Printable Version of Topic

Click here to view this topic in its original format

_ MediaWiki Software _ Your two rights on Wikipedia

Posted by: thekohser

I've heard it said that users have two rights on Wikipedia:

1. The right to "fork" the database
2. The right to leave the project

So, I'm thinking about forking the English Wikipedia. How exactly does one go about doing that? I thought that the Wikimedia Foundation had given up about 18 months ago with trying to produce regularly-available data dumps of the entire project, presumably because their servers were choking on the process.

Is it now incumbent on a forker of the mother database (the "mother forker") to execute the entire process from "outside" Wikipedia?

And another question -- how might one fork the http://simple.wikipedia.org/wiki/Main_Page, which has a much more manageable 25,704 articles?

Greg

Posted by: Nathan

Find the data dumps then import them into your database?

Posted by: thekohser

QUOTE(Nathan @ Mon 18th February 2008, 2:43pm) *

Find the data dumps then import them into your database?


Sure, but where are these elusive data dumps? I thought the last stable, successful one was back at the end of 2006!?

Greg

Posted by: GlassBeadGame

QUOTE(thekohser @ Mon 18th February 2008, 2:53pm) *

QUOTE(Nathan @ Mon 18th February 2008, 2:43pm) *

Find the data dumps then import them into your database?


Sure, but where are these elusive data dumps? I thought the last stable, successful one was back at the end of 2006!?

Greg


I don't think there is anyway to execute a dump from upside as you would need sufficient permissions on the database, so you would have to rely on an existing publicly available dumps. It would still be an interesting project even if the dump was rather old. After all it's not like the project is improving anymore. You could insist on IRL identities of editors, respect experts, treat businesses with respect, exercise editorial restraint and implement BLP reform. I think the approach would be like marble sculpture. Cut away everything that doesn't look like an encyclopedia. You would have a much better product within a year, even with only modest number of committed editors.

Posted by: gomi

This is one of the big lies of Wikipedia -- that you can fork it. There have been successful backups during 2007 -- as recently as December, but they get removed as soon as they are complete. There is a very small window in which to pick one up. Wordbomb has some, but I think they are old.


Posted by: EternalIdealist

The misconception that database dumps are somehow rare or difficult to come by is one of the most persistent falsehoods. People really should bother to Google. rolleyes.gif

http://download.wikimedia.org/backup-index.html

http://en.wikipedia.org/wiki/Wikipedia:Database_download

Posted by: Somey

Yeah! I even took a photo of one, just the other day:

FORUM Image


I'm not sure how you'd fork something like that, though. Maybe a pitchfork...

Posted by: thekohser

QUOTE(EternalIdealist @ Wed 20th February 2008, 12:33am) *

The misconception that database dumps are somehow rare or difficult to come by is one of the most persistent falsehoods. People really should bother to Google. rolleyes.gif

http://download.wikimedia.org/backup-index.html

http://en.wikipedia.org/wiki/Wikipedia:Database_download


LOL. Try clicking the http://static.wikipedia.org/wikipedia/en/index.html. (Doesn't work.)

Try grabbing the XML dump of just the most http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-current.xml.bz2 of the English Wikipedia. (Doesn't work.)

So, you were rolling your eyes, because...?

Posted by: Pumpkin Muffins

QUOTE(thekohser @ Mon 18th February 2008, 7:05pm) *

I've heard it said that users have two rights on Wikipedia:

1. The right to "fork" the database
2. The right to leave the project

So, I'm thinking about forking the English Wikipedia. How exactly does one go about doing that? I thought that the Wikimedia Foundation had given up about 18 months ago with trying to produce regularly-available data dumps of the entire project, presumably because their servers were choking on the process.

Is it now incumbent on a forker of the mother database (the "mother forker") to execute the entire process from "outside" Wikipedia?

And another question -- how might one fork the http://simple.wikipedia.org/wiki/Main_Page, which has a much more manageable 25,704 articles?

Greg


to fork, you'd want "All pages, current versions only", not "All pages with complete edit history". Then latter is the one that crashes all the time before completing.




Posted by: thekohser

QUOTE(Pumpkin Muffins @ Wed 20th February 2008, 1:19am) *

to fork, you'd want "All pages, current versions only", not "All pages with complete edit history". Then latter is the one that crashes all the time before completing.


Pumpkin, I realize that. Show me where I can get a working copy of the 6 GB file of "All pages, current versions only". Please!

Posted by: Pumpkin Muffins

QUOTE(thekohser @ Wed 20th February 2008, 6:24am) *

QUOTE(Pumpkin Muffins @ Wed 20th February 2008, 1:19am) *

to fork, you'd want "All pages, current versions only", not "All pages with complete edit history". Then latter is the one that crashes all the time before completing.


Pumpkin, I realize that. Show me where I can get a working copy of the 6 GB file of "All pages, current versions only". Please!


http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-meta-current.xml.bz2 or http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-articles.xml.bz2 ... don't know if these files are functional though. The xml dumps need to be http://meta.wikimedia.org/wiki/Xml2sql.

Posted by: Nathan

QUOTE(thekohser @ Wed 20th February 2008, 12:52am) *

QUOTE(EternalIdealist @ Wed 20th February 2008, 12:33am) *

The misconception that database dumps are somehow rare or difficult to come by is one of the most persistent falsehoods. People really should bother to Google. :rolleyes:

http://download.wikimedia.org/backup-index.html

http://en.wikipedia.org/wiki/Wikipedia:Database_download


LOL. Try clicking the http://static.wikipedia.org/wikipedia/en/index.html. (Doesn't work.)

Try grabbing the XML dump of just the most http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-current.xml.bz2 of the English Wikipedia. (Doesn't work.)

So, you were rolling your eyes, because...?


There's a dump right http://download.wikimedia.org/enwiki/20071218/, though. oops, not what you want.

Posted by: dtobias

When you gotta take a dump, you gotta take a dump!

To the tune of the William Tell Overture / Lone Ranger theme:

Take a dump, take a dump, take a dump dump dump
Take a dump, take a dump, take a dump dump dump
Take a dump, take a dump, take a dump dump dump
Every day, take a dump dump dump!


Posted by: Error59

Dtobias - you may enjoy http://en.wikipedia.org/wiki/The_Diarrhea_Song happy.gif

Posted by: thekohser

Reminds me of a song a co-worker of mine would sing from the Men's room when I worked in a carpet warehouse as a teenager --

Stranded! Stranded! Stranded on the bathroom bowl...

What do you do, when you just had a poo...

And you gotta have a roll?!

Posted by: JohnA

I assume that Wikipedia has told you to go fork yourself?

Greg, this is probably what you want: http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-articles.xml.bz2

Posted by: thekohser

QUOTE(JohnA @ Wed 20th February 2008, 9:38am) *

I assume that Wikipedia has told you to go fork yourself?

Greg, this is probably what you want: http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-articles.xml.bz2


Perhaps. We'll see -- I'm 66% downloaded now.

Posted by: JohnA

Now that you've got it, what are you going to do with it?

Posted by: thekohser

QUOTE(JohnA @ Fri 22nd February 2008, 3:11am) *

Now that you've got it, what are you going to do with it?


Stay tuned. I'm assembling a strategy team and will likely be incorporating, either with or without venture capital. We've already discussed what I might do with it, elsewhere on here.

To discuss any more would just be sabotaging my own first-mover advantage.

Greg