Printable Version of Topic
_ MediaWiki Software _ Your two rights on Wikipedia
Posted by: thekohser
I've heard it said that users have two rights on Wikipedia:
1. The right to "fork" the database
2. The right to leave the project
So, I'm thinking about forking the English Wikipedia. How exactly does one go about doing that? I thought that the Wikimedia Foundation had given up about 18 months ago with trying to produce regularly-available data dumps of the entire project, presumably because their servers were choking on the process.
Is it now incumbent on a forker of the mother database (the "mother forker") to execute the entire process from "outside" Wikipedia?
And another question -- how might one fork the http://simple.wikipedia.org/wiki/Main_Page, which has a much more manageable 25,704 articles?
Greg
Posted by: Nathan
Find the data dumps then import them into your database?
Posted by: thekohser
QUOTE(Nathan @ Mon 18th February 2008, 2:43pm)
Find the data dumps then import them into your database?
Sure, but where are these elusive data dumps? I thought the last stable, successful one was back at the end of 2006!?
Greg
Posted by: GlassBeadGame
QUOTE(thekohser @ Mon 18th February 2008, 2:53pm)
QUOTE(Nathan @ Mon 18th February 2008, 2:43pm)
Find the data dumps then import them into your database?
Sure, but where are these elusive data dumps? I thought the last stable, successful one was back at the end of 2006!?
Greg
I don't think there is anyway to execute a dump from upside as you would need sufficient permissions on the database, so you would have to rely on an existing publicly available dumps. It would still be an interesting project even if the dump was rather old. After all it's not like the project is improving anymore. You could insist on IRL identities of editors, respect experts, treat businesses with respect, exercise editorial restraint and implement BLP reform. I think the approach would be like marble sculpture. Cut away everything that doesn't look like an encyclopedia. You would have a much better product within a year, even with only modest number of committed editors.
Posted by: gomi
This is one of the big lies of Wikipedia -- that you can fork it. There have been successful backups during 2007 -- as recently as December, but they get removed as soon as they are complete. There is a very small window in which to pick one up. Wordbomb has some, but I think they are old.
Posted by: EternalIdealist
The misconception that database dumps are somehow rare or difficult to come by is one of the most persistent falsehoods. People really should bother to Google.
http://download.wikimedia.org/backup-index.html
http://en.wikipedia.org/wiki/Wikipedia:Database_download
Posted by: Somey
Yeah! I even took a photo of one, just the other day:
I'm not sure how you'd fork something like that, though. Maybe a pitchfork...
Posted by: thekohser
QUOTE(EternalIdealist @ Wed 20th February 2008, 12:33am)
The misconception that database dumps are somehow rare or difficult to come by is one of the most persistent falsehoods. People really should bother to Google.
http://download.wikimedia.org/backup-index.html
http://en.wikipedia.org/wiki/Wikipedia:Database_download
LOL. Try clicking the http://static.wikipedia.org/wikipedia/en/index.html. (Doesn't work.)
Try grabbing the XML dump of just the most http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-current.xml.bz2 of the English Wikipedia. (Doesn't work.)
So, you were rolling your eyes, because...?
Posted by: Pumpkin Muffins
QUOTE(thekohser @ Mon 18th February 2008, 7:05pm)
I've heard it said that users have two rights on Wikipedia:
1. The right to "fork" the database
2. The right to leave the project
So, I'm thinking about forking the English Wikipedia. How exactly does one go about doing that? I thought that the Wikimedia Foundation had given up about 18 months ago with trying to produce regularly-available data dumps of the entire project, presumably because their servers were choking on the process.
Is it now incumbent on a forker of the mother database (the "mother forker") to execute the entire process from "outside" Wikipedia?
And another question -- how might one fork the http://simple.wikipedia.org/wiki/Main_Page, which has a much more manageable 25,704 articles?
Greg
to fork, you'd want "All pages, current versions only", not "All pages with complete edit history". Then latter is the one that crashes all the time before completing.
Posted by: thekohser
QUOTE(Pumpkin Muffins @ Wed 20th February 2008, 1:19am)
to fork, you'd want "All pages, current versions only", not "All pages with complete edit history". Then latter is the one that crashes all the time before completing.
Pumpkin, I realize that. Show me where I can get a working copy of the 6 GB file of "All pages, current versions only". Please!
Posted by: Pumpkin Muffins
QUOTE(thekohser @ Wed 20th February 2008, 6:24am)
QUOTE(Pumpkin Muffins @ Wed 20th February 2008, 1:19am)
to fork, you'd want "All pages, current versions only", not "All pages with complete edit history". Then latter is the one that crashes all the time before completing.
Pumpkin, I realize that. Show me where I can get a working copy of the 6 GB file of "All pages, current versions only". Please!
http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-meta-current.xml.bz2 or http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-articles.xml.bz2 ... don't know if these files are functional though. The xml dumps need to be http://meta.wikimedia.org/wiki/Xml2sql.
Posted by: Nathan
QUOTE(thekohser @ Wed 20th February 2008, 12:52am)
QUOTE(EternalIdealist @ Wed 20th February 2008, 12:33am)
The misconception that database dumps are somehow rare or difficult to come by is one of the most persistent falsehoods. People really should bother to Google. :rolleyes:
http://download.wikimedia.org/backup-index.html
http://en.wikipedia.org/wiki/Wikipedia:Database_download
LOL. Try clicking the http://static.wikipedia.org/wikipedia/en/index.html. (Doesn't work.)
Try grabbing the XML dump of just the most http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-current.xml.bz2 of the English Wikipedia. (Doesn't work.)
So, you were rolling your eyes, because...?
There's a dump right http://download.wikimedia.org/enwiki/20071218/, though. oops, not what you want.
Posted by: dtobias
When you gotta take a dump, you gotta take a dump!
To the tune of the William Tell Overture / Lone Ranger theme:
Take a dump, take a dump, take a dump dump dump
Take a dump, take a dump, take a dump dump dump
Take a dump, take a dump, take a dump dump dump
Every day, take a dump dump dump!
Posted by: Error59
Dtobias - you may enjoy http://en.wikipedia.org/wiki/The_Diarrhea_Song
Posted by: thekohser
Reminds me of a song a co-worker of mine would sing from the Men's room when I worked in a carpet warehouse as a teenager --
Stranded! Stranded! Stranded on the bathroom bowl...
What do you do, when you just had a poo...
And you gotta have a roll?!
Posted by: JohnA
I assume that Wikipedia has told you to go fork yourself?
Greg, this is probably what you want: http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-articles.xml.bz2
Posted by: thekohser
QUOTE(JohnA @ Wed 20th February 2008, 9:38am)
I assume that Wikipedia has told you to go fork yourself?
Greg, this is probably what you want: http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-articles.xml.bz2
Perhaps. We'll see -- I'm 66% downloaded now.
Posted by: JohnA
Now that you've got it, what are you going to do with it?
Posted by: thekohser
QUOTE(JohnA @ Fri 22nd February 2008, 3:11am)
Now that you've got it, what are you going to do with it?
Stay tuned. I'm assembling a strategy team and will likely be incorporating, either with or without venture capital. We've already discussed what I might do with it, elsewhere on here.
To discuss any more would just be sabotaging my own first-mover advantage.
Greg