FORUM WARNING [2] Division by zero (Line: 2933 of /srcsgcaop/boardclass.php)
Wikipedia and Information Theory -
     
 
The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> Wikipedia and Information Theory
anthony
post
Post #1


Postmaster
*******

Group: Regulars
Posts: 2,034
Joined:
Member No.: 2,132



The English Wikipedia database, uncompressed: 5.34 terabytes
The English Wikipedia database, compressed: 32 gigabytes
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
 
Reply to this topicStart new topic
Replies
MZMcBride
post
Post #2


Ãœber Member
*****

Group: Regulars
Posts: 671
Joined:
Member No.: 10,962



anthony is almost certainly referring to pages-meta-history.xml.7z, which is "all pages with complete edit history" and weighs in at 31.9 GB. 7-Zip is pretty nifty compression, so much so that it's been seriously suggested lately to drop the bzip2 dumps altogether.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Milton Roe
post
Post #3


Known alias of J. Random Troll
*********

Group: Regulars
Posts: 10,209
Joined:
Member No.: 5,156



QUOTE(MZMcBride @ Fri 2nd April 2010, 11:07pm) *

anthony is almost certainly referring to pages-meta-history.xml.7z, which is "all pages with complete edit history" and weighs in at 31.9 GB. 7-Zip is pretty nifty compression, so much so that it's been seriously suggested lately to drop the bzip2 dumps altogether.


It still boggles my mind that all of WP, all past pages and non-oversighted history included, minus images, can be compressed and stored on one flash drive.

If we could identify one "stable" certified version of each article, and only store THAT, perhaps it would be only 1% of this? Or 1/167th of this? (IMG:smilys0b23ax56/default/wink.gif)

(How many edits does the average article have, or a better question, is what is the ratio of the size of the LAST version of all the articles, to the size of their entire edit history?? Does anybody know?).

The point being that if we were looking at only the last "good" version, we might be able to store all WP text on one flash drive UNCOMPRESSED. Something like that has been done, but it's not user friendly (some business person naturally wants to sell you a KINDLE type dedicated reader).

The world changes when you can get it all on a flash drive that is readable by any hand held computer-driven device, including a smart-phone (these things all badly need flashdrive ports). That would make access to WP independent of web access, and (as I've said before) actually have a chance of being of some use to the kid in the third world (example, the people of the island of Yap, who I visited), where internet access is very, very expensive. But everybody speaks English and has a crying need for Western education (they go for secondary schooling to Guam or Hawaii).
User is offlineProfile CardPM
Go to the top of the page
+Quote Post



Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now:
 
     
FORUM WARNING [2] Cannot modify header information - headers already sent by (output started at /home2/wikipede/public_html/int042kj398.php:242) (Line: 0 of Unknown)