FORUM WARNING [2] Division by zero (Line: 2933 of /srcsgcaop/boardclass.php)
Wikipedia and Information Theory -
     
 
The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> Wikipedia and Information Theory
anthony
post
Post #1


Postmaster
*******

Group: Regulars
Posts: 2,034
Joined:
Member No.: 2,132



The English Wikipedia database, uncompressed: 5.34 terabytes
The English Wikipedia database, compressed: 32 gigabytes
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
 
Reply to this topicStart new topic
Replies
Kelly Martin
post
Post #2


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



The full dump contains full-text versions of each revision of each article. That's a great deal of redundancy just there. Then you have all the "redundancy" of the XML structure wrapping, which uses way more bits than strictly needed and add very little entropy. And I don't know much of of that 5-odd terabytes is indexing; indexes obviously add no information at all.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post



Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now:
 
     
FORUM WARNING [2] Cannot modify header information - headers already sent by (output started at /home2/wikipede/public_html/int042kj398.php:242) (Line: 0 of Unknown)