The full dump contains full-text versions of each revision of each article. That's a great deal of redundancy just there. Then you have all the "redundancy" of the XML structure wrapping, which uses way more bits than strictly needed and add very little entropy. And I don't know much of of that 5-odd terabytes is indexing; indexes obviously add no information at all.
|