FORUM WARNING [2] Division by zero (Line: 2933 of /srcsgcaop/boardclass.php)
My upcoming plagiarism report -
     
 
The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> General Discussion? What's that all about?

This subforum is for general discussion of Wikipedia and other Wikimedia projects. For a glossary of terms frequently used in such discussions, please refer to Wikipedia:Glossary. For a glossary of musical terms, see here. Other useful links:

Akahele.orgWikipedia-WatchWikitruthWP:ANWikiEN-L/Foundation-L (mailing lists) • Citizendium forums

> My upcoming plagiarism report, How should I present it?
Daniel Brandt
post
Post #1


Postmaster
*******

Group: Regulars
Posts: 2,473
Joined:
Member No.: 77



I need suggestions on how to present my plagiarism report at wikipedia-watch.org. I still have several weeks of work to do, despite the fact that I've been working a few hours a day on it for the last three weeks.

I'm far enough along in terms of separating the signal from the noise, that I can now predict that the report will end up with between 100 and 300 examples. Here's a throwaway example, that will probably get corrected as soon as someone from Wikipedia sees this post:

Wikipedia version as of mid-September, 2006

Source that was plagiarized

Most of my examples are similar to this -- except they're not from Britannica, but rather from everywhere imaginable. Almost all of the original sources have clear copyright notices on them, and the source is not acknowledged on the Wikipedia article, and anywhere from several sentences to several paragraphs are plagiarized.

My question is, "How can I format the report so that anyone looking at it will get the picture, within a few clicks, that Wikipedia has a plagiarism problem?"

So far my best idea is to have a doorway page explaining that my examples were culled from a sampling of slightly less than one percent of the 1.4 million English-language Wikipedia articles. If I have 200 examples, then we can presume that there are about 20,000 plagiarized articles in Wikipedia that no one has yet discovered. No one has made any attempt to discover them, and no one ever will. It's just too hard. Even for programmers with a pipeline into automated Google inquiries, it's still too hard. There's an amazing amount of manual checking that's required to reduce the noise without throwing out the signal.

This doorway page will link to 200 subpages (Example 001, Example 002, ... Example 200). Each of the subpages will be titled "Plagiarism on Wikipedia - Example 001" and have a link to the source, plus a link to the version on Wikipedia as of mid-September when I grabbed the page. Then below this, the text-portion only from that page (this is easy to strip out of the XML versions of the article that I already have) will be reproduced, and the sections that are plagiarized from the source will be in highlighted in background yellow.

The effect will be that the visitor to the doorway page is given some information on how the examples were found, and is invited to click randomly on any of the 200 examples to see for themselves. I'm linking to the mid-September version, since it's possible that many editors will start cleaning up these 200 examples. One way they will try to clean it up is to acknowledge the source, but that still doesn't solve the problem that entire paragraphs were copied verbatim. They'll have to change sentences around too.

Therefore, I predict that Jimmy will claim that Wikipedia is amazingly free from plagiarism, because Wikipedia has always had a zero-tolerance policy. (This will be a lie -- there have been no efforts to identify plagiarism at Wikipedia.) Then he will zap totally all 200 articles (no history, no nothing) so that the links to the September version on my subpages won't work. That's why I have to reproduce the text from the article and highlight the plagiarized material. If I don't, my report will not be convincing after Jimmy zaps the 200 articles.

Any other ideas?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
 
Reply to this topicStart new topic
Replies
Daniel Brandt
post
Post #2


Postmaster
*******

Group: Regulars
Posts: 2,473
Joined:
Member No.: 77



Citizendium is one example. Another example is the often-proposed distribution of Wikipedia in various forms that will be frozen at a particular point in time. For example, I just read that a Flash version of Wikipedia will soon be available. There has been talk about a print version. Those laptops for hungry African children will have Wikipedia pre-installed.

These new forms of distribution for Wikipedia mean that errors, copyright violations, and plagiarism will also be distributed -- and frozen in time. Right now (well, not right now, but after my plagiarism report goes live), no publisher will consider a bound version of Wikipedia without factoring in the cost of hiring editors and researchers to double-check every article they plan to publish. When you add that in, it makes no sense to have a print version at all. It's hard enough to sell a print version that has to compete with free versions online, but adding the cost of screening every single article is prohibitive. You cannot risk a print run if you're going to be printing plagiarism.

The problem is that Wikipedia is too big, and by now it's too late to install meaningful controls on the freedoms that rogue editors enjoy. Forget about plagiarisim and copyright violations and errors for a second. Just the task of taking out all the pop-culture trivia, fancruft, porn, gaming esoterica -- it's overwhelming. Going through 1.4 million articles is not anyone's idea of a good time.

I think Jimmy has hyped himself into a corner. And if Larry Sanger thinks he can do better, then the only way for him to start is to delete stuff like crazy. Maybe there are 200,000 articles worth keeping out of the 1.4 million.

And the Internet noise that's been generated, by Wikipedia plus Google, with all the scrapers looking to sell something, is just shameful. I found 965 domains that scrape Wikipedia. These are the ones that don't give Wikipedia credit on their site. I had no idea that there were so many, because normally I'd think of scrapers in terms of those couple dozen sites that scrape as much of Wikipedia as they can for the ad revenue. But for every one of those, there are dozens of niche scrapers that are pushing particular types of merchandise. Art galleries scrape biographies of artists, for example, because it looks cool on their website next to an image of something by that artist that they're trying to sell. Tourism agencies add a little historical flavor to their packages by scraping some information about famous people who lived in the area. And on and on.

The GFDL is a crummy idea. There should be a "noncommercial use only" stipulation in it. Now it's probably too late for Wikipedia to change it without starting over.

What a mess. I remember that set of World Book encyclopedias that I had as a child. Pure signal, zero noise.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
JohnA
post
Post #3


Looking over Winston Smith's shoulder
******

Group: Regulars
Posts: 1,171
Joined:
Member No.: 313



QUOTE(Daniel Brandt @ Mon 16th October 2006, 1:54pm) *


The problem is that Wikipedia is too big, and by now it's too late to install meaningful controls on the freedoms that rogue editors enjoy. Forget about plagiarisim and copyright violations and errors for a second. Just the task of taking out all the pop-culture trivia, fancruft, porn, gaming esoterica -- it's overwhelming. Going through 1.4 million articles is not anyone's idea of a good time.

I think Jimmy has hyped himself into a corner. And if Larry Sanger thinks he can do better, then the only way for him to start is to delete stuff like crazy. Maybe there are 200,000 articles worth keeping out of the 1.4 million.


I think there are. There are some excellent articles on Wikipedia, but unfortunately the people who wrote them are tarred with the same brush as the rest of Wikipedia's output.

QUOTE
What a mess. I remember that set of World Book encyclopedias that I had as a child. Pure signal, zero noise.


I learnt most in my childhood from Children's Britannica and a set of Encyclopedia Britannica from about 1960. Talk about zero noise, pure scholarship and clear language!

The trouble is Wikipedia's S/N is poor, and I don't think Sanger has a hope in hell of replacing Wikipedia.

I'm sorry but Sanger has completely missed the point of why Wikipedia is successful and why people sacrifice their time to write for Wikipedia. I think Citizendium will fail, the latest in a long line of attempted Wikiforks that went nowhere.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

Posts in this topic
Daniel Brandt   My upcoming plagiarism report  
EuroSceptic   1. Provide link, but also save all WP versions loc...  
Jonny Cache   What EuroSceptic suggests sounds like the first th...  
Joey   /  
Skyrocket   Plagiarism? It's trivial. What about copyrig...  
poopooball   whats scary is taht the plagerizer here says hes a...  
Daniel Brandt   Most of the plagiarism in my examples will also se...  
Somey   Well, I'm certainly impressed! Nice work...  
Somey   One more thing... Another way they'll probabl...  
Joey   /  
Daniel Brandt   The approach here might be to submit the matter fo...  
Ashibaka   It was Seigenthaler's idea to do a plagiarism...  
Daniel Brandt   I gotta say, that's pretty cool! Make sure...  
Daniel Brandt   Actually, Wikipedia lacks tools to convert individ...  
Somey   If you try to save an actual page from Wikipedia a...  
Uly   You'll probably want to prepare an argument fo...  
Daniel Brandt   Somey: If you need Explorer to read them, that mea...  
guy   Let's hope they do say that. Daniel can point...  
Daniel Brandt   Let's hope they do say that. Daniel can point...  
poopooball   looks like plagarist librerian fixed it. http://...  
taiwopanfob   I guess the obvious should be said if it hasn...  
Joey   /  
Daniel Brandt   Look here -- I'm picking up the MSN cache copy...  
Surfer   For presentation: I like Euro´s suggestion, too...  
guy   That's unlikely to work for old but still in c...  
Joey   /  
guy   I'm not certain what relevance the fact that ...  
Joey   /  
guy   Absent a definite article that would expose the i...  
Joey   /  
JohnA   The only problem I can see is that Wikipedia may g...  
guy   I expect they'll say that there are a handful ...  
Daniel Brandt   Here's how I'm planning on doing each exam...  
Somey   I say we all block ourselves for 45 minutes, go ma...  
Joey   ?  
JohnA   So I wouldn't sweat it, personally. If the m...  
Daniel Brandt   More tips for Wikipedia critics with their own ser...  
Somey   It's hard enough to sell a print version that ...  


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now:
 
     
FORUM WARNING [2] Cannot modify header information - headers already sent by (output started at /home2/wikipede/public_html/int042kj398.php:242) (Line: 0 of Unknown)