FORUM WARNING [2] Division by zero (Line: 2933 of /srcsgcaop/boardclass.php)
My upcoming plagiarism report -
     
 
The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> General Discussion? What's that all about?

This subforum is for general discussion of Wikipedia and other Wikimedia projects. For a glossary of terms frequently used in such discussions, please refer to Wikipedia:Glossary. For a glossary of musical terms, see here. Other useful links:

Akahele.orgWikipedia-WatchWikitruthWP:ANWikiEN-L/Foundation-L (mailing lists) • Citizendium forums

> My upcoming plagiarism report, How should I present it?
Daniel Brandt
post
Post #1


Postmaster
*******

Group: Regulars
Posts: 2,473
Joined:
Member No.: 77



I need suggestions on how to present my plagiarism report at wikipedia-watch.org. I still have several weeks of work to do, despite the fact that I've been working a few hours a day on it for the last three weeks.

I'm far enough along in terms of separating the signal from the noise, that I can now predict that the report will end up with between 100 and 300 examples. Here's a throwaway example, that will probably get corrected as soon as someone from Wikipedia sees this post:

Wikipedia version as of mid-September, 2006

Source that was plagiarized

Most of my examples are similar to this -- except they're not from Britannica, but rather from everywhere imaginable. Almost all of the original sources have clear copyright notices on them, and the source is not acknowledged on the Wikipedia article, and anywhere from several sentences to several paragraphs are plagiarized.

My question is, "How can I format the report so that anyone looking at it will get the picture, within a few clicks, that Wikipedia has a plagiarism problem?"

So far my best idea is to have a doorway page explaining that my examples were culled from a sampling of slightly less than one percent of the 1.4 million English-language Wikipedia articles. If I have 200 examples, then we can presume that there are about 20,000 plagiarized articles in Wikipedia that no one has yet discovered. No one has made any attempt to discover them, and no one ever will. It's just too hard. Even for programmers with a pipeline into automated Google inquiries, it's still too hard. There's an amazing amount of manual checking that's required to reduce the noise without throwing out the signal.

This doorway page will link to 200 subpages (Example 001, Example 002, ... Example 200). Each of the subpages will be titled "Plagiarism on Wikipedia - Example 001" and have a link to the source, plus a link to the version on Wikipedia as of mid-September when I grabbed the page. Then below this, the text-portion only from that page (this is easy to strip out of the XML versions of the article that I already have) will be reproduced, and the sections that are plagiarized from the source will be in highlighted in background yellow.

The effect will be that the visitor to the doorway page is given some information on how the examples were found, and is invited to click randomly on any of the 200 examples to see for themselves. I'm linking to the mid-September version, since it's possible that many editors will start cleaning up these 200 examples. One way they will try to clean it up is to acknowledge the source, but that still doesn't solve the problem that entire paragraphs were copied verbatim. They'll have to change sentences around too.

Therefore, I predict that Jimmy will claim that Wikipedia is amazingly free from plagiarism, because Wikipedia has always had a zero-tolerance policy. (This will be a lie -- there have been no efforts to identify plagiarism at Wikipedia.) Then he will zap totally all 200 articles (no history, no nothing) so that the links to the September version on my subpages won't work. That's why I have to reproduce the text from the article and highlight the plagiarized material. If I don't, my report will not be convincing after Jimmy zaps the 200 articles.

Any other ideas?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
 
Reply to this topicStart new topic
Replies
Uly
post
Post #2


Junior Member
**

Group: Contributors
Posts: 80
Joined:
Member No.: 250



You'll probably want to prepare an argument for why your hosting of the copyvio examples isn't a copyvio itself.

I expect you can easily argue fair use and proper attribution - but I'm also sure this'll be one of the first attacks levelled at you.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Daniel Brandt
post
Post #3


Postmaster
*******

Group: Regulars
Posts: 2,473
Joined:
Member No.: 77



Somey: If you need Explorer to read them, that means I'd have to reverse-engineer the format to serve the files from a Linux box. The MSN cache copy looks a lot easier. I know you do Microsoft stuff, but my work is more generic. I use XP on my desktop (my servers are Linux), but at least half the time on my XP I'm doing stuff from a command window. I tried saving a web page once from IE, and it took me three minutes to track down all the directories it created to save the various types of content. That was the last time I saved anything from IE without doing a "view source" first. I do use IE to see how the web pages I code by hand look on IE, but I've got everything disabled in it for security reasons, which means I can't use it online apart from looking at my own sites.

QUOTE(Uly @ Wed 11th October 2006, 9:30am) *
You'll probably want to prepare an argument for why your hosting of the copyvio examples isn't a copyvio itself. I expect you can easily argue fair use and proper attribution - but I'm also sure this'll be one of the first attacks levelled at you.

That has occurred to me. But then I thought, "Well, I sure won't get any criticism from the copyright holder, because I'm trying to defend their copyright."

And the next thing I thought was, "Wow, look at all those pathetic editors at Wikipedia who insist on calling me a privacy advocate, without any evidence that I'm any such thing, just so they can launch into a clever remark about how ironic it is that I'm violating the privacy of Wikipedia editors on hivemind."

Here is the latest example, from WP:ANI, about my IRC logs:
QUOTE
We can all try and amuse ourselves with the irony that Brandt is supposed to be a leading internet privacy advocate. We can also mention this irony in the press next time someone asks us about critics. --bainer (talk) 14:23, 8 October 2006 (UTC)

You are correct about this Uly. If you give the average Wikipedian any sort of opening at all, they will instantly stick their tongue into it. A single column from Wikipedia with the duplicate sentences highlighted, and just a link to the original (which, after all, will be stable), is probably the smart thing to do.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

Posts in this topic
Daniel Brandt   My upcoming plagiarism report  
EuroSceptic   1. Provide link, but also save all WP versions loc...  
Jonny Cache   What EuroSceptic suggests sounds like the first th...  
Joey   /  
Skyrocket   Plagiarism? It's trivial. What about copyrig...  
poopooball   whats scary is taht the plagerizer here says hes a...  
Daniel Brandt   Most of the plagiarism in my examples will also se...  
Somey   Well, I'm certainly impressed! Nice work...  
Somey   One more thing... Another way they'll probabl...  
Joey   /  
Daniel Brandt   The approach here might be to submit the matter fo...  
Ashibaka   It was Seigenthaler's idea to do a plagiarism...  
Daniel Brandt   I gotta say, that's pretty cool! Make sure...  
Daniel Brandt   Actually, Wikipedia lacks tools to convert individ...  
Somey   If you try to save an actual page from Wikipedia a...  
guy   Let's hope they do say that. Daniel can point...  
Daniel Brandt   Let's hope they do say that. Daniel can point...  
poopooball   looks like plagarist librerian fixed it. http://...  
taiwopanfob   I guess the obvious should be said if it hasn...  
Joey   /  
Daniel Brandt   Look here -- I'm picking up the MSN cache copy...  
Surfer   For presentation: I like Euro´s suggestion, too...  
guy   That's unlikely to work for old but still in c...  
Joey   /  
guy   I'm not certain what relevance the fact that ...  
Joey   /  
guy   Absent a definite article that would expose the i...  
Joey   /  
JohnA   The only problem I can see is that Wikipedia may g...  
guy   I expect they'll say that there are a handful ...  
Daniel Brandt   Here's how I'm planning on doing each exam...  
Somey   I say we all block ourselves for 45 minutes, go ma...  
Joey   ?  
JohnA   So I wouldn't sweat it, personally. If the m...  
Daniel Brandt   More tips for Wikipedia critics with their own ser...  
Daniel Brandt   Citizendium is one example. Another example is the...  
Somey   It's hard enough to sell a print version that ...  
JohnA   The problem is that Wikipedia is too big, and by...  


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now:
 
     
FORUM WARNING [2] Cannot modify header information - headers already sent by (output started at /home2/wikipede/public_html/int042kj398.php:242) (Line: 0 of Unknown)