My upcoming plagiarism report -

The Wikipedia Review: A forum for discussion and criticism of Wikipedia

Welcome, Guest! ( Log In | Register )

> Wikimedia Discussion > General Discussion

General Discussion? What's that all about?

This subforum is for general discussion of Wikipedia and other Wikimedia projects. For a glossary of terms frequently used in such discussions, please refer to Wikipedia:Glossary. For a glossary of musical terms, see here. Other useful links:

Akahele.org • Wikipedia-Watch • Wikitruth • WP:AN • WikiEN-L/Foundation-L (mailing lists) • Citizendium forums

My upcoming plagiarism report, How should I present it?

Options

Daniel Brandt	Post #1
Postmaster Group: Regulars Posts: 2,473 Joined: Member No.: 77	I need suggestions on how to present my plagiarism report at wikipedia-watch.org. I still have several weeks of work to do, despite the fact that I've been working a few hours a day on it for the last three weeks. I'm far enough along in terms of separating the signal from the noise, that I can now predict that the report will end up with between 100 and 300 examples. Here's a throwaway example, that will probably get corrected as soon as someone from Wikipedia sees this post: Wikipedia version as of mid-September, 2006 Source that was plagiarized Most of my examples are similar to this -- except they're not from Britannica, but rather from everywhere imaginable. Almost all of the original sources have clear copyright notices on them, and the source is not acknowledged on the Wikipedia article, and anywhere from several sentences to several paragraphs are plagiarized. My question is, "How can I format the report so that anyone looking at it will get the picture, within a few clicks, that Wikipedia has a plagiarism problem?" So far my best idea is to have a doorway page explaining that my examples were culled from a sampling of slightly less than one percent of the 1.4 million English-language Wikipedia articles. If I have 200 examples, then we can presume that there are about 20,000 plagiarized articles in Wikipedia that no one has yet discovered. No one has made any attempt to discover them, and no one ever will. It's just too hard. Even for programmers with a pipeline into automated Google inquiries, it's still too hard. There's an amazing amount of manual checking that's required to reduce the noise without throwing out the signal. This doorway page will link to 200 subpages (Example 001, Example 002, ... Example 200). Each of the subpages will be titled "Plagiarism on Wikipedia - Example 001" and have a link to the source, plus a link to the version on Wikipedia as of mid-September when I grabbed the page. Then below this, the text-portion only from that page (this is easy to strip out of the XML versions of the article that I already have) will be reproduced, and the sections that are plagiarized from the source will be in highlighted in background yellow. The effect will be that the visitor to the doorway page is given some information on how the examples were found, and is invited to click randomly on any of the 200 examples to see for themselves. I'm linking to the mid-September version, since it's possible that many editors will start cleaning up these 200 examples. One way they will try to clean it up is to acknowledge the source, but that still doesn't solve the problem that entire paragraphs were copied verbatim. They'll have to change sentences around too. Therefore, I predict that Jimmy will claim that Wikipedia is amazingly free from plagiarism, because Wikipedia has always had a zero-tolerance policy. (This will be a lie -- there have been no efforts to identify plagiarism at Wikipedia.) Then he will zap totally all 200 articles (no history, no nothing) so that the links to the September version on my subpages won't work. That's why I have to reproduce the text from the article and highlight the plagiarized material. If I don't, my report will not be convincing after Jimmy zaps the 200 articles. Any other ideas?

Replies

Daniel Brandt	Post #2
Postmaster Group: Regulars Posts: 2,473 Joined: Member No.: 77	Look here -- I'm picking up the MSN cache copy from my wikipedia-watch server. Here's how I did it: First I got the cache URL ID number from MSN by doing a search for: site:wikipedia.org alain leroy locke Then I pasted this cache ID into a one-line script that ran on Linux: CODE curl -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" -o "msncache.html" "http://cc.msnscache.com/cache.aspx?q=4081175923447" That should all be on one line; this board most likely wrapped it. The only change I made to my msncache.html that curl fetched for me, was to delete the MSN header at the top. The page looks like it came right out of Wikipedia, but the important content -- namely the text -- came from my server. The stuff that came from Wikipedia are the templates and the image -- things that Wikipedia is unable to change. This will work even if Wikipedia zaps the article and history. They could delete the image, but that's no big deal. Many of my samples don't have images. The other stuff is used so widely on Wikipedia that they have to leave it alone. They could block my server, but how embarrassing would that be for them? I think I'll have to grab as many MSN cache copies as I can before I go live.

Posts in this topic

Daniel Brandt My upcoming plagiarism report

EuroSceptic 1. Provide link, but also save all WP versions loc...

Jonny Cache What EuroSceptic suggests sounds like the first th...

Joey /

Skyrocket Plagiarism? It's trivial. What about copyrig...

poopooball whats scary is taht the plagerizer here says hes a...

Daniel Brandt Most of the plagiarism in my examples will also se...

Somey Well, I'm certainly impressed! Nice work...

Somey One more thing... Another way they'll probabl...

Joey /

Daniel Brandt The approach here might be to submit the matter fo...

Ashibaka It was Seigenthaler's idea to do a plagiarism...

Daniel Brandt I gotta say, that's pretty cool! Make sure...

Daniel Brandt Actually, Wikipedia lacks tools to convert individ...

Somey If you try to save an actual page from Wikipedia a...

Uly You'll probably want to prepare an argument fo...

Daniel Brandt Somey: If you need Explorer to read them, that mea...

guy Let's hope they do say that. Daniel can point...

Daniel Brandt Let's hope they do say that. Daniel can point...

poopooball looks like plagarist librerian fixed it. http://...

taiwopanfob I guess the obvious should be said if it hasn...

Joey /

Daniel Brandt Look here -- I'm picking up the MSN cache copy...

Surfer For presentation: I like EuroÂ´s suggestion, too...

guy That's unlikely to work for old but still in c...

Joey /

guy I'm not certain what relevance the fact that ...

Joey /

guy Absent a definite article that would expose the i...

Joey /

JohnA The only problem I can see is that Wikipedia may g...

guy I expect they'll say that there are a handful ...

Daniel Brandt Here's how I'm planning on doing each exam...

Somey I say we all block ourselves for 45 minutes, go ma...

Joey ?

JohnA So I wouldn't sweat it, personally. If the m...

Daniel Brandt More tips for Wikipedia critics with their own ser...

Daniel Brandt Citizendium is one example. Another example is the...

Somey It's hard enough to sell a print version that ...

JohnA The problem is that Wikipedia is too big, and by...

« Next Oldest · General Discussion · Next Newest »

2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)

0 Members:

Display Mode: Switch to: Standard · Switch to: Linear+ · Outline

Track this topic · Email this topic · Print this topic · Subscribe to this forum

Lo-Fi Version

Time is now:

FORUM WARNING [2] Cannot modify header information - headers already sent by (output started at /home2/wikipede/public_html/int042kj398.php:242) (Line: 0 of Unknown)