The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> General Discussion? What's that all about?

This subforum is for general discussion of Wikipedia and other Wikimedia projects. For a glossary of terms frequently used in such discussions, please refer to Wikipedia:Glossary. For a glossary of musical terms, see here. Other useful links:

Akahele.orgWikipedia-WatchWikitruthWP:ANWikiEN-L/Foundation-L (mailing lists) • Citizendium forums

> Wikipedia and privacy
anthony
post Sat 24th April 2010, 2:12am
Post #1


Postmaster
*******

Group: Regulars
Posts: 2,034
Joined: Mon 30th Jul 2007, 1:31am
Member No.: 2,132



QUOTE

On Thu, Apr 22, 2010 at 6:31 PM, Platonides <Platonides@gmail.com> wrote:

S. Nunes wrote:
> Hi all,
>
> I presume that Wikipedia keeps data about HTTP accesses to all articles.
> Can anybody inform me if this data is available for research purposes?

No. With the amount of traffic it has, space needs would be immense, and
Wikimedia is not interested in logging all accesses.


http://lists.wikimedia.org/pipermail/wiki-...ril/000987.html

Did most people here on Wikipedia Review know about this "sampled feed" (I've heard 1/100th all the way up to 1/10th for the sample rate)? Isn't this a huge breach of privacy? Why isn't anyone talking about this?

I'm especially surprised I've never heard Daniel Brandt bring it up. A log of what pages you're reading on Wikipedia is about as sensitive as a log of what searches you're doing on Google, and not only is the Wikimedia Foundation collecting it, but they're giving it out to third parties as well.

Yes, it's sampled, but the privacy policy doesn't say how often the samples are made, I've heard as often as 1/10, and even the lesser figure of 1/100th still presents an unacceptable risk to regular users.

It's not even clear whether or not the procedure of giving out the data for "research purposes" is in compliance with the privacy policy. Yes, the policy mentions sampled data, but it claims "the raw log data is not made public". Now I'm not interested in getting into an argument with some Wikipedia apologist over whether or not the researchers fall under the rubric of "the public", or whether or not the modifications made to the data render it no long "raw" (*). I admit it's ambiguous. The very fact that the privacy policy is so ambiguous is part of the problem.

(*) I would be interested in learning exactly what data *is* being released, in what form, and to whom.

This post has been edited by anthony: Sat 24th April 2010, 2:13am
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

Posts in this topic


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now: 23rd 5 13, 8:27am