The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> Checkuser data retention
gomi
post Mon 15th September 2008, 5:12pm
Post #1


Member
********

Group: Members
Posts: 3,022
Joined: Fri 17th Nov 2006, 6:38pm
Member No.: 565




here:

QUOTE
Tim Starling tstarling at wikimedia.org
Thu Sep 11 03:11:52 UTC 2008

Jon wrote:
> I could not find this in the privacy policy... however, what is
> Wikimedia's current data retention policy? That is to ask, how long do
> projects keep data for use in tools such as checkuser?

CheckUser data used to be kept for 3 months, but Aaron recently increased
it to 5 months. I'm not sure why or on whose authority.

<http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUser.php?r1=39734&r2=40620>

-- Tim Starling



User is offlineProfile CardPM
Go to the top of the page
+Quote Post
 
Reply to this topicStart new topic
Replies(1 - 17)
Rootology
post Mon 15th September 2008, 5:25pm
Post #2


Fat Cat
******

Group: Regulars
Posts: 1,489
Joined: Fri 26th Jan 2007, 11:11pm
Member No.: 877



Kelly said recently that the Checkuser data was moved to it's own separate table that hadn't been cleared since November 2007. Whats the discrepancy here?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Mon 15th September 2008, 5:38pm
Post #3


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(Rootology @ Mon 15th September 2008, 12:25pm) *
Kelly said recently that the Checkuser data was moved to it's own separate table that hadn't been cleared since November 2007. Whats the discrepancy here?
Probably my misunderstanding of the changes to the checkuser code. Those changes meant that checkuser data could potentially be kept indefinitely, which I misinterpreted to mean that they were being kept indefinitely. My mistake.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
gomi
post Mon 15th September 2008, 5:48pm
Post #4


Member
********

Group: Members
Posts: 3,022
Joined: Fri 17th Nov 2006, 6:38pm
Member No.: 565



I should hasten to add that there are probably site-specific overrides possible, and it is well within the realm of possibility that Wikipedia uses one. That would explain why Starling was so cavalier about giving out the above information.

More on the subject, from the same thread:

QUOTE
Tim Starling tstarling at wikimedia.org
Thu Sep 11 04:26:16 UTC 2008

Gregory Maxwell wrote:
> I think Jon was inquiring about more than just checkuser (notice the
> "such as"). I would assume that anyone asking about data retention in
> general is not overly concerned with the specific modes of retention,
> but is more concerned with the maximum retention time (across all
> modes) of any particular type of private data.

The other logs are not automatically rotated, and need to be manually
purged. The retention time is thus not consistent. Typically we have kept
around 6 months of data. There are error logs, and logs for various kinds
of special requests. They are not used for sockpuppet investigation.

I've said in the past that I think 6 months would be a reasonable horizon
for all private data -- it would give us plenty of data for operations,
and would be a far shorter period than that used by the large commercial
websites.

-- Tim Starling


User is offlineProfile CardPM
Go to the top of the page
+Quote Post
C H
post Mon 15th September 2008, 8:59pm
Post #5


Junior Member
**

Group: Contributors
Posts: 51
Joined: Wed 19th Apr 2006, 6:50pm
Member No.: 142



Note, Tim Starling has since reverted the change to the CheckUser data maximum age, with the comment "If you want such a policy change, have an open discussion about it, don't get together with some troll-hunting mates on a private mailing list and make your own rules."

That makes me curious as to what he's talking about. Who is this developer Aaron and who are his "troll-hunting mates?" Is the private mailing list being referred to the global checkuser list or another one? I notice that Aaron is the one who added the email logging to checkuser, which Brion reverted, and then added a "toned-down version" of the email logging, which Brion also reverted.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Rootology
post Mon 15th September 2008, 9:01pm
Post #6


Fat Cat
******

Group: Regulars
Posts: 1,489
Joined: Fri 26th Jan 2007, 11:11pm
Member No.: 877



Links to the mail reversions?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Rootology
post Mon 15th September 2008, 9:11pm
Post #7


Fat Cat
******

Group: Regulars
Posts: 1,489
Joined: Fri 26th Jan 2007, 11:11pm
Member No.: 877



The curious question here is: does each project actually use the standard Mediawiki software we're watching developers revert war over, here? Is this the EXACT trunk and SQL schema that English Wikipedia uses, and that Commons users, and Meta, and WikiQuote?

IF it is, someone with enough familiarity with all of this and the public records may be able to ferret out all sorts of valuable information.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
C H
post Mon 15th September 2008, 9:15pm
Post #8


Junior Member
**

Group: Contributors
Posts: 51
Joined: Wed 19th Apr 2006, 6:50pm
Member No.: 142



QUOTE(C H @ Mon 15th September 2008, 4:12pm) *

QUOTE(Rootology @ Mon 15th September 2008, 4:01pm) *

Links to the mail reversions?

Just scroll down that same page.


Here and here.

Note, it looks like the email logging was re-enabled here.

This post has been edited by C H: Mon 15th September 2008, 9:19pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Random832
post Tue 16th September 2008, 12:34am
Post #9


meh
*******

Group: Regulars
Posts: 1,933
Joined: Thu 14th Feb 2008, 8:52pm
Member No.: 4,844

WP user page - talk
check - contribs



QUOTE(Rootology @ Mon 15th September 2008, 9:11pm) *

The curious question here is: does each project actually use the standard Mediawiki software we're watching developers revert war over, here? Is this the EXACT trunk and SQL schema that English Wikipedia uses, and that Commons users, and Meta, and WikiQuote?

IF it is, someone with enough familiarity with all of this and the public records may be able to ferret out all sorts of valuable information.


There's no history of what versions go live when (and not every version goes live at all), but - otherwise, yeah, that's the software - there's some local configuration variables though.

Some of these are at http://noc.wikimedia.org/conf/

I can probably decipher some of this if there's anything in particular.

This post has been edited by Random832: Tue 16th September 2008, 12:36am
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Tue 16th September 2008, 1:02am
Post #10


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(C H @ Mon 15th September 2008, 4:15pm) *

QUOTE(C H @ Mon 15th September 2008, 4:12pm) *

QUOTE(Rootology @ Mon 15th September 2008, 4:01pm) *

Links to the mail reversions?

Just scroll down that same page.


Here and here.

Note, it looks like the email logging was re-enabled here.
The email logging was reverted originally because they conflicted with SUL; that was a technical decision, not a policy one; it was fully intended that they'd be brought back once the conflict was resolved.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
anthony
post Tue 16th September 2008, 2:53am
Post #11


Postmaster
*******

Group: Regulars
Posts: 2,034
Joined: Mon 30th Jul 2007, 1:31am
Member No.: 2,132



QUOTE(C H @ Mon 15th September 2008, 8:59pm) *

Who is this developer Aaron


Aaron Schulz a.k.a. User:Voice of All?

Is there an up-to-date list of all CVS commiters?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
jch
post Fri 19th September 2008, 6:29am
Post #12


Quickly running out of Cache
***

Group: Contributors
Posts: 136
Joined: Sun 5th Aug 2007, 3:56am
Member No.: 2,249

WP user page - talk
check - contribs



QUOTE(anthony @ Tue 16th September 2008, 2:53am) *

QUOTE(C H @ Mon 15th September 2008, 8:59pm) *

Who is this developer Aaron


Aaron Schulz a.k.a. User:Voice of All?

Is there an up-to-date list of all CVS commiters?

http://svn.wikimedia.org/users.php

Also, it's SVN, not CVS. I think CVS is a drugstore...
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Lar
post Fri 19th September 2008, 4:28pm
Post #13


"His blandness goes to 11!"
*******

Group: Regulars
Posts: 2,116
Joined: Wed 26th Dec 2007, 6:04pm
From: A large LEGO storage facility
Member No.: 4,290



QUOTE(jch @ Fri 19th September 2008, 2:29am) *

QUOTE(anthony @ Tue 16th September 2008, 2:53am) *

QUOTE(C H @ Mon 15th September 2008, 8:59pm) *

Who is this developer Aaron


Aaron Schulz a.k.a. User:Voice of All?

Is there an up-to-date list of all CVS commiters?

http://svn.wikimedia.org/users.php

Also, it's SVN, not CVS. I think CVS is a drugstore...


SVN aims to be a replacement for CVS, which in turn aimed to be a replacement for RCS... These are all "Open" version control systems as opposed to say, PVCS or VSS, which are "Closed" (with respect to version control this refers to whether a module is locked or whether multiple can work on it at once and then you merge the changes, not Open/Closed in the free software sense)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Rootology
post Fri 19th September 2008, 5:57pm
Post #14


Fat Cat
******

Group: Regulars
Posts: 1,489
Joined: Fri 26th Jan 2007, 11:11pm
Member No.: 877



Last night I poked around all the listed materials out of curiosity, and darned if I can find where wgCUDMaxAge and the retention for CU data is actually stored on the separate table. I have a feeling it's out of view. Since this is the principle "privacy" matter (especially after Poetgate) that everyone is always worried about, I'm honestly wondering if the benefits of hiding this information from being displayed in a clear fashion outweigh the possible harm.

This post has been edited by Rootology: Fri 19th September 2008, 5:58pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
jch
post Sat 20th September 2008, 2:10am
Post #15


Quickly running out of Cache
***

Group: Contributors
Posts: 136
Joined: Sun 5th Aug 2007, 3:56am
Member No.: 2,249

WP user page - talk
check - contribs



QUOTE(Lar @ Fri 19th September 2008, 4:28pm) *

QUOTE(jch @ Fri 19th September 2008, 2:29am) *

QUOTE(anthony @ Tue 16th September 2008, 2:53am) *

QUOTE(C H @ Mon 15th September 2008, 8:59pm) *

Who is this developer Aaron


Aaron Schulz a.k.a. User:Voice of All?

Is there an up-to-date list of all CVS commiters?

http://svn.wikimedia.org/users.php

Also, it's SVN, not CVS. I think CVS is a drugstore...


SVN aims to be a replacement for CVS, which in turn aimed to be a replacement for RCS... These are all "Open" version control systems as opposed to say, PVCS or VSS, which are "Closed" (with respect to version control this refers to whether a module is locked or whether multiple can work on it at once and then you merge the changes, not Open/Closed in the free software sense)


That was what we Internets users call a "joke" Lar. I think you're spending too much time around your OTRS-using humorless sweetie.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Sat 20th September 2008, 2:31am
Post #16


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(Rootology @ Fri 19th September 2008, 12:57pm) *

Last night I poked around all the listed materials out of curiosity, and darned if I can find where wgCUDMaxAge and the retention for CU data is actually stored on the separate table. I have a feeling it's out of view. Since this is the principle "privacy" matter (especially after Poetgate) that everyone is always worried about, I'm honestly wondering if the benefits of hiding this information from being displayed in a clear fashion outweigh the possible harm.
This would be configured in the LocalSettings.php file, which cannot be directly published as it contains passwords for the database engine and other information that would be Bad to let get out. There is a redacted LocalSettings.php file somewhere on Wikimedia's site, but it is rather out of date, as I recall.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Krimpet
post Sat 20th September 2008, 2:46am
Post #17


Senior Member
****

Group: Regulars
Posts: 402
Joined: Mon 16th Jul 2007, 3:44am
From: Rochester, NY
Member No.: 1,975

WP user page - talk
check - contribs



QUOTE(Rootology @ Fri 19th September 2008, 1:57pm) *

Last night I poked around all the listed materials out of curiosity, and darned if I can find where wgCUDMaxAge and the retention for CU data is actually stored on the separate table. I have a feeling it's out of view. Since this is the principle "privacy" matter (especially after Poetgate) that everyone is always worried about, I'm honestly wondering if the benefits of hiding this information from being displayed in a clear fashion outweigh the possible harm.


From CheckUser.php:
CODE

    # Every 100th edit, prune the checkuser changes table.
    wfSeedRandom();
    if( 0 == mt_rand( 0, 99 ) ) {
        # Periodically flush old entries from the recentchanges table.
        global $wgCUDMaxAge;
        $cutoff = $dbw->timestamp( time() - $wgCUDMaxAge );
        $recentchanges = $dbw->tableName( 'cu_changes' );
        $sql = "DELETE FROM $recentchanges WHERE cuc_timestamp < '{$cutoff}'";
        $dbw->query( $sql );
    }


Also, while Wikimedia's LocalSettings does require() a file called "PrivateSettings.php" that's not world-viewable that contains database passwords and the like, it's loaded before the CheckUser extension is, meaning any attempts to hide CU settings in there would be overridden by the defaults when CheckUser is loaded.

I think it can be confidently said, then, that CU data on Wikimedia is deleted after the default 90 days. smile.gif
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
tarantino
post Sat 20th September 2008, 2:44pm
Post #18


the Dude abides
******

Group: Regulars
Posts: 1,439
Joined: Mon 30th Jul 2007, 11:41pm
Member No.: 2,143



QUOTE(Krimpet @ Sat 20th September 2008, 2:46am) *

I think it can be confidently said, then, that CU data on Wikimedia is deleted after the default 90 days. smile.gif

This is confirmed by Tim Starling -
QUOTE

It's the same everywhere, it's three months. Neither the Board nor the
executive have expressed any desire to make that decision, but they are
free to weigh in if they want to. We chose the three month figure as a
compromise between privacy advocates and troll hunters.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now: 25th 5 13, 4:08pm