|
Content contributors, statistical analysis |
|
|
Peter Damian |
|
I have as much free time as a Wikipedia admin!
Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212
|
My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog. Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work? Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics). So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low. This post has been edited by Peter Damian:
|
|
|
|
|
|
Replies
Peter Damian |
|
I have as much free time as a Wikipedia admin!
Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212
|
QUOTE(Silver seren @ Sun 30th October 2011, 9:02pm) How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.
Yes of course there are a 101 ways in which this number could fail to have the meaning it may have. But then Giano tends to edit in his own space in the way you describe, yet he has one of the highest epp's. All we can say, and all we need to say is that: 1. In general, editors with low epp's tend to perform relatively mechanical low economic value easily learned tasks. We can verify this by looking at their actual contributions. Editors with high epp's tend to be those with lots of FA and GA stars on their page, and who are generally and anecdotally known as so-called content contributors. That proves there is a division of labour in Wikipedia. 2. Low epp's predominate in the admin corps. Hardly surprising, given that the qualities required of an admin are precisely low-value, repetitive tasks, and given that RfA tends to emphasise quantity rather than quality of edits. 3. The theory of crowdsourcing says that this shouldn't happen.
|
|
|
|
radek |
|
Ãœber Member
Group: Regulars
Posts: 699
Joined:
Member No.: 15,651
|
QUOTE(Peter Damian @ Sun 30th October 2011, 4:12pm) QUOTE(Silver seren @ Sun 30th October 2011, 9:02pm) How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.
Yes of course there are a 101 ways in which this number could fail to have the meaning it may have. But then Giano tends to edit in his own space in the way you describe, yet he has one of the highest epp's. All we can say, and all we need to say is that: 1. In general, editors with low epp's tend to perform relatively mechanical low economic value easily learned tasks. We can verify this by looking at their actual contributions. Editors with high epp's tend to be those with lots of FA and GA stars on their page, and who are generally and anecdotally known as so-called content contributors. That proves there is a division of labour in Wikipedia. 2. Low epp's predominate in the admin corps. Hardly surprising, given that the qualities required of an admin are precisely low-value, repetitive tasks, and given that RfA tends to emphasise quantity rather than quality of edits. 3. The theory of crowdsourcing says that this shouldn't happen. As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages. If there was data you could do some regressions here: 1. Dependent variable is a 0/1 dummy for whether a person is an admin or a non-admin. Independent variables are epp, % edits to articles space etc. Run this as a Probit or Logit. 2. Construct a measure of whether a person is a "content creator" by, say, counting up their GAs, FAs and maybe DYKs and just non-redirect articles, weighting these in some way (which would be arbitrary but you could change the weighting to do robustness checks). Then correlate that with epp and % edits to article space. Overall I don't think the idea that there's "division of labor" on Wikipedia is controversial though. And some of that may even be justified. The problem is with the differential awards and over (under) supply of one particular type relative to the other. Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator" (As a further aside, in that Dr. Blofeld discussion that was linked, some moron objects to people objecting to Dr. Blofeld's mass creation of one sentence stubs because "we shouldn't interfere with the work of content creators". In other words, lots of these idiots actually think that auto-creating thousdands of one sentence next to useless stubs is "content creation"!) This post has been edited by radek:
|
|
|
|
Peter Damian |
|
I have as much free time as a Wikipedia admin!
Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212
|
QUOTE(radek @ Sun 30th October 2011, 9:24pm)
As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.
I looked at his edits and he has a large percentage of 'blue' (Wikipedia: pages) which suggests he is part of the peanut gallery. I'm not disagreeing - it's an 'in general' thing. I looked at 720 admin editors and tried in each case of > 4 to explain why it was higher. In nearly all cases the person has a hobby of caterpillars or asteroids, or has FA and GA stars. In most cases of <4, this is not the case. In nearly every case of < 2 the person either is a bot, or acts like one. Interesting that David Gerard got the second lowest score, I should have mentioned that earlier :| QUOTE Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"
Agree again. With all statistical measures, we see if there is broad agreement, look for anomalies, then try and explain them. I will do this study again some time, but using the tool to check 720 edits take exactly 2 days. Access to the database would be wonderful. This post has been edited by Peter Damian:
|
|
|
|
radek |
|
Ãœber Member
Group: Regulars
Posts: 699
Joined:
Member No.: 15,651
|
QUOTE(Peter Damian @ Sun 30th October 2011, 4:32pm) QUOTE(radek @ Sun 30th October 2011, 9:24pm)
As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.
I looked at his edits and he has a large percentage of 'blue' (Wikipedia: pages) which suggests he is part of the peanut gallery. I'm not disagreeing - it's an 'in general' thing. I looked at 720 admin editors and tried in each case of > 4 to explain why it was higher. In nearly all cases the person has a hobby of caterpillars or asteroids, or has FA and GA stars. In most cases of <4, this is not the case. In nearly every case of < 2 the person either is a bot, or acts like one. Interesting that David Gerard got the second lowest score, I should have mentioned that earlier :| QUOTE Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"
Agree again. With all statistical measures, we see if there is broad agreement, look for anomalies, then try and explain them. I will do this study again some time, but using the tool to check 720 edits take exactly 2 days. Access to the database would be wonderful. I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends. I brought up the counter examples above simply because I'm wondering how much of the pattern that is and if it could somehow be controlled for. High % "blue pages" and % "user's talk" I think would be good indicators that a particular editor with a high epp is in the "peanut gallery" category, not the "content creator" category This post has been edited by radek:
|
|
|
|
Peter Damian |
|
I have as much free time as a Wikipedia admin!
Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212
|
QUOTE(radek @ Sun 30th October 2011, 9:37pm) I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.
It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think. Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one. The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'. On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly. This post has been edited by Peter Damian:
|
|
|
|
radek |
|
Ãœber Member
Group: Regulars
Posts: 699
Joined:
Member No.: 15,651
|
QUOTE(Peter Damian @ Sun 30th October 2011, 4:45pm) QUOTE(radek @ Sun 30th October 2011, 9:37pm) I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.
It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think. Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one. The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'. On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly. Well, the other one that you should include is "purple" pages (User talk). But yes, there is some patterns here. Here, I made a matrix (and uploaded it to commons (IMG: smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it: (IMG: http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png) You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative. (and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature) This post has been edited by radek:
|
|
|
|
Ottava |
|
Ãœber Pokemon
Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328
|
QUOTE(radek @ Sun 30th October 2011, 6:00pm) Here, I made a matrix (and uploaded it to commons (IMG: smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it: (IMG: http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png) You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative. (and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature) My percentage in Articles was less than 30%. I still think you are forgetting WP:DYK, WP:GAN, WP:FAC, which moves edits from "article" or "article talk" to Wikipedia. Nevermind, you mentioned that in your next post. By the way, Gatoclass writes very little actual content. He is just an admin that latched onto DYK and used it as his little territory. SandyGeorgia does some article work but very little anymore. This post has been edited by Ottava:
|
|
|
|
radek |
|
Ãœber Member
Group: Regulars
Posts: 699
Joined:
Member No.: 15,651
|
QUOTE(Ottava @ Sun 30th October 2011, 5:52pm) QUOTE(radek @ Sun 30th October 2011, 6:00pm) Here, I made a matrix (and uploaded it to commons (IMG: smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it: (IMG: http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png) You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative. (and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature) My percentage in Articles was less than 30%. I still think you are forgetting WP:DYK, WP:GAN, WP:FAC, which moves edits from "article" or "article talk" to Wikipedia. Nevermind, you mentioned that in your next post. By the way, Gatoclass writes very little actual content. He is just an admin that latched onto DYK and used it as his little territory. SandyGeorgia does some article work but very little anymore. You are, for once, right on this. I'm actually taking down some of this data for various people and you come up as a "Someone who uses Wikipedia as Facebook" but I don't think you were that - well, not that much - correction, you come up as "Drama Queen"... hmm, maybe not that far off. This is actually very similar to the problem that someone like SandyGeorgia comes up as indistinguishable along these two dimensions from someone like Baseball Bugs. And all of that has to do with the fact that the soxred data does not distinguish between "Posting to AN/I way too much" from "Reviewing GAs and FAs" - it counts both under "Wikipedia" but qualitatively these are very different things. So... I'm still tweaking it. If anyone can point me to a statistic which would allow me to distinguish "Posting to ANI way too much" from "Reviewing GAs" (or similar) kind of people then I would appreciate it. For some editors who "opted in" into the whole soxred thing you can do it, but most haven't. Other than that, the only thing I can think of is to take an editor's last 1000 or so contributions and see what % were to ANI, AE etc. But that's a buttload of work at this point. BTW, Malleus is a very clear outlier. Very high % in article space and pretty high % epp. Very clearly a "content contributor". Giano not so much (though still in that cell). Update: Here's a bit of what I have so far: (IMG: http://upload.wikimedia.org/wikipedia/commons/3/34/Div_of_labor2.png) Again, the basic problem is that given the data, in the "warm colors" category (red and orange) it is impossible to distinguish people who use WP:whatever type pages (the blue pages) for what could essentially be considered legitimate uses (reviewing FAs etc.) vs. people who are fucking around (playing on ANI, politicking on talk pages) Also, related to the other thread, someone like Dr. Blofeld shows up as a "wiki gnome" because they mass create a lot of one or two sentence stubs. This means their article % is high, but since he never goes back to see what happened to the children he sired he has a low epp. In this case I think "wiki gnome" is not too inaccurate (cough cough), so I'm not bothered by this. Overall I think this illustrates some of the above discussion. This post has been edited by radek:
|
|
|
|
Malleus |
|
Fat Cat
Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716
|
QUOTE(radek @ Sun 30th October 2011, 11:02pm) BTW, Malleus is a very clear outlier. Very high % in article space and pretty high % epp. Very clearly a "content contributor". Giano not so much (though still in that cell). Update: Here's a bit of what I have so far: (IMG: http://upload.wikimedia.org/wikipedia/commons/3/34/Div_of_labor2.png) Are you sure you don't mean "outlaw" rather than "outlier"?
|
|
|
|
Posts in this topic
Peter Damian Content contributors SB_Johnny
So why is the situation apparently reversed on Wi... Ottava
So why is the situation apparently reversed on W... Peter Damian
So why is the situation apparently reversed on W... communicat Peter/Edward, don't know if you've come ac... Peter Damian
Peter/Edward, don't know if you've come a... Ottava
Peter/Edward, don't know if you've come a... radek
My blog post for today [url=http://ocham.blogspot... Peter Damian
Wikipedia is not a market.
That's interest... radek
Wikipedia is not a market.
That's interes... thekohser
Wikipedia is not a market.
For most editors, no,... radek
My blog post for today http://ocham.blogspot.com/... Peter Damian
Oh yeah Peter, one thing. Your methodology will o... radek
Well, there's no perfect way of doing it but... Ottava
Well, there's no perfect way of doing it but ... Peter Damian
I have a feeling that you might want to break dow... EricBarbour
For example, Fetchcommons has 28.26% of his posts... radek
Bear in mind that many of those "wiki gnome... timbo
For example, Fetchcommons has 28.26% of his post... communicat PeterEdward, in my experience there's another ... Peter Damian
PeterEdward, in my experience there's another... communicat
[quote name='communicat' post='287348' date='Sun ... Peter Damian
I see no convincing comparison or correlation bet... Peter Damian
[quote name='radek' post='287361' date='Sun 30th... radek
[quote name='radek' post='287366' date='Sun 30th ... EricBarbour
However, under 'content creators' there i... A Horse With No Name
Are you sure you don't mean "outlaw... the fieryangel
Are you sure you don't mean "outlaw... communicat
No need for "hard demographic data". Ev... Ceoil Oh for fuck sake. If you just wanted to cram in a ... communicat
Oh for fuck sake. If you just wanted to cram in a... EricBarbour
People this website used to be fun, what happened... communicat
Gomi might disagree with you. See his recent mess... thekohser
Whoops, sorry, didn't mean that as a personal... Maunus How do I calculate where I fit in the contributor ... Peter Damian
How do I calculate where I fit in the contributor... radek
How do I calculate where I fit in the contributor... Peter Damian For the record, here are the top 20 scorers. Most... timbo Radek's Chart really nails it.
Silver seren m... radek
That chart really nails it.
Silver seren makes a... timbo
Second, the "autoreviewer" thing is a j... Peter Damian A message to me from a Wikipedian.
OK I need to... dogbiscuit
A message to me from a Wikipedian.
OK I need t... communicat
A message to me from a Wikipedian.
OK I need t... communicat
A message to me from a Wikipedian.
OK I need t... Malleus
A message to me from a Wikipedian.
OK I need ... communicat
[quote name='communicat' post='287419' date='Mon ... Peter Damian
I agree with Ceoil that you (and others in the di... thekohser
I suspect you are an idiot.
None of my experimen... radek
[quote name='communicat' post='287419' date='Mon ... communicat Peter/Edward?Whatever: You're becoming as bad ... thekohser Try this, Peter. Say five nice things about Wikip... Ottava One of the things I noticed is that even if you na... EricBarbour url=http://en.wikipedia.org/w/index.php?title=Kubl... Ceoil Sorry Eric, you make really great, LOUD, tubes (I... Peter Damian
Sorry Eric, you make really great, LOUD, tubes (I... Peter Damian
Sorry Eric, you make really great, LOUD, tubes (I... Ceoil Peter I'm not accusing you of anything, lets b... Ceoil Hi Peter. I'd like to engage Eric, he is often... Peter Damian
Hi Peter. I'd like to engage Eric, he is ofte... Ottava
Hi Peter. I'd like to engage Eric, he is ofte... Kelly Martin The problem I have with the proposed statistical ... Ceoil What Kelly said.
Peter I was not having a go at ... Peter Damian
What Kelly said.
Peter I was not having a go at... Ceoil I'm not a hallowed logician like you are, sitt... Peter Damian
I'm not a hallowed logician like you are, sit... radek
The problem I have with the proposed statistical ... Kelly Martin I'm actually sort of doing this. There are two... Peter Damian
The problem I have with the proposed statistical ... Kelly Martin
The problem I have with the proposed statistical... radek
[quote name='Peter Damian' post='287448' date='Mo... Peter Damian
(and in fact I'm somewhat ok with just DEFIN... Malleus
(and in fact I'm somewhat ok with just DEFI... Ceoil
[quote name='Peter Damian' post='287458' date='Mo... Malleus
Your Rfa was not so much a failure as an assassi... Ceoil A point Peter should make is that its a hard and a... mbz1
What about those users like me who failed at RfA?... Malleus
[quote name='Malleus' post='287477' date='Mon 31s... EricBarbour
I had two: this is the first, and here's the ... mbz1
[quote name='mbz1' post='287489' date='Tue 1st No... Malleus
[quote name='mbz1' post='287489' date='Tue 1st N... Kelly Martin Another way would be to first define what "gn... Peter Damian
Another way would be to first define what "g... Kelly Martin What is your qualification in statistics, Kelly?Wh... Peter Damian
What is your qualification in statistics, Kelly?W... radek
Another way would be to first define what "g... Kelly Martin Well, I'm not going to send off my four-color ... SB_Johnny
[quote name='Peter Damian' post='287469' date='Mo... Peter Damian
I think that what Kelly was trying to point out h... communicat
I think that what Kelly was trying to point out ... timbo Thinking out loud here...
Each edit changes artic... radek
Thinking out loud here...
Each edit changes arti... Peter Damian Also, for the record, here are the first 27 of edi... radek
Also, for the record, here are the first 27 of ed... Kelly Martin What I want is a test. That is, I want a decision... radek
What I want is a test. That is, I want a decisio... Ceoil Peter I notice two things; one is you are defensiv... The Joy
Peter I notice two things; one is you are defensi... A Horse With No Name
Peter I notice two things; one is you are defens... Malleus
Hey, whatever happened to Ryan's hot girlfrie... Peter Damian
Peter I notice two things; one is you are defensi... Ottava
Peter I notice two things; one is you are defens... Vigilant
[quote name='Peter Damian' post='287509' date='Tu... GlassBeadGame
I'm very sorry about this. I really hadn... thekohser
As I have already indicated I don't believe a... Peter Damian
I'm very sorry about this. I really hadn... Ottava
[quote name='GlassBeadGame' post='287567' date='T... timbo
The main point actually is to engage with the stu... SB_Johnny
By demonstrating that, you have shown that, to t... EricBarbour
Is this a case of the blind leading the clueless,... SB_Johnny
Is this a case of the blind leading the clueless... carbuncle
There are other aspects of WP that haven't be... iii
What I want is a [b]test. That is, I want a deci... papaya Well, looking at my pie chart, about half my edits... Peter Damian
There are other aspects of WP that haven't be... Abd The kind of research being suggested here would be... Detective
Wikiversity may be the only WMF wiki that allows ... Anne Sexton I apologize for jumping into this after 7 pages, w... thekohser
This: http://arxiv.org/abs/1002.0561 (maybe you... gomi First, welcome to the Review, and thank you for a ... Anne Sexton
First, welcome to the Review, and thank you for a... EricBarbour Welcome to WR, Anne.
Just as an aside: one of the... Anne Sexton
Welcome to WR, Anne.
Just as an aside: one of th... Peter Damian I have updated the editing patterns http://www.log...
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
| |