FORUM WARNING [2] Division by zero (Line: 2933 of /srcsgcaop/boardclass.php)
Content contributors -
     
 
The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> General Discussion? What's that all about?

This subforum is for general discussion of Wikipedia and other Wikimedia projects. For a glossary of terms frequently used in such discussions, please refer to Wikipedia:Glossary. For a glossary of musical terms, see here. Other useful links:

Akahele.orgWikipedia-WatchWikitruthWP:ANWikiEN-L/Foundation-L (mailing lists) • Citizendium forums

> Content contributors, statistical analysis
Peter Damian
post
Post #1


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

This post has been edited by Peter Damian:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
 
Reply to this topicStart new topic
Replies
radek
post
Post #2


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 8:01am) *

My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.


Oh yeah Peter, one thing. Your methodology will overestimate "content creation" by admins for ones who hang out mostly at AN/I and AE. More precisely, their high edits per page will come from them posting frequently to these drama boards, rather than working on articles.

There's probably some bias on the other end too. Someone like Piotrus has a edit per page number of 3.75, which is somewhere in the middle. But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.

Actually, I think you could somehow use the Namespace Totals % which are given to separate out the repeated edits to actual articles vs. repeated edits to drama boards and user's talk pages. That would give a more accurate and relevant ratio for your purposes (I'd have to think for a few minutes how to do it which I might)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #3


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(radek @ Sun 30th October 2011, 5:25pm) *

Oh yeah Peter, one thing. Your methodology will overestimate "content creation" by admins for ones who hang out mostly at AN/I and AE. More precisely, their high edits per page will come from them posting frequently to these drama boards, rather than working on articles.

There's probably some bias on the other end too. Someone like Piotrus has a edit per page number of 3.75, which is somewhere in the middle. But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.

Actually, I think you could somehow use the Namespace Totals % which are given to separate out the repeated edits to actual articles vs. repeated edits to drama boards and user's talk pages. That would give a more accurate and relevant ratio for your purposes (I'd have to think for a few minutes how to do it which I might)


Quite correct. That is evident from the pie chart - certain editors have a high 'blue' proportion, which is the WP: prefixed pages. There is no way round that except by selective querying of the database to get only article contributions.

And there are many other ways this figure is skewed. E.g. YellowMonkey has the highest number of FAs, yet a (relatively) low average e.p.p. of 3.69. All I can hope to give is a blunt figure that shows some correlation with our intuitive idea of 'content', namely something that cannot be produced by flitting from page to page, and which requires a long look at a single article, concerning the summary, the meaning of the parts.

Yes, you could use the % of namespace totals as a proxy, but I can think of several reasons why that might be skewed.

At the end of the day, I am trying to give one of many reasons why the concept of 'crowdsourcing' is badly flawed.

QUOTE

But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.


I don't agree with that. If I create 100 pages and give 100 edits to each page, that's a very high e.p.p. of 100. Piotrus is probably contaminating his content work with mechanical repetitive editing. Which I understand well, because I relieve the writer's block doldrums with such activity myself.

QUOTE(radek @ Sun 30th October 2011, 5:53pm) *

Well, he can make whatever crappy analogies he wants to, but it still ain't. I think this just shows that Jimmy doesn't have much of a clue of what a market is or how it functions.


Well, he did publish a peer-reviewed paper on options pricing as part of his Ph.D., so he can't be a complete dunce. I think there are other explanations for why he said those things.

QUOTE(Ottava @ Sun 30th October 2011, 5:47pm) *

QUOTE
Ottava is not given to irony, but I assumed he was here.


I was being 100% honest.


OK so I was right about the bit before the 'but'.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #4


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651





QUOTE
Quite correct. That is evident from the pie chart - certain editors have a high 'blue' proportion, which is the WP: prefixed pages. There is no way round that except by selective querying of the database to get only article contributions.


Well, there's no perfect way of doing it but you could just subtract off the blue to get a probably better estimate.
So edits per article page would be (1-(%wikipedia+%wikipedia talk))*average edits per page

Ideally you'd want to adjust the number of "pages" as well by subtracting AN/I and AE or whatever, but since there aren't that many of these pages it won't get too skewed.

The only possible exception is FAR pages which also count as "wikipedia" (blue) even though a lot of that is obviously content related.

The real difficulty is adjusting for # of edits on users' talk, since there's no way to tell how many different user talk pages a particular person posted to. And a lot of these admins basically spend the majority of their time politickin' on each others' talk pages so that's really something which should be taken into account. For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.
For comparison, only 6.62% of my edits are to users talk.

So none of the above have anything to do with average edits per ARTICLE page. Again, the difficulty is in adjusting both the numerator and denominator here.

Still I think the formula above would give a somewhat better picture of actual edits per article page.

QUOTE
And there are many other ways this figure is skewed. E.g. YellowMonkey has the highest number of FAs, yet a (relatively) low average e.p.p. of 3.69. All I can hope to give is a blunt figure that shows some correlation with our intuitive idea of 'content', namely something that cannot be produced by flitting from page to page, and which requires a long look at a single article, concerning the summary, the meaning of the parts.


Yes, and some people will work on articles on their word processor or sandbox and then just post the ready thing. Others (like me) like to do it bit by bit. So the measure is obviously going to miss that.

QUOTE
Yes, you could use the % of namespace totals as a proxy, but I can think of several reasons why that might be skewed.

At the end of the day, I am trying to give one of many reasons why the concept of 'crowdsourcing' is badly flawed.


Well, any statistic summarizes information, almost by definition. And when you summarize information, by definition, you're going to loose some information (the only alternative is to somehow look at every single edit ever made at Wikipedia simultaneously). That doesn't mean that describing data with statistics is useless.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post
Post #5


blah
*********

Group: Regulars
Posts: 5,919
Joined:
Member No.: 5,066



QUOTE(radek @ Sun 30th October 2011, 1:28pm) *

For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.

All of whom are notoriously contentious and abusive admins.
And none of whom adds very much in article content.

This chart is actually not bad, although there are some exceptions (but not very many).
Bear in mind that many of those "wiki gnomes" are heavy users of bots that scrape
from other websites. I would call them something more descriptive, like "Benders". (IMG:smilys0b23ax56/default/smile.gif)
(That's because Futurama is an extremely popular subject among WP admins....)
(IMG:http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png)

This post has been edited by EricBarbour:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

Posts in this topic
Peter Damian   Content contributors  
SB_Johnny   So why is the situation apparently reversed on Wi...  
Ottava   So why is the situation apparently reversed on W...  
Peter Damian   So why is the situation apparently reversed on W...  
communicat   Peter/Edward, don't know if you've come ac...  
Peter Damian   Peter/Edward, don't know if you've come a...  
Ottava   Peter/Edward, don't know if you've come a...  
radek   My blog post for today [url=http://ocham.blogspot...  
Peter Damian   Wikipedia is not a market. That's interest...  
radek   Wikipedia is not a market. That's interes...  
thekohser   Wikipedia is not a market. For most editors, no,...  
Ottava   Well, there's no perfect way of doing it but ...  
Peter Damian   I have a feeling that you might want to break dow...  
radek   Bear in mind that many of those "wiki gnome...  
timbo   For example, Fetchcommons has 28.26% of his post...  
communicat   PeterEdward, in my experience there's another ...  
Peter Damian   PeterEdward, in my experience there's another...  
communicat   [quote name='communicat' post='287348' date='Sun ...  
Peter Damian   I see no convincing comparison or correlation bet...  
Silver seren   How would you account for the people that work on ...  
Peter Damian   How would you account for the people that work on...  
radek   How would you account for the people that work o...  
Peter Damian   As I mention above, after seeing Jechoman's ...  
radek   As I mention above, after seeing Jechoman's...  
Peter Damian   I think you have uncovered a certain asymmetric p...  
radek   [quote name='radek' post='287361' date='Sun 30th ...  
Peter Damian   [quote name='radek' post='287361' date='Sun 30th...  
radek   [quote name='radek' post='287366' date='Sun 30th ...  
EricBarbour   However, under 'content creators' there i...  
Ottava   Here, I made a matrix (and uploaded it to commons...  
radek   [quote name='radek' post='287366' date='Sun 30th ...  
Malleus   BTW, Malleus is a very clear outlier. Very high %...  
radek   Are you sure you don't mean "outlaw...  
A Horse With No Name   Are you sure you don't mean "outlaw...  
the fieryangel   Are you sure you don't mean "outlaw...  
communicat   No need for "hard demographic data". Ev...  
Ceoil   Oh for fuck sake. If you just wanted to cram in a ...  
communicat   Oh for fuck sake. If you just wanted to cram in a...  
EricBarbour   People this website used to be fun, what happened...  
communicat   Gomi might disagree with you. See his recent mess...  
thekohser   Whoops, sorry, didn't mean that as a personal...  
Maunus   How do I calculate where I fit in the contributor ...  
Peter Damian   How do I calculate where I fit in the contributor...  
radek   How do I calculate where I fit in the contributor...  
Peter Damian   For the record, here are the top 20 scorers. Most...  
timbo   Radek's Chart really nails it. Silver seren m...  
radek   That chart really nails it. Silver seren makes a...  
timbo   Second, the "autoreviewer" thing is a j...  
Peter Damian   A message to me from a Wikipedian. OK I need to...  
dogbiscuit   A message to me from a Wikipedian. OK I need t...  
communicat   A message to me from a Wikipedian. OK I need t...  
communicat   A message to me from a Wikipedian. OK I need t...  
Malleus   A message to me from a Wikipedian. OK I need ...  
communicat   [quote name='communicat' post='287419' date='Mon ...  
Peter Damian   I agree with Ceoil that you (and others in the di...  
thekohser   I suspect you are an idiot. None of my experimen...  
radek   [quote name='communicat' post='287419' date='Mon ...  
communicat   Peter/Edward?Whatever: You're becoming as bad ...  
thekohser   Try this, Peter. Say five nice things about Wikip...  
Ottava   One of the things I noticed is that even if you na...  
EricBarbour   url=http://en.wikipedia.org/w/index.php?title=Kubl...  
Ceoil   Sorry Eric, you make really great, LOUD, tubes (I...  
Peter Damian   Sorry Eric, you make really great, LOUD, tubes (I...  
Peter Damian   Sorry Eric, you make really great, LOUD, tubes (I...  
Ceoil   Peter I'm not accusing you of anything, lets b...  
Ceoil   Hi Peter. I'd like to engage Eric, he is often...  
Peter Damian   Hi Peter. I'd like to engage Eric, he is ofte...  
Ottava   Hi Peter. I'd like to engage Eric, he is ofte...  
Kelly Martin   The problem I have with the proposed statistical ...  
Ceoil   What Kelly said. Peter I was not having a go at ...  
Peter Damian   What Kelly said. Peter I was not having a go at...  
Ceoil   I'm not a hallowed logician like you are, sitt...  
Peter Damian   I'm not a hallowed logician like you are, sit...  
radek   The problem I have with the proposed statistical ...  
Kelly Martin   I'm actually sort of doing this. There are two...  
Peter Damian   The problem I have with the proposed statistical ...  
Kelly Martin   The problem I have with the proposed statistical...  
radek   [quote name='Peter Damian' post='287448' date='Mo...  
Peter Damian   (and in fact I'm somewhat ok with just DEFIN...  
Malleus   (and in fact I'm somewhat ok with just DEFI...  
Ceoil   [quote name='Peter Damian' post='287458' date='Mo...  
Malleus   Your Rfa was not so much a failure as an assassi...  
Ceoil   A point Peter should make is that its a hard and a...  
mbz1   What about those users like me who failed at RfA?...  
Malleus   [quote name='Malleus' post='287477' date='Mon 31s...  
EricBarbour   I had two: this is the first, and here's the ...  
mbz1   [quote name='mbz1' post='287489' date='Tue 1st No...  
Malleus   [quote name='mbz1' post='287489' date='Tue 1st N...  
Kelly Martin   Another way would be to first define what "gn...  
Peter Damian   Another way would be to first define what "g...  
Kelly Martin   What is your qualification in statistics, Kelly?Wh...  
Peter Damian   What is your qualification in statistics, Kelly?W...  
radek   Another way would be to first define what "g...  
Kelly Martin   Well, I'm not going to send off my four-color ...  
SB_Johnny   [quote name='Peter Damian' post='287469' date='Mo...  
Peter Damian   I think that what Kelly was trying to point out h...  
communicat   I think that what Kelly was trying to point out ...  
timbo   Thinking out loud here... Each edit changes artic...  
radek   Thinking out loud here... Each edit changes arti...  
Peter Damian   Also, for the record, here are the first 27 of edi...  
radek   Also, for the record, here are the first 27 of ed...  
Kelly Martin   What I want is a test. That is, I want a decision...  
radek   What I want is a test. That is, I want a decisio...  
Ceoil   Peter I notice two things; one is you are defensiv...  
The Joy   Peter I notice two things; one is you are defensi...  
A Horse With No Name   Peter I notice two things; one is you are defens...  
Malleus   Hey, whatever happened to Ryan's hot girlfrie...  
Peter Damian   Peter I notice two things; one is you are defensi...  
Ottava   Peter I notice two things; one is you are defens...  
Vigilant   [quote name='Peter Damian' post='287509' date='Tu...  
GlassBeadGame   I'm very sorry about this. I really hadn...  
thekohser   As I have already indicated I don't believe a...  
Peter Damian   I'm very sorry about this. I really hadn...  
Ottava   [quote name='GlassBeadGame' post='287567' date='T...  
timbo   The main point actually is to engage with the stu...  
SB_Johnny   By demonstrating that, you have shown that, to t...  
EricBarbour   Is this a case of the blind leading the clueless,...  
SB_Johnny   Is this a case of the blind leading the clueless...  
carbuncle   There are other aspects of WP that haven't be...  
iii   What I want is a [b]test. That is, I want a deci...  
papaya   Well, looking at my pie chart, about half my edits...  
Peter Damian   There are other aspects of WP that haven't be...  
Abd   The kind of research being suggested here would be...  
Detective   Wikiversity may be the only WMF wiki that allows ...  
Anne Sexton   I apologize for jumping into this after 7 pages, w...  
thekohser   This: http://arxiv.org/abs/1002.0561 (maybe you...  
gomi   First, welcome to the Review, and thank you for a ...  
Anne Sexton   First, welcome to the Review, and thank you for a...  
EricBarbour   Welcome to WR, Anne. Just as an aside: one of the...  
Anne Sexton   Welcome to WR, Anne. Just as an aside: one of th...  
Peter Damian   I have updated the editing patterns http://www.log...  


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now: