FORUM WARNING [2] Division by zero (Line: 2933 of /srcsgcaop/boardclass.php)
FORUM WARNING [2] Division by zero (Line: 2943 of /srcsgcaop/boardclass.php)
Content contributors -
     
 
The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> General Discussion? What's that all about?

This subforum is for general discussion of Wikipedia and other Wikimedia projects. For a glossary of terms frequently used in such discussions, please refer to Wikipedia:Glossary. For a glossary of musical terms, see here. Other useful links:

Akahele.orgWikipedia-WatchWikitruthWP:ANWikiEN-L/Foundation-L (mailing lists) • Citizendium forums

 
Reply to this topicStart new topic
> Content contributors, statistical analysis
Peter Damian
post
Post #121


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

This post has been edited by Peter Damian:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
SB_Johnny
post
Post #122


It wasn't me who made honky-tonk angels
*******

Group: Regulars
Posts: 2,128
Joined:
Member No.: 8,272



QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am) *

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #123


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



QUOTE(SB_Johnny @ Sun 30th October 2011, 9:15am) *

QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am) *

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.



You'd be surprised. Editing protected pages, history merges, moving over redirects, suppressing redirects, etc., are all extremely valuable to editing content.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #124


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(SB_Johnny @ Sun 30th October 2011, 1:15pm) *

QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am) *

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.


I didn't understand the 'to be fair' bit.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #125


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



Peter/Edward, don't know if you've come across this, written by a fellow logician. It might or might not answer some of your questions, and it provides some useful references.
http://knol.google.com/k/carl-hewitt/corru...ip_by_Wikipedia
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #126


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(communicat @ Sun 30th October 2011, 4:19pm) *

Peter/Edward, don't know if you've come across this, written by a fellow logician. It might or might not answer some of your questions, and it provides some useful references.
http://knol.google.com/k/carl-hewitt/corru...ip_by_Wikipedia


Thanks, but yes, actually I am familiar with that one.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #127


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 8:01am) *

My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.


I think you already answered your own question - supply and demand, while always present, lead to the outcomes you describe (higher paid for scarcer skills) only in functioning markets. Wikipedia is not a market. So rewards are not necessarily related to productivity or usefulness but rather determined through a messy social and political process (who's got what friends).



QUOTE(Ottava @ Sun 30th October 2011, 8:17am) *

QUOTE(SB_Johnny @ Sun 30th October 2011, 9:15am) *

QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am) *

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.



You'd be surprised. Editing protected pages, history merges, moving over redirects, suppressing redirects, etc., are all extremely valuable to editing content.


Eh, they may be useful but are neither necessary nor even "extremely valuable".
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #128


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(radek @ Sun 30th October 2011, 4:31pm) *

Wikipedia is not a market.


That's interesting because analysis of Wales' early posts to the lists in 2001 suggests a market economy was exactly what he had in mind. That's why he was so heavy on not biasing the outcome by having content committees or editors in chief and so on.



QUOTE

QUOTE(Ottava @ Sun 30th October 2011, 8:17am) *

You'd be surprised. Editing protected pages, history merges, moving over redirects, suppressing redirects, etc., are all extremely valuable to editing content.


Eh, they may be useful but are neither necessary nor even "extremely valuable".


Ottava is not given to irony, but I assumed he was here.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #129


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 8:01am) *

My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.


Oh yeah Peter, one thing. Your methodology will overestimate "content creation" by admins for ones who hang out mostly at AN/I and AE. More precisely, their high edits per page will come from them posting frequently to these drama boards, rather than working on articles.

There's probably some bias on the other end too. Someone like Piotrus has a edit per page number of 3.75, which is somewhere in the middle. But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.

Actually, I think you could somehow use the Namespace Totals % which are given to separate out the repeated edits to actual articles vs. repeated edits to drama boards and user's talk pages. That would give a more accurate and relevant ratio for your purposes (I'd have to think for a few minutes how to do it which I might)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #130


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



QUOTE(communicat @ Sun 30th October 2011, 12:19pm) *

Peter/Edward, don't know if you've come across this, written by a fellow logician. It might or might not answer some of your questions, and it provides some useful references.
http://knol.google.com/k/carl-hewitt/corru...ip_by_Wikipedia



Hewitt doesn't understand that the Stewards haven't respected the Foundation since 2008 and that very few people ever actually listened to Jimbo to begin with. (Not to say that the few who did weren't powerful, but the Commons matter shows that Jimbo was only given some power when he was towing the party line).



Peter

QUOTE
Ottava is not given to irony, but I assumed he was here.


I was being 100% honest. History merges were an annoying thing that was utterly important to me many times. It is annoying to have to go find an admin, link the different pages, and hope they get it right instead of being able to do a bunch of history merges in a row. Remember, I was building a dozen or so articles on average in my user space and then moving them out where many of them had articles. The merging of histories was a valuable addition.

Also, seeing deleted pages is important if you are trying to see what was there before when trying to recreate something. Editing protected pages is good for when you are working on different pages, want to update DYK queue, etc. Importing is another feature that I used. Suppressing redirect was handy quite regularly to me. And it is annoying if you want to move out a page to something that is a redirect already.

This post has been edited by Ottava:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #131


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 12:23pm) *

QUOTE(radek @ Sun 30th October 2011, 4:31pm) *

Wikipedia is not a market.


That's interesting because analysis of Wales' early posts to the lists in 2001 suggests a market economy was exactly what he had in mind. That's why he was so heavy on not biasing the outcome by having content committees or editors in chief and so on.


Well, he can make whatever crappy analogies he wants to, but it still ain't. I think this just shows that Jimmy doesn't have much of a clue of what a market is or how it functions.

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #132


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(radek @ Sun 30th October 2011, 5:25pm) *

Oh yeah Peter, one thing. Your methodology will overestimate "content creation" by admins for ones who hang out mostly at AN/I and AE. More precisely, their high edits per page will come from them posting frequently to these drama boards, rather than working on articles.

There's probably some bias on the other end too. Someone like Piotrus has a edit per page number of 3.75, which is somewhere in the middle. But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.

Actually, I think you could somehow use the Namespace Totals % which are given to separate out the repeated edits to actual articles vs. repeated edits to drama boards and user's talk pages. That would give a more accurate and relevant ratio for your purposes (I'd have to think for a few minutes how to do it which I might)


Quite correct. That is evident from the pie chart - certain editors have a high 'blue' proportion, which is the WP: prefixed pages. There is no way round that except by selective querying of the database to get only article contributions.

And there are many other ways this figure is skewed. E.g. YellowMonkey has the highest number of FAs, yet a (relatively) low average e.p.p. of 3.69. All I can hope to give is a blunt figure that shows some correlation with our intuitive idea of 'content', namely something that cannot be produced by flitting from page to page, and which requires a long look at a single article, concerning the summary, the meaning of the parts.

Yes, you could use the % of namespace totals as a proxy, but I can think of several reasons why that might be skewed.

At the end of the day, I am trying to give one of many reasons why the concept of 'crowdsourcing' is badly flawed.

QUOTE

But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.


I don't agree with that. If I create 100 pages and give 100 edits to each page, that's a very high e.p.p. of 100. Piotrus is probably contaminating his content work with mechanical repetitive editing. Which I understand well, because I relieve the writer's block doldrums with such activity myself.

QUOTE(radek @ Sun 30th October 2011, 5:53pm) *

Well, he can make whatever crappy analogies he wants to, but it still ain't. I think this just shows that Jimmy doesn't have much of a clue of what a market is or how it functions.


Well, he did publish a peer-reviewed paper on options pricing as part of his Ph.D., so he can't be a complete dunce. I think there are other explanations for why he said those things.

QUOTE(Ottava @ Sun 30th October 2011, 5:47pm) *

QUOTE
Ottava is not given to irony, but I assumed he was here.


I was being 100% honest.


OK so I was right about the bit before the 'but'.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
thekohser
post
Post #133


Member
*********

Group: Regulars
Posts: 10,274
Joined:
Member No.: 911



QUOTE(radek @ Sun 30th October 2011, 12:31pm) *

Wikipedia is not a market.


For most editors, no, it's not.

Me, though... I just received another $100 PayPal payment for some fairly simple work on Wikipedia.

(IMG:smilys0b23ax56/default/evilgrin.gif)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #134


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



PeterEdward, in my experience there's another category of wikipedian apart from admins with low-value skills and actual 'content contributors'. I'm referring of course to the category of "supervisor", namely the fact that for every content contributor there seem to be at least three or four extremely tedious and irritating "supervisors", not necessarily admins or productive editors, who constantly nit-pick and tell the content contributor how they think the edit should be done or what should or should not be included. Needless to say, these "supervisors" never, but never, make any edits or content contributions of their own. (Possibly because they already know from some past experience how much shit they'd have to put up with if ever they did try to make a useful contribution).

This post has been edited by communicat:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #135


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651





QUOTE
Quite correct. That is evident from the pie chart - certain editors have a high 'blue' proportion, which is the WP: prefixed pages. There is no way round that except by selective querying of the database to get only article contributions.


Well, there's no perfect way of doing it but you could just subtract off the blue to get a probably better estimate.
So edits per article page would be (1-(%wikipedia+%wikipedia talk))*average edits per page

Ideally you'd want to adjust the number of "pages" as well by subtracting AN/I and AE or whatever, but since there aren't that many of these pages it won't get too skewed.

The only possible exception is FAR pages which also count as "wikipedia" (blue) even though a lot of that is obviously content related.

The real difficulty is adjusting for # of edits on users' talk, since there's no way to tell how many different user talk pages a particular person posted to. And a lot of these admins basically spend the majority of their time politickin' on each others' talk pages so that's really something which should be taken into account. For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.
For comparison, only 6.62% of my edits are to users talk.

So none of the above have anything to do with average edits per ARTICLE page. Again, the difficulty is in adjusting both the numerator and denominator here.

Still I think the formula above would give a somewhat better picture of actual edits per article page.

QUOTE
And there are many other ways this figure is skewed. E.g. YellowMonkey has the highest number of FAs, yet a (relatively) low average e.p.p. of 3.69. All I can hope to give is a blunt figure that shows some correlation with our intuitive idea of 'content', namely something that cannot be produced by flitting from page to page, and which requires a long look at a single article, concerning the summary, the meaning of the parts.


Yes, and some people will work on articles on their word processor or sandbox and then just post the ready thing. Others (like me) like to do it bit by bit. So the measure is obviously going to miss that.

QUOTE
Yes, you could use the % of namespace totals as a proxy, but I can think of several reasons why that might be skewed.

At the end of the day, I am trying to give one of many reasons why the concept of 'crowdsourcing' is badly flawed.


Well, any statistic summarizes information, almost by definition. And when you summarize information, by definition, you're going to loose some information (the only alternative is to somehow look at every single edit ever made at Wikipedia simultaneously). That doesn't mean that describing data with statistics is useless.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #136


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(communicat @ Sun 30th October 2011, 6:29pm) *

PeterEdward, in my experience there's another category of wikipedian apart from admins with low-value skills and actual 'content contributors'. I'm referring of course to the category of "supervisor", namely the fact that for every content contributor there seem to be at least three or four extremely tedious and irritating "supervisors", not necessarily admins or productive editors, who constantly nit-pick and tell the content contributor how they think the edit should be done or what should or should not be included. Needless to say, these "supervisors" never, but never, make any edits or content contributions of their own. (Possibly because they already know from some past experience how much shit they'd have to put up with if ever they did try to make a useful contribution).


Isn't this similar to the way the Red Army used to have 'political officers'?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Silver seren
post
Post #137


Senior Member
****

Group: Contributors
Posts: 470
Joined:
Member No.: 36,940



How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #138


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Silver seren @ Sun 30th October 2011, 9:02pm) *

How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.


Yes of course there are a 101 ways in which this number could fail to have the meaning it may have. But then Giano tends to edit in his own space in the way you describe, yet he has one of the highest epp's.

All we can say, and all we need to say is that:

1. In general, editors with low epp's tend to perform relatively mechanical low economic value easily learned tasks. We can verify this by looking at their actual contributions. Editors with high epp's tend to be those with lots of FA and GA stars on their page, and who are generally and anecdotally known as so-called content contributors. That proves there is a division of labour in Wikipedia.

2. Low epp's predominate in the admin corps. Hardly surprising, given that the qualities required of an admin are precisely low-value, repetitive tasks, and given that RfA tends to emphasise quantity rather than quality of edits.

3. The theory of crowdsourcing says that this shouldn't happen.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #139


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



QUOTE(radek @ Sun 30th October 2011, 4:28pm) *

Well, there's no perfect way of doing it but you could just subtract off the blue to get a probably better estimate.
So edits per article page would be (1-(%wikipedia+%wikipedia talk))*average edits per page

Ideally you'd want to adjust the number of "pages" as well by subtracting AN/I and AE or whatever, but since there aren't that many of these pages it won't get too skewed.

The only possible exception is FAR pages which also count as "wikipedia" (blue) even though a lot of that is obviously content related.

The real difficulty is adjusting for # of edits on users' talk, since there's no way to tell how many different user talk pages a particular person posted to. And a lot of these admins basically spend the majority of their time politickin' on each others' talk pages so that's really something which should be taken into account. For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.
For comparison, only 6.62% of my edits are to users talk.

So none of the above have anything to do with average edits per ARTICLE page. Again, the difficulty is in adjusting both the numerator and denominator here.

Still I think the formula above would give a somewhat better picture of actual edits per article page.



Just for curiosity's sake: my user talk page percentage was 30.02%. My article percentage was 25.53%. I made an average of 8.61 average edits per page.

I also participated in over 300 different FAC reviews ("Wikipedia" page) and many DYK related matters (also "Wikipedia" page related).


I have a feeling that you might want to break down where exactly the people are editing. Perhaps a much better way of determining "content" contributors are those who add large amounts of bytes to an article that aren't part of an undo? (It would be hard to remove all of the undos though).
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #140


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 4:12pm) *

QUOTE(Silver seren @ Sun 30th October 2011, 9:02pm) *

How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.


Yes of course there are a 101 ways in which this number could fail to have the meaning it may have. But then Giano tends to edit in his own space in the way you describe, yet he has one of the highest epp's.

All we can say, and all we need to say is that:

1. In general, editors with low epp's tend to perform relatively mechanical low economic value easily learned tasks. We can verify this by looking at their actual contributions. Editors with high epp's tend to be those with lots of FA and GA stars on their page, and who are generally and anecdotally known as so-called content contributors. That proves there is a division of labour in Wikipedia.

2. Low epp's predominate in the admin corps. Hardly surprising, given that the qualities required of an admin are precisely low-value, repetitive tasks, and given that RfA tends to emphasise quantity rather than quality of edits.

3. The theory of crowdsourcing says that this shouldn't happen.


As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.

If there was data you could do some regressions here:

1. Dependent variable is a 0/1 dummy for whether a person is an admin or a non-admin. Independent variables are epp, % edits to articles space etc. Run this as a Probit or Logit.

2. Construct a measure of whether a person is a "content creator" by, say, counting up their GAs, FAs and maybe DYKs and just non-redirect articles, weighting these in some way (which would be arbitrary but you could change the weighting to do robustness checks). Then correlate that with epp and % edits to article space.

Overall I don't think the idea that there's "division of labor" on Wikipedia is controversial though. And some of that may even be justified. The problem is with the differential awards and over (under) supply of one particular type relative to the other.

Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"

(As a further aside, in that Dr. Blofeld discussion that was linked, some moron objects to people objecting to Dr. Blofeld's mass creation of one sentence stubs because "we shouldn't interfere with the work of content creators". In other words, lots of these idiots actually think that auto-creating thousdands of one sentence next to useless stubs is "content creation"!)

This post has been edited by radek:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #141


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Ottava @ Sun 30th October 2011, 9:18pm) *

I have a feeling that you might want to break down where exactly the people are editing. Perhaps a much better way of determining "content" contributors are those who add large amounts of bytes to an article that aren't part of an undo? (It would be hard to remove all of the undos though).


It all depends what you want to prove. I am trying simply to see if there is a simple statistical measure that suggests, with some degree of confidence, that there is a division of labour between 'content contributors' and 'gnomes'. We all know anecdotally that this exists, but here is an objective measure. The fact that the measure, like all statistical measures, only shows this with a certain degree of confidence, but no absolute certainty, does not matter. To be sure, some people we think of as content contributors have gnome-like characteristics (e.g. YellowMonkey). But we know that too, from his edits, when he was active.

The other point is that low epp count is indicative of low-value added - monotonous, easily learned repetitive labour.

The final point is that this low-value labour gives you high status on Wikipedia, unlike in the real world.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #142


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(radek @ Sun 30th October 2011, 9:24pm) *


As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.


I looked at his edits and he has a large percentage of 'blue' (Wikipedia: pages) which suggests he is part of the peanut gallery. I'm not disagreeing - it's an 'in general' thing. I looked at 720 admin editors and tried in each case of > 4 to explain why it was higher. In nearly all cases the person has a hobby of caterpillars or asteroids, or has FA and GA stars. In most cases of <4, this is not the case. In nearly every case of < 2 the person either is a bot, or acts like one.

Interesting that David Gerard got the second lowest score, I should have mentioned that earlier :|

QUOTE

Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"


Agree again. With all statistical measures, we see if there is broad agreement, look for anomalies, then try and explain them.

I will do this study again some time, but using the tool to check 720 edits take exactly 2 days. Access to the database would be wonderful.

This post has been edited by Peter Damian:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #143


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 4:32pm) *

QUOTE(radek @ Sun 30th October 2011, 9:24pm) *


As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.


I looked at his edits and he has a large percentage of 'blue' (Wikipedia: pages) which suggests he is part of the peanut gallery. I'm not disagreeing - it's an 'in general' thing. I looked at 720 admin editors and tried in each case of > 4 to explain why it was higher. In nearly all cases the person has a hobby of caterpillars or asteroids, or has FA and GA stars. In most cases of <4, this is not the case. In nearly every case of < 2 the person either is a bot, or acts like one.

Interesting that David Gerard got the second lowest score, I should have mentioned that earlier :|

QUOTE

Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"


Agree again. With all statistical measures, we see if there is broad agreement, look for anomalies, then try and explain them.

I will do this study again some time, but using the tool to check 720 edits take exactly 2 days. Access to the database would be wonderful.


I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.

I brought up the counter examples above simply because I'm wondering how much of the pattern that is and if it could somehow be controlled for. High % "blue pages" and % "user's talk" I think would be good indicators that a particular editor with a high epp is in the "peanut gallery" category, not the "content creator" category

This post has been edited by radek:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #144


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



For the record, here are the top 20 scorers. Most of them are consistent with the hypothesis. Even FT2, whose top articles were Zoophilia, Labrador Retriever and Neurolinguistic Programming.

Marine 69-71 is Tony the Marine.

Zero0000 7.28
Maunus 7.36
Jmh649 7.38
Jehochman 7.46
Happyme22 7.77
AnemoneProjectors 7.87
Cailil 7.97
Masem 8.05
Mike Cline 8.23
Stephan Schulz 8.28
Cbl62 9.05
Gatoclass 9.2
FT2 9.27
Gwen Gale 9.27
SlimVirgin 9.4
Slrubenstein 9.66
COGDEN 9.67
Marine 69-71 10.08
Moni3 12.72
Wehwalt 20.51
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #145


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(radek @ Sun 30th October 2011, 9:37pm) *

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.


It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.

This post has been edited by Peter Damian:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #146


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



QUOTE(Peter Damian @ Sun 30th October 2011, 11:01pm) *

QUOTE(communicat @ Sun 30th October 2011, 6:29pm) *

PeterEdward, in my experience there's another category of wikipedian apart from admins with low-value skills and actual 'content contributors'. I'm referring of course to the category of "supervisor", namely the fact that for every content contributor there seem to be at least three or four extremely tedious and irritating "supervisors", not necessarily admins or productive editors, who constantly nit-pick and tell the content contributor how they think the edit should be done or what should or should not be included. Needless to say, these "supervisors" never, but never, make any edits or content contributions of their own. (Possibly because they already know from some past experience how much shit they'd have to put up with if ever they did try to make a useful contribution).


Isn't this similar to the way the Red Army used to have 'political officers'?

I see no convincing comparison or correlation between the Red Army's political commissars and WP's self-appointed supervisors. But if it's correlations you're after, try the one that exists between the decline of the American-dominated WP and the decline of the American economy -- (not to mention the decline in American international prestige following its disasterous interventions in Iraq and Afghanistan, and the disaster that's sure to follow in newly "liberated" Libya). Very few people outside of America attach much credibility these days to anything perceived to be American or American-based, (including even or especially WP).
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #147


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(communicat @ Sun 30th October 2011, 9:47pm) *

I see no convincing comparison or correlation between the Red Army's political commissars and WP's self-appointed supervisors.



That reminds me of another study I need to complete. I was looking at regularly blocked content creators such as Malleus, Giano and, er, myself.

There was a regular pattern of one admin blocks for some supposed offence, and then another admin unblocks. You can easily put this into a table with 'block' on one side and 'unblock' on another.

Now, if we were to plot the binary block/unblock against epp, what would we get, I wonder?

Suggestions or guesses please.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #148


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 4:45pm) *

QUOTE(radek @ Sun 30th October 2011, 9:37pm) *

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.


It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.


Well, the other one that you should include is "purple" pages (User talk). But yes, there is some patterns here.

Here, I made a matrix (and uploaded it to commons (IMG:smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

(IMG:http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png)

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

(and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature)

This post has been edited by radek:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #149


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(radek @ Sun 30th October 2011, 10:00pm) *

QUOTE(Peter Damian @ Sun 30th October 2011, 4:45pm) *

QUOTE(radek @ Sun 30th October 2011, 9:37pm) *

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.


It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.


Well, the other one that you should include is "purple" pages (User talk). But yes, there is some patterns here.

Here, I made a matrix (and uploaded it to commons (IMG:smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

(IMG:http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png)

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.


I think that hits the nail on the head.

However, under 'content creators' there is a further subdivision into those who create long and boring articles sourced from weather and hurricane reports, or articles about video games, and those who don't.

There's very little remainder, actually.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #150


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Sun 30th October 2011, 5:05pm) *

QUOTE(radek @ Sun 30th October 2011, 10:00pm) *

QUOTE(Peter Damian @ Sun 30th October 2011, 4:45pm) *

QUOTE(radek @ Sun 30th October 2011, 9:37pm) *

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.


It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.


Well, the other one that you should include is "purple" pages (User talk). But yes, there is some patterns here.

Here, I made a matrix (and uploaded it to commons (IMG:smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

(IMG:http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png)

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.


I think that hits the nail on the head.

However, under 'content creators' there is a further subdivision into those who create long and boring articles sourced from weather and hurricane reports, or articles about video games, and those who don't.

There's very little remainder, actually.


Looking at some editors epps and %s, an allowance should be made for people who run certain projects. For example both Gatoclass and SandyGeorgia would show up in the "Drama Queens" category. They have high epps because they post a lot to the same project page (DYKs and GAs) and low % article space counts for the very same reason (or because they post to user talk to notify people that their articles are being reviewed/approved etc.)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #151


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



QUOTE(radek @ Sun 30th October 2011, 6:00pm) *

Here, I made a matrix (and uploaded it to commons (IMG:smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

(IMG:http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png)

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

(and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature)



My percentage in Articles was less than 30%. I still think you are forgetting WP:DYK, WP:GAN, WP:FAC, which moves edits from "article" or "article talk" to Wikipedia. Nevermind, you mentioned that in your next post.

By the way, Gatoclass writes very little actual content. He is just an admin that latched onto DYK and used it as his little territory. SandyGeorgia does some article work but very little anymore.

This post has been edited by Ottava:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #152


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Ottava @ Sun 30th October 2011, 5:52pm) *

QUOTE(radek @ Sun 30th October 2011, 6:00pm) *

Here, I made a matrix (and uploaded it to commons (IMG:smilys0b23ax56/default/wink.gif)) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

(IMG:http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png)

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

(and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature)



My percentage in Articles was less than 30%. I still think you are forgetting WP:DYK, WP:GAN, WP:FAC, which moves edits from "article" or "article talk" to Wikipedia. Nevermind, you mentioned that in your next post.

By the way, Gatoclass writes very little actual content. He is just an admin that latched onto DYK and used it as his little territory. SandyGeorgia does some article work but very little anymore.


You are, for once, right on this. I'm actually taking down some of this data for various people and you come up as a "Someone who uses Wikipedia as Facebook" but I don't think you were that - well, not that much - correction, you come up as "Drama Queen"... hmm, maybe not that far off. This is actually very similar to the problem that someone like SandyGeorgia comes up as indistinguishable along these two dimensions from someone like Baseball Bugs. And all of that has to do with the fact that the soxred data does not distinguish between "Posting to AN/I way too much" from "Reviewing GAs and FAs" - it counts both under "Wikipedia" but qualitatively these are very different things.

So... I'm still tweaking it. If anyone can point me to a statistic which would allow me to distinguish "Posting to ANI way too much" from "Reviewing GAs" (or similar) kind of people then I would appreciate it. For some editors who "opted in" into the whole soxred thing you can do it, but most haven't. Other than that, the only thing I can think of is to take an editor's last 1000 or so contributions and see what % were to ANI, AE etc. But that's a buttload of work at this point.

BTW, Malleus is a very clear outlier. Very high % in article space and pretty high % epp. Very clearly a "content contributor". Giano not so much (though still in that cell).

Update:

Here's a bit of what I have so far:

(IMG:http://upload.wikimedia.org/wikipedia/commons/3/34/Div_of_labor2.png)

Again, the basic problem is that given the data, in the "warm colors" category (red and orange) it is impossible to distinguish people who use WP:whatever type pages (the blue pages) for what could essentially be considered legitimate uses (reviewing FAs etc.) vs. people who are fucking around (playing on ANI, politicking on talk pages)

Also, related to the other thread, someone like Dr. Blofeld shows up as a "wiki gnome" because they mass create a lot of one or two sentence stubs. This means their article % is high, but since he never goes back to see what happened to the children he sired he has a low epp. In this case I think "wiki gnome" is not too inaccurate (cough cough), so I'm not bothered by this. Overall I think this illustrates some of the above discussion.

This post has been edited by radek:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post
Post #153


blah
*********

Group: Regulars
Posts: 5,919
Joined:
Member No.: 5,066



QUOTE(radek @ Sun 30th October 2011, 1:28pm) *

For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.

All of whom are notoriously contentious and abusive admins.
And none of whom adds very much in article content.

This chart is actually not bad, although there are some exceptions (but not very many).
Bear in mind that many of those "wiki gnomes" are heavy users of bots that scrape
from other websites. I would call them something more descriptive, like "Benders". (IMG:smilys0b23ax56/default/smile.gif)
(That's because Futurama is an extremely popular subject among WP admins....)
(IMG:http://upload.wikimedia.org/wikipedia/commons/5/59/DIV_LABOR_WIKI.png)

This post has been edited by EricBarbour:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #154


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(EricBarbour @ Sun 30th October 2011, 6:29pm) *


Bear in mind that many of those "wiki gnomes" are heavy users of bots that scrape
from other websites. I would call them something more descriptive, like "Benders". (IMG:smilys0b23ax56/default/smile.gif)


Yes, very much so, as the chart right above illustrates. Here's where you get into semantics - mass creating a bunch of next-to-useless stubs is "gnomish" so the category is appropriate, just not refined enough. I do wish that I could somehow just a get a huge data dump on all active (more than 100 edits per month) editors, to see what the relative supply of each kind is.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post
Post #155


blah
*********

Group: Regulars
Posts: 5,919
Joined:
Member No.: 5,066



QUOTE(Peter Damian @ Sun 30th October 2011, 3:05pm) *

However, under 'content creators' there is a further subdivision into those who create long and boring articles sourced from weather and hurricane reports, or articles about video games, and those who don't.

Yes, there is a reasonably clear division between obsessives and people who write on varied
article subjects. The real obsessives have truly disturbing contribs. Like the several guys who
can't stop talking about hurricanes, or the Doctor Who nerds.

This is why Jimbo's old comment that he "didn't want bias" from obsessive editors is so pathetic.
Because that's exactly what he's got. In spades.

This post has been edited by EricBarbour:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
timbo
post
Post #156


Member
***

Group: Contributors
Posts: 102
Joined:
Member No.: 21,141



Radek's Chart really nails it.

Silver seren makes a good point that some excellent content creators probably write outside of mainspace and then transfer everything at once. So there would be substantial content creation with minimal edits per page in this situation. I suppose a really scientific study would somehow include kilobytes of content incorporated into the first edit of a newly created page and weigh that into the equation.

Myself, I like to build a framework, write a lead, add a couple lines to subsections and a source or two to keep the wolves at bay, and then to do the writing in mainspace.

As to the question why gnomish administrative sorts have high status and content creators low status, that's an ongoing sore spot with me. I think that part of it is to be corrected by simple consciousness raising among those who write. Speaking for myself, I felt somehow vindicated or rewarded or whatever the term is when I was given autoreviewed status -- so that new articles come through the front gate without being highlighted in yellow and therefore tampered with by gnomish administrative sorts for no good reason.

The New Articles spooler is akin to a shark tank sometimes, some of those reviewing new work are only semi-competent, working too fast, meddling too much. Obviously, there's a lot of swill rolling through the door that needs to be stopped, but it's still a source of annoyance to just get started and then have a series of edit conflicts with meddlesome new page patrollers.

My prescription for WP would be to have autoreviewed status made into a bigger deal as a mechanism for rewarding content creators.

I also wouldn't mind the gnomish sorts being taken down a peg by renaming "administrators" as "janitors." That would balance the field. But that's pettiness on my part, I suppose, owing to an aversion to people of that personality type and their clique mentality...

I think there are some administrative tools that would be useful for content creators. Being able to see deleted files would be a boon now and then -- but that ultimately is a pretty minor tidbit in the big scheme of things; certainly nothing worth undergoing the Lord of the Flies gauntlet of THOSE people.

tim

This post has been edited by timbo:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #157


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(timbo @ Sun 30th October 2011, 8:57pm) *

That chart really nails it.

Silver seren makes a good point that some excellent content creators probably write outside of mainspace and then transfer everything at once. So there would be substantial content creation with minimal edits per page in this situation. I suppose a really scientific study would somehow include kilobytes of content incorporated into the first edit of a newly created page and weigh that into the equation.

Myself, I like to build a framework, write a lead, add a couple lines to subsections and a source or two to keep the wolves at bay, and then to do the writing in mainspace.

As to the question why gnomish administrative sorts have high status and content creators low status, that's an ongoing sore spot with me. I think that part of it is to be corrected by simple consciousness raising among those who write. Speaking for myself, I felt somehow vindicated or rewarded or whatever the term is when I was given autoreviewed status -- so that new articles come through the front gate without being highlighted in yellow and therefore tampered with by gnomish administrative sorts for no good reason.

The New Articles spooler is akin to a shark tank sometimes, some of those reviewing new work are only semi-competent, working too fast, meddling too much. Obviously, there's a lot of swill rolling through the door that needs to be stopped, but it's still a source of annoyance to just get started and then have a series of edit conflicts with meddlesome new page patrollers.

My prescription for WP would be to have autoreviewed status made into a bigger deal as a mechanism for rewarding content creators.

I also wouldn't mind the gnomish sorts being taken down a peg by renaming "administrators" as "janitors." That would balance the field. But that's pettiness on my part, I suppose, owing to an aversion to people of that personality type and their clique mentality...

I think there are some administrative tools that would be useful for content creators. Being able to see deleted files would be a boon now and then -- but that ultimately is a pretty minor tidbit in the big scheme of things; certainly nothing worth undergoing the Lord of the Flies gauntlet of THOSE people.

tim


In case you're wondering you're in the "Wiki Gnome" category. Which, perhaps, just goes to show, that the above chart is about the TYPE of contributions and not really about the QUALITY of such - in my mind it's useful to get the TYPE distribution down first. I mean, some AN/I commentatin' might actually be "of quality" or something. But I don't really know you so it could be quality.

Second, the "autoreviewer" thing is a joke. Anyone who has managed to make a few edits without getting blocked as a vandal can get it. They throw these bones to you to make you think you're "important". You're not. (Neither am I)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
timbo
post
Post #158


Member
***

Group: Contributors
Posts: 102
Joined:
Member No.: 21,141



QUOTE(EricBarbour @ Sun 30th October 2011, 4:29pm) *

QUOTE(radek @ Sun 30th October 2011, 1:28pm) *

For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.

All of whom are notoriously contentious and abusive admins. ***



I noticed Sarek's name on the Resigned Administrators list today. I've bumped bellies with him in the past once or twice... I don't think he's a bad person, just not temperamentally suited for power tools, in my estimation. I think maybe he has come to the same conclusion.

Here's hoping he has second wind as a content creator...

tim
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
timbo
post
Post #159


Member
***

Group: Contributors
Posts: 102
Joined:
Member No.: 21,141



QUOTE(radek @ Sun 30th October 2011, 7:11pm) *

Second, the "autoreviewer" thing is a joke. Anyone who has managed to make a few edits without getting blocked as a vandal can get it. They throw these bones to you to make you think you're "important". You're not. (Neither am I)


It made my life easier and less stressful and I value it.

Everybody is important and nobody is important. Getting some acknowledgement that one's work is noticed by others is good. Give a cowardly lion a medal and it makes him courageous.


tim
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #160


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)


OK I need to develop this ‘critical awareness’ about Wikipedia. Can anyone help me here?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
dogbiscuit
post
Post #161


Could you run through Verifiability not Truth once more?
********

Group: Members
Posts: 2,972
Joined:
From: The Midlands
Member No.: 4,015



QUOTE(Peter Damian @ Mon 31st October 2011, 9:50am) *

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)


OK I need to develop this ‘critical awareness’ about Wikipedia. Can anyone help me here?

He is simply saying that you are biased by your experiences (or at minimum are seen as biased by your experiences) and you need to see that.

That you have a number of hypotheses about Wikipedia that can be construed as anti-project is probably fair comment; whether your experiences have made you uncritical and you cannot see that in yourself I wouldn't care to judge.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #162


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



QUOTE(Peter Damian @ Mon 31st October 2011, 10:50am) *

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)


OK I need to develop this ‘critical awareness’ about Wikipedia. Can anyone help me here?

Maybe try critical awareness of Mark Twain's famous quote: "Lies, damned lies, and statistics". ?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
thekohser
post
Post #163


Member
*********

Group: Regulars
Posts: 10,274
Joined:
Member No.: 911



Try this, Peter. Say five nice things about Wikipedia, and say them like you mean them. We can then see if you have the ability to objectively evaluate that cess pit.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #164


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



QUOTE(Peter Damian @ Mon 31st October 2011, 10:50am) *

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)


OK I need to develop this ‘critical awareness’ about Wikipedia. Can anyone help me here?


By "critical awareness", he/she may be referring to an analysis that draws from knowledge across the social sciences and humanities -- not one that appears presently to be relying exclusively a quantitative analytical approach, (both in this topic as in its current fork marked "Content contributors").

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Malleus
post
Post #165


Fat Cat
******

Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716



QUOTE(communicat @ Mon 31st October 2011, 5:36pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 10:50am) *

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)


OK I need to develop this ‘critical awareness’ about Wikipedia. Can anyone help me here?


By "critical awareness", he/she may be referring to an analysis that draws from knowledge across the social sciences and humanities -- not one that appears presently to be relying exclusively a quantitative analytical approach, (both in this topic as in its current fork marked "Content contributors").

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.

I think that demonstrates a fundamental misunderstanding of the scientific method, perhaps one that Peter shares. The point of a hypothesis is to state it in such a way that it is susceptible to empirical investigation designed to disprove it, not to prove it. And to suggest that a qualitative approach may be more objective than a quantitative one is just risible.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #166


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(communicat @ Mon 31st October 2011, 5:36pm) *

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.


I suspect you are an idiot. Can you please read carefully the original post http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html and please tell me whether I was advancing or proving any hypothesis, and if so, what hypothesis you think I was advancing or trying to prove?

Please note the bit in the post that says "before you leap to conclusions".

[edit] Think also of this limiting case. I write an entire new article off-wiki, and then save it onto Wikipedia, links and all, and I never return to that article. I then write another article off-wiki and save that into Wikipedia. Repeat another 98 times. Thus I have written 100 complete articles, of the sort that would normally require 1,000’s of edits. Yet my average epp = 1, exactly. I was suggesting that we shouldn’t leap to the natural conclusion that low epp = low value, or not ‘content creator’ or anything like that.

This post has been edited by Peter Damian:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
thekohser
post
Post #167


Member
*********

Group: Regulars
Posts: 10,274
Joined:
Member No.: 911



QUOTE(Peter Damian @ Mon 31st October 2011, 2:12pm) *

I suspect you are an idiot.

None of my experiments involving Communicat have been able to disprove that hypothesis, Peter.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #168


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Mon 31st October 2011, 1:12pm) *

QUOTE(communicat @ Mon 31st October 2011, 5:36pm) *

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.


I suspect you are an idiot. Can you please read carefully the original post http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html and please tell me whether I was advancing or proving any hypothesis, and if so, what hypothesis you think I was advancing or trying to prove?

Please note the bit in the post that says "before you leap to conclusions".

[edit] Think also of this limiting case. I write an entire new article off-wiki, and then save it onto Wikipedia, links and all, and I never return to that article. I then write another article off-wiki and save that into Wikipedia. Repeat another 98 times. Thus I have written 100 complete articles, of the sort that would normally require 1,000’s of edits. Yet my average epp = 1, exactly. I was suggesting that we shouldn’t leap to the natural conclusion that low epp = low value, or not ‘content creator’ or anything like that.


That's right, I'm the one who went ahead and made that leap for you (with caveats and stuff)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #169


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



One of the things I noticed is that even if you narrow down who are content contributors, you still have a lot of problematic contributors.

Take this for example. The guy altered many cited statements and made them 100% opposite of what the source says. The guy then adds a lot of blatant original research contradicted by other parts of the page that are cited. This is a highly read page and though he was reverted twice with people pointing out that he was adding original research, he is still allowed to continue it and his additions are now the current version of the page.

These people are rampant.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post
Post #170


blah
*********

Group: Regulars
Posts: 5,919
Joined:
Member No.: 5,066



QUOTE(Ottava @ Mon 31st October 2011, 12:32pm) *
url=http://en.wikipedia.org/w/index.php?title=Kubla_Khan&diff=458271697&oldid=454934213]Take this for example[/url]. The guy altered many cited statements and made them 100% opposite of what the source says. The guy then adds a lot of blatant original research contradicted by other parts of the page that are cited. This is a highly read page and though he was reverted twice with people pointing out that he was adding original research, he is still allowed to continue it and his additions are now the current version of the page.

These people are rampant.

Those are "subtle vandals". I think there might be a few hundred of them, usually sticking to
certain areas (like the guy who uses his IP address to falsify British football statistics).
Wikipedia cannot deal with them, it is too corrupt and incompetent. One can't even figure out
how much subtle vandalism is going on because their changes look legitimate. It might be
possible to write a complex script to check simple things like sports stats, but you'd need a
verifiable database to check against, and it would be a big job. The people who could and SHOULD
do this, the guys who write editing and vandalism bots, won't. Because they would have to work
very hard to produce a script that is reliable, and because they don't care. Diddling Wikipedia is
supposed to be "fun", not work.

At the end of the day, Wikipedia is not an "encyclopedia". It is a fundraising scam.

They have to produce statistics that show the volunteer userbase isn't declining, so they wave
around the increase of total articles and the raw edit stats.
Nothing about the QUALITY of those articles. Nothing about the QUALITY of the edits.
Figuring out "quality" would cost a lot of money and their remaining "dedicated" volunteer
community is full of total raving flakes and fools, who don't want to hear there is a "problem".
So no one makes important changes, more and more bots generate crap content, and the
whole thing slowly declines.

I meant what I said in the other thread: Wikipedia will go the way of dmoz.org.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #171


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



Sorry Eric, you make really great, LOUD, tubes (I'm a serious analog noise fan and follow), but you are another preson who is at the moment looking through a self reinforcing prism. Speaking as a person there and suffering since 2006, I'm enamoured with the project in a lot of ways, but really really disillusioned in a lot of other ways too. Ye guys can guess. There is a lot of discent within the project, notice Moni3's posts in the last few days, but the point is that its not reflexive, its thinking and constructive. Its not polemic, which is what I was trying to say to Peter.

This post has been edited by Ceoil:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #172


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Ceoil @ Mon 31st October 2011, 8:13pm) *

Sorry Eric, you make really great, LOUD, tubes (I'm a serious analog noise fan and follow), but you are another preson who is at the moment looking through a self reinforcing prism. Speaking as a person there and suffering since 2006, I'm enamoured with the project in a lot of ways, but really really disillusioned in a lot of other ways too. Ye guys can guess. There is a lot of discent within the project, notice Moni3's posts in the last few days, but the point is that its not reflexive, its thinking and constructive. Its not polemic, which is what I was trying to say to Peter.


Ah hello Ceoil, I had forgotten you post here.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #173


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Ceoil @ Mon 31st October 2011, 8:13pm) *

Sorry Eric, you make really great, LOUD, tubes (I'm a serious analog noise fan and follow), but you are another preson who is at the moment looking through a self reinforcing prism. Speaking as a person there and suffering since 2006, I'm enamoured with the project in a lot of ways, but really really disillusioned in a lot of other ways too. Ye guys can guess. There is a lot of discent within the project, notice Moni3's posts in the last few days, but the point is that its not reflexive, its thinking and constructive. Its not polemic, which is what I was trying to say to Peter.


Ah hello Ceoil, I had forgotten you post here.

But where is my polemic? My original blog post had some very tentative conclusions.

[edit] There is also a maxim in logic that if you going to criticise an argument, you have to say what is wrong with it. I generally always try to give arguments. To accuse a logician of 'polemic' is a fairly bad accusation, not to be taken lightly.

This post has been edited by Peter Damian:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #174


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



Hi Peter. I'd like to engage Eric, he is often astute but ruined by a perception of bitterness on the part of the faithful. I think he his criticism would be more valuable if he dropped the veneer. God knows the project is lacking introspection, and tends to shoot messengers. <four tides: Ceoil>

This post has been edited by Ceoil:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #175


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



Peter I'm not accusing you of anything, lets be fair. But I know the traps and would like to advise you if you'll listen.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #176


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Ceoil @ Mon 31st October 2011, 8:35pm) *

Hi Peter. I'd like to engage Eric, he is often astute but ruined by a perception of bitterness on the part of the faithful. I think he his criticism would be more valuable if he dropped the veneer. God knows the project is lacking introspection, and tends to shoot messengers. <four tides: Ceoil>


But you were having a go at me. Lacking self-awareness of whatever. What is the truth that you can see, that I am unable to see?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #177


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



QUOTE(Malleus @ Mon 31st October 2011, 8:08pm) *

QUOTE(communicat @ Mon 31st October 2011, 5:36pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 10:50am) *

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)


OK I need to develop this ‘critical awareness’ about Wikipedia. Can anyone help me here?


By "critical awareness", he/she may be referring to an analysis that draws from knowledge across the social sciences and humanities -- not one that appears presently to be relying exclusively a quantitative analytical approach, (both in this topic as in its current fork marked "Content contributors").

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.

I think that demonstrates a fundamental misunderstanding of the scientific method, perhaps one that Peter shares. The point of a hypothesis is to state it in such a way that it is susceptible to empirical investigation designed to disprove it, not to prove it. And to suggest that a qualitative approach may be more objective than a quantitative one is just risible.

What I'm suggesting is that the quantitative approach just ain't working in this particular instance of Peter/Edward's stated intention of writing a book about WP (see Peter/Edward-initiated topic "New book about WP"). Even if the quantitative approach was working, which it is not, nobody in their right mind is going to go out of their way to buy a book about WP stubs and shit. That's all I'm suggesting.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post
Post #178


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



The problem I have with the proposed statistical "rules" that have been presented in this discussion is that they're all ad hoc, rather than empirical. That is, instead of taking a sample of edits or editors, categorizing them by inspection, and then doing an analysis of variance (or some other regression analysis) to identify metrics that are correlated with the already-determined categorizations, you instead identify metrics that you have a priori decided ought to correspond with categorizations. That's methodologically bankrupt; a decisional rule that uses a metric as proxy to categorize members of a population has to be empirically justified, and not just backdoored in by handwaving.

And it's not enough to generate a statistic and then look to see if the extremes fit your hypothesis (e.g. Peter's post giving the "top scorers" on some metric which I think is edits per page); such an analysis is vulnerable to confirmation bias. You need to look at a broad sample from the entire population, not just the three-sigma tail, if you want an actual predictive rule.

From where I sit the "statistical" evidence I've seen posted in this thread ranges from inadequate to farcical. Let's take radek's four-way categorization. Not hard to test it: Take a sample of about 50 editors, categorize them by inspection (not of their statistics, but of their apparent behavior based on examining their edits) into the categories provided. Then generate the statistics radek proposes, and run the numbers to see if the metrics really do predict the categorization, and with what degree of certainty. Until you actually do this, you're just pissing into the wind.

(This is on my mind at the moment because I've been reading some of the materials related to the dual-polarization radar that the NWS just put up here in Chicago. They've done a lot of research to try to come up with rules to translate the various additional metrics the new radar offers into actionable information such as "hail", "snow", "freezing rain", and "graupel". While they have some concepts of what they think will happen, the actual methodology used in the field is based on collecting the metrics and cross-correlating it with "ground truth" reports of what is actually going on in the field. They're doing it right, which is why they can predict the weather, and you can't.)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #179


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



What Kelly said.

Peter I was not having a go at you at all. I'm a blunt person, trying here to influence your methology, which I think is at the moment skewed. We can talk, it doesnt have to be all or nothing.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #180


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155




Peter/Edward?Whatever: You're becoming as bad as Kohs, forking and convoluting topics, and soliciting people here at WR to go read your blogs posted elsewhere. But never mind, at least you've not yet started soliciting public donations, as Kohs does at Wikipedia Review.

This post has been edited by communicat:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #181


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Kelly Martin @ Mon 31st October 2011, 3:49pm) *

The problem I have with the proposed statistical "rules" that have been presented in this discussion is that they're all ad hoc, rather than empirical. That is, instead of taking a sample of edits or editors, categorizing them by inspection, and then doing an analysis of variance (or some other regression analysis) to identify metrics that are correlated with the already-determined categorizations, you instead identify metrics that you have a priori decided ought to correspond with categorizations. That's methodologically bankrupt; a decisional rule that uses a metric as proxy to categorize members of a population has to be empirically justified, and not just backdoored in by handwaving.

And it's not enough to generate a statistic and then look to see if the extremes fit your hypothesis (e.g. Peter's post giving the "top scorers" on some metric which I think is edits per page); such an analysis is vulnerable to confirmation bias. You need to look at a broad sample from the entire population, not just the three-sigma tail, if you want an actual predictive rule.

From where I sit the "statistical" evidence I've seen posted in this thread ranges from inadequate to farcical. Let's take radek's four-way categorization. Not hard to test it: Take a sample of about 50 editors, categorize them by inspection (not of their statistics, but of their apparent behavior based on examining their edits) into the categories provided. Then generate the statistics radek proposes, and run the numbers to see if the metrics really do predict the categorization, and with what degree of certainty. Until you actually do this, you're just pissing into the wind.


I'm actually sort of doing this. There are two difficulties however. First, is how to sample these 50 editors. I can just pull people off the top of my head or what have you but I'm wary of some kind of bias - basically, I'm not sure how to randomly select these 50 people (this isn't a problem - at least to first approx - with articles, since we have the Random Article feature). The second part, as already mentioned is that for the low epp editors there's no way to distinguish "Posting a Lot at AN/I" from "Running Featured Article Reviews" because soxred counts both as edits to WP. The only way I can think of separating it out is by manually looking at the last 1000 or so edits of a particular editor and counting up the proportion of times they posted to ANI (or similar). This is doable but time consuming.

Overall though, I'm not sure if even then I'd call this "scientific" - it's more like those "Political compass" tests if anything. One thing which WOULD BE interesting is if somehow I could get this data on ALL editors (say, with more than 1000) edits and see which "cell" (or corner of the scatter plot) is "saturated" which one is "over saturated" and which one comes up empty.

(and if you look at that scatterplot above, the 4-way categorization does correctly predict for the 5 people I labeled on there. Malleus is regarded as "content creator". Dr.Blofeld is a "wiki gnome" (under this definition of gnome). Etc. But that's still a small and non-random sample so while encouraging it's not serious evidence at this point)

This post has been edited by radek:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #182


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Kelly Martin @ Mon 31st October 2011, 8:49pm) *

The problem I have with the proposed statistical "rules" that have been presented in this discussion is that they're all ad hoc, rather than empirical.


I read no further than that sentence, as it is clear you have no idea what you are talking about. Radek clearly does.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post
Post #183


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



QUOTE(Peter Damian @ Mon 31st October 2011, 3:57pm) *

QUOTE(Kelly Martin @ Mon 31st October 2011, 8:49pm) *

The problem I have with the proposed statistical "rules" that have been presented in this discussion is that they're all ad hoc, rather than empirical.


I read no further than that sentence, as it is clear you have no idea what you are talking about. Radek clearly does.
Until I see a t-test or F-test score, I'm going to have to assume that you have no idea what you're talking about. Radek is at least making an effort. I'd like to see the actual analysis run; I rather doubt that the two axes in his proposal are truly orthogonal, for example.

I suppose I should actually break down and look at the data set and see what, if anything, can be sucked out of it. In any case, I've been around long enough to distrust sloppy stats. It's amazing how often people's intuitions are wrong about statistical measures, especially in populations that exhibit markedly unbalanced distributions.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #184


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Ceoil @ Mon 31st October 2011, 8:55pm) *

What Kelly said.

Peter I was not having a go at you at all. I'm a blunt person, trying here to influence your methology, which I think is at the moment skewed. We can talk, it doesnt have to be all or nothing.


As I said, if you think an argument is wrong, you have to say what is wrong with it. You have to be clear what my argument is, which Kelly isn't, because she clearly hasn't even read the original post, and you have to say what is wrong with it.

The first point is that from a defined population (currently active admins) there is a wide range of epp values. That is an objectively measurable fact.

The second point is that there is no logical connection between high epp and high content. The limiting case is someone who edits an article 100 times by adding the numbers 1-99, then deleting them. That editors epp is a very high 100, but zero content. Conversely an editor who creates an entire article offline then adds to Wikipedia in a single edit has the lowest possible epp, but is creating a lot of content.

The third point is empirical: from the given, precisely defined sample, there is an empircal connection. Those with low epps tend to have mechanical repetitive editing patterns - they are always doing the same kind of thing. By contrast, those with high epps tend (note the word 'tend') to be 'content contributors'.

The fourth point is a behavioural observation. Those who flit from article to article will find it difficult to make insightful and meaningful contributions, which requires careful (and long) study of the whole article.

These were my conclusions. The point that the sample was chosen in the way it was is stupid and irrelevant. If I want to study whether Conservatives drive expensive cars, obviously I have to select just Conservatives. And in any case, to avoid accusations of selection bias, I deliberately ran the study over all 720 admins, without exception.

QUOTE(Kelly Martin @ Mon 31st October 2011, 9:08pm) *

especially in populations that exhibit markedly unbalanced distributions.


What do you mean by an 'unbalanced distribution'?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #185


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



I'm not a hallowed logician like you are, sitting on a cloud and only receptive to perfectly formed refutals with equasitions and things, but can spot specious argument when I see it. Dear god man, how unattractive was that post. Get a grip.

This post has been edited by Ceoil:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post
Post #186


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



QUOTE(radek @ Mon 31st October 2011, 3:56pm) *
I'm actually sort of doing this. There are two difficulties however. First, is how to sample these 50 editors. I can just pull people off the top of my head or what have you but I'm wary of some kind of bias - basically, I'm not sure how to randomly select these 50 people (this isn't a problem - at least to first approx - with articles, since we have the Random Article feature). The second part, as already mentioned is that for the low epp editors there's no way to distinguish "Posting a Lot at AN/I" from "Running Featured Article Reviews" because soxred counts both as edits to WP. The only way I can think of separating it out is by manually looking at the last 1000 or so edits of a particular editor and counting up the proportion of times they posted to ANI (or similar). This is doable but time consuming.

Overall though, I'm not sure if even then I'd call this "scientific" - it's more like those "Political compass" tests if anything. One thing which WOULD BE interesting is if somehow I could get this data on ALL editors (say, with more than 1000) edits and see which "cell" (or corner of the scatter plot) is "saturated" which one is "over saturated" and which one comes up empty.

(and if you look at that scatterplot above, the 4-way categorization does correctly predict for the 5 people I labeled on there. Malleus is regarded as "content creator". Dr.Blofeld is a "wiki gnome" (under this definition of gnome). Etc. But that's still a small and non-random sample so while encouraging it's not serious evidence at this point)
It looks like the soxred data generates around two dozen metrics per user, some of which are obviously interdependent (as the percentages necessarily add to 100%, so there's at least one degree of freedom eaten there). We can get random users by sampling the "All Users" list, but the problem with that is that most of them will be extremely low (that is, zero) edit count users; not terribly useful. However, the filtering process could be automated using the exposed API (http://en.wikipedia.org/w/api.php), and that API could also be used to automate gathering the "how much does this editor post to ANI" statistics you showed an interest in (although the API throttle might make that a slow process). So it's not unattainable, not in the least.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #187


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Kelly Martin @ Mon 31st October 2011, 4:08pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 3:57pm) *

QUOTE(Kelly Martin @ Mon 31st October 2011, 8:49pm) *

The problem I have with the proposed statistical "rules" that have been presented in this discussion is that they're all ad hoc, rather than empirical.


I read no further than that sentence, as it is clear you have no idea what you are talking about. Radek clearly does.
Until I see a t-test or F-test score, I'm going to have to assume that you have no idea what you're talking about. Radek is at least making an effort. I'd like to see the actual analysis run; I rather doubt that the two axes in his proposal are truly orthogonal, for example.

I suppose I should actually break down and look at the data set and see what, if anything, can be sucked out of it. In any case, I've been around long enough to distrust sloppy stats. It's amazing how often people's intuitions are wrong about statistical measures, especially in populations that exhibit markedly unbalanced distributions.


I think that what Kelly is talking about above is something like External Validity (there seems to be some other criticisms mixed in as well).

One way to do it is to somehow generate a list of randomly selected editors, then pass this list out to people familiar with Wikipedia and ask them to categorize these people according to the criteria above. Then see to what extent the subjective categorizations match up with categorizations based on epp and % articles. This wouldn't be totally ideal as people can have quite skewed and biased notions of themselves and others, in additions to having widely different definitions (a clear example is that guy calling Dr. Blofeld a "content creator" in that thread)

Another way would be to first define what "gnomish" edit is, what a "content creating" edit is, what a "drama queen" post is etc. Then with these pre-set definition in hand go out and get that list of randomly selected editors and again, see if it matches up. This would be way too much work.

(and in fact I'm somewhat ok with just DEFINING high % low epp editors as "Wiki gnomes" and high % high epp editors as "Content creators". Most of the trouble is with the low % folks)

Hmmm, so you want a t stat or an F test. One thing I could do is to see if epp or % articles predict admin status (the logit or probit regression I mentioned before). Two problems would be the lack of randomness I mentioned above, and also that ideally we'd want to have the epp and % articles BEFORE a person became an admin, so that we get the causality right. But I don't think there's data on that though I might email soxred and axe him 'bout it.

QUOTE(Kelly Martin @ Mon 31st October 2011, 4:19pm) *

QUOTE(radek @ Mon 31st October 2011, 3:56pm) *
I'm actually sort of doing this. There are two difficulties however. First, is how to sample these 50 editors. I can just pull people off the top of my head or what have you but I'm wary of some kind of bias - basically, I'm not sure how to randomly select these 50 people (this isn't a problem - at least to first approx - with articles, since we have the Random Article feature). The second part, as already mentioned is that for the low epp editors there's no way to distinguish "Posting a Lot at AN/I" from "Running Featured Article Reviews" because soxred counts both as edits to WP. The only way I can think of separating it out is by manually looking at the last 1000 or so edits of a particular editor and counting up the proportion of times they posted to ANI (or similar). This is doable but time consuming.

Overall though, I'm not sure if even then I'd call this "scientific" - it's more like those "Political compass" tests if anything. One thing which WOULD BE interesting is if somehow I could get this data on ALL editors (say, with more than 1000) edits and see which "cell" (or corner of the scatter plot) is "saturated" which one is "over saturated" and which one comes up empty.

(and if you look at that scatterplot above, the 4-way categorization does correctly predict for the 5 people I labeled on there. Malleus is regarded as "content creator". Dr.Blofeld is a "wiki gnome" (under this definition of gnome). Etc. But that's still a small and non-random sample so while encouraging it's not serious evidence at this point)
It looks like the soxred data generates around two dozen metrics per user, some of which are obviously interdependent (as the percentages necessarily add to 100%, so there's at least one degree of freedom eaten there). We can get random users by sampling the "All Users" list, but the problem with that is that most of them will be extremely low (that is, zero) edit count users; not terribly useful. However, the filtering process could be automated using the exposed API (http://en.wikipedia.org/w/api.php), and that API could also be used to automate gathering the "how much does this editor post to ANI" statistics you showed an interest in (although the API throttle might make that a slow process). So it's not unattainable, not in the least.


I have no idea on how to do any of that.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #188


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



QUOTE(Ceoil @ Mon 31st October 2011, 4:35pm) *

Hi Peter. I'd like to engage Eric, he is often astute but ruined by a perception of bitterness on the part of the faithful. I think he his criticism would be more valuable if he dropped the veneer. God knows the project is lacking introspection, and tends to shoot messengers. <four tides: Ceoil>



Hey, how are you doing?


I do have a bit of nitpicking to do regarding your defense of Amanda after she totally trashed the To Autumn page with OR, plagiarism, etc. You defended her. Those are exactly the kind of person who Wikipedia needs to boot. You also defended Fowler who was someone who did that same thing quite regularly. I dont know if you still feel the same way about such people, but you were putting personal views of a person above what they were actually doing regarding content.

At least the true MySpacers and Drama Queens stay out of articles. The people who are dangerous are those who affect articles in a very negative manner.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #189


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Ceoil @ Mon 31st October 2011, 9:18pm) *

I'm not a hallowed logician like you are, sitting on a cloud and only receptive to perfectly formed refutals with equasitions and things, but can spot specious argument when I see it. Dear god man, how unattractive was that post. Get a grip.



Where is the specious argument?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
timbo
post
Post #190


Member
***

Group: Contributors
Posts: 102
Joined:
Member No.: 21,141



Thinking out loud here...

Each edit changes article size.

Content Creators, whether then write by editing a page 50 times in a row or by writing offline and then adding everything at once, tend to MARKEDLY INCREASE article size in mainspace.

Administrative Gnomes, adding a link here or correcting a spelling there, tend to impact article size in mainspace very little.

Soap Opera Sallies tend to skew their editing away from mainspace.

I don't think the Facebook Faction is a real category. The real fourth group are the Temporary Tramps that come in, write a page about their uncle or their favorite Transformer character, and then leave the project. These can be quantified or eliminated from the equation on the basis of lifetime total edits.

Edits per page is probably highly correlated to these types, but it seems to me that change in article size is more important than edits per page as an identifying metric.


tim
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #191


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(radek @ Mon 31st October 2011, 9:23pm) *


(and in fact I'm somewhat ok with just DEFINING high % low epp editors as "Wiki gnomes" and high % high epp editors as "Content creators". Most of the trouble is with the low % folks)


That would actually be perfect. We start with the behavioural assumption first. Anyone who is editing 3 times a minute or more on different articles is unlikely to be contributing what we call 'content'. That's behind our whole idea of what 'content' is. Namely, stuff you have to study the whole article carefully in order to add.

I think the harder one is the high epp. E.g. FT2 famously spends a huge amount of time editing and re-editing the same sentence, sometimes 100 edits just for one paragraph. But it still reads like the long-winding verbose nonsense that it was in the first place.

But even there, does it matter? Let's just define 'content' as what is added by high epp'ers. Then we have the logical deduction that a very high proportion of admins are not content-producers.

What is actually much more difficult is choosing another population to compare with. Non-admins are too large. Anything else risks selection bias.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post
Post #192


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



QUOTE(radek @ Mon 31st October 2011, 4:23pm) *
Another way would be to first define what "gnomish" edit is, what a "content creating" edit is, what a "drama queen" post is etc. Then with these pre-set definition in hand go out and get that list of randomly selected editors and again, see if it matches up. This would be way too much work.
This, fundamentally, is the problem with an objective, quantitative analysis of Wikipedia editors and editing. It is, as you say, "way too much work" to code enough editors or edits to do any meaningful analysis. I've only seen a few studies that did, in fact, do such coding, and interestingly enough all of the studies I've seen that did do so ended up contradicting conventional wisdom in at least some ways.

So instead of doing it right, because doing it right is too much work, y'all settle for doing it wrong, and hoping nobody notices. Which, to be fair, nobody usually does. Back when I was in grad school, I knew a guy who (for a master's thesis, I believe) reviewed a couple hundred peer-reviewed papers in the social sciences; he reported that less than 10% of the papers he reviewed were free of serious methodological flaws in their use of statistical method, and nearly half stated conclusions that could not be supported from the data. And every one of these had been passed on in peer review. My conclusion is that social scientists, in general, do not understand statistics.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #193


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



Also, for the record, here are the first 27 of editors on http://en.wikipedia.org/wiki/Wikipedia:WBFAN

The distribution is quite different from that of the sample admins. You may argue what "Wikipedia:WBFAN" signifies. I reply: it is an objectively verifiable fact that the one precisely defined population has an entirely different distribution from the other. I really don't understand Kelly's problem. As long as you define the criterion by which you select your population, there shouldn't be a problem.


YellowMonkey 3.69
Casliber 6.57
Hurricanehink 7.16
Wehwalt 20.51
Lord_Emsworth 3.47
Brianboulton 15.01
Ealdgyth 6.87
Ucucha 2.58
David_Fuchs 6.99
Mike_Christie 7.18
Malleus_Fatuorum 12.46
Sasata 3.25
Juliancolton 2.09
Awadewit 7.55
Cla68 6.15
Jimfbleak 4.13
DrKiernan 3.49
Iridescent 1.53
Serendipodous 12.34
Parrot_of_Doom 14.48
Ruhrfisch 4.18
Mav 3.1
Ian_Rose 8.57
Parsecboy 4.71
Piotrus 3.75
Acdixon 4.19
Johnleemk 2.41
Karanacs 3.92
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #194


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(timbo @ Mon 31st October 2011, 4:33pm) *

Thinking out loud here...

Each edit changes article size.

Content Creators, whether then write by editing a page 50 times in a row or by writing offline and then adding everything at once, tend to MARKEDLY INCREASE article size in mainspace.

Administrative Gnomes, adding a link here or correcting a spelling there, tend to impact article size in mainspace very little.

Soap Opera Sallies tend to skew their editing away from mainspace.

I don't think the Facebook Faction is a real category. The real fourth group are the Temporary Tramps that come in, write a page about their uncle or their favorite Transformer character, and then leave the project. These can be quantified or eliminated from the equation on the basis of lifetime total edits.

Edits per page is probably highly correlated to these types, but it seems to me that change in article size is more important than edits per page as an identifying metric.


tim


The thing about change in article size is right but again, I don't know what an efficient way of collecting such data would be.

The Facebook faction appears to contain some long term editors like GWH, so it's not just Temporary Tramps. TTs would look like "content creators" according to the categorization since they'd have very high epp (that one page on their uncle) and and very high % in articles, for the most part (if they get dragged through ani for creating these uncle pages it might go down a bit).

Again, there's no presumption in any of the above that just cuz someone is in the "Content creator" category they're "good" - as Ottava points out many of these could be "bad". I can't think of such a distinction for the FFs but it might be there - I dunno, somebody who just makes everyone feel welcome or something.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #195


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Kelly Martin @ Mon 31st October 2011, 9:38pm) *

QUOTE(radek @ Mon 31st October 2011, 4:23pm) *
Another way would be to first define what "gnomish" edit is, what a "content creating" edit is, what a "drama queen" post is etc. Then with these pre-set definition in hand go out and get that list of randomly selected editors and again, see if it matches up. This would be way too much work.
This, fundamentally, is the problem with an objective, quantitative analysis of Wikipedia editors and editing. It is, as you say, "way too much work" to code enough editors or edits to do any meaningful analysis. I've only seen a few studies that did, in fact, do such coding, and interestingly enough all of the studies I've seen that did do so ended up contradicting conventional wisdom in at least some ways.

So instead of doing it right, because doing it right is too much work, y'all settle for doing it wrong, and hoping nobody notices. Which, to be fair, nobody usually does. Back when I was in grad school, I knew a guy who (for a master's thesis, I believe) reviewed a couple hundred peer-reviewed papers in the social sciences; he reported that less than 10% of the papers he reviewed were free of serious methodological flaws in their use of statistical method, and nearly half stated conclusions that could not be supported from the data. And every one of these had been passed on in peer review. My conclusion is that social scientists, in general, do not understand statistics.


What is your qualification in statistics, Kelly?

This post has been edited by Peter Damian:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #196


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Peter Damian @ Mon 31st October 2011, 4:42pm) *

Also, for the record, here are the first 27 of editors on http://en.wikipedia.org/wiki/Wikipedia:WBFAN

The distribution is quite different from that of the sample admins. You may argue what "Wikipedia:WBFAN" signifies. I reply: it is an objectively verifiable fact that the one precisely defined population has an entirely different distribution from the other. I really don't understand Kelly's problem. As long as you define the criterion by which you select your population, there shouldn't be a problem.


YellowMonkey 3.69
Casliber 6.57
Hurricanehink 7.16
Wehwalt 20.51
Lord_Emsworth 3.47
Brianboulton 15.01
Ealdgyth 6.87
Ucucha 2.58
David_Fuchs 6.99
Mike_Christie 7.18
Malleus_Fatuorum 12.46
Sasata 3.25
Juliancolton 2.09
Awadewit 7.55
Cla68 6.15
Jimfbleak 4.13
DrKiernan 3.49
Iridescent 1.53
Serendipodous 12.34
Parrot_of_Doom 14.48
Ruhrfisch 4.18
Mav 3.1
Ian_Rose 8.57
Parsecboy 4.71
Piotrus 3.75
Acdixon 4.19
Johnleemk 2.41
Karanacs 3.92


Quick comment on this list - while this probably isn't a problem for these guys above, once you start going down a further bit you have a bunch people who didn't actually WRITE the FAs, just NOMINATED them.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post
Post #197


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



What I want is a test. That is, I want a decisional rule: something like "if editor's epp < 3.0, then editor is a content creator, with p=0.8". There are rigorous methods for adducing such decisional rules from appropriate sample data. But the proposed rules that have been offered so far are not derived using those methods; they are instead just generated ad hoc. This is appropriate for the investigatory phase of the analysis, but you can't just stop there.

And none of the hypotheses I've seen thrown out have been rigorously tested, even though in most cases I think they can be, in some cases fairly easily. Why is this?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #198


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Kelly Martin @ Mon 31st October 2011, 4:38pm) *

QUOTE(radek @ Mon 31st October 2011, 4:23pm) *
Another way would be to first define what "gnomish" edit is, what a "content creating" edit is, what a "drama queen" post is etc. Then with these pre-set definition in hand go out and get that list of randomly selected editors and again, see if it matches up. This would be way too much work.
This, fundamentally, is the problem with an objective, quantitative analysis of Wikipedia editors and editing. It is, as you say, "way too much work" to code enough editors or edits to do any meaningful analysis. I've only seen a few studies that did, in fact, do such coding, and interestingly enough all of the studies I've seen that did do so ended up contradicting conventional wisdom in at least some ways.

So instead of doing it right, because doing it right is too much work, y'all settle for doing it wrong, and hoping nobody notices. Which, to be fair, nobody usually does. Back when I was in grad school, I knew a guy who (for a master's thesis, I believe) reviewed a couple hundred peer-reviewed papers in the social sciences; he reported that less than 10% of the papers he reviewed were free of serious methodological flaws in their use of statistical method, and nearly half stated conclusions that could not be supported from the data. And every one of these had been passed on in peer review. My conclusion is that social scientists, in general, do not understand statistics.


Well, I'm not going to send off my four-color chart off to an academic journal or anything. Like I keep saying, to me, at this point, this is more like that Political Compass quiz - potentially informative but flawed. I'm interested in making this more rigorous (cuz I got that itch) but there's also a limited amount of time I'm willing to devote to it. If you can help with any of the suggestions I mentioned in a concrete way, I'd appreciate it.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post
Post #199


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



QUOTE(Peter Damian @ Mon 31st October 2011, 4:45pm) *
What is your qualification in statistics, Kelly?
What does that matter?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #200


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Kelly Martin @ Mon 31st October 2011, 9:55pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 4:45pm) *
What is your qualification in statistics, Kelly?
What does that matter?


It matters a lot.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post
Post #201


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696



QUOTE(radek @ Mon 31st October 2011, 4:53pm) *
Well, I'm not going to send off my four-color chart off to an academic journal or anything. Like I keep saying, to me, at this point, this is more like that Political Compass quiz - potentially informative but flawed. I'm interested in making this more rigorous (cuz I got that itch) but there's also a limited amount of time I'm willing to devote to it. If you can help with any of the suggestions I mentioned in a concrete way, I'd appreciate it.
I can probably help you with getting some of the stats out of the database (using the API) that we've talked about (on a limited basis), although it'll have to wait a bit as I have some other things that are higher priority. Think about what would be most useful to you and let me know.


QUOTE(Peter Damian @ Mon 31st October 2011, 4:57pm) *

QUOTE(Kelly Martin @ Mon 31st October 2011, 9:55pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 4:45pm) *
What is your qualification in statistics, Kelly?
What does that matter?
It matters a lot.
Your statement is conclusory and unsupported by evidence. Seems to be a pattern with you.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #202


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651



QUOTE(Kelly Martin @ Mon 31st October 2011, 4:52pm) *

What I want is a test. That is, I want a decisional rule: something like "if editor's epp < 3.0, then editor is a content creator, with p=0.8". There are rigorous methods for adducing such decisional rules from appropriate sample data. But the proposed rules that have been offered so far are not derived using those methods; they are instead just generated ad hoc. This is appropriate for the investigatory phase of the analysis, but you can't just stop there.

And none of the hypotheses I've seen thrown out have been rigorously tested, even though in most cases I think they can be, in some cases fairly easily. Why is this?


Well ok. At this point though, like I said, all I got is a small (42) non-random sample, of editors that just popped into my head and all I can do is something like trying to predict current admin status by epp and % art

So here you go, ran a logit:

admin | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
epp | -.1180069 .0945739 -1.25 0.212 -.3033683 .0673544
perart | -.0340222 .0168428 -2.02 0.043 -.0670335 -.0010109
_cons | 2.592894 1.125109 2.30 0.021 .3877218 4.798066

Epp doesn't seem to matter much (could be small non random sample), but % of edits to articles is negatively related to being an admin and even with a small sample this is significant at 95% level (the z's are the fourth column). Pseudo R2 is .1 which again, given the sample size ain't that bad.

Edit: What the above says, roughly, that for every extra % to article space as opposed to other Wikipedia pages, the probability that you're an admin goes down by about 3.4%

Again, this is very very very preliminary and I'd rather run it on this "Content creator/facebook/gnome/etc" thing but we just started playing with this yesterday so it's a little early to be asking for a finished academic paper at this point.

This post has been edited by radek:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #203


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



Peter I notice two things; one is you are defensive and thus not objective and distant, and second you have not asked what the criteria for inclusion on WBFAN are; you just take it as it is. Both are basic, fundemental mistakes. And that I offered to help but you hounded me and were very agressive in PMs in the last hour just shows that you are gathering factioids for a polemic, the truth be damed. I'm almost tempted to fuck a horse just to spite you.

This post has been edited by Ceoil:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Malleus
post
Post #204


Fat Cat
******

Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716



QUOTE(radek @ Sun 30th October 2011, 11:02pm) *

BTW, Malleus is a very clear outlier. Very high % in article space and pretty high % epp. Very clearly a "content contributor". Giano not so much (though still in that cell).

Update:

Here's a bit of what I have so far:

(IMG:http://upload.wikimedia.org/wikipedia/commons/3/34/Div_of_labor2.png)

Are you sure you don't mean "outlaw" rather than "outlier"?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post
Post #205


Ãœber Member
*****

Group: Regulars
Posts: 699
Joined:
Member No.: 15,651




[/quote]
Are you sure you don't mean "outlaw" rather than "outlier"?
[/quote]

You're an Outlaw Outlier. Oooooo!
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Malleus
post
Post #206


Fat Cat
******

Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716



QUOTE(Peter Damian @ Mon 31st October 2011, 9:34pm) *

QUOTE(radek @ Mon 31st October 2011, 9:23pm) *


(and in fact I'm somewhat ok with just DEFINING high % low epp editors as "Wiki gnomes" and high % high epp editors as "Content creators". Most of the trouble is with the low % folks)


That would actually be perfect. We start with the behavioural assumption first. Anyone who is editing 3 times a minute or more on different articles is unlikely to be contributing what we call 'content'. That's behind our whole idea of what 'content' is. Namely, stuff you have to study the whole article carefully in order to add.

I think the harder one is the high epp. E.g. FT2 famously spends a huge amount of time editing and re-editing the same sentence, sometimes 100 edits just for one paragraph. But it still reads like the long-winding verbose nonsense that it was in the first place.

But even there, does it matter? Let's just define 'content' as what is added by high epp'ers. Then we have the logical deduction that a very high proportion of admins are not content-producers.

What is actually much more difficult is choosing another population to compare with. Non-admins are too large. Anything else risks selection bias.

What about those users like me who failed at RfA?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #207


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



QUOTE(Malleus @ Mon 31st October 2011, 10:38pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 9:34pm) *

QUOTE(radek @ Mon 31st October 2011, 9:23pm) *


(and in fact I'm somewhat ok with just DEFINING high % low epp editors as "Wiki gnomes" and high % high epp editors as "Content creators". Most of the trouble is with the low % folks)


That would actually be perfect. We start with the behavioural assumption first. Anyone who is editing 3 times a minute or more on different articles is unlikely to be contributing what we call 'content'. That's behind our whole idea of what 'content' is. Namely, stuff you have to study the whole article carefully in order to add.

I think the harder one is the high epp. E.g. FT2 famously spends a huge amount of time editing and re-editing the same sentence, sometimes 100 edits just for one paragraph. But it still reads like the long-winding verbose nonsense that it was in the first place.

But even there, does it matter? Let's just define 'content' as what is added by high epp'ers. Then we have the logical deduction that a very high proportion of admins are not content-producers.

What is actually much more difficult is choosing another population to compare with. Non-admins are too large. Anything else risks selection bias.

What about those users like me who failed at RfA?


Your Rfa was not so much a failure as an assassination. I'm sure your savy enough to realise the orchestration.

This post has been edited by Ceoil:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Malleus
post
Post #208


Fat Cat
******

Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716



QUOTE(Ceoil @ Mon 31st October 2011, 10:49pm) *


Your Rfa was not so much a failure as an assassination. I'm sure your savy enough to realise the orchestration.

True.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
The Joy
post
Post #209


I am a millipede! I am amazing!
********

Group: Members
Posts: 3,839
Joined:
From: The Moon
Member No.: 982



QUOTE(Ceoil @ Mon 31st October 2011, 6:15pm) *

Peter I notice two things; one is you are defensive and thus not objective and distant, and second you have not asked what the criteria for inclusion on WBFAN are; you just take it as it is. Both are basic, fundemental mistakes. And that I offered to help but you hounded me and were very agressive in PMs in the last hour just shows that you are gathering factioids for a polemic, the truth be damed. I'm almost tempted to fuck a horse just to spite you.


Unless you are a hot broad, I do not think our A Horse With No Name would approve of that. (IMG:smilys0b23ax56/default/dry.gif) (IMG:smilys0b23ax56/default/hrmph.gif)


(Good gravy, I'm turning into the Baseball Bugs of WR. I need a drink... (IMG:smilys0b23ax56/default/sick.gif) )
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ceoil
post
Post #210


Junior Member
**

Group: Contributors
Posts: 56
Joined:
Member No.: 8,131



A point Peter should make is that its a hard and at times mindbending slog pushing against boring stupid people. Most of the onsite angst comes from this, and it tips otherwise good people over the edge.

The Joy: I'm fairly randy but even I draw the line at the HWNN, he's endearing and funny, and I like him, but jesus not my type. Also, and this is important, yes I'm a hot broad. 26, busty and a doctor, very lonely.

This post has been edited by Ceoil:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
mbz1
post
Post #211


Senior Member
****

Group: Contributors
Posts: 461
Joined:
Member No.: 25,791



QUOTE(Malleus @ Mon 31st October 2011, 10:38pm) *

What about those users like me who failed at RfA?

Could you please provide the link to your RfA? Thanks.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Malleus
post
Post #212


Fat Cat
******

Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716



QUOTE(mbz1 @ Tue 1st November 2011, 12:25am) *

QUOTE(Malleus @ Mon 31st October 2011, 10:38pm) *

What about those users like me who failed at RfA?

Could you please provide the link to your RfA? Thanks.

I had two:
this is the first, and here's the second.

I find it interesting to look back on dorks like Ryan Postlethwaite, puts thing in perspective, to a degree.

This post has been edited by Malleus:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post
Post #213


blah
*********

Group: Regulars
Posts: 5,919
Joined:
Member No.: 5,066



QUOTE(Malleus @ Mon 31st October 2011, 7:47pm) *

Postlethwaite speaks:
QUOTE
Malleus is a fantastic article writer, along with being a great contributor to the FA and GA processes. However, he has no experience in the tasks that admins have to undertake.

Ha ha ha. The great majority of Wikipedia admins do little or no patrolling, or other "administrative tasks". They prefer to write actual articles. (And those are the admins who are quitting in great numbers.)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
mbz1
post
Post #214


Senior Member
****

Group: Contributors
Posts: 461
Joined:
Member No.: 25,791



QUOTE(Malleus @ Tue 1st November 2011, 2:47am) *

QUOTE(mbz1 @ Tue 1st November 2011, 12:25am) *

QUOTE(Malleus @ Mon 31st October 2011, 10:38pm) *

What about those users like me who failed at RfA?

Could you please provide the link to your RfA? Thanks.

I had two:
this is the first, and here's the second.


Interesting! And looks like you might be ready for the third one (IMG:smilys0b23ax56/default/smile.gif) btw I was not following wikipedia for some time, are you going to take a break or something else? I mean what this question at your talk "Let me know if you're thinking of returning to editing any time soon" is about.

This post has been edited by mbz1:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
papaya
post
Post #215


Senior Member
****

Group: Contributors
Posts: 252
Joined:
Member No.: 1,255



Well, looking at my pie chart, about half my edits are articles, and about a quarter are to WP pages; I imagine that most of the edits to userspace non-talk are article edits too-- I don't change my user page that much. The safe bet, though, is that a large church of the WP edits are singles, while far more of the article and user pages are multiple hits on the same article. It's the nature of the beast: if you patrol AFD or AFC or the like, unless you're the obsessive "I can't let this article die" type, you're going to tend to have one edit per page in this group.

In comparing me to a couple of other people, though, I see some interesting patterns. For instance, if you look at NYBrad, you see what a bureaucrat looks like: 11% article edits, 30% user talk, and 55% WP and WP talk. With all that his EPP is 4.6, presumably from ARBCOM and the like. If you look at SV or MONGO, though, you see a pattern that looks superficially like mine: 43% articles, 24% WP. The big difference, though, is that she spends a lot of time talking: 17% of her edits are in article talk and another 11% are in user talk. And her EPP is quite high at 9.4. What this says is that she spends a lot of time arguing with people.


This post has been edited by papaya:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #216


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(Ceoil @ Mon 31st October 2011, 10:15pm) *

Peter I notice two things; one is you are defensive and thus not objective and distant, and second you have not asked what the criteria for inclusion on WBFAN are; you just take it as it is. Both are basic, fundemental mistakes. And that I offered to help but you hounded me and were very agressive in PMs in the last hour just shows that you are gathering factioids for a polemic, the truth be damed. I'm almost tempted to fuck a horse just to spite you.


I'm very sorry about this. I really hadn't meant to offend - the 'content contributors' are the main group I would like to defend in the book, as it happens.

I can only think there has been a misunderstanding, so I will step aside from this discussion.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #217


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



QUOTE(Peter Damian @ Tue 1st November 2011, 4:27am) *

QUOTE(Ceoil @ Mon 31st October 2011, 10:15pm) *

Peter I notice two things; one is you are defensive and thus not objective and distant, and second you have not asked what the criteria for inclusion on WBFAN are; you just take it as it is. Both are basic, fundemental mistakes. And that I offered to help but you hounded me and were very agressive in PMs in the last hour just shows that you are gathering factioids for a polemic, the truth be damed. I'm almost tempted to fuck a horse just to spite you.


I'm very sorry about this. I really hadn't meant to offend - the 'content contributors' are the main group I would like to defend in the book, as it happens.

I can only think there has been a misunderstanding, so I will step aside from this discussion.



If you end up needing any help regarding info about certain content contributors or the rest, send me an email.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Malleus
post
Post #218


Fat Cat
******

Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716



QUOTE(mbz1 @ Tue 1st November 2011, 3:43am) *

QUOTE(Malleus @ Tue 1st November 2011, 2:47am) *

QUOTE(mbz1 @ Tue 1st November 2011, 12:25am) *

QUOTE(Malleus @ Mon 31st October 2011, 10:38pm) *

What about those users like me who failed at RfA?

Could you please provide the link to your RfA? Thanks.

I had two:
this is the first, and here's the second.


Interesting! And looks like you might be ready for the third one (IMG:smilys0b23ax56/default/smile.gif) btw I was not following wikipedia for some time, are you going to take a break or something else? I mean what this question at your talk "Let me know if you're thinking of returning to editing any time soon" is about.

There will be no third one, not ever. Balloonman was having a little joke.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
A Horse With No Name
post
Post #219


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,471
Joined:
Member No.: 9,985



QUOTE(The Joy @ Mon 31st October 2011, 6:57pm) *

QUOTE(Ceoil @ Mon 31st October 2011, 6:15pm) *

Peter I notice two things; one is you are defensive and thus not objective and distant, and second you have not asked what the criteria for inclusion on WBFAN are; you just take it as it is. Both are basic, fundemental mistakes. And that I offered to help but you hounded me and were very agressive in PMs in the last hour just shows that you are gathering factioids for a polemic, the truth be damed. I'm almost tempted to fuck a horse just to spite you.


Unless you are a hot broad, I do not think our A Horse With No Name would approve of that. (IMG:smilys0b23ax56/default/dry.gif) (IMG:smilys0b23ax56/default/hrmph.gif)


Oh, c'mon, can't a horse enjoy a graze in the grass without this inevitable collapse into innuendo? (IMG:smilys0b23ax56/default/angry.gif)

And that leads us to WR's Word of the Day - Innuendo - the Italian word for sodomy. (IMG:smilys0b23ax56/default/smile.gif)

QUOTE(Malleus @ Mon 31st October 2011, 10:47pm) *

I find it interesting to look back on dorks like Ryan Postlethwaite, puts thing in perspective, to a degree.


Hey, whatever happened to Ryan's hot girlfriend? Is she still around? (IMG:smilys0b23ax56/default/evilgrin.gif)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
A Horse With No Name
post
Post #220


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,471
Joined:
Member No.: 9,985



QUOTE(Malleus @ Mon 31st October 2011, 6:25pm) *

Are you sure you don't mean "outlaw" rather than "outlier"?


And speaking of "outlaw"

(IMG:http://upload.wikimedia.org/wikipedia/en/b/ba/The_Outlaw_poster.jpg)

(IMG:smilys0b23ax56/default/evilgrin.gif) (IMG:smilys0b23ax56/default/evilgrin.gif) (IMG:smilys0b23ax56/default/evilgrin.gif) (IMG:smilys0b23ax56/default/evilgrin.gif)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Vigilant
post
Post #221


Senior Member
****

Group: Contributors
Posts: 307
Joined:
Member No.: 8,684



QUOTE(Ottava @ Tue 1st November 2011, 12:47pm) *

QUOTE(Peter Damian @ Tue 1st November 2011, 4:27am) *

QUOTE(Ceoil @ Mon 31st October 2011, 10:15pm) *

Peter I notice two things; one is you are defensive and thus not objective and distant, and second you have not asked what the criteria for inclusion on WBFAN are; you just take it as it is. Both are basic, fundemental mistakes. And that I offered to help but you hounded me and were very agressive in PMs in the last hour just shows that you are gathering factioids for a polemic, the truth be damed. I'm almost tempted to fuck a horse just to spite you.


I'm very sorry about this. I really hadn't meant to offend - the 'content contributors' are the main group I would like to defend in the book, as it happens.

I can only think there has been a misunderstanding, so I will step aside from this discussion.



If you end up needing any help regarding info about certain content contributors or the rest, send me an email.


Go finish your thesis, Jeffrey.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Malleus
post
Post #222


Fat Cat
******

Group: Contributors
Posts: 1,682
Joined:
From: United Kingdom
Member No.: 8,716



QUOTE(A Horse With No Name @ Tue 1st November 2011, 3:27pm) *

Hey, whatever happened to Ryan's hot girlfriend? Is she still around? (IMG:smilys0b23ax56/default/evilgrin.gif)

Not if she's got any sense.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
GlassBeadGame
post
Post #223


Dharma Bum
*********

Group: Contributors
Posts: 7,919
Joined:
From: My name it means nothing. My age it means less. The country I come from is called the Mid-West.
Member No.: 981



QUOTE(Peter Damian @ Tue 1st November 2011, 3:27am) *


I'm very sorry about this. I really hadn't meant to offend - the 'content contributors' are the main group I would like to defend in the book, as it happens.



I been watching this and related threads for a while. I guess I'm happy to see you admit your pursuit of an apology for "content creators." It saved me the need to confront you for seeming to take this direction. It might be conceptually useful to plot Wikipedians into four cells on two axis. But at the end of the day you are just looking at four kinds of nerds.

As I have already indicated I don't believe any book will result from any collaboration of users from WR. At least not one that isn't "self published." If I am wrong I would suspect the real headaches would just be beginning for the "authors."

But of all the failed books that might be possible one focusing on the "defense of content contributors" would be one of the worse. It would fail to address any social criticism of Wikipedia and by necessity be inward looking. The potential reader base for such a book is exactly equal to the number of "content creators" plus the number of their helicopter moms who support their self indulgent pursuits.

Notwithstanding common reliance by the the embattled, getting criticism from both sides doesn't prove any virtue.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
thekohser
post
Post #224


Member
*********

Group: Regulars
Posts: 10,274
Joined:
Member No.: 911



QUOTE(GlassBeadGame @ Tue 1st November 2011, 5:15pm) *

As I have already indicated I don't believe any book will result from any collaboration of users from WR. At least not one that isn't "self published."


We should post a wager at LongNow.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #225


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(GlassBeadGame @ Tue 1st November 2011, 9:15pm) *

QUOTE(Peter Damian @ Tue 1st November 2011, 3:27am) *


I'm very sorry about this. I really hadn't meant to offend - the 'content contributors' are the main group I would like to defend in the book, as it happens.



But of all the failed books that might be possible one focusing on the "defense of content contributors" would be one of the worse. It would fail to address any social criticism of Wikipedia and by necessity be inward looking. The potential reader base for such a book is exactly equal to the number of "content creators" plus the number of their helicopter moms who support their self indulgent pursuits.


The main point actually is to engage with the stupid 'emergentist' idea that Wikipedia is some magical Web 2.0 phenomenon that we have never seen before.

Demonstrating that actually Wikipedia has a division of labour on fairly conventional lines, with some workers doing repetitive 'unskilled' sorts of work, others doing the craftsmanlike bits like polishing up articles into some semblance of quality, others doing managerial tasks, others acting like policeman, and finally its own security force.

By demonstrating that, you have shown that, to the extent that Wikipedia works, it works because of old-fashioned methods of editorial oversight and control. Not the magic pixy-dust of crowdsourcing.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Ottava
post
Post #226


Ãœber Pokemon
********

Group: Contributors
Posts: 2,917
Joined:
Member No.: 7,328



QUOTE(Peter Damian @ Wed 2nd November 2011, 3:58am) *

QUOTE(GlassBeadGame @ Tue 1st November 2011, 9:15pm) *

QUOTE(Peter Damian @ Tue 1st November 2011, 3:27am) *


I'm very sorry about this. I really hadn't meant to offend - the 'content contributors' are the main group I would like to defend in the book, as it happens.



But of all the failed books that might be possible one focusing on the "defense of content contributors" would be one of the worse. It would fail to address any social criticism of Wikipedia and by necessity be inward looking. The potential reader base for such a book is exactly equal to the number of "content creators" plus the number of their helicopter moms who support their self indulgent pursuits.


The main point actually is to engage with the stupid 'emergentist' idea that Wikipedia is some magical Web 2.0 phenomenon that we have never seen before.

Demonstrating that actually Wikipedia has a division of labour on fairly conventional lines, with some workers doing repetitive 'unskilled' sorts of work, others doing the craftsmanlike bits like polishing up articles into some semblance of quality, others doing managerial tasks, others acting like policeman, and finally its own security force.

By demonstrating that, you have shown that, to the extent that Wikipedia works, it works because of old-fashioned methods of editorial oversight and control. Not the magic pixy-dust of crowdsourcing.


Indeed. One need only see how being featured on a mainpage actually destroys articles than helps them. Crowd sourcing only brings in amateurs who destroy pages with crazy ideas - wanting to put up original research, plagiarism, etc. Crowd sourcing has never worked because people are not actually equal.

The "equality" in the Declaration of Independence is the same equality as found in Hobbes - everyone was born with the ability to kill another and we all eventually die (i.e. no one is invincible). That is the only equality that exists. Not everyone is smart, not everyone talented, etc. Very rarely does anyone but a gifted writer actually contribute.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
timbo
post
Post #227


Member
***

Group: Contributors
Posts: 102
Joined:
Member No.: 21,141



QUOTE(Peter Damian @ Wed 2nd November 2011, 12:58am) *

The main point actually is to engage with the stupid 'emergentist' idea that Wikipedia is some magical Web 2.0 phenomenon that we have never seen before.

Demonstrating that actually Wikipedia has a division of labour on fairly conventional lines, with some workers doing repetitive 'unskilled' sorts of work, others doing the craftsmanlike bits like polishing up articles into some semblance of quality, others doing managerial tasks, others acting like policeman, and finally its own security force.

By demonstrating that, you have shown that, to the extent that Wikipedia works, it works because of old-fashioned methods of editorial oversight and control. Not the magic pixy-dust of crowdsourcing.


I'd buy that book.

There are other aspects of WP that haven't been sufficiently covered. For all the carping on this site, it must be admitted that WP does a pretty amazing job keeping vandalism levels as low as it does. The whole article review/vandalism control process merits study. I'm not positive P.D. is dispassionate enough to analyze the processes that keep masses of skin healthy rather than to obsess upon warts, freckles, and blemishes.

Mixing my metaphors: if Wikipedia is a men's room painted white with Sharpie pens sitting in a big dish next to the sink, it is a remarkably clean and well-maintained men's room. How exactly does that manage to happen?

tim

This post has been edited by timbo:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #228


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(timbo @ Wed 2nd November 2011, 4:40pm) *

There are other aspects of WP that haven't been sufficiently covered. For all the carping on this site, it must be admitted that WP does a pretty amazing job keeping vandalism levels as low as it does. The whole article review/vandalism control process merits study. I'm not positive P.D. is dispassionate enough to analyze the processes that keep masses of skin healthy rather than to obsess upon warts, freckles, and blemishes.

Mixing my metaphors: if Wikipedia is a men's room painted white with Sharpie pens sitting in a big dish next to the sink, it is a remarkably clean and well-maintained men's room. How exactly does that manage to happen?

tim


‘Vandalism’ can be simply defined as the stuff that the vandalism patrol processes pick up. Let’s suppose the men’s room has two kinds of pen, the ordinary kind, and those which are only visible under UV light. Then, to those without the right kind of spectacles, it looks nice and clean and white. To those with the specs, it is filled with all kinds of noxious graffiti. I give the following edit as a prime example

http://en.wikipedia.org/w/index.php?title=...oldid=368771238 (18 June 2010)

QUOTE

In Bouvard et Pécuchet, Gustave Flaubert made fun of 18th and 19th century attempts to catalogue, classify, list, and record all of scientific and historical knowledge. To what extent is Wikipedia is an unaware continuation of the “Enlightenment” projects that Flaubert so brilliantly mocked?


which was not reverted for nearly a year. Very clever. There’s plenty more stuff like that, but you will have to get the book to find out.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
SB_Johnny
post
Post #229


It wasn't me who made honky-tonk angels
*******

Group: Regulars
Posts: 2,128
Joined:
Member No.: 8,272



QUOTE(timbo @ Wed 2nd November 2011, 12:40pm) *

QUOTE(Peter Damian @ Wed 2nd November 2011, 12:58am) *

By demonstrating that, you have shown that, to the extent that Wikipedia works, it works because of old-fashioned methods of editorial oversight and control. Not the magic pixy-dust of crowdsourcing.

I'd buy that book.

Is this a case of the blind leading the clueless, or is it the other way around? (IMG:smilys0b23ax56/default/blink.gif)

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
SB_Johnny
post
Post #230


It wasn't me who made honky-tonk angels
*******

Group: Regulars
Posts: 2,128
Joined:
Member No.: 8,272



QUOTE(Kelly Martin @ Mon 31st October 2011, 5:59pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 4:57pm) *

QUOTE(Kelly Martin @ Mon 31st October 2011, 9:55pm) *

QUOTE(Peter Damian @ Mon 31st October 2011, 4:45pm) *
What is your qualification in statistics, Kelly?
What does that matter?
It matters a lot.
Your statement is conclusory and unsupported by evidence. Seems to be a pattern with you.

I think that what Kelly was trying to point out here is that you seem to be using inductive reasoning based upon an intuition to prove a hypothesis about a proposed statistical model based upon a vanishingly small and otherwise arbitrary data set which is of questionable relevance to the topic you are trying to address.

Yes, that was a grammatically correct sentence, and needed no commas.

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post
Post #231


blah
*********

Group: Regulars
Posts: 5,919
Joined:
Member No.: 5,066



QUOTE(SB_Johnny @ Wed 2nd November 2011, 1:05pm) *

Is this a case of the blind leading the clueless, or is it the other way around? (IMG:smilys0b23ax56/default/blink.gif)

At least Carrite is starting to show interest in what is being said here.
In the past, he would show up on WR primarily to post incoherent hosannahs to the Magic Pedia,
and objections to any variety of criticism of it. So please, don't complain.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
SB_Johnny
post
Post #232


It wasn't me who made honky-tonk angels
*******

Group: Regulars
Posts: 2,128
Joined:
Member No.: 8,272



QUOTE(EricBarbour @ Wed 2nd November 2011, 4:20pm) *

QUOTE(SB_Johnny @ Wed 2nd November 2011, 1:05pm) *

Is this a case of the blind leading the clueless, or is it the other way around? (IMG:smilys0b23ax56/default/blink.gif)

At least Carrite is starting to show interest in what is being said here.
In the past, he would show up on WR primarily to post incoherent hosannahs to the Magic Pedia,
and objections to any variety of criticism of it. So please, don't complain.

Fair enough, but he's still clueless, even if not hopelessly so. (IMG:smilys0b23ax56/default/rolleyes.gif)
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post
Post #233


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined:
Member No.: 4,212



QUOTE(SB_Johnny @ Wed 2nd November 2011, 8:15pm) *

I think that what Kelly was trying to point out here is that you seem to be using inductive reasoning based upon an intuition to prove a hypothesis about a proposed statistical model based upon a vanishingly small and otherwise arbitrary data set which is of questionable relevance to the topic you are trying to address.

Yes, that was a grammatically correct sentence, and needed no commas.


I think she (and perhaps you) is very confused about what I was doing. I was not trying to infer properties of a wider distribution (e.g. all Wikipedia editors) from the properties of a smaller one (current admins). I simply calculated the actual distribution by tabulating the epp of every admin. And since I did this for all 720 admins, it follows there is no statistical model. It is a fact that, as of October 2011, the epp had that distribution.

The second piece of reasoning was behavioural, not statistical. Namely, that 'flitters' - editors who move quickly from article to article, are unlikely to be adding 'content', and more likely to be doing monotonous repetitive work.

This reasoning leads to two tentative conclusion.

First tentative conclusion: editors with an epp of less than 2 are highly likely to be doing monotonous repetitive work. This is borne out by examining the their edits.

Second tentative conclusion: editors with epp greater than 5 are highly likely to be adding content. This is borne out by examining their edits, and by lots of those badge things on their user page.

epps between 2and 5: hard to say. Thought experiment: one editor has 5,000 edits to 10 articles. epp = 500, very high. Another editor also has 5,000 edits to 10 articles, but contaminates this with vandalism work on another 5,000 different articles. Average epp = 2. Yet they both have the same 'content' work.

The distributions I haven't looked at, but intend to are

(1) Individual article edit distributions. Under the crowdsourcing hypothesis, the distribution should be even, which no large maxima. My contrary hypothesis is that we will find many or all articles will have 'tails', with individual editors adding large blobs of content.

(2) Individual editor article edit distributions. Under the crowdsourcing hypothesis, editors will have an even distribution of edits across articles. The contrary hypothesis is that some editors will have a 'tail' in their edit distribution, with large contributions to some articles.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #234


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



QUOTE(Peter Damian @ Thu 3rd November 2011, 9:35am) *

QUOTE(SB_Johnny @ Wed 2nd November 2011, 8:15pm) *

I think that what Kelly was trying to point out here is that you seem to be using inductive reasoning based upon an intuition to prove a hypothesis about a proposed statistical model based upon a vanishingly small and otherwise arbitrary data set which is of questionable relevance to the topic you are trying to address.

Yes, that was a grammatically correct sentence, and needed no commas.


I think she (and perhaps you) is very confused about what I was doing. I was not trying to infer properties of a wider distribution (e.g. all Wikipedia editors) from the properties of a smaller one (current admins). I simply calculated the actual distribution by tabulating the epp of every admin. And since I did this for all 720 admins, it follows there is no statistical model. It is a fact that, as of October 2011, the epp had that distribution.

The second piece of reasoning was behavioural, not statistical. Namely, that 'flitters' - editors who move quickly from article to article, are unlikely to be adding 'content', and more likely to be doing monotonous repetitive work.

This reasoning leads to two tentative conclusion.

First tentative conclusion: editors with an epp of less than 2 are highly likely to be doing monotonous repetitive work. This is borne out by examining the their edits.

Second tentative conclusion: editors with epp greater than 5 are highly likely to be adding content. This is borne out by examining their edits, and by lots of those badge things on their user page.

epps between 2and 5: hard to say. Thought experiment: one editor has 5,000 edits to 10 articles. epp = 500, very high. Another editor also has 5,000 edits to 10 articles, but contaminates this with vandalism work on another 5,000 different articles. Average epp = 2. Yet they both have the same 'content' work.

The distributions I haven't looked at, but intend to are

(1) Individual article edit distributions. Under the crowdsourcing hypothesis, the distribution should be even, which no large maxima. My contrary hypothesis is that we will find many or all articles will have 'tails', with individual editors adding large blobs of content.

(2) Individual editor article edit distributions. Under the crowdsourcing hypothesis, editors will have an even distribution of edits across articles. The contrary hypothesis is that some editors will have a 'tail' in their edit distribution, with large contributions to some articles.

All this is gonna make gripping reading in your forthcoming book about WP. Can hardly wait to get my hands on a copy. When and by whom is it due to be published? (If the stats haven't changed by then).
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
carbuncle
post
Post #235


Fat Cat
******

Group: Regulars
Posts: 1,601
Joined:
Member No.: 5,544



QUOTE(timbo @ Wed 2nd November 2011, 4:40pm) *

There are other aspects of WP that haven't been sufficiently covered. For all the carping on this site, it must be admitted that WP does a pretty amazing job keeping vandalism levels as low as it does. The whole article review/vandalism control process merits study. I'm not positive P.D. is dispassionate enough to analyze the processes that keep masses of skin healthy rather than to obsess upon warts, freckles, and blemishes.

Mixing my metaphors: if Wikipedia is a men's room painted white with Sharpie pens sitting in a big dish next to the sink, it is a remarkably clean and well-maintained men's room. How exactly does that manage to happen?

I believe that even the harshest critics of WP would agree that the majority of the obvious vandalism is caught quite quickly. How it happens is not mysterious - edit filters prevent some of it, ClueBot reverts the really obvious stuff, and WP editors get most of what is left. Occasionally, a reader of WP will themselves attempt to edit vandalism out of an article.

The more interesting question is why anyone would leave a big dish of Sharpie pens in a white men's room if they didn't want people to write on the walls, and why they would keep painting those walls white over and over and over again. We'll ignore the first question for now.

ClueBot is simply following the second law of robotics, so we can ignore it. I assume that readers of WP are motivated by the desire to have a sensible article and prevent other readers from seeing the same joke/nonsense/obscenity. So what motivates the WP editors who remove the rest of the vandalism? For some, personal interest in the subject (I watchlist anything related to nacktreiten, for instance) is the motivator. Others may genuinely be concerned about accuracy (etc) in WP.

But most of the simple stuff is caught by "vandal-fighters" and "new page patrollers" who are simply playing a video game. It would be a really boring video game if they didn't know that there was a person on the other side of that IP or throw-away username. And that's why there is a big dish of Sharpies - the game would soon end if there wasn't...
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
iii
post
Post #236


Member
***

Group: Contributors
Posts: 114
Joined:
Member No.: 38,992



QUOTE(Kelly Martin @ Mon 31st October 2011, 5:52pm) *

What I want is a test. That is, I want a decisional rule: something like "if editor's epp < 3.0, then editor is a content creator, with p=0.8". There are rigorous methods for adducing such decisional rules from appropriate sample data. But the proposed rules that have been offered so far are not derived using those methods; they are instead just generated ad hoc. This is appropriate for the investigatory phase of the analysis, but you can't just stop there.

And none of the hypotheses I've seen thrown out have been rigorously tested, even though in most cases I think they can be, in some cases fairly easily. Why is this?


If one really wanted to do research with the publicly available Wikipedia database along these lines, I would recommend discriminating between models based on something like a Bayesian information criterion. That would be convincing, in my opinion that I'm sure no one cares about but I'll offer anyway. Demanding a frequentist test or asking for a p-value is almost as mealy-mouthed as doing subjective categorizations. If I cycle through enough arbitrary demarcations, I can get to arbitrary low p-values that'll knock your socks off. (Incidentally, high p-values tell you absolutely nothing: they just tell you that you can't statistically discriminate between two groups, but maybe I'm getting too pedantic here.)

But, if the goal is to discover something about the Wikipedia hell-hole, there needs to be a forum where theorists about the subject can generate hypotheses. To that end, I guess that WR works just as well as any other crucible and arguing that the generation of such hypotheses is ad hoc is irrelevant if your test of the hypothesis is rigorous enough. Some decent hypotheses have been offered by certain people in these fora, and it shouldn't be too onerous to analyze them, if that's what one wants to do. I'm not sure to what end this is worthwhile. If someone here discovers that Wikipedia is failing (or not failing) for X reason, congratulations. But nobody outside the inbred Wikipedia and Wikipedia-hangers-on community is going to care.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Abd
post
Post #237


Postmaster
*******

Group: Regulars
Posts: 1,919
Joined:
From: Northampton, MA, USA
Member No.: 9,019



The kind of research being suggested here would be fine on Wikiversity. All that's needed is to avoid specific criticism of specific editors. Wikistudies are welcome there, otherwise.

A problem has been that some users have used Wikiversity as a platform from which to attack other users (mostly from Wikipedia).

I used Wikiversity, a user page, to document my block evasion on Wikipedia. That page did not coordinate blocking, did not attack Wikipedia (though it could be argued, I suppose, that my evasion was an attack), and did not criticize specific users, it reported responses neutrally. I suppose that someone could claim selection bias, but I did try to avoid that.

And the result was actually positive on Wikipedia. Certain excessive responses were noticed and corrected. A policy was rewritten. It was rare, I think, that a banned user would document his IP edits! Mostly banned users try to avoid detection. So little or no information can be compiled.

There was an attempt to delete this on Wikiversity, by someone who didn't understand it, I think. It failed. But "attack pages" are frequently deleted.

In any case, welcome to Wikiversity, anyone who wants to do original research in wiki studies. Let me know what you are doing, I'm User:Abd there, and I may be able to help. Wikiversity may be the only WMF wiki that allows original research. It is not an encyclopedia!

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Detective
post
Post #238


Senior Member
****

Group: Contributors
Posts: 331
Joined:
Member No.: 35,179



QUOTE(Abd @ Sat 5th November 2011, 12:07am) *

Wikiversity may be the only WMF wiki that allows original research.

That's not true at all, although the WV sites (in several languages) may be the only ones to admit to it.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
the fieryangel
post
Post #239


the Internet Review Corporation is watching you...
********

Group: Regulars
Posts: 2,990
Joined:
From: It's all in your mind anyway...
Member No.: 577



QUOTE(A Horse With No Name @ Tue 1st November 2011, 4:30pm) *

QUOTE(Malleus @ Mon 31st October 2011, 6:25pm) *

Are you sure you don't mean "outlaw" rather than "outlier"?


And speaking of "outlaw"

(IMG:http://upload.wikimedia.org/wikipedia/en/b/ba/The_Outlaw_poster.jpg)

(IMG:smilys0b23ax56/default/evilgrin.gif) (IMG:smilys0b23ax56/default/evilgrin.gif) (IMG:smilys0b23ax56/default/evilgrin.gif) (IMG:smilys0b23ax56/default/evilgrin.gif)


The GLBT element here says "hubba hubba"!

Now, isn't it really weird that there is no hard demographic data on WP users? You'd think that, after all this time, that there would be some stats but there's just...pratically nothing.

That's really, really strange...
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post
Post #240


Senior Member
****

Group: Contributors
Posts: 270
Joined:
From: Southern Africa
Member No.: 61,155



QUOTE
'the fieryangel' - 'Sat 5th November 2011, 10:52pm' :
Now, isn't it really weird that there is no hard demographic data on WP users? You'd think that, after all this time, that there would be some stats but there's just...pratically nothing.

That's really, really strange...

No need for "hard demographic data". Everybody knows which particular demographic group dominates WP. It's also happens to be a very patriotic demographic group, of which Aldous Huxley once made a pertinent observation:
"One of the great attractions of patriotism - it fulfills our worst wishes. In the person of our nation we are able, vicariously, to bully and cheat. Bully and cheat, what's more, with a feeling that we are profoundly virtuous. " - Aldous Huxley

This post has been edited by communicat:
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now:
 
     
FORUM WARNING [2] Cannot modify header information - headers already sent by (output started at /home2/wikipede/public_html/int042kj398.php:242) (Line: 0 of Unknown)