The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

> General Discussion? What's that all about?

This subforum is for general discussion of Wikipedia and other Wikimedia projects. For a glossary of terms frequently used in such discussions, please refer to Wikipedia:Glossary. For a glossary of musical terms, see here. Other useful links:

Akahele.orgWikipedia-WatchWikitruthWP:ANWikiEN-L/Foundation-L (mailing lists) • Citizendium forums

7 Pages V « < 5 6 7  
Reply to this topicStart new topic
> Content contributors, statistical analysis
Ceoil
post Mon 7th November 2011, 2:47am
Post #121


Junior Member
**

Group: Contributors
Posts: 56
Joined: Sun 7th Sep 2008, 2:33pm
Member No.: 8,131



Oh for fuck sake. If you just wanted to cram in a Huxley quote, you could at least come up with an insightful observation to justify it. What are you, 16? People this website used to be fun, what happened? I miss the old days!

This post has been edited by Ceoil: Mon 7th November 2011, 2:52am
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post Mon 7th November 2011, 12:01pm
Post #122


Senior Member
****

Group: Contributors
Posts: 270
Joined: Sun 31st Jul 2011, 11:31am
From: Southern Africa
Member No.: 61,155

WP user page - talk
check - contribs



QUOTE(Ceoil @ Mon 7th November 2011, 4:47am) *

Oh for fuck sake. If you just wanted to cram in a Huxley quote, you could at least come up with an insightful observation to justify it. What are you, 16? People this website used to be fun, what happened? I miss the old days!

The "insightful observation" you require is that the quote pertains to the demographic fact of which demographic group dominates WP (and WR), and relationship of that particular demographic group to content. You will have noticed this topic is headed "Content contributors". Do I really have to spell it out for you?

BTW the Huxley quote is lifted directly from Tarc's hypocritical profile, and Tarc's "insightful observations" are clear and present at the related thread that some mod moved to the tar-pit annex, because people here apparently found my observations embarrassing and they just couldn't handle the truth. The tar-pit topic is misleadingly headed "Communicat being disagreeable". http://wikipediareview.com/index.php?showt...view=getnewpost

Another BTW: Not only was the thread in question deemed "off topic", which it was not, but the now "off topic" topic has somehow been coded in such a way that it does not even show up in the tar-pit index. If that's not censorship, then I don't know what is. Thank you for your interest.


This post has been edited by communicat: Mon 7th November 2011, 12:32pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post Mon 7th November 2011, 7:18pm
Post #123


blah
*********

Group: Regulars
Posts: 5,919
Joined: Mon 25th Feb 2008, 2:31am
Member No.: 5,066

WP user page - talk
check - contribs



QUOTE(Ceoil @ Sun 6th November 2011, 6:47pm) *

People this website used to be fun, what happened? I miss the old days!

It was more fun---when your bud was still contributing snotty asides.

He's been really scarce. WR needs more snotty asides.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
communicat
post Tue 8th November 2011, 1:59pm
Post #124


Senior Member
****

Group: Contributors
Posts: 270
Joined: Sun 31st Jul 2011, 11:31am
From: Southern Africa
Member No.: 61,155

WP user page - talk
check - contribs



QUOTE
EricBarbour - Mon 7th November 2011, 9:18pm

WR needs more snotty asides.

Gomi might disagree with you. See his recent message to The Cat, verbatim below:

Cat - You are on the verge of being suspended. Knock off the ad hominem versus Kohs and everyone else, and chill out, because my patience is wearing thing. If you want to argue on-topic, Wikipedia-related issues without the personal insults, please feel free, but consider yourself warned.

-- gomi


BTW, I never did figure out what "thing" he says his patience is wearing. Some kind of biblical thing, maybe? Whoops, sorry, didn't mean that as a personal insult.

This post has been edited by communicat: Tue 8th November 2011, 2:00pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
thekohser
post Tue 8th November 2011, 2:22pm
Post #125


Member
*********

Group: Regulars
Posts: 10,274
Joined: Thu 1st Feb 2007, 10:21pm
Member No.: 911



QUOTE(communicat @ Tue 8th November 2011, 8:59am) *

Whoops, sorry, didn't mean that as a personal insult.


Any time now, Gomi.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Maunus
post Sat 26th November 2011, 3:24pm
Post #126


New Member
*

Group: Contributors
Posts: 46
Joined: Wed 23rd Nov 2011, 1:37am
Member No.: 71,134

WP user page - talk
check - contribs



How do I calculate where I fit in the contributor spectrum? I don't understand the scales on the axes - I assume the horizontal axis isn't scaled simply by epp since figures can be negative.

My stats are:

epp 7.36

Article space edits: 52.89%
Talk page edits: 24.24%
User talk space edits 9.32%
Wikipedia space edits: 7.16%
Wikipedia talk space edits 1.90%


Also I am wondering what it means when editors have few edits per page AND mostly non-article space edits. What kind of wiki activity does that cover? I would assume that ANI hangarounds and socialites would have many edits per page - either in project space or user talk space.

Also, where do article talk edits figure? In my experience high article talk contribution rates suggest editors engaged in discussion at articles for controversial topics - and editors using talkpages of controversial articles as discussion fora. Also what is the difference between editors with a high article space to article talk space ration - and editors with about similar rates between the two. I would think that the formed tend to write content in non-controversial, low-interest topics and therefore get to write the entire article without having to interact with anyone on the talkpage. Whereas editors who contribute content to topics that are on many editors watchlists would have to interact more frequently on the talkpage. So perhaps statistics can also say something about the kind of topics that content contributers edit.

This post has been edited by Maunus: Sat 26th November 2011, 3:30pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post Sat 26th November 2011, 3:32pm
Post #127


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined: Tue 18th Dec 2007, 9:25pm
Member No.: 4,212

WP user page - talk
check - contribs



QUOTE(Maunus @ Sat 26th November 2011, 3:24pm) *

How do I calculate where I fit in the contributor spectrum? I don't understand the scales on the axes - I assume the horizontal axis isn't scaled simply by epp since figures can be negative.

My stats are:

epp 7.36

Article space edits: 52.89%
Talk page edits: 24.24%
User talk space edits 9.32%
Wikipedia space edits: 7.16%
Wikipedia talk space edits 1.90%


Also I am wondering what it means when editors have few edits per page AND mostly non-article space edits. What kind of wiki activity does that cover? I would assume that ANI hangarounds and socialites would have many edits per page - either in project space or user talk space.

Also, where do article talk edits figure? In my experience high article talk contribution rates suggest editors engaged in discussion at articles for controversial topics - and editors using talkpages of controversial articles as discussion fora. Also what is the difference between editors with a high article space to article talk space ration - and editors with about similar rates between the two. I would think that the formed tend to write content in non-controversial, low-interest topics and therefore get to write the entire article without having to interact with anyone on the talkpage. Whereas editors who contribute content to topics that are on many editors watchlists would have to interact more frequently on the talkpage. So perhaps statistics can also say something about the kind of topics that content contributers edit.


I recently revised the metric - see here http://www.logicmuseum.com/x/index.php?tit...diting_patterns . I include talk page in article space, as well as template edits.

Maximum edit to any one page is a more reliable metric, given that the epp is easily contaminated by 'gnome' work. But in any case, an epp of 7.36 is pretty high.

This post has been edited by Peter Damian: Sat 26th November 2011, 3:33pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Anne Sexton
post Wed 30th November 2011, 6:23pm
Post #128


Neophyte


Group: Contributors
Posts: 5
Joined: Tue 29th Nov 2011, 1:57am
From: A writer is essentially a spy. / Dear love, I am that girl.
Member No.: 71,500



I apologize for jumping into this after 7 pages, which I'm sure I've only partly understood.

Peter Damian started the thread with the question: "why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?"

This seems like a reasonable question to me, except for the included assumption that status as measured by adminship is the only kind of status based reward available. It's easily measureable, true, but mighten't there be off-wiki status based rewards available to content contributors, if only internalized ones? I guess this is what Carrite was getting at when he talked about feeling rewarded.

The most interesting thing to me (and the one that finally got me to register an account here) is the discussion of the metric for deciding who's a content contributor. This: http://arxiv.org/abs/1002.0561 (maybe you've seen it?) proposes a metric for measuring quality of contributed content, which is the ratio of amount material surviving at the time of measurement to material added to article (w_surv / w_new). They find that material surviving after 5 edits correlates closely enough with material surviving indefinitely that they can use that to approximate w_surv.

I'm just thinking that maybe this kind of thing would let you ignore non-article space edits in your calculations, and decide who the content contributors are by ranking them according to amount of quality material added?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
thekohser
post Wed 30th November 2011, 7:43pm
Post #129


Member
*********

Group: Regulars
Posts: 10,274
Joined: Thu 1st Feb 2007, 10:21pm
Member No.: 911



QUOTE(Anne Sexton @ Wed 30th November 2011, 1:23pm) *

This: http://arxiv.org/abs/1002.0561 (maybe you've seen it?) proposes a metric for measuring quality of contributed content, which is the ratio of amount material surviving at the time of measurement to material added to article (w_surv / w_new). They find that material surviving after 5 edits correlates closely enough with material surviving indefinitely that they can use that to approximate w_surv.


I believe we discussed that metric or a similar one a couple of years ago. I believe we found that the metric could be easily botched if an administrator was fond of "moving" articles to a new page name, so that all of the previous work would be "credited" to the "new" article creator, who really had nothing to do with the content's quality. Also, I think the metric was easily botched, even if someone moved sentences or paragraphs around on the page -- the tool seemed to count all of the moved content as "new" content, and credit again went to the fiddler, not the creator.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
gomi
post Wed 30th November 2011, 7:53pm
Post #130


Member
********

Group: Members
Posts: 3,022
Joined: Fri 17th Nov 2006, 6:38pm
Member No.: 565



First, welcome to the Review, and thank you for a very thoughtful contribution as your first post.

QUOTE(Anne Sexton @ Wed 30th November 2011, 10:23am) *
... the included assumption that status as measured by adminship is the only kind of status based reward available. It's easily measureable, true, but mighten't there be off-wiki status based rewards available to content contributors, if only internalized ones?
There are, no doubt, extrinsic as well as intrinsic rewards associated with Wikipedia editing. Perhaps the most obvious is the satisfaction of influencing others on a topic, especially a controversial one. There are also, to be sure, more positive or socially-acceptable extrinsic rewards. But I doubt that a significant minority of editors, even within the domain of a topic area, can agree on them. When they do agree, you get well-functioning "Wiki Project" teams, of which they are a few.

QUOTE(Anne Sexton @ Wed 30th November 2011, 10:23am) *
The most interesting thing to me ... is the discussion of the metric for deciding who's a content contributor. This: http://arxiv.org/abs/1002.0561 ... proposes a metric for measuring quality of contributed content
Specifically, this para:
QUOTE
The quality of a contribution is measured in terms of Wnew, the number of new words added by a user to Wikipedia articles, such that the words were not present in any previous revisions of those articles. We found a high correlation between the number of new words that survive 5 revisions, and the number Wsurv that survive to the last revision of the article ( > 0:97), consistent with previous analyses of edit persistence. We therefore constructed a simple metric by taking the proportion of new words introduced by the user that are retained in the last version of a suciently frequently edited article: Wsurv=Wnew.

I am pretty dubious about this metric. The other metrics in the article, such as "Best Answer" selections by peers, all seem better than this. Surviving text over 5 revisions may indicate edit-warring, article ownership, or simply selecting non-controversial articles. Frequent serial revisers (those who make 10 or 30 or 50 small revisions in a row) would be "high quality" by this measure. No attempt in the paper is made to test this metric against a qualitative or "reader-rating" score of article quality. That is where I think this analysis falls down.

Also, the measure of quality for an "encyclopedia" article is (or should be) substantially different from a self-help "how to fix your PC" online forum or the equivalent.
QUOTE(Anne Sexton @ Wed 30th November 2011, 10:23am) *
I'm just thinking that maybe this kind of thing would let you ignore non-article space edits in your calculations, and decide who the content contributors are by ranking them according to amount of quality material added?
A measure of "contribution to article stability" might very well be interesting, but I don't think it is a proxy for "quality".
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Anne Sexton
post Wed 30th November 2011, 9:05pm
Post #131


Neophyte


Group: Contributors
Posts: 5
Joined: Tue 29th Nov 2011, 1:57am
From: A writer is essentially a spy. / Dear love, I am that girl.
Member No.: 71,500



QUOTE(gomi @ Wed 30th November 2011, 7:53pm) *

First, welcome to the Review, and thank you for a very thoughtful contribution as your first post.


Thanks!

QUOTE(gomi @ Wed 30th November 2011, 7:53pm) *

I am pretty dubious about this metric. The other metrics in the article, such as "Best Answer" selections by peers, all seem better than this. Surviving text over 5 revisions may indicate edit-warring, article ownership, or simply selecting non-controversial articles. Frequent serial revisers (those who make 10 or 30 or 50 small revisions in a row) would be "high quality" by this measure. No attempt in the paper is made to test this metric against a qualitative or "reader-rating" score of article quality. That is where I think this analysis falls down.

Also, the measure of quality for an "encyclopedia" article is (or should be) substantially different from a self-help "how to fix your PC" online forum or the equivalent.
QUOTE(Anne Sexton @ Wed 30th November 2011, 10:23am) *
I'm just thinking that maybe this kind of thing would let you ignore non-article space edits in your calculations, and decide who the content contributors are by ranking them according to amount of quality material added?
A measure of "contribution to article stability" might very well be interesting, but I don't think it is a proxy for "quality".


Ah, yeah. Of course you're right, especially about the non-controversial article thing. That didn't even occur to me. Ditto with ownership and editing style. I can see on rereading what I wrote that I wasn't exactly clear about what I meant, also. I was thinking that this metric might be a good starting place for a better metric than EPP for deciding who's a content creator, but only a starting place. What's needed is a way to gauge quality of contributions, and then quantity of quality contributions would gauge content-creator status (so I guess that's just w_surv, measured over some number of revisions). Maybe it's possible to control for confounding variables by grouping articles into cohorts according to how much churn they're experiencing, how many contributors there are, especially maybe how many watchers there are? 2K of edits to an article that a thousand people watch that lasts through some fixed number of revisions is probably better quality than 2K in one only watched by 30.

And you're absolutely right about the problems with that paper in terms of the comparison with the answer forums and reader ratings. On the other hand, if we're talking about what WP editors mean by content creators, it seems reasonable to me to use survival of content (if it's possible to control for the variables you noted) as a relevant kind of reader rating. It's almost certainly different from what non-editing readers might think, but then pleasing them is only one of the many, many uses to which WP lends itself.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
radek
post Wed 30th November 2011, 9:15pm
Post #132


Über Member
*****

Group: Regulars
Posts: 699
Joined: Sat 28th Nov 2009, 10:40pm
Member No.: 15,651

WP user page - talk
check - contribs



QUOTE(Maunus @ Sat 26th November 2011, 9:24am) *

How do I calculate where I fit in the contributor spectrum? I don't understand the scales on the axes - I assume the horizontal axis isn't scaled simply by epp since figures can be negative.

My stats are:

epp 7.36

Article space edits: 52.89%
Talk page edits: 24.24%
User talk space edits 9.32%
Wikipedia space edits: 7.16%
Wikipedia talk space edits 1.90%


Also I am wondering what it means when editors have few edits per page AND mostly non-article space edits. What kind of wiki activity does that cover? I would assume that ANI hangarounds and socialites would have many edits per page - either in project space or user talk space.

Also, where do article talk edits figure? In my experience high article talk contribution rates suggest editors engaged in discussion at articles for controversial topics - and editors using talkpages of controversial articles as discussion fora. Also what is the difference between editors with a high article space to article talk space ration - and editors with about similar rates between the two. I would think that the formed tend to write content in non-controversial, low-interest topics and therefore get to write the entire article without having to interact with anyone on the talkpage. Whereas editors who contribute content to topics that are on many editors watchlists would have to interact more frequently on the talkpage. So perhaps statistics can also say something about the kind of topics that content contributers edit.


I originally just normalized the scale by the average of the non-random sample of editors (basically I just listed the first 30 folks that popped into my head) so your score should be % deviation from the mean. I asked for some help in getting a real random and larger sample of Wikipedia editors with more than 500 or 1000 edits but peoples' been slacking on that. Peter's approach might be better (remember that I was making it up as I was going along).

And the statistics are always going to miss some information. That's what statistics are supposed to be in fact, missers of information.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
EricBarbour
post Wed 30th November 2011, 9:20pm
Post #133


blah
*********

Group: Regulars
Posts: 5,919
Joined: Mon 25th Feb 2008, 2:31am
Member No.: 5,066

WP user page - talk
check - contribs



Welcome to WR, Anne.

Just as an aside: one of the most contentious, difficult and downright politically nasty issues with
Wikipedia is determining article or edit "quality". The WMF is still waving around that badly-devised
Nature magazine study from 2005, while other academics have been squabbling about it ever since.

The problem always will be with Wikipedia's systemic opacity. Because random IP addresses and users
can edit anything, and people are motivated to push POV and game the system for their own
aggrandizement, this will always be an area subject to very loud disputes. It doesn't help that
Wikipedia's reputation in the academic world has been steadily declining. In fact, every schoolteacher
and university professor I've spoken to has said the same thing: they will NOT accept any information
taken from Wikipedia, because it is not trustworthy--not even the references. I suspect this is
discouraging people from studying Wikipedia on a serious basis.

I've tried to figure out a computer-friendly way to rate articles for factual accuracy, with no luck.
If you want to see a couple of (the very few existing) academic papers on Wikipedia article quality, PM me.

This post has been edited by EricBarbour: Wed 30th November 2011, 9:22pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Anne Sexton
post Thu 1st December 2011, 12:15am
Post #134


Neophyte


Group: Contributors
Posts: 5
Joined: Tue 29th Nov 2011, 1:57am
From: A writer is essentially a spy. / Dear love, I am that girl.
Member No.: 71,500



QUOTE(EricBarbour @ Wed 30th November 2011, 9:20pm) *

Welcome to WR, Anne.

Just as an aside: one of the most contentious, difficult and downright politically nasty issues with
Wikipedia is determining article or edit "quality". The WMF is still waving around that badly-devised
Nature magazine study from 2005, while other academics have been squabbling about it ever since.

The problem always will be with Wikipedia's systemic opacity. Because random IP addresses and users
can edit anything, and people are motivated to push POV and game the system for their own
aggrandizement, this will always be an area subject to very loud disputes. It doesn't help that
Wikipedia's reputation in the academic world has been steadily declining. In fact, every schoolteacher
and university professor I've spoken to has said the same thing: they will NOT accept any information
taken from Wikipedia, because it is not trustworthy--not even the references. I suspect this is
discouraging people from studying Wikipedia on a serious basis.

I've tried to figure out a computer-friendly way to rate articles for factual accuracy, with no luck.
If you want to see a couple of (the very few existing) academic papers on Wikipedia article quality, PM me.


Thanks, I will do that. I'm fascinated by the way people want to decide once and for all if Wikipedia is "reliable" or not. It seems so beside the point. Jimmy Wales is a buffoon, who doesn't have the good sense to know what a marvel he's created. Judging Wikipedia by him is like judging the Libyan people by Muammar Gaddafi. Wikimedia Foundation seems to be staffed by incompetent weirdos who have no idea what it is they're running. I think you're right about why Academics haven't studied Wikipedia seriously much. Academics who criticize Wikipedia don't often know what it is that they're talking about either because they're so wrapped up in the idea of epistemelogical authority. It's a fascinating phenomenon. I haven't been this entertained since the golden age of usenet. The most incisive comment I've seen on this forum about the nature of Wikipedia is The Glass Bead Game's username. That's exactly what it's like, but turned up to 11. Anyway, I'm drawing the thread far off topic, and will stop now and PM you.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Peter Damian
post Thu 9th February 2012, 6:06pm
Post #135


I have as much free time as a Wikipedia admin!
*********

Group: Regulars
Posts: 4,400
Joined: Tue 18th Dec 2007, 9:25pm
Member No.: 4,212

WP user page - talk
check - contribs



I have updated the editing patterns http://www.logicmuseum.com/x/index.php?tit...diting_patterns table to include a number of new editors. Notably user:Fae, who has been so famous recently. It irritates me slightly that his great and magnificent contributions to Wikipedia are so frequently lauded in his defence. The statistics confirm exactly what you would expect from a cursory examination of his edits. He has a very low ‘maximum edits in article space’ (column ‘A’ in the table linked to above). This usually indicates gnomish or patrolling behaviour with editing dispersed over a large number of different pages. (Content contributors, by contrast, focus their editing intensely on a single page as they build an article).

Fae’s recent activity seems largely confined to slapping ‘welcome’ templates onto new account pages. I’m not sure why he does this, given that he is a director of Wikimedia UK. Is it something to do with building up an edit count? I know a lot of people on the Wiki talk of hs 50,000 edits with something approaching awe and great respect.

The other thing they talk about is Fae’s hugely important and valuable work on the Hoxne Hoard http://en.wikipedia.org/wiki/Hoxne_Hoard article. I looked at that too, but couldn’t see anything from Fae but spelling corrections, linking, adding templates and references. All very important but not ‘content creation’ as I would understand it.

This post has been edited by Peter Damian: Thu 9th February 2012, 6:07pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

7 Pages V « < 5 6 7
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now: 18th 5 13, 6:32pm