The Wikipedia Review: A forum for discussion and criticism of Wikipedia
Wikipedia Review Op-Ed Pages

Welcome, Guest! ( Log In | Register )

5 Pages V < 1 2 3 4 5 >  
Reply to this topicStart new topic
> Open Source Diva named CTO
carbuncle
post Tue 8th March 2011, 1:14pm
Post #41


Fat Cat
******

Group: Regulars
Posts: 1,601
Joined: Sun 30th Mar 2008, 4:48pm
Member No.: 5,544



Perhaps the timing of this has something to do with their experiences rolling out the recent Wikimedia 1.17 release?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
anthony
post Tue 8th March 2011, 1:26pm
Post #42


Postmaster
*******

Group: Regulars
Posts: 2,034
Joined: Mon 30th Jul 2007, 1:31am
Member No.: 2,132



QUOTE(thekohser @ Tue 8th March 2011, 3:57am) *

I wonder if the fact that Brion Vibber is being re-hired is any indication of:
  1. Danese doesn't know enough about computers to do Brion's job
  2. Once you're a Wikimediot, you can't last long in the "outside" work world
Maybe a little bit of both?


The roles of CTO and Lead Architect are very very different. It is extremely rare for someone to simultaneously be the best candidate for both positions.

I think this position will be much better suited for Brion.

Of course, I couldn't help but catching this nugget: "Since I joined WMF in February 2010, I have been looking for a Lead Architect to work on the future of the platform (both for our use and for the thousands of wikis that run on our engine)." (Translation: "both for our use and for the use of Wikia")
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Zoloft
post Tue 8th March 2011, 4:48pm
Post #43


May we all find solace in our dreams.
******

Group: Regulars
Posts: 1,332
Joined: Fri 15th Jan 2010, 11:08pm
From: Erewhon
Member No.: 16,621



Ultimately, if MediaWiki fails to move away from PHP, this by itself could be what kills Wikipedia.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Tue 8th March 2011, 6:10pm
Post #44


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(Zoloft @ Tue 8th March 2011, 10:48am) *
Ultimately, if MediaWiki fails to move away from PHP, this by itself could be what kills Wikipedia.
Jimmy Wales, and by extension Wikimedia, is far too conservative to "gamble" on a platform change at this late stage of the game. In Wikipedia's ten years there has been almost no functional or technical change in the Wikipedia platform. Contrast Facebook, whose current platform bears absolutely no technical resemblance and fairly little functional resemblance to what it was like when it premiered.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Zoloft
post Tue 8th March 2011, 6:52pm
Post #45


May we all find solace in our dreams.
******

Group: Regulars
Posts: 1,332
Joined: Fri 15th Jan 2010, 11:08pm
From: Erewhon
Member No.: 16,621



QUOTE(Kelly Martin @ Tue 8th March 2011, 10:10am) *

QUOTE(Zoloft @ Tue 8th March 2011, 10:48am) *
Ultimately, if MediaWiki fails to move away from PHP, this by itself could be what kills Wikipedia.
Jimmy Wales, and by extension Wikimedia, is far too conservative to "gamble" on a platform change at this late stage of the game. In Wikipedia's ten years there has been almost no functional or technical change in the Wikipedia platform. Contrast Facebook, whose current platform bears absolutely no technical resemblance and fairly little functional resemblance to what it was like when it premiered.

In agreeing with you, I will differ to this extent; the WMF has enough cash to parallel-develop MediaWiki 2.0 while maintaining the 1.x line, then bringing up servers with the newer, faster, more scalable version and importing the database and links.

Properly managed, very low risk, really high upside.

I'm lazy enough to assume without even looking that no such development is actually planned.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Somey
post Wed 9th March 2011, 4:38am
Post #46


Can't actually moderate
*********

Group: Moderators
Posts: 11,814
Joined: Sat 17th Jun 2006, 7:47pm
From: Dreamland
Member No.: 275



QUOTE(Zoloft @ Tue 8th March 2011, 12:52pm) *
Properly managed, very low risk, really high upside.

Ehh, it's that "properly managed" part that seems to be the real sticking point...

I myself am sort of a programmer... It's an interesting question, to me, anyway, what programming language/platform they'd use if they rebuilt MediaWiki from the ground up to take advantage of the "latest technology." I know J2EE isn't bad, but they could do better than that, couldn't they? And isn't it the database (MySQL) that causes the real bottlenecks, as opposed to PHP, or is it the fact that they use both, or that they use an interpreted server-side language in the first place?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Wed 9th March 2011, 1:00pm
Post #47


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(Somey @ Tue 8th March 2011, 10:38pm) *
I know J2EE isn't bad, but they could do better than that, couldn't they? And isn't it the database (MySQL) that causes the real bottlenecks, as opposed to PHP, or is it the fact that they use both, or that they use an interpreted server-side language in the first place?
Given that MediaWiki doesn't have any real need for a fully ACID compliant database, there is no good reason not to use something like HBase (which is what Facebook uses) for the database. The other big performance win would be to rewrite the parser; right now it's a crazy mess of regular expression abuse combined with an XML parser that ends up being expensive in both time and space. Recoding it using either traditional or more modern parsing techniques would likely be a big win on multiple fronts; however, doing so would likely require making some small changes to the markup language. MediaWiki markup is definitely not in LL(n) for any n, and I think it's also not in LR(n) for any n; also, the parser currently requires database access, as the correct parsing of some constructs is dependent on database content. Making a few minor changes to the language "specification" (there really isn't one, just a reference implementation) would avoid both of these problems and make writing a proper parser much easier (that is, possible), but there is considerable reticence to making any change that would "break" Wikipedia.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Cla68
post Wed 9th March 2011, 10:57pm
Post #48


Postmaster
*******

Group: Regulars
Posts: 1,763
Joined: Fri 18th Apr 2008, 5:53pm
Member No.: 5,761

WP user page - talk
check - contribs



QUOTE(Somey @ Tue 8th March 2011, 7:24am) *


I actually agree with Ms. Martin about why Brion Vibber would be willing to return, given that he'll be reporting to Danese Cooper and not Erik Moeller... but Mr. Kohs has actually made an interesting point here, as he often does. Sometimes the WMF management reminds me of kids let loose in a candy store (when they're not reminding me of Lord of the Flies, at least). They have lots of money, no appreciable oversight, no definable performance metrics other than just their ability to keep the websites online... it sounds like loads of fun for an IT person, and a helluva lot better for a dev/DBA than the average corporate IT shop, consulting firm, or software VAR.


Ms Cooper will have earned her salary the day the WMF's main server farm burns down or floods and the backup plan (COOP, or whatever you want to call it) kicks in with a flawless changeover which allows all the wiki-activists to continue trying to save the world using Wikipedia to continue their efforts without pause. Does anyone know where the WMF's main server farm is located and what their COOP plan dictates will happen if it does blow up?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Zoloft
post Wed 9th March 2011, 11:20pm
Post #49


May we all find solace in our dreams.
******

Group: Regulars
Posts: 1,332
Joined: Fri 15th Jan 2010, 11:08pm
From: Erewhon
Member No.: 16,621



QUOTE(Cla68 @ Wed 9th March 2011, 2:57pm) *
QUOTE(Somey @ Tue 8th March 2011, 7:24am) *
I actually agree with Ms. Martin about why Brion Vibber would be willing to return, given that he'll be reporting to Danese Cooper and not Erik Moeller... but Mr. Kohs has actually made an interesting point here, as he often does. Sometimes the WMF management reminds me of kids let loose in a candy store (when they're not reminding me of Lord of the Flies, at least). They have lots of money, no appreciable oversight, no definable performance metrics other than just their ability to keep the websites online... it sounds like loads of fun for an IT person, and a helluva lot better for a dev/DBA than the average corporate IT shop, consulting firm, or software VAR.
Ms Cooper will have earned her salary the day the WMF's main server farm burns down or floods and the backup plan (COOP, or whatever you want to call it) kicks in with a flawless changeover which allows all the wiki-activists to continue trying to save the world using Wikipedia to continue their efforts without pause. Does anyone know where the WMF's main server farm is located and what their COOP plan dictates will happen if it does blow up?

Well, a lot of information about their two server clusters is at the WikiTech wiki.
One cluster is in Tampa, Florida and the other is in Amsterdam.

I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues.

Edit:
Here's a Wikimedia Tech Blog entry about the failure of the automatic failover a year ago:
QUOTE
Due to an overheating problem in our European data center many of our servers turned off to protect themselves. As this impacted all Wikipedia and other projects access from European users, we were forced to move all user traffic to our Florida cluster, for which we have a standard quick failover procedure in place, that changes our DNS entries.

However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally. This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects.

We apologize for the inconvenience this has caused.


Further Edit:
They are right now provisioning a new data center in Ashburn, Virginia although I'm not sure if that's Wikimedia or Wikipedia or both.

Hey! I found their Disaster Recovery Plan! *snicker heehee chortle gasp*

This post has been edited by Zoloft: Wed 9th March 2011, 11:42pm
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
thekohser
post Thu 10th March 2011, 1:21am
Post #50


Member
*********

Group: Regulars
Posts: 10,274
Joined: Thu 1st Feb 2007, 10:21pm
Member No.: 911



QUOTE(Zoloft @ Wed 9th March 2011, 6:20pm) *

Hey! I found their Disaster Recovery Plan! *snicker heehee chortle gasp*


Psst... the Disaster Recovery Plan was outlined in WikiVoices Episode #45. Find the audio tape of that episode, and you'll find the Plan.

fear.gif
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Thu 10th March 2011, 1:30am
Post #51


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm) *
One cluster is in Tampa, Florida and the other is in Amsterdam.

I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues.
They cannot fail from the Tampa cluster to the Amsterdam cluster, not completely: they are using single-master MySQL replication and thus "there can be only one". If they lose the master database server in Tampa, nobody can edit until they get it back (or make a new one: there is supposedly some way to make a slave into a master, although I don't imagine doing so is a pretty process). What "fails over" are the squids and the PHP front ends, of which there are hundreds to make up for the fact that Mediawiki is remarkably slow and inefficient software.

Wikimedia does not have a meaningful disaster recovery plan; rather, when things break they scurry about like mad trying to figure out how to recover from whatever it was that broke. They're fairly good at scurrying, though, so they usually get back up within a fairly short time, and what data losses they've had (and there have been several instances of data loss, for various reasons) have not been serious enough to generate significant upset.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Cla68
post Thu 10th March 2011, 1:44am
Post #52


Postmaster
*******

Group: Regulars
Posts: 1,763
Joined: Fri 18th Apr 2008, 5:53pm
Member No.: 5,761

WP user page - talk
check - contribs



QUOTE(Kelly Martin @ Thu 10th March 2011, 1:30am) *

QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm) *
One cluster is in Tampa, Florida and the other is in Amsterdam.

I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues.
They cannot fail from the Tampa cluster to the Amsterdam cluster, not completely: they are using single-master MySQL replication and thus "there can be only one". If they lose the master database server in Tampa, nobody can edit until they get it back (or make a new one: there is supposedly some way to make a slave into a master, although I don't imagine doing so is a pretty process). What "fails over" are the squids and the PHP front ends, of which there are hundreds to make up for the fact that Mediawiki is remarkably slow and inefficient software.

Wikimedia does not have a meaningful disaster recovery plan; rather, when things break they scurry about like mad trying to figure out how to recover from whatever it was that broke. They're fairly good at scurrying, though, so they usually get back up within a fairly short time, and what data losses they've had (and there have been several instances of data loss, for various reasons) have not been serious enough to generate significant upset.


If I understand you right, it sounds like software limitations prevent them from having a seamless hot-site transfer ability if the Tampa location goes belly-up. Hopefully their new CTO, the subject of this thread, is aware of this and is using some of those millions of dollars in donations to find a solution. Trying to plan and implement a permanent solution on the fly in response to an unforseen emergency probably isn't a very good idea.

Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data?

This post has been edited by Cla68: Thu 10th March 2011, 1:51am
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Zoloft
post Thu 10th March 2011, 2:22am
Post #53


May we all find solace in our dreams.
******

Group: Regulars
Posts: 1,332
Joined: Fri 15th Jan 2010, 11:08pm
From: Erewhon
Member No.: 16,621



Although this is just trivia, I find it amusing that they have a server named 'sanger' but none named 'jimbo' or 'wales' - IT people have their own sense of history.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Thu 10th March 2011, 4:34am
Post #54


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(Cla68 @ Wed 9th March 2011, 7:44pm) *
Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data?
Last I heard there was no comprehensive backup solution whatsoever. For quite a long while there were no complete replicas of anything that weren't in the Tampa data center, but I think they do now have database slaves in Amsterdam from which, in theory, new database masters could be created. The media collection (that is, all the pictures and other nontext digital assets that live in the assorted File: namespaces) is, as far as I know, not replicated in any systematic way, and the loss of the Tampa data center would probably destroy 60% to 90% of the content in Commons. (We can only hope.) As far as I know, there is no systematic offsite backup of any aspect of the environment; they are completely and utterly vulnerable. It makes me twitch just thinking about it.

One outage, a few years ago, was caused by the nonredundant NFS mount point that contained the Mediawiki code being used by all the PHP servers going poof. It took them something like four hours to recover from that, and that was done by grabbing another box, installing the requisite components on it, and scrabbling about for copies of the relevant bits from where ever they could found or reconstructed. Not by restoring a backup, as you'd expect to happen. A responsible operation would have (a) not had such a critical function being served nonredundantly and (b) had multiple forms of backups of the relevant servers in the event that all of them failed simultaneously (or some process caused all replicas to become corrupted). The Wikimedia server team has displayed significant cleverness in keeping Wikimedia running at all, but they are seriously lacking in methodology and discipline.

I think part of their problem is that very few of their people are experienced in operational management; they're mainly developers and the like, and thus they're not experienced in thinking about all the things us professional sysadmins think about all the time. There's also a culture of "getting by with as little as possible", which made sense in the early years but they're flush with cash now and there's no reason to persist in running on a shoe string.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Gruntled
post Thu 10th March 2011, 6:05pm
Post #55


Quite an unusual member
***

Group: On Vacation
Posts: 222
Joined: Tue 2nd Feb 2010, 12:23pm
Member No.: 16,954



QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm) *
One cluster is in Tampa, Florida and the other is in Amsterdam.

If they have servers in Amsterdam, and if you view a page it may have come from there, why isn't Wikipedia subject to Dutch law on copyright and responsibility for the contents of these pages?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
anthony
post Thu 10th March 2011, 6:13pm
Post #56


Postmaster
*******

Group: Regulars
Posts: 2,034
Joined: Mon 30th Jul 2007, 1:31am
Member No.: 2,132



QUOTE(Gruntled @ Thu 10th March 2011, 6:05pm) *

QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm) *
One cluster is in Tampa, Florida and the other is in Amsterdam.

If they have servers in Amsterdam, and if you view a page it may have come from there, why isn't Wikipedia subject to Dutch law on copyright and responsibility for the contents of these pages?


Why do you assume it isn't?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
gomi
post Thu 10th March 2011, 6:36pm
Post #57


Member
********

Group: Members
Posts: 3,022
Joined: Fri 17th Nov 2006, 6:38pm
Member No.: 565



Does anyone want this disaster recovery/server config discussion split out from the CTO/Vibber talk?
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Zoloft
post Thu 10th March 2011, 7:26pm
Post #58


May we all find solace in our dreams.
******

Group: Regulars
Posts: 1,332
Joined: Fri 15th Jan 2010, 11:08pm
From: Erewhon
Member No.: 16,621



QUOTE(gomi @ Thu 10th March 2011, 10:36am) *

Does anyone want this disaster recovery/server config discussion split out from the CTO/Vibber talk?

That might be good.

Judging by the actual text of the Disaster Recovery Plan...
QUOTE
Disaster Recovery
On Brion's todo list


We might be working on the COOP/DRP more than the techs are... tongue.gif

User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Kelly Martin
post Thu 10th March 2011, 9:30pm
Post #59


Bring back the guttersnipes!
********

Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696



QUOTE(gomi @ Thu 10th March 2011, 12:36pm) *

Does anyone want this disaster recovery/server config discussion split out from the CTO/Vibber talk?
It's all part and parcel of the issue of how WMF (mis)manages technology.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
Cla68
post Thu 10th March 2011, 11:08pm
Post #60


Postmaster
*******

Group: Regulars
Posts: 1,763
Joined: Fri 18th Apr 2008, 5:53pm
Member No.: 5,761

WP user page - talk
check - contribs



QUOTE(Kelly Martin @ Thu 10th March 2011, 4:34am) *

QUOTE(Cla68 @ Wed 9th March 2011, 7:44pm) *
Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data?
Last I heard there was no comprehensive backup solution whatsoever. For quite a long while there were no complete replicas of anything that weren't in the Tampa data center, but I think they do now have database slaves in Amsterdam from which, in theory, new database masters could be created. The media collection (that is, all the pictures and other nontext digital assets that live in the assorted File: namespaces) is, as far as I know, not replicated in any systematic way, and the loss of the Tampa data center would probably destroy 60% to 90% of the content in Commons. (We can only hope.) As far as I know, there is no systematic offsite backup of any aspect of the environment; they are completely and utterly vulnerable. It makes me twitch just thinking about it.


You've got to be kidding me. Somehow, however, I'm not surprised.

To the moderator, perhaps we should split out the posts about the WMF's COOP/disaster recovery plan, or lack thereof. If I get a chance to ask some of the WMF leadership about this, I'd like to link to this conversation.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post

5 Pages V < 1 2 3 4 5 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

-   Lo-Fi Version Time is now: 22nd 5 13, 3:45am