|
Open Source Diva named CTO |
|
|
|
|
Replies
Somey |
|
Can't actually moderate (or even post)
Group: Moderators
Posts: 11,816
Joined:
From: Dreamland
Member No.: 275
|
QUOTE(Alison @ Mon 7th March 2011, 10:04pm) QUOTE Maybe a little bit of both?
... or a little bit of neither? Let's be fair ... No, let's be unfair! (IMG: smilys0b23ax56/default/hrmph.gif) I actually agree with Ms. Martin about why Brion Vibber would be willing to return, given that he'll be reporting to Danese Cooper and not Erik Moeller... but Mr. Kohs has actually made an interesting point here, as he often does. Sometimes the WMF management reminds me of kids let loose in a candy store (when they're not reminding me of Lord of the Flies, at least). They have lots of money, no appreciable oversight, no definable performance metrics other than just their ability to keep the websites online... it sounds like loads of fun for an IT person, and a helluva lot better for a dev/DBA than the average corporate IT shop, consulting firm, or software VAR. That's not to say there wouldn't be some downside - in particular, the guilty-conscience thing could cause lack of sleep, what with knowing you were participating in a corrupt, socially-irresponsible fake-charity operation. But as it turns out, some people aren't really affected by that silly "conscience" stuff at all, so maybe Brion is one of the lucky ones.
|
|
|
|
Cla68 |
|
Postmaster
Group: Regulars
Posts: 1,763
Joined:
Member No.: 5,761
|
QUOTE(Somey @ Tue 8th March 2011, 7:24am)
I actually agree with Ms. Martin about why Brion Vibber would be willing to return, given that he'll be reporting to Danese Cooper and not Erik Moeller... but Mr. Kohs has actually made an interesting point here, as he often does. Sometimes the WMF management reminds me of kids let loose in a candy store (when they're not reminding me of Lord of the Flies, at least). They have lots of money, no appreciable oversight, no definable performance metrics other than just their ability to keep the websites online... it sounds like loads of fun for an IT person, and a helluva lot better for a dev/DBA than the average corporate IT shop, consulting firm, or software VAR.
Ms Cooper will have earned her salary the day the WMF's main server farm burns down or floods and the backup plan (COOP, or whatever you want to call it) kicks in with a flawless changeover which allows all the wiki-activists to continue trying to save the world using Wikipedia to continue their efforts without pause. Does anyone know where the WMF's main server farm is located and what their COOP plan dictates will happen if it does blow up?
|
|
|
|
Zoloft |
|
May we all find solace in our dreams.
Group: Regulars
Posts: 1,332
Joined:
From: Erewhon
Member No.: 16,621
|
QUOTE(Cla68 @ Wed 9th March 2011, 2:57pm) QUOTE(Somey @ Tue 8th March 2011, 7:24am) I actually agree with Ms. Martin about why Brion Vibber would be willing to return, given that he'll be reporting to Danese Cooper and not Erik Moeller... but Mr. Kohs has actually made an interesting point here, as he often does. Sometimes the WMF management reminds me of kids let loose in a candy store (when they're not reminding me of Lord of the Flies, at least). They have lots of money, no appreciable oversight, no definable performance metrics other than just their ability to keep the websites online... it sounds like loads of fun for an IT person, and a helluva lot better for a dev/DBA than the average corporate IT shop, consulting firm, or software VAR. Ms Cooper will have earned her salary the day the WMF's main server farm burns down or floods and the backup plan (COOP, or whatever you want to call it) kicks in with a flawless changeover which allows all the wiki-activists to continue trying to save the world using Wikipedia to continue their efforts without pause. Does anyone know where the WMF's main server farm is located and what their COOP plan dictates will happen if it does blow up? Well, a lot of information about their two server clusters is at the WikiTech wiki. One cluster is in Tampa, Florida and the other is in Amsterdam. I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues. Edit: Here's a Wikimedia Tech Blog entry about the failure of the automatic failover a year ago: QUOTE Due to an overheating problem in our European data center many of our servers turned off to protect themselves. As this impacted all Wikipedia and other projects access from European users, we were forced to move all user traffic to our Florida cluster, for which we have a standard quick failover procedure in place, that changes our DNS entries.
However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally. This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects.
We apologize for the inconvenience this has caused. Further Edit: They are right now provisioning a new data center in Ashburn, Virginia although I'm not sure if that's Wikimedia or Wikipedia or both. Hey! I found their Disaster Recovery Plan! *snicker heehee chortle gasp* This post has been edited by Zoloft:
|
|
|
|
Kelly Martin |
|
Bring back the guttersnipes!
Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696
|
QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm) One cluster is in Tampa, Florida and the other is in Amsterdam.
I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues. They cannot fail from the Tampa cluster to the Amsterdam cluster, not completely: they are using single-master MySQL replication and thus "there can be only one". If they lose the master database server in Tampa, nobody can edit until they get it back (or make a new one: there is supposedly some way to make a slave into a master, although I don't imagine doing so is a pretty process). What "fails over" are the squids and the PHP front ends, of which there are hundreds to make up for the fact that Mediawiki is remarkably slow and inefficient software. Wikimedia does not have a meaningful disaster recovery plan; rather, when things break they scurry about like mad trying to figure out how to recover from whatever it was that broke. They're fairly good at scurrying, though, so they usually get back up within a fairly short time, and what data losses they've had (and there have been several instances of data loss, for various reasons) have not been serious enough to generate significant upset.
|
|
|
|
Cla68 |
|
Postmaster
Group: Regulars
Posts: 1,763
Joined:
Member No.: 5,761
|
QUOTE(Kelly Martin @ Thu 10th March 2011, 1:30am) QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm) One cluster is in Tampa, Florida and the other is in Amsterdam.
I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues. They cannot fail from the Tampa cluster to the Amsterdam cluster, not completely: they are using single-master MySQL replication and thus "there can be only one". If they lose the master database server in Tampa, nobody can edit until they get it back (or make a new one: there is supposedly some way to make a slave into a master, although I don't imagine doing so is a pretty process). What "fails over" are the squids and the PHP front ends, of which there are hundreds to make up for the fact that Mediawiki is remarkably slow and inefficient software. Wikimedia does not have a meaningful disaster recovery plan; rather, when things break they scurry about like mad trying to figure out how to recover from whatever it was that broke. They're fairly good at scurrying, though, so they usually get back up within a fairly short time, and what data losses they've had (and there have been several instances of data loss, for various reasons) have not been serious enough to generate significant upset. If I understand you right, it sounds like software limitations prevent them from having a seamless hot-site transfer ability if the Tampa location goes belly-up. Hopefully their new CTO, the subject of this thread, is aware of this and is using some of those millions of dollars in donations to find a solution. Trying to plan and implement a permanent solution on the fly in response to an unforseen emergency probably isn't a very good idea. Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data? This post has been edited by Cla68:
|
|
|
|
Kelly Martin |
|
Bring back the guttersnipes!
Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696
|
QUOTE(Cla68 @ Wed 9th March 2011, 7:44pm) Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data? Last I heard there was no comprehensive backup solution whatsoever. For quite a long while there were no complete replicas of anything that weren't in the Tampa data center, but I think they do now have database slaves in Amsterdam from which, in theory, new database masters could be created. The media collection (that is, all the pictures and other nontext digital assets that live in the assorted File: namespaces) is, as far as I know, not replicated in any systematic way, and the loss of the Tampa data center would probably destroy 60% to 90% of the content in Commons. (We can only hope.) As far as I know, there is no systematic offsite backup of any aspect of the environment; they are completely and utterly vulnerable. It makes me twitch just thinking about it. One outage, a few years ago, was caused by the nonredundant NFS mount point that contained the Mediawiki code being used by all the PHP servers going poof. It took them something like four hours to recover from that, and that was done by grabbing another box, installing the requisite components on it, and scrabbling about for copies of the relevant bits from where ever they could found or reconstructed. Not by restoring a backup, as you'd expect to happen. A responsible operation would have (a) not had such a critical function being served nonredundantly and (b) had multiple forms of backups of the relevant servers in the event that all of them failed simultaneously (or some process caused all replicas to become corrupted). The Wikimedia server team has displayed significant cleverness in keeping Wikimedia running at all, but they are seriously lacking in methodology and discipline. I think part of their problem is that very few of their people are experienced in operational management; they're mainly developers and the like, and thus they're not experienced in thinking about all the things us professional sysadmins think about all the time. There's also a culture of "getting by with as little as possible", which made sense in the early years but they're flush with cash now and there's no reason to persist in running on a shoe string.
|
|
|
|
Cla68 |
|
Postmaster
Group: Regulars
Posts: 1,763
Joined:
Member No.: 5,761
|
QUOTE(Kelly Martin @ Thu 10th March 2011, 4:34am) QUOTE(Cla68 @ Wed 9th March 2011, 7:44pm) Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data? Last I heard there was no comprehensive backup solution whatsoever. For quite a long while there were no complete replicas of anything that weren't in the Tampa data center, but I think they do now have database slaves in Amsterdam from which, in theory, new database masters could be created. The media collection (that is, all the pictures and other nontext digital assets that live in the assorted File: namespaces) is, as far as I know, not replicated in any systematic way, and the loss of the Tampa data center would probably destroy 60% to 90% of the content in Commons. (We can only hope.) As far as I know, there is no systematic offsite backup of any aspect of the environment; they are completely and utterly vulnerable. It makes me twitch just thinking about it. You've got to be kidding me. Somehow, however, I'm not surprised. To the moderator, perhaps we should split out the posts about the WMF's COOP/disaster recovery plan, or lack thereof. If I get a chance to ask some of the WMF leadership about this, I'd like to link to this conversation.
|
|
|
|
Posts in this topic
anthony Open Source Diva named CTO Zoloft That BLP is very different today. It links to her ... anthony
That BLP is very different today. It links to her... Alison
That BLP is very different today. It links to her... Krimpet
And hey - she's a Mac gal ;)
The Open Sourc... Alison
[quote name='Alison' post='218736' date='Sat 30th... Krimpet
Smartypants!! :lol:
* [url=http://publi... GlassBeadGame
[quote name='Alison' post='218741' date='Sat 30th... Lar
And hey - she's a Mac gal ;)
Nobody's p... Zoloft
... She's had a long tech career, appears to ... thekohser She looks like a Napa Valley "real estate age... TungstenCarbide
She looks like a Napa Valley "real estate ag... Milton Roe
She looks like a Napa Valley "real estate a... Somey Seems to me they need someone whose OSS background... dogbiscuit
Seems to me they need someone whose OSS backgroun... Somey I'm confused, wasn't Erik chosen as the Go... GlassBeadGame
I'm confused, wasn't Erik chosen as the G... Jon Awbrey
I think the CTO has quite a challenge ahead. It c... Cock-up-over-conspiracy Social entropy typical of big time voluntary secto... TungstenCarbide
[quote name='Somey' post='218584' date='Fri 29th ... Kelly Martin I'm confused, wasn't Erik chosen as the Go... anthony
From the standpoint of her role as a CTO, I don... Kelly Martin
[quote name='Kelly Martin' post='218637' date='Fr... thekohser
Seems to me they need someone whose OSS backgroun... Somey Whoa, thread necromancy.
For this job, you want a... Jon Awbrey
Rather, she should take every possible opportunit... thekohser
Anyway, it's always best to give people a cha... Jon Awbrey
So who is this Danese Cooper?
I think she's... CharlotteWebb
I think she's related to [url=http://12.media... Milton Roe
I think she's related to [url=http://12.medi... Cock-up-over-conspiracy Just as an aside, I have to guess she is damned gr... CharlotteWebb
That D.B. actually never had a middle initial (th... Zoloft Editing Madeleine_(cake):
No sooner had I clicked ... It's the blimp, Frank
Editing Madeleine_(cake):
No sooner had I clicked... Zoloft
[quote name='Zoloft' post='269165' date='Wed 23rd... Somey Mmm, cake!
rfuupnlVLvs Kelly Martin That's not to say there wouldn't be some d... thekohser
Hey! I found their Disaster Recovery Plan... Gruntled One cluster is in Tampa, Florida and the other is ... anthony
One cluster is in Tampa, Florida and the other is... Gruntled
Why do you assume it isn't?
Because I find i... anthony
Because I find it amazing that to my knowledge, n... Gruntled
[quote name='Gruntled' post='270683' date='Fri 11... anthony
[quote name='Gruntled' post='270683' date='Fri 1... Gruntled
So, basically, you know of several people who wou... anthony
So, basically, you know of several people who wo... Cedric
[quote post='270637' date='Thu 10th March 2011, 6... anthony
I wonder if the fact that Brion Vibber is being r... Jon Awbrey I think this is a little more up our alley …... Kelly Martin I think part of it is that the two main reasons he... carbuncle Perhaps the timing of this has something to do wit... Zoloft Ultimately, if MediaWiki fails to move away from P... Kelly Martin Ultimately, if MediaWiki fails to move away from P... Zoloft
Ultimately, if MediaWiki fails to move away from ... Somey Properly managed, very low risk, really high upsid... Kelly Martin I know J2EE isn't bad, but they could do bette... Zoloft Although this is just trivia, I find it amusing th... gomi Does anyone want this disaster recovery/server con... Zoloft
Does anyone want this disaster recovery/server co... Kelly Martin
Does anyone want this disaster recovery/server co... Somey I doubt we'll ever know for certain, though Ms... melloden
I doubt we'll ever know for certain, though M... Kelly Martin And unfortunately, this is one reason why Wikipedi... Somey And unfortunately, this is one reason why Wikipedi... Milton Roe
A very good point, indeed. The technology that dr... EricBarbour If a software engineer had designed my Volvo in 20... EricBarbour And btw, the Mozilla developers know that Firefox ... thekohser Erik Moeller (yes, that Erik Moeller) makes it sou... Milton Roe
Erik Moeller (yes, [i]that Erik Moeller) makes it... mydog
Now that Danese has been lateralized, no doubt ...
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
| |