| |
|
  |
Open Source Diva named CTO |
|
|
| anthony |
Tue 8th March 2011, 1:26pm
|
Postmaster
      
Group: Regulars
Posts: 2,034
Joined: Mon 30th Jul 2007, 1:31am
Member No.: 2,132

|
QUOTE(thekohser @ Tue 8th March 2011, 3:57am)  I wonder if the fact that Brion Vibber is being re-hired is any indication of: - Danese doesn't know enough about computers to do Brion's job
- Once you're a Wikimediot, you can't last long in the "outside" work world
Maybe a little bit of both? The roles of CTO and Lead Architect are very very different. It is extremely rare for someone to simultaneously be the best candidate for both positions. I think this position will be much better suited for Brion. Of course, I couldn't help but catching this nugget: "Since I joined WMF in February 2010, I have been looking for a Lead Architect to work on the future of the platform (both for our use and for the thousands of wikis that run on our engine)." (Translation: "both for our use and for the use of Wikia")
|
|
|
|
|
|
| Kelly Martin |
Tue 8th March 2011, 6:10pm
|
Bring back the guttersnipes!
       
Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696

|
QUOTE(Zoloft @ Tue 8th March 2011, 10:48am)  Ultimately, if MediaWiki fails to move away from PHP, this by itself could be what kills Wikipedia. Jimmy Wales, and by extension Wikimedia, is far too conservative to "gamble" on a platform change at this late stage of the game. In Wikipedia's ten years there has been almost no functional or technical change in the Wikipedia platform. Contrast Facebook, whose current platform bears absolutely no technical resemblance and fairly little functional resemblance to what it was like when it premiered.
|
|
|
|
|
|
| Zoloft |
Tue 8th March 2011, 6:52pm
|

May we all find solace in our dreams.
     
Group: Regulars
Posts: 1,332
Joined: Fri 15th Jan 2010, 11:08pm
From: Erewhon
Member No.: 16,621

|
QUOTE(Kelly Martin @ Tue 8th March 2011, 10:10am)  QUOTE(Zoloft @ Tue 8th March 2011, 10:48am)  Ultimately, if MediaWiki fails to move away from PHP, this by itself could be what kills Wikipedia. Jimmy Wales, and by extension Wikimedia, is far too conservative to "gamble" on a platform change at this late stage of the game. In Wikipedia's ten years there has been almost no functional or technical change in the Wikipedia platform. Contrast Facebook, whose current platform bears absolutely no technical resemblance and fairly little functional resemblance to what it was like when it premiered. In agreeing with you, I will differ to this extent; the WMF has enough cash to parallel-develop MediaWiki 2.0 while maintaining the 1.x line, then bringing up servers with the newer, faster, more scalable version and importing the database and links. Properly managed, very low risk, really high upside. I'm lazy enough to assume without even looking that no such development is actually planned.
|
|
|
|
|
|
| Somey |
Wed 9th March 2011, 4:38am
|

Can't actually moderate
        
Group: Moderators
Posts: 11,814
Joined: Sat 17th Jun 2006, 7:47pm
From: Dreamland
Member No.: 275

|
QUOTE(Zoloft @ Tue 8th March 2011, 12:52pm)  Properly managed, very low risk, really high upside. Ehh, it's that "properly managed" part that seems to be the real sticking point... I myself am sort of a programmer... It's an interesting question, to me, anyway, what programming language/platform they'd use if they rebuilt MediaWiki from the ground up to take advantage of the "latest technology." I know J2EE isn't bad, but they could do better than that, couldn't they? And isn't it the database (MySQL) that causes the real bottlenecks, as opposed to PHP, or is it the fact that they use both, or that they use an interpreted server-side language in the first place?
|
|
|
|
|
|
| Kelly Martin |
Wed 9th March 2011, 1:00pm
|
Bring back the guttersnipes!
       
Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696

|
QUOTE(Somey @ Tue 8th March 2011, 10:38pm)  I know J2EE isn't bad, but they could do better than that, couldn't they? And isn't it the database (MySQL) that causes the real bottlenecks, as opposed to PHP, or is it the fact that they use both, or that they use an interpreted server-side language in the first place?
Given that MediaWiki doesn't have any real need for a fully ACID compliant database, there is no good reason not to use something like HBase (which is what Facebook uses) for the database. The other big performance win would be to rewrite the parser; right now it's a crazy mess of regular expression abuse combined with an XML parser that ends up being expensive in both time and space. Recoding it using either traditional or more modern parsing techniques would likely be a big win on multiple fronts; however, doing so would likely require making some small changes to the markup language. MediaWiki markup is definitely not in LL(n) for any n, and I think it's also not in LR(n) for any n; also, the parser currently requires database access, as the correct parsing of some constructs is dependent on database content. Making a few minor changes to the language "specification" (there really isn't one, just a reference implementation) would avoid both of these problems and make writing a proper parser much easier (that is, possible), but there is considerable reticence to making any change that would "break" Wikipedia.
|
|
|
|
|
|
| Cla68 |
Wed 9th March 2011, 10:57pm
|
Postmaster
      
Group: Regulars
Posts: 1,763
Joined: Fri 18th Apr 2008, 5:53pm
Member No.: 5,761
WP user page -
talk
check -
contribs

|
QUOTE(Somey @ Tue 8th March 2011, 7:24am) 
I actually agree with Ms. Martin about why Brion Vibber would be willing to return, given that he'll be reporting to Danese Cooper and not Erik Moeller... but Mr. Kohs has actually made an interesting point here, as he often does. Sometimes the WMF management reminds me of kids let loose in a candy store (when they're not reminding me of Lord of the Flies, at least). They have lots of money, no appreciable oversight, no definable performance metrics other than just their ability to keep the websites online... it sounds like loads of fun for an IT person, and a helluva lot better for a dev/DBA than the average corporate IT shop, consulting firm, or software VAR.
Ms Cooper will have earned her salary the day the WMF's main server farm burns down or floods and the backup plan (COOP, or whatever you want to call it) kicks in with a flawless changeover which allows all the wiki-activists to continue trying to save the world using Wikipedia to continue their efforts without pause. Does anyone know where the WMF's main server farm is located and what their COOP plan dictates will happen if it does blow up?
|
|
|
|
|
|
| Zoloft |
Wed 9th March 2011, 11:20pm
|

May we all find solace in our dreams.
     
Group: Regulars
Posts: 1,332
Joined: Fri 15th Jan 2010, 11:08pm
From: Erewhon
Member No.: 16,621

|
QUOTE(Cla68 @ Wed 9th March 2011, 2:57pm)  QUOTE(Somey @ Tue 8th March 2011, 7:24am)  I actually agree with Ms. Martin about why Brion Vibber would be willing to return, given that he'll be reporting to Danese Cooper and not Erik Moeller... but Mr. Kohs has actually made an interesting point here, as he often does. Sometimes the WMF management reminds me of kids let loose in a candy store (when they're not reminding me of Lord of the Flies, at least). They have lots of money, no appreciable oversight, no definable performance metrics other than just their ability to keep the websites online... it sounds like loads of fun for an IT person, and a helluva lot better for a dev/DBA than the average corporate IT shop, consulting firm, or software VAR. Ms Cooper will have earned her salary the day the WMF's main server farm burns down or floods and the backup plan (COOP, or whatever you want to call it) kicks in with a flawless changeover which allows all the wiki-activists to continue trying to save the world using Wikipedia to continue their efforts without pause. Does anyone know where the WMF's main server farm is located and what their COOP plan dictates will happen if it does blow up? Well, a lot of information about their two server clusters is at the WikiTech wiki. One cluster is in Tampa, Florida and the other is in Amsterdam. I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues. Edit: Here's a Wikimedia Tech Blog entry about the failure of the automatic failover a year ago: QUOTE Due to an overheating problem in our European data center many of our servers turned off to protect themselves. As this impacted all Wikipedia and other projects access from European users, we were forced to move all user traffic to our Florida cluster, for which we have a standard quick failover procedure in place, that changes our DNS entries.
However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally. This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects.
We apologize for the inconvenience this has caused. Further Edit: They are right now provisioning a new data center in Ashburn, Virginia although I'm not sure if that's Wikimedia or Wikipedia or both. Hey! I found their Disaster Recovery Plan! *snicker heehee chortle gasp* This post has been edited by Zoloft: Wed 9th March 2011, 11:42pm
|
|
|
|
|
|
| Kelly Martin |
Thu 10th March 2011, 1:30am
|
Bring back the guttersnipes!
       
Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696

|
QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm)  One cluster is in Tampa, Florida and the other is in Amsterdam.
I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues. They cannot fail from the Tampa cluster to the Amsterdam cluster, not completely: they are using single-master MySQL replication and thus "there can be only one". If they lose the master database server in Tampa, nobody can edit until they get it back (or make a new one: there is supposedly some way to make a slave into a master, although I don't imagine doing so is a pretty process). What "fails over" are the squids and the PHP front ends, of which there are hundreds to make up for the fact that Mediawiki is remarkably slow and inefficient software. Wikimedia does not have a meaningful disaster recovery plan; rather, when things break they scurry about like mad trying to figure out how to recover from whatever it was that broke. They're fairly good at scurrying, though, so they usually get back up within a fairly short time, and what data losses they've had (and there have been several instances of data loss, for various reasons) have not been serious enough to generate significant upset.
|
|
|
|
|
|
| Cla68 |
Thu 10th March 2011, 1:44am
|
Postmaster
      
Group: Regulars
Posts: 1,763
Joined: Fri 18th Apr 2008, 5:53pm
Member No.: 5,761
WP user page -
talk
check -
contribs

|
QUOTE(Kelly Martin @ Thu 10th March 2011, 1:30am)  QUOTE(Zoloft @ Wed 9th March 2011, 5:20pm)  One cluster is in Tampa, Florida and the other is in Amsterdam.
I don't know anything about their COOP plan, but I've seen them manually fail over from one cluster to the other recently when they had bandwidth issues. They cannot fail from the Tampa cluster to the Amsterdam cluster, not completely: they are using single-master MySQL replication and thus "there can be only one". If they lose the master database server in Tampa, nobody can edit until they get it back (or make a new one: there is supposedly some way to make a slave into a master, although I don't imagine doing so is a pretty process). What "fails over" are the squids and the PHP front ends, of which there are hundreds to make up for the fact that Mediawiki is remarkably slow and inefficient software. Wikimedia does not have a meaningful disaster recovery plan; rather, when things break they scurry about like mad trying to figure out how to recover from whatever it was that broke. They're fairly good at scurrying, though, so they usually get back up within a fairly short time, and what data losses they've had (and there have been several instances of data loss, for various reasons) have not been serious enough to generate significant upset. If I understand you right, it sounds like software limitations prevent them from having a seamless hot-site transfer ability if the Tampa location goes belly-up. Hopefully their new CTO, the subject of this thread, is aware of this and is using some of those millions of dollars in donations to find a solution. Trying to plan and implement a permanent solution on the fly in response to an unforseen emergency probably isn't a very good idea. Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data? This post has been edited by Cla68: Thu 10th March 2011, 1:51am
|
|
|
|
|
|
| Kelly Martin |
Thu 10th March 2011, 4:34am
|
Bring back the guttersnipes!
       
Group: Regulars
Posts: 3,270
Joined: Sun 22nd Jun 2008, 4:41am
From: EN61bw
Member No.: 6,696

|
QUOTE(Cla68 @ Wed 9th March 2011, 7:44pm)  Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data? Last I heard there was no comprehensive backup solution whatsoever. For quite a long while there were no complete replicas of anything that weren't in the Tampa data center, but I think they do now have database slaves in Amsterdam from which, in theory, new database masters could be created. The media collection (that is, all the pictures and other nontext digital assets that live in the assorted File: namespaces) is, as far as I know, not replicated in any systematic way, and the loss of the Tampa data center would probably destroy 60% to 90% of the content in Commons. (We can only hope.) As far as I know, there is no systematic offsite backup of any aspect of the environment; they are completely and utterly vulnerable. It makes me twitch just thinking about it. One outage, a few years ago, was caused by the nonredundant NFS mount point that contained the Mediawiki code being used by all the PHP servers going poof. It took them something like four hours to recover from that, and that was done by grabbing another box, installing the requisite components on it, and scrabbling about for copies of the relevant bits from where ever they could found or reconstructed. Not by restoring a backup, as you'd expect to happen. A responsible operation would have (a) not had such a critical function being served nonredundantly and (b) had multiple forms of backups of the relevant servers in the event that all of them failed simultaneously (or some process caused all replicas to become corrupted). The Wikimedia server team has displayed significant cleverness in keeping Wikimedia running at all, but they are seriously lacking in methodology and discipline. I think part of their problem is that very few of their people are experienced in operational management; they're mainly developers and the like, and thus they're not experienced in thinking about all the things us professional sysadmins think about all the time. There's also a culture of "getting by with as little as possible", which made sense in the early years but they're flush with cash now and there's no reason to persist in running on a shoe string.
|
|
|
|
|
|
| Cla68 |
Thu 10th March 2011, 11:08pm
|
Postmaster
      
Group: Regulars
Posts: 1,763
Joined: Fri 18th Apr 2008, 5:53pm
Member No.: 5,761
WP user page -
talk
check -
contribs

|
QUOTE(Kelly Martin @ Thu 10th March 2011, 4:34am)  QUOTE(Cla68 @ Wed 9th March 2011, 7:44pm)  Are they doing complete daily data dumps between the two sites so that if they lose Tampa they would only lose one day of data? Last I heard there was no comprehensive backup solution whatsoever. For quite a long while there were no complete replicas of anything that weren't in the Tampa data center, but I think they do now have database slaves in Amsterdam from which, in theory, new database masters could be created. The media collection (that is, all the pictures and other nontext digital assets that live in the assorted File: namespaces) is, as far as I know, not replicated in any systematic way, and the loss of the Tampa data center would probably destroy 60% to 90% of the content in Commons. (We can only hope.) As far as I know, there is no systematic offsite backup of any aspect of the environment; they are completely and utterly vulnerable. It makes me twitch just thinking about it. You've got to be kidding me. Somehow, however, I'm not surprised. To the moderator, perhaps we should split out the posts about the WMF's COOP/disaster recovery plan, or lack thereof. If I get a chance to ask some of the WMF leadership about this, I'd like to link to this conversation.
|
|
|
|
|
|
|
  |
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:
| |