|
|
|
When Wikipedia goes down |
|
|
anthony |
|
Postmaster
Group: Regulars
Posts: 2,034
Joined:
Member No.: 2,132
|
Take a look at the server admin log (personally I'm subscribed to the RSS feed). QUOTE 18:00 brion: things seem at least semi-working.
1. everything hung 2. suda had some kind of kernel crash 3. after reboot, it was found to have a couple flaky disks 4. brion hacked up MW config files to skip the NFS logging 5. mark set up an alternate /home NFS server
QUOTE 15:00 mark: Site down completely. Post-mortem:
1. Rob is untangling power cables in rack B2, and both asw-b2-pmtpa and asw3-pmtpa (in B4) lose power 2. Two racks unreachable, PyBal sees too many hosts down and won't depool more 3. Rob brings power to asw-b2-pmtpa back up, but connectivity loss to B4 is not noticed 4. Mark investigates why LVS isn't working, adjusts PyBal parameters, until PyBal pools not a single server 5. Apaches are unhappy about completely missing ES clusters 6. Connectivity loss to B4 discovered, restored 7. Site back online
There is an oversighted edit though, that read: QUOTE 14:45 godwin/gardner: Prepare downtime donation message and take a hammer to a few hard drives.
Just kidding.
|
|
|
|
lolwut |
|
Photobucket staff are Marxists.
Group: Regulars
Posts: 571
Joined:
Member No.: 6,235
|
QUOTE(GlassBeadGame @ Fri 20th February 2009, 5:21pm) Reminds of the Woody Allen joke of the two old ladies at the restaurant. First lady says "The food here is so bad." Her friend replies "Yes and the portions are so small."
Were you referring to my post there? I do get the joke and understand the analogy, but I actually kinda have more of a love/hate relationship with WP rather than just outright hate. QUOTE(Eva Destruction @ Fri 20th February 2009, 6:06pm) To be boring for a minute – usually when you get this message it means someone's doing an action that's locked the system (either the devs updating the software, or something which puts a major strain on the server). Renaming an account with a lot of edits will do it, as will deletion/restoration of pages with thousands of revisions – when Nawlinwiki was at his fearless-grawp-hunter peak, you'd get this all the time as he deleted and selective-restored what felt like every single page.
He's stopped doing that now, probably for good. But he was going for massive overkill. It was pretty funny to watch while it lasted.
|
|
|
|
EricBarbour |
|
blah
Group: Regulars
Posts: 5,919
Joined:
Member No.: 5,066
|
Amazing how ignorant your guys are about your favorite "encyclopedia". Some of the servers are in Tampa, some are in Amsterdam, and some are in Seoul. PS--the Wikitech hasn't been updated substantially in months/years, so there are probably other servers elsewhere. They use Squid. A lot. There's probably no practical way in hell to figure out where a chunk of data is in a system like this at any given moment.
|
|
|
|
tarantino |
|
the Dude abides
Group: Regulars
Posts: 1,441
Joined:
Member No.: 2,143
|
QUOTE(sbrown @ Thu 2nd July 2009, 10:08pm) QUOTE(Eva Destruction @ Thu 2nd July 2009, 10:32pm) "European proxy caching cluster", apparently.
If those servers are in England can someone sue WMF for their contents under English law? hehEvoswitch helps us improve project access in Europe and beyondJune 22nd, 2009 Evoswitch has recently allowed the WMF free access to caching servers at their leading, 100% carbon neutral green data center in the Netherlands. I don't think the WMF are using the servers in Seoul anymore.
|
|
|
|
emesee |
|
ban me
Group: Tanked
Posts: 764
Joined:
From: aww
Member No.: 8,586
|
|
|
|
|
Apathetic |
|
Ãœber Member
Group: Regulars
Posts: 594
Joined:
Member No.: 7,383
|
QUOTE(Nerd @ Wed 29th July 2009, 8:13pm) QUOTE(Apathetic @ Thu 30th July 2009, 1:10am) slow as hell right now
Are you Emesee in disguise? no, why do you say that? down again! =\ http://lists.wikimedia.org/pipermail/wikit...uly/044406.htmlQUOTE Hello,
Due to a problem in one of our core routers in our Tampa cluster we need to perform some network maintenance tomorrow, Friday July 31st around 12:00 UTC. We will be performing a software upgrade and reboot of the router. This should not take more than a few minutes if everything goes well. Unfortunately this means that practically all sites and services will be down during that time.
For those interested: one of the line cards in the router failed earlier this week. A replacement has arrived, but does not boot up correctly after hot plugging. Because we want to upgrade the firmware anyway, we will reboot the entire box.
Cheers,
-- Mark Bergsma System & Network Administrator, Wikimedia Foundation
This post has been edited by Apathetic:
|
|
|
|
Kelly Martin |
|
Bring back the guttersnipes!
Group: Regulars
Posts: 3,270
Joined:
From: EN61bw
Member No.: 6,696
|
QUOTE(EricBarbour @ Thu 2nd July 2009, 4:55pm) You're starting to see the true horror of keeping a major multi-colocated database website up and going. It's like running LexisNexis, but on open-source software and with no money and no hot backup. I'm amazed they don't have major outages every day. Indeed, it's amazing that they stay up as much as they do, given the number of single points of failure there are in their architecture. To be fair, many of them are forced by their choice to use mysql as a backend database, instead of a more robust product like, say, Oracle. The router downtime that was mentioned above should have been avoidable if they had a proper hot-spare environment. But that costs a bit more money, and spending money on that would cut down on the size of Jimmy's castle fund. QUOTE(EricBarbour @ Thu 2nd July 2009, 5:34pm) They use Squid. A lot. There's probably no practical way in hell to figure out where a chunk of data is in a system like this at any given moment. I think something like two-thirds of their servers are just Squid engines. MediaWiki is way too slow to generate all the pages that Wikimedia serves across all the projects in real time. It constantly amuses me to hear PHP culties go on about how well PHP scales, citing Wikipedia as proof, when the reality is that PHP scales like crap and the WMF has to throw tons of hardware at their half-assed content engine just to keep up.
|
|
|
|
sbrown |
|
Senior Member
Group: Inactive
Posts: 441
Joined:
Member No.: 11,840
|
QUOTE(Apathetic @ Fri 31st July 2009, 2:01pm) QUOTE(Nerd @ Wed 29th July 2009, 8:13pm) Are you Emesee in disguise?
no, why do you say that? Im sure its nothing personal. Accusing people of being each other is a popular sport here. (IMG: smilys0b23ax56/default/biggrin.gif)
|
|
|
|
Apathetic |
|
Ãœber Member
Group: Regulars
Posts: 594
Joined:
Member No.: 7,383
|
QUOTE(sbrown @ Sat 1st August 2009, 2:29am) QUOTE(Apathetic @ Fri 31st July 2009, 2:01pm) QUOTE(Nerd @ Wed 29th July 2009, 8:13pm) Are you Emesee in disguise?
no, why do you say that? Im sure its nothing personal. Accusing people of being each other is a popular sport here. (IMG: smilys0b23ax56/default/biggrin.gif) yea, I made a silly mis-call a while back.
|
|
|
|
|
|
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members:
| |