Printable Version of Topic

Click here to view this topic in its original format

_ MediaWiki Software _ When Wikipedia goes down

Posted by: thekohser

English Wikipedia (and Meta, and French Wikipedia, and Wikiversity, but strangely not Wikiquote) seems to have a server down right now. Anybody else notice this? Additionally, I notice that on the "error" page the Foundation has also craftily mentioned:

The Wikimedia Foundation is a non-profit organisation which hosts some of the most popular sites on the Internet, including Wikipedia. It has a constant need to purchase new hardware. If you would like to help, please donate.

I wonder if they might be actually planning downtime deliberately, in order to drive up contributions?

Posted by: michael

They're doing a poor job of planning it if they are...in my three years, I can only count about ten times it has gone down.

Posted by: luke

seems to be working again for the moment

Posted by: gomi

QUOTE(michael @ Mon 22nd September 2008, 10:38am) *

They're doing a poor job of planning it if they are...in my three years, I can only count about ten times it has gone down.

You must not be on very much. I see a transitory outage every week or two, with some more significant outage every few months.


Posted by: anthony

Take a look at the https://wikitech.leuksman.com/view/Server_admin_log (personally I'm subscribed to the RSS feed).

QUOTE

18:00 brion: things seem at least semi-working.

1. everything hung
2. suda had some kind of kernel crash
3. after reboot, it was found to have a couple flaky disks
4. brion hacked up MW config files to skip the NFS logging
5. mark set up an alternate /home NFS server


QUOTE

15:00 mark: Site down completely. Post-mortem:

1. Rob is untangling power cables in rack B2, and both asw-b2-pmtpa and asw3-pmtpa (in B4) lose power
2. Two racks unreachable, PyBal sees too many hosts down and won't depool more
3. Rob brings power to asw-b2-pmtpa back up, but connectivity loss to B4 is not noticed
4. Mark investigates why LVS isn't working, adjusts PyBal parameters, until PyBal pools not a single server
5. Apaches are unhappy about completely missing ES clusters
6. Connectivity loss to B4 discovered, restored
7. Site back online


There is an oversighted edit though, that read:

QUOTE

14:45 godwin/gardner: Prepare downtime donation message and take a hammer to a few hard drives.


Just kidding.

Posted by: michael

QUOTE(gomi @ Mon 22nd September 2008, 11:57am) *

You must not be on very much. I see a transitory outage every week or two, with some more significant outage every few months.


Really? It's hard for me to recall the last time i ever saw the "servers are down" message for an extended period of time, that's why I said that it must have only been about ten times. I have about 17,000 edits over two Wikipedias, so...

Posted by: cyofee

I see glitches and errors every once in a few weeks, but they go away after a refresh or two.

Posted by: wikiwhistle

QUOTE(michael @ Mon 22nd September 2008, 6:38pm) *

They're doing a poor job of planning it if they are...in my three years, I can only count about ten times it has gone down.


Really? I quite often get the message that they're 'experiencing technical problems' or whatever come up. It usually only lasts a very short time indeed tho. So it must be a matter of luck whether you're trying to edit/view at that time.

Posted by: LaraLove

When NawlinWiki was doing all his Grawp deletions the other month, the server was being locked down constantly. Only for a couple minutes at a time, but he was causing it repeatedly. I jump on Wikipedia sporadically, usually, and it seemed like every time I tried to make an edit, I was getting an error message. That's a pain in the ass.

Posted by: GlassBeadGame

QUOTE(anthony @ Mon 22nd September 2008, 1:01pm) *



There is an oversighted edit though, that read:

QUOTE

14:45 godwin/gardner: Prepare downtime donation message and take a hammer to a few hard drives.


Just kidding.


Well played. I was complete drawn into the idea that a log item was removed and didn't realize the gag until I had finished reading. Had a nice chuckle.

Posted by: Sylar

The English Wikipedia is extremely slow right now.

Edit: And, it's now down.

Posted by: lolwut

Pages are not loading for me at the moment.

Posted by: GlassBeadGame

Reminds of the Woody Allen joke of the two old ladies at the restaurant. First lady says "The food here is so bad." Her friend replies "Yes and the portions are so small."

Posted by: Eva Destruction

To be boring for a minute – usually when you get this message it means someone's doing an action that's locked the system (either the devs updating the software, or something which puts a major strain on the server). Renaming an account with a lot of edits will do it, as will deletion/restoration of pages with thousands of revisions – when Nawlinwiki was at his fearless-grawp-hunter peak, you'd get this all the time as he deleted and selective-restored what felt like every single page.

Posted by: lolwut

QUOTE(GlassBeadGame @ Fri 20th February 2009, 5:21pm) *

Reminds of the Woody Allen joke of the two old ladies at the restaurant. First lady says "The food here is so bad." Her friend replies "Yes and the portions are so small."

Were you referring to my post there? I do get the joke and understand the analogy, but I actually kinda have more of a love/hate relationship with WP rather than just outright hate.

QUOTE(Eva Destruction @ Fri 20th February 2009, 6:06pm) *

To be boring for a minute – usually when you get this message it means someone's doing an action that's locked the system (either the devs updating the software, or something which puts a major strain on the server). Renaming an account with a lot of edits will do it, as will deletion/restoration of pages with thousands of revisions – when Nawlinwiki was at his fearless-grawp-hunter peak, you'd get this all the time as he deleted and selective-restored what felt like every single page.

He's stopped doing that now, probably for good. But he was going for massive overkill. It was pretty funny to watch while it lasted.

Posted by: MZMcBride

QUOTE(Eva Destruction @ Fri 20th February 2009, 2:06pm) *

To be boring for a minute – usually when you get this message it means someone's doing an action that's locked the system (either the devs updating the software, or something which puts a major strain on the server). Renaming an account with a lot of edits will do it, as will deletion/restoration of pages with thousands of revisions – when Nawlinwiki was at his fearless-grawp-hunter peak, you'd get this all the time as he deleted and selective-restored what felt like every single page.


There are protections in place now that put large renames in the job queue and prevent deletions of pages with over 5,000 revisions.

Nowadays, when the site is down, it's nearly always related to a server issue of some sort, not something caused by an admin.

Posted by: lolwut

Wikipedia is slow as fuck tonight.

Posted by: Apathetic

QUOTE(lolwut @ Thu 2nd July 2009, 5:07pm) *

Wikipedia is slow as fuck tonight.

indeed !

Posted by: LaraLove

QUOTE(lolwut @ Thu 2nd July 2009, 5:07pm) *

Wikipedia is slow as fuck tonight.

Servers in the UK are down due to a power outage, so someone said in IRC. I thought all the servers were in the US.

Posted by: Eva Destruction

QUOTE(LaraLove @ Thu 2nd July 2009, 10:28pm) *

QUOTE(lolwut @ Thu 2nd July 2009, 5:07pm) *

Wikipedia is slow as fuck tonight.

Servers in the UK are down due to a power outage, so someone said in IRC. I thought all the servers were in the US.

http://techblog.wikimedia.org/2009/07/power-outage-in-wikimedias-european-servers/, apparently.

Posted by: Malleus

QUOTE(Eva Destruction @ Thu 2nd July 2009, 10:32pm) *
/]"European proxy caching cluster"[/url], apparently.

No problem then. Who the hell cares about those damned Europeans anyway?

BTW, where is Europe? Anywhere near Idaho?

Posted by: A Horse With No Name

QUOTE(Malleus @ Thu 2nd July 2009, 5:38pm) *

QUOTE(Eva Destruction @ Thu 2nd July 2009, 10:32pm) *
/]"European proxy caching cluster"[/url], apparently.

No problem then. Who the hell cares about those damned Europeans anyway?

BTW, where is Europe? Anywhere near Idaho?


Idaho is a fine place, Malley Baby. We should set up a Wikipedia Meet-up at Couer d'Alene -- a great place to hang out and watch the bikini babes hit the lake. evilgrin.gif

Posted by: Eva Destruction

If any Europeans are really desperate, https://secure.wikimedia.org/wikipedia/en/wiki/ still works fine.

Posted by: LaraLove

QUOTE(Eva Destruction @ Thu 2nd July 2009, 5:48pm) *

If any Europeans are really desperate, https://secure.wikimedia.org/wikipedia/en/wiki/ still works fine.

I'm apparently a European now. Is Europe anywhere near North Carolina? Wikipedia isn't working for me at all, so I'm having to use the secure server.

Actually, it just came up. But it was down for me for about an hour. First the Foundation's "We're sorry our shit's not working, give us money plz" message and then the standard time out page.

Posted by: EricBarbour

You're starting to see the true horror of keeping a major multi-colocated
database website up and going. It's like running LexisNexis, but on open-source
software and with no money and no hot backup. I'm amazed they don't have
major outages every day.

Posted by: sbrown

QUOTE(Eva Destruction @ Thu 2nd July 2009, 10:32pm) *

"European proxy caching cluster", apparently.

If those servers are in England can someone sue WMF for their contents under English law?

Posted by: Eva Destruction

QUOTE(sbrown @ Thu 2nd July 2009, 11:08pm) *

QUOTE(Eva Destruction @ Thu 2nd July 2009, 10:32pm) *

"European proxy caching cluster", apparently.

If those servers are in England can someone sue WMF for their contents under English law?

They're not in England (the clue's in the "European") – they're in Holland. Good luck trying to convince the Dutch of the evils of pornography.

Posted by: EricBarbour

Amazing how ignorant your guys are about your favorite "encyclopedia".

Some of the servers are in http://wikitech.wikimedia.org/view/Tampa_cluster, some are in http://wikitech.wikimedia.org/view/Kennisnet_cluster, and some are in http://wikitech.wikimedia.org/view/Yahoo_cluster.

PS--the http://wikitech.wikimedia.org/ hasn't been updated substantially in months/years, so there
are probably other servers elsewhere.

They use http://en.wikipedia.org/wiki/Squid_(software). A lot. There's probably no practical way in hell to figure out where
a chunk of data is in a system like this at any given moment.

Posted by: tarantino

QUOTE(sbrown @ Thu 2nd July 2009, 10:08pm) *

QUOTE(Eva Destruction @ Thu 2nd July 2009, 10:32pm) *

"European proxy caching cluster", apparently.

If those servers are in England can someone sue WMF for their contents under English law?

heh
http://blog.wikimedia.org/2009/06/22/evoswitch-helps-us-improve-project-access-in-europe-and-beyond/
June 22nd, 2009
Evoswitch has recently allowed the WMF free access to caching servers at their leading, 100% carbon neutral green data center in the Netherlands.

I don't think the WMF are using the servers in Seoul anymore.

Posted by: sbrown

QUOTE(Eva Destruction @ Thu 2nd July 2009, 11:24pm) *

They're not in England (the clue's in the "European") – they're in Holland. Good luck trying to convince the Dutch of the evils of pornography.

I was thinking more of libel and holding WMF responsible for whats on there (no sec 230).

Posted by: Eva Destruction

QUOTE(sbrown @ Fri 3rd July 2009, 12:20pm) *

QUOTE(Eva Destruction @ Thu 2nd July 2009, 11:24pm) *

They're not in England (the clue's in the "European") – they're in Holland. Good luck trying to convince the Dutch of the evils of pornography.

I was thinking more of libel and holding WMF responsible for whats on there (no sec 230).

US based company, so will be immune from overseas libel judgements under the Free Speech Protection Act (assuming it passes) unless the content would also be libellous in the US, so we're back where we started.

Posted by: CharlotteWebb

QUOTE(Eva Destruction @ Thu 2nd July 2009, 11:24pm) *

They're not in England (the clue's in the "European") – they're in Holland. Good luck trying to convince the Dutch of the evils of pornography.

And as far as copyright infringement goes, the People's Republic of Laos might be a good choice.

Posted by: emesee

heckers McHeckers, ya; Epic Lulz. confused.gif confused.gif rolleyes.gif rolleyes.gif smile.gif smile.gif wub.gif

confused.gif confused.gif evilgrin.gif smile.gif

you like my montage???????? wub.gif confused.gif

Posted by: Apathetic

slow as hell right now

Posted by: Nerd

QUOTE(Apathetic @ Thu 30th July 2009, 1:10am) *

slow as hell right now


Are you Emesee in disguise?

Posted by: Apathetic

QUOTE(Nerd @ Wed 29th July 2009, 8:13pm) *

QUOTE(Apathetic @ Thu 30th July 2009, 1:10am) *

slow as hell right now


Are you Emesee in disguise?


no, why do you say that?

down again! =\

http://lists.wikimedia.org/pipermail/wikitech-l/2009-July/044406.html
QUOTE

Hello,

Due to a problem in one of our core routers in our Tampa cluster we need
to perform some network maintenance tomorrow, Friday July 31st around
12:00 UTC. We will be performing a software upgrade and reboot of the
router. This should not take more than a few minutes if everything goes
well. Unfortunately this means that practically all sites and services
will be down during that time.

For those interested: one of the line cards in the router failed earlier
this week. A replacement has arrived, but does not boot up correctly
after hot plugging. Because we want to upgrade the firmware anyway, we
will reboot the entire box.

Cheers,

--
Mark Bergsma
System & Network Administrator, Wikimedia Foundation

Posted by: Kelly Martin

QUOTE(EricBarbour @ Thu 2nd July 2009, 4:55pm) *
You're starting to see the true horror of keeping a major multi-colocated database website up and going. It's like running LexisNexis, but on open-source software and with no money and no hot backup. I'm amazed they don't have major outages every day.
Indeed, it's amazing that they stay up as much as they do, given the number of single points of failure there are in their architecture. To be fair, many of them are forced by their choice to use mysql as a backend database, instead of a more robust product like, say, Oracle.

The router downtime that was mentioned above should have been avoidable if they had a proper hot-spare environment. But that costs a bit more money, and spending money on that would cut down on the size of Jimmy's castle fund.

QUOTE(EricBarbour @ Thu 2nd July 2009, 5:34pm) *
They use http://en.wikipedia.org/wiki/Squid_(software). A lot. There's probably no practical way in hell to figure out where a chunk of data is in a system like this at any given moment.
I think something like two-thirds of their servers are just Squid engines. MediaWiki is way too slow to generate all the pages that Wikimedia serves across all the projects in real time.

It constantly amuses me to hear PHP culties go on about how well PHP scales, citing Wikipedia as proof, when the reality is that PHP scales like crap and the WMF has to throw tons of hardware at their half-assed content engine just to keep up.

Posted by: sbrown

QUOTE(Apathetic @ Fri 31st July 2009, 2:01pm) *

QUOTE(Nerd @ Wed 29th July 2009, 8:13pm) *

Are you Emesee in disguise?

no, why do you say that?

Im sure its nothing personal. Accusing people of being each other is a popular sport here. biggrin.gif

Posted by: Apathetic

QUOTE(sbrown @ Sat 1st August 2009, 2:29am) *

QUOTE(Apathetic @ Fri 31st July 2009, 2:01pm) *

QUOTE(Nerd @ Wed 29th July 2009, 8:13pm) *

Are you Emesee in disguise?

no, why do you say that?

Im sure its nothing personal. Accusing people of being each other is a popular sport here. biggrin.gif

yea, I made a silly mis-call a while back.

Posted by: Apathetic

nvm

Posted by: SDJ

QUOTE(Apathetic @ Tue 22nd September 2009, 5:38pm) *

nvm

Yes. It's very strange, and something that's never happened to me before.

Posted by: Apathetic

QUOTE(SDJ @ Tue 22nd September 2009, 5:39pm) *

QUOTE(Apathetic @ Tue 22nd September 2009, 5:38pm) *

nvm

Yes. It's very strange, and something that's never happened to me before.


did you get the blank white page too?