6

Why Facebook Fell Apart

Facebook went dark for more than two and a half hours yesterday. It is the biggest service outage in more than 4 years. Like buttons disappeared from all over the web. And Gene Weingarten rejoiced.

Before engineers in Palo Alto could finally fix what ailed the world's biggest social network they actually had to shut the whole thing down and reboot. So what went wrong?

Robert Johnson, Facebook's director of software engineering, explained it all here.
Charles Arthur at the Guardian did a nice job decoding it. And for the truly slothful I've attempted to create this plain English crib sheet.

With more than 500 million members uploading pictures, status updates and other blather everyday, Facebook has enormous stores of data it needs to back up and cache. And like any big data storage operation it can't simply rely on one set of servers to do the job.

So Facebook built a network of servers around what Johnson calls a "persistent store." Think of it like the hub on a bicycle wheel - with spokes connecting it to Facebook's other servers. When something goes wrong on a satellite server out there on the rim of the wheel they are programmed to check in with the hub for a fix.

But yesterday, Facebook's engineers inserted a bad piece of code into the heart of the machine. It was uploaded by all the servers around the rim of the wheel - and when the code didn't work the servers followed instructions and queried the hub asking for a fix...again and again and again.

This whole mess accelerated until the wheels came off, Facebook crashed, and the engineers at Palo Alto had to put the axle back together again and reboot.

About the author

Steve Henn was Marketplace’s technology and innovation reporter for the entire portfolio of Marketplace programs until December 2011.
jumpingkokanee's picture
jumpingkokanee - Sep 27, 2010

It would be nice if the FRONT of the faceBOOK or top of my page, wall, profile etc.
a nice contrasting easy to see NOTICE saying:

"We are experiencing technical difficulties don't waste your time continuing to be frustrated...you are just wasting your time and our ability to solve the problem....come back later and log on after this notice disappears"

Patrick Provart's picture
Patrick Provart - Sep 26, 2010

This particular "Non-critical social network site" is also the way my local school district has chosen to send mass communication to concerned parents, and the local utility uses it in a similar fashion.

I do use Facebook mostly for fun -- but not everything it does is frivolous. . .

Steve Henn's picture
Steve Henn - Sep 27, 2010

I'm fascinated that your school system and your power company are using Facebook to reach out. Do they offer other alternatives? There are lots of people out there with excellent reasons not to join Facebook, for example anyone who has escaped an abusive relationship. So I'd love to hear more.

But I have to admit, when I first talked to my producer about how we should cover this story I was in Sarah and Eben's camp. Really how important is it? Does it matter much that folks were deprived of status updates for a few hours?

But I was also curious about what happened. I wanted to know a bit more about how this giant machine, aka Facebook, worked and why it fell apart. Ultimately I think that kind of curiousiy drives what we are trying to do.

So covering this story on the blog made sense to me. Folks who where just as curious could get a clue - if they wanted to dig deeper there were links to click.

For everyone else, I recommend Gene Weingarten's hysterical column in the Washington Post. In it he describe’s Facebook as an "ocean of banalities shared among persons with lives so empty they echo."

Sarosh Sepai's picture
Sarosh Sepai - Sep 24, 2010

But again, this was just a "non-critical" social networking website that millions of people spend time doing worthless, sedentary, non-productive gossip.....not the NYPD's security system that would affect millions of lives....pun intended.

Eben's picture
Eben - Sep 24, 2010

Ok. So? A big machine broke down and had to be repaired. Was anyone's life at risk? Did anyone lose their homes, or their jobs, because of it? I expect more from Marketplace; I expect at all times that issues will be kept in perspective.

Deborah's picture
Deborah - Sep 24, 2010

It's more than just a big machine and it impacts millions of people. I think it is a valid story.