Amazon S3 Failure Delivers Sanity Check

At sometime just before 12PM EDT the Amazon Simple Storage Service (or “S3″ for short) began to fail, causing major disruptions among applications that rely on the service for cheap and virtually unlimited storage. Amazon’s string of status updates throughout the day barely masks the complete-meltdown nature of the incident. Five hours later the service was restored.

For the uninitiated, S3 is a central part of Amazon’s family of service-based solutions and has been regarded as one of the most useful and reliable of the bunch. In short: it means that developers can offload storage of large or oft-requested files to a trusted third party without having to worry about costly bandwidth, hardware, and sys admins, and far more cheaply than content delivery networks (”CDN’s”) like Akamai. While Amazon charges based on usage of the service, stories abound with companies that save hundreds or thousands of dollars a month by shifting storage to S3.

Analysts have largely responded positively to the service, which originally grew out of Amazon’s internal infrastructure efforts and has come to symbolize the company as a major player in the emerging software-as-a-service landscape of so-called “utility computing.”

What’s noteworthy about this event is twofold: 1) the outage lasted over five hours, an eternity for such a service, and 2) the world barely noticed. Users of one of the highest profile services affected by the outage - Twitter - probably responded with a shrug given its recent performance problems. Ironically it was one of the supposedly most reliable components of Twitter’s architecture rather than the much-maligned Ruby on Rails framework that was to blame.

Most developers I’ve spoken with who’ve played with the service have been understandably reticent about introducing S3 into a production architecture. If nothing else, this incident demonstrates that utility computing services like S3 are not nearly ready to be used in mission critical applications and are best employed to supplement traditional storage solutions.

The lesson is clear: Keep those RAID arrays humming. For now.

Reblog this post [with Zemanta]

Leave a reply

You must be logged in to post a comment.