Presentation: "Amazon S3: Architecting for Resiliency in the Face of Massive Load"

Time: Wednesday 16:50 - 17:50

Location: Franciscan I & II

Abstract: Amazon's Simple Storage Service (S3) provides a key-value-oriented interface to an infinitely scalable and durable storage system.  One of the design requirements for S3 was to handle wildly varying access patterns initiated by our users.  Maintaining availability when experiencing unexpected, unusual, and often massive load is a difficult problem.  Nevertheless, a rigorously designed and implemented system will not experience widespread unavailability in these situations.   This session will focus on the approaches taken by S3 in architecting its systems to survive unexpected and overwhelming load.  The session begins with a generic description of a common service architecture and then discusses the potential failure points within the design.  It describes the necessary software and technology that all components in a service-oriented architecture must have as basic building blocks.  Then, for each layer in the system, it presents architectures which mitigate the availability impact of overload scenarios.  It concludes with a last-resort mechanism that all services must use to handle any remaining scenarios.

Jason McHugh, Senior Principal Engineer at Amazon

 Jason  McHugh

Jason McHugh is a senior principal engineer at Amazon where he works on S3 - Amazon's Simple Storage Service. Since joining Amazon in 2004 Jason has been fortunate enough to work on many interesting and challenging problems. For the last 3 years Jason has worked in Amazon web services where he has enjoyed learning more about big distributed systems, data storage technologies, strong and weak consistency models, epidemic protocols and more.


Prior to joining Amazon Jason spent five years at a bay area startup ( working on 3D graphics and distributed systems. Jason holds a BS degree in Computer Science from Boston College and a MS and PhD with a focus on database systems from Stanford.