<<< Previous speaker next speaker >>>

Jason McHugh

 Jason  McHugh

Jason McHugh is a principal engineer at Amazon where he works on S3 - Amazon's Simple Storage Service. Since joining Amazon in 2004 Jason has been fortunate enough to work on many interesting and challenging problems. For the last 3 years Jason has worked in Amazon web services where he has enjoyed learning more about big distributed systems, data storage technologies, strong and weak consistency models, epidemic protocols and more.

Prior to joining Amazon Jason spent five years at a bay area startup (www.there.com) working on 3D graphics and distributed systems. Jason holds a BS degree in Computer Science from Boston College and a MS and PhD with a focus on database systems from Stanford.

Presentation: "Amazon S3: Architecting for Resiliency in the Face of Failures"

Time: Friday 16:15 - 17:15

Location: Metropolitan Ballroom

Abstract: Amazon's Simple Storage Service (S3) provides a web services interface to durably store and retrieve any amount of data. S3 servers are located on multiple continents and across many different data center facilities. From time to time these data centers or components within the data centers fail in unusual and spectacular ways. A rigorously designed and implemented system makes these failures largely unnoticeable to our users. This talk will focus on the approaches taken by S3 in architecting our systems to survive component, machine, and data center failures. We will begin by categorizing and classifying the different types of failures and evaluating which of these we are most concerned about. We will discuss some general design principles used by S3 to mitigate failures. We will conclude with a description of specific real-world failures and how the design of the system either mitigated the failure or compounded it.