Track: The Art of Chaos Engineering

Location: Ballroom BC

Day of week: Wednesday

Chaos Engineering is an emerging discipline, but the underlying concepts are not. Failure is going to happen - Are you ready? Put simply, Chaos Engineering is one approach to “breaking things on purpose” that teaches us new information about our systems through experimentation. By triggering incidents intentionally in a controlled way, we gain confidence that our systems can deal with those failures before they occur in production. Come learn from those just starting this journey as well as the experts pushing the state of the art. We will hear war stories from those putting out the fires in the middle of the night, as well as those starting the fires during the day! In the end we’ll learn how to build systems and organizations that improve in the face of failure.

Track Host:
Kolton Andrus
Founder of Gremlin Inc, former Netflix

Kolton is the founder of Gremlin Inc - helping companies build more robust services. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT: Netflix’s failure injection service. Prior he improved the performance and reliability of the Amazon Retail website. At both companies he has served as a ‘Call Leader’, managing the resolution of company-wide incidents. Kolton is passionate about building resilient systems, primarily as it lets him break things for fun and profit.

10:35am - 11:25am

by Adrian Cockcroft
VP Cloud Architecture Strategy @AWSCloud & Microservices Pioneer

Perfectly engineered resilient systems may be broken by confused operators when they behave differently in response to underlying failures. Highly available applications need to be resilient to failures in infrastructure, networks, applications and operators. Chaos engineering is needed to exercise the incident handling mechanisms at every level, including people and processes. This talk will look at best practices and challenges in getting to a chaos...

11:50am - 12:40pm

Abstract Coming Soon

1:40pm - 2:30pm

by Dave Hahn
Sr SRE, Reliability and Chaos Engineering @Netflix

Netflix is a strong believer in Chaos Engineering and the Velocity of Innovation. Most of the time, our customers never notice the former and appreciate the latter. Occasionally however…

Can not connect to Netflix. You press play and it doesn't work. You can't log in. Nothing is on the screen and Stranger Things Season 2 just released!

A behind the scenes look at how Netflix engineering teams think about failure. The tools,...

2:55pm - 3:45pm

Abstract Coming Soon

5:25pm - 6:15pm

Abstract Coming Soon