Track: The Art of Chaos Engineering

Location: Ballroom BC

Day of week: Wednesday

Chaos Engineering is an emerging discipline, but the underlying concepts are not. Failure is going to happen - Are you ready? Put simply, Chaos Engineering is one approach to “breaking things on purpose” that teaches us new information about our systems through experimentation. By triggering incidents intentionally in a controlled way, we gain confidence that our systems can deal with those failures before they occur in production. Come learn from those just starting this journey as well as the experts pushing the state of the art. We will hear war stories from those putting out the fires in the middle of the night, as well as those starting the fires during the day! In the end we’ll learn how to build systems and organizations that improve in the face of failure.

Track Host:
Kolton Andrus
Founder of Gremlin Inc, former Netflix

Kolton is the founder of Gremlin - helping companies build more robust services. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT: Netflix’s failure injection service. Prior he improved the performance and reliability of the Amazon Retail website. At both companies he has served as a ‘Call Leader’, managing the resolution of company-wide incidents. Kolton is passionate about building resilient systems, primarily as it lets him break things for fun and profit.

10:35am - 11:25am

by Willie Wheeler
Principal Application Engineer @Expedia

by Sahar Samiei
Senior Product Manager @Expedia

Those coming from product-driven organizations—where product features are often prioritized over resiliency-related concerns—will understand how challenging it can be to convince teams to do resiliency work. In this presentation we’ll share Expedia’s resiliency journey, starting with resiliency as an afterthought and progressing toward resiliency as a first-class concern. Attendees will learn about the importance of partnering with the teams experiencing operational struggles, and equipping...

11:50am - 12:40pm

by Nathan Äschbacher
Chief Technology Officer @PolySync

As the complexity and criticality of our software systems is rapidly increasing; our ability and available methodologies to ensure their determinism and correctness are often nascent or sometimes even non-existent. We see the effects of this paradox as we advance the role and responsibility of software in society. Often the evidence is observed in service outages, security breaches, financial market "flash crashes", and now the ever shortening length of time between the development and...

1:40pm - 2:30pm

by Dave Hahn
Sr SRE, Reliability and Chaos Engineering @Netflix

Netflix is a strong believer in Chaos Engineering and the Velocity of Innovation. Most of the time, our customers never notice the former and appreciate the latter. Occasionally however…
Can not connect to Netflix. You press play and it doesn't work. You can't log in. Nothing is on the screen and Stranger Things Season 2 just released!
A behind the scenes look at how Netflix engineering teams think about failure. The tools, techniques, and training we use...

2:55pm - 3:45pm

by Kolton Andrus
Founder of Gremlin Inc, former Netflix

by Willie Wheeler
Principal Application Engineer @Expedia

by Sahar Samiei
Senior Product Manager @Expedia

by Nathan Äschbacher
Chief Technology Officer @PolySync

by Dave Hahn
Sr SRE, Reliability and Chaos Engineering @Netflix

by Adrian Cockcroft
VP Cloud Architecture Strategy @AWSCloud & Microservices Pioneer

by Heather Nakama
Software Engineer @Microsoft - Azure Search

4:10pm - 5:00pm

by Adrian Cockcroft
VP Cloud Architecture Strategy @AWSCloud & Microservices Pioneer

Perfectly engineered resilient systems may be broken by confused operators when they behave differently in response to underlying failures. Highly available applications need to be resilient to failures in infrastructure, networks, applications and operators. Chaos engineering is needed to exercise the incident handling mechanisms at every level, including people and processes. This talk will look at best practices and challenges in getting to a chaos...

5:25pm - 6:15pm

by Heather Nakama
Software Engineer @Microsoft - Azure Search

As the systems that support internet-scale services grow larger and ever more complex, chaos engineering has emerged as industry best practice for ensuring system resiliency. Many companies maintain entire teams devoted to chaos testing their product. But what can you do if you don't have these kinds of resources to devote to the problem? How can you get started with chaos engineering without hiring an entire team of experts?

...

.

Tracks

  • Architectures You've Always Wondered About

    Architectural practices from the world's most well-known properties, featuring startups, massive scale, evolving architectures, and software tools used by nearly all of us.

  • Going Serverless

    Learn about the state of Serverless & how to successfully leverage it! Lessons learned in the track hit on security, scalability, IoT, and offer warnings to watch out for.

  • Microservices: Patterns and Practices

    Stories of success and failure building modern Microservices, including event sourcing, reactive, decomposition, & more.

  • DevOps: You Build It, You Run It

    Pushing DevOps beyond adoption into cultural change. Hear about designing resilience, managing alerting, CI/CD lessons, & security. Features lessons from open source, Linkedin, Netflix, Financial Times, & more. 

  • The Art of Chaos Engineering

    Failure is going to happen - Are you ready? Chaos engineering is an emerging discipline - What is the state of the art?

  • The Whole Engineer

    Success as an engineer is more than writing code. Hear inward looking thoughts on inclusion, attitude, leadership, remote working, and not becoming the brilliant jerk.

  • Evolving Java

    Java continues to evolve & change. Track covers Spring 5, async, Kotlin, serverless, the 6-month cadence plans, & AI/ML use cases.

  • Security: Attacking and Defending

    Offense and defensive security evolution that application developers should know about including SGX Enclaves, effects of AI, software exploitation techniques, & crowd defense

  • The Practice & Frontiers of AI

    Learn about machine learning in practice and on the horizon. Learn about ML at Quora, Uber's Michelangelo, ML workflow with Netflix Meson and topics on Bots, Conversational interfaces, automation, and deployment practices in the space.

  • 21st Century Languages

    Compile to Native, Microservices, Machine learning... tailor-made languages solving modern challenges, featuring use cases around Go, Rust, C#, and Elm.

  • Modern CS in the Real World

    Applied trends in Computer Science that are likely to affect Software Engineers today. Topics include category theory, crypto, CRDT's, logic-based automated reasoning, and more.

  • Stream Processing In The Modern Age

    Compelling applications of stream processing using Flink, Beam, Spark, Strymon & recent advances in the field, including Custom Windowing, Stateful Streaming, SQL over Streams.