Incident Response

Session Staff Plus Engineering

The Ironies of A^2 I^2

Wednesday Nov 19 / 01:35PM PST

In this talk, we'll explore some of the "ironies" of automation—and now, artificial intelligence—in their interactions with software operators (i.e. you), especially during high consequence, high tempo situations (aka incidents).

Speaker image - J. Paul Reed

J. Paul Reed

Staff Incident Operations Manager @Chime

Session Incident Response

Week-Long Outage: Lifelong Lessons

Wednesday Nov 19 / 02:45PM PST

Routine database upgrades should be straightforward, especially with familiar, well-established technology. We were confident heading into our Elasticsearch upgrade, equipped with a solid plan and excited to see performance gains like we had seen from past upgrades.

Speaker image - Molly Struve

Molly Struve

Staff Site Reliability Engineer @Netflix

Session Incident Analysis

The Time it Wasn't DNS

Wednesday Nov 19 / 03:55PM PST

In January of 2023, the Microsoft Azure Wide Area Network experienced a global outage. If you were a Microsoft customer at the time, you were impacted by this outage.

Speaker image - Sean Klein

Sean Klein

Principal Technical Program Manager - Modern Incident Analysis @Microsoft Azure