Track: Distributed Systems War Stories


Day of week:

Distributed systems are notoriously difficult to build. Asynchrony and partial failure make our job as implementers challenging. But even after we’ve developed and deployed our applications unforeseen scenarios may occur and challenge all of our assumptions. In this track we’ll share stories of surviving the unexpected with an emphasis on analysis and knowledge sharing.

Track Host:
Ines Sombra
Engineer @Fastly
Ines Sombra is an Engineer at @Fastly, where she spends her time helping the Web go faster. Ines holds an M.S. in Computology with an emphasis on Cheesy 80’s Rock Ballads. She has a fondness for steak, fernet, and a pug named Gordo. In a previous life she was a Data Engineer.
10:35am - 11:25am

by Kiran Bhattaram
Infrastructure/Developer @Stripe

"It was a dark and stormy night; the rain fell in torrents — except at occasional intervals, when it was checked by a violent gust of wind which swept up the streets (for it is in London that our scene lies), rattling along the housetops, and fiercely agitating the scanty flame of the lamps that struggled against the darkness.”

This sentence exhibits so many writing antipatterns that it's inspired an entire literary competition for terrible opening sentences. It's complicated...

11:50am - 12:40pm

by Jordan West
Data Infrastructure Engineer @Apple

As distributed systems builders it is in our best interest to understand the theory underpinning our work and to learn from the research in our field. For many years, academic papers have provied the practitioner with an avenue to accomplish these goals, providing solutions to hard problems and defining the limits of what is possible. But implementing the system or algorithm described in a paper or applying those theoretical limits to an implementation is not a straightforward process. It...

1:40pm - 2:30pm

by Tyler McMullen
CTO @Fastly

Load balancing is something most of us assume is a solved problem. But the idea that load balancing is "solved" could not be further from the truth. If you use multiple load balancers, the problem is even worse. Most of us use "random" or "round-robin" techniques, which have certain advantages but are highly inefficient. Others use more complex algorithms like "least-conns," which can be more efficient but have horrific edge cases. "Consistent hashing" is a very useful technique, but only...

2:55pm - 3:45pm

Open Space
4:10pm - 5:00pm

by Jeff Hodges
CEO @DarkishGreen

Let's Encrypt is the first the non-profit, open source, free, automated certificate authority that's issued over 10 million HTTPS certificates in its first year. This talk is a run down of the tradeoffs and designs that Let's Encrypt shipped with and how the social systems around it impacted those decisions.

5:25pm - 6:15pm

by Haley Tucker
Senior Software Engineer, Playback Features @Netflix

Imagine a world where you do everything within your power to ensure the code you are pushing into production is as ready as possible to take traffic. You have thorough test coverage, you push out canaries, and you use push windows. You have truly operated your microservice in a top-notch way. And then all of a sudden...CPU spikes, GC churns, latency increases, you start spewing errors...enter sad Netflix customers.

This talk will explore ways in which behaviors of other systems result...



Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9