Track: Evolving DevOps


Day of week:

Devops, SRE, TechOps, System Administration and the rest have a common goal toward stability, and reliability of deployment and operation of production applications. We’ll discuss how mechanization of SRE tasks, observability, consistency, and training are vital for speed and operational excellence in a scalable production environment. We’ll discuss the history of the competencies of DevOps as well as how we can work toward increasing the pipeline of new folks in the field.

Track Host:
Lisa Phillips
VP, Site Reliability Engineering @Fastly
Lisa Phillips is a leader in the reliability, with particular interest in social media and speeding up content delivery. She has worked for 20 years in tech and database operational roles for large sites Livejournal, Six Apart and Twitter - where she helped kill the fail whale. Lisa is returning from a year of world travelling and is happy to have landed at Fastly as Vice President of Site Reliability Engineering.
10:35am - 11:25am

by Lisa Phillips
VP, Site Reliability Engineering @Fastly

As a content delivery network, Fastly operates a large internetwork and a global application environment. Fastly developed its Incident Command protocol, which it uses to deal with large-scale events. Lisa will cover in detail the typical struggles a company Fastly’s size runs into when building around-the-clock incident operations and the things Fastly has put in place to make dealing with incidents easier and more effective. She will also cover common mistakes and lessons learned as Fastly...

11:50am - 12:40pm

by Cory Watson
Observability Specialist @Stripe

It's common to hear that an organization needs more observability, but what does that mean?

How do you change the culture of a company such that these needs are addressed sooner than later? I've got some ideas, and I've been trying them out at Stripe. Let's review how it's gone and talk about what worked at what didn't.

Let's talk about people, their needs and how to make them — and your observability — awesome.

1:40pm - 2:30pm

Open Space
2:55pm - 3:45pm

by Sayli Karmarkar
Senior Software Engineer, Diagnostics and Remediation Engineering (DaRE) @Netflix

Netflix is a collection of microservices that all come together to enable the product you love. Operations for these microservices is distributed across the owning teams and their engineers. Ever wondered how we manage to achieve high availability and reliability without having a central operations team managing the operations of all these individual services? We believe that engineers who know their service inside out are the best people to manage its operations as well. So instead of...

4:10pm - 5:00pm

by Pedro Canahuati
VP, Production Engineering & Site Reliability @Facebook

Development/Software orgs typically focus on shipping and building new features. Sometimes this happens at the expense of efficiency or stability. Operations orgs are typically built to enure services run smoothly 24x7 and to do it with the least amount of cost possible. Sometimes, this means each teams’ incentives aren't quite aligned and the situation can lead to an us versus them dynamic.

Facebook’s solution for this problem lies in the Production Engineering (PE) team. PE embeds...

5:25pm - 6:15pm

by Franziska Bell
Data Science Manager @Uber

The Observability team at Uber focuses on providing intelligent real-time outage detection and root cause exploration at scale. This encompasses multiple building blocks: (i) a proprietary, scalable back-end store for application telemetry data that can service more than 500 million time series in real-time, (ii) a user-friendly and robust query language and UI for setting up alert configurations, (iii) the development of novel time series and machine learning models for fully automated,...



Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9