Track: Practices of DevOps & Lean Thinking

Location: Ballroom BC

Day of week: Monday

Devops, SRE, TechOps, System Administration and the rest have a common goal toward stability, and reliability of deployment and operation of production applications. We’ll discuss how mechanization of SRE tasks, observability, consistency, and training are vital for speed and operational excellence in a scalable production environment. We’ll discuss the history of the competencies of DevOps as well as how we can work toward increasing the pipeline of new folks in the field.

Track Host: Jessica Kerr

Polyglot Functional Developer @atomist

Jessica Kerr (unique ID: @jessitron) develops development automation at Atomist. After a dozen years in Java, she branched out to Scala and Clojure and Ruby and Elm and more. Nowadays she works in TypeScript, on tools to help developers automate more of our own work -- DevOps is only the beginning! Jessica lives in St. Louis, MO with four humans and two felines. Her Pokémon Go trainer code is 5431 5140 6916.

CASE STUDY TALK (50 MIN)

10:35am - 11:25am

DevOps For The Database

Why is it hard to apply DevOps principles and practices to databases, and how can we get better at it? This talk explores real-life stories that answer these two questions, through the perspectives of teams that have changed the entrenched culture, processes, and tooling—and those who’ve tried. Along the way, we’ll cover topics including:

  • What the research shows about DevOps, databases, and company performance
  • Current and emerging trends in how we build and manage data tiers, and implications
  • The traditional dedicated DBA role, and what has happened as a result
  • What it takes to change from a DBA-centric culture, to one where database-related competencies and responsibilities are more distributed
  • Why some teams succeed in this transformation, while others fail

We can apply DevOps principles to the database, and our work will be better for it. This talk will show you how.

Baron Schwartz, CTO @VividCortex
CASE STUDY TALK (50 MIN)

11:50am - 12:40pm

Service Ownership @Slack

As recently as 2017, developers at Slack didn’t carry a pager. They deployed to production over a hundred times a day, and a centralized operations team took the calls in the night. Most pages were not very actionable because they weren’t set up by the dev teams that knew their systems best. Heros and knowledge islands saved day over and over. Post-incident postmortems were poorly attended and did not encourage learning.     

Slowly, then quickly, all that changed. Slack moved to teams of empowered developers on-call, with embedded SREs, safer production deployments, and actionable alerts. Postmortems focus on learning, and meaningful analysis of incident patterns is done at all levels of the company.     

In this talk you’ll hear all about the bumps and scrapes, triumphs and pitfalls of our journey from a centralized ops team to development teams that own the full lifecycle of their systems. It wasn’t easy, but it wasn’t impossible. Hopefully, it will inspire you to try something radically different at your company too.

Holly Allen, Service Engineering @SlackHQ
CASE STUDY TALK (50 MIN)

1:40pm - 2:30pm

Whispers in the Chaos: Monitoring Weak Signals

The complexity of the socio-technical systems we engineer, operate, and exist within is staggering. Due to our daily interactions with and familiarity with our systems, the true gravity of this complexity can become easy to ignore. (And... let's face it, as a good coping strategy, too!) When those systems falter or fail, we often find in the postmortems and retrospectives afterward that there were "weak signals" that portended doom, but we didn't know they were there or how to sense them. 

In this talk, we'll look at what the safety sciences have to say about humans operating complex socio-technical systems, including what aircraft carriers have to do with Internet infrastructure operations, how resilience engineering can help us, and the role heuristics play in incident response. All of these provide insight into ways we can improve one the most advanced—and most effective—monitoring tools we have available to keep those systems running: ourselves.

J. Paul Reed, Build/Release Engineering, DevOps, and Human Factors Consultant
CASE STUDY TALK (50 MIN)

2:55pm - 3:45pm

Full Cycle Developers @Netflix

The year was 2012 and operating a critical service at Netflix was laborious. Deployments were like walking through wet sand. Testing devolved into verifying endurance rather than correct functionality. Researching issues felt like bouncing a rubber ball between teams, hard to catch the root cause and harder yet to stop from bouncing between one another. All of these were signs that changes were needed.   

Fast forward to 2018. Netflix has grown to over 130M global members enjoying stories from all over the world. Deployments happen daily rather than monthly. The reliability of our service continues to improve and middle-of-the-night pages are much less common. Our mission critical services are owned and operated by small teams of developers with no dedicated test teams and no dedicated operations teams. How did we make this transition?   

This talk presents our journey from siloed teams to our Full Cycle Developer model for building and operating our services at Netflix.  I will discuss the various approaches we’ve tried, the motivations that pushed us to keep evolving, and the lessons learned along the way.  The audience will leave with an understanding of the Full Cycle Developer model, its pros and cons, and what’s required to make it work.  I hope that sharing our experiences inspires others to debate the alternatives and learn from our journey.

Greg Burrell, Sr SRE @Netflix, member of the Edge Developer Productivity Team
CASE STUDY TALK (50 MIN)

4:10pm - 5:00pm

DevOps & Lean Thinking Panel

The panelists confront deep questions like "How do you DevOps right?" and "Is testing waste?" Find pointers about selecting incident commanders, DevOps under auditing constraints, and low-overhead deploy coordination.

Jessica Kerr, Polyglot Functional Developer @atomist
Matt Stratton, DevOps Advocate @pagerduty
Bridget Kromhout, Principal Cloud Developer Advocate @Microsoft
J. Paul Reed, Build/Release Engineering, DevOps, and Human Factors Consultant
Greg Burrell, Sr SRE @Netflix, member of the Edge Developer Productivity Team
Holly Allen, Service Engineering @SlackHQ
CASE STUDY TALK (50 MIN)

5:25pm - 6:15pm

Day Two Kubernetes: Tools for Operability

Artisanally hand-crafting our own container hosting solutions can be a fun learning experience, but for repeatable production use, we want to deploy and manage Kubernetes clusters in a reproducible fashion. Using open source tools like Helm, Draft, Brigade, and Terraform, we can deploy and update our Kubernetes clusters via a trusted, versioned, repeatable process. We’ll discuss what containers and Kubernetes clusters are at a high level, look into the practical application of open source tools to simplify cluster management, and show you how to deploy Kubernetes clusters in a repeatable and portable fashion.

Bridget Kromhout, Principal Cloud Developer Advocate @Microsoft

Tracks

Monday, 5 November

Tuesday, 6 November

Wednesday, 7 November

The all-new QCon app!

Available on iOS and Android

The new QCon app helps you make the most of your conference experience. Easily browse and follow the conference schedule, star the talks you want to attend, and keep tabs on your personal itinerary. Download the app now for free on iOS and Android.