Devops, SRE, TechOps, System Administration and the rest have a common goal toward stability, and reliability of deployment and operation of production applications. We’ll discuss how mechanization of SRE tasks, observability, consistency, and training are vital for speed and operational excellence in a scalable production environment. We’ll discuss the history of the competencies of DevOps as well as how we can work toward increasing the pipeline of new folks in the field.
Track: Practices of DevOps & Lean Thinking
Location: Ballroom BC
Day of week: Monday
Track Host: Jessica Kerr
Jessica Kerr (unique ID: @jessitron) develops development automation at Atomist. After a dozen years in Java, she branched out to Scala and Clojure and Ruby and Elm and more. Nowadays she works in TypeScript, on tools to help developers automate more of our own work -- DevOps is only the beginning! Jessica lives in St. Louis, MO with four humans and two felines. Her Pokémon Go trainer code is 5431 5140 6916.
10:35am - 11:25am
DevOps For The Database
Why is it hard to apply DevOps principles and practices to databases, and how can we get better at it? This talk explores real-life stories that answer these two questions, through the perspectives of teams that have changed the entrenched culture, processes, and tooling—and those who’ve tried. Along the way, we’ll cover topics including:
- What the research shows about DevOps, databases, and company performance
- Current and emerging trends in how we build and manage data tiers, and implications
- The traditional dedicated DBA role, and what has happened as a result
- What it takes to change from a DBA-centric culture, to one where database-related competencies and responsibilities are more distributed
- Why some teams succeed in this transformation, while others fail
We can apply DevOps principles to the database, and our work will be better for it. This talk will show you how.
11:50am - 12:40pm
Service Ownership @Slack
As recently as 2017, developers at Slack didn’t carry a pager. They deployed to production over a hundred times a day, and a centralized operations team took the calls in the night. Most pages were not very actionable because they weren’t set up by the dev teams that knew their systems best. Heros and knowledge islands saved day over and over. Post-incident postmortems were poorly attended and did not encourage learning.
Slowly, then quickly, all that changed. Slack moved to teams of empowered developers on-call, with embedded SREs, safer production deployments, and actionable alerts. Postmortems focus on learning, and meaningful analysis of incident patterns is done at all levels of the company.
In this talk you’ll hear all about the bumps and scrapes, triumphs and pitfalls of our journey from a centralized ops team to development teams that own the full lifecycle of their systems. It wasn’t easy, but it wasn’t impossible. Hopefully, it will inspire you to try something radically different at your company too.
1:40pm - 2:30pm
Whispers in the Chaos: Monitoring Weak Signals
The complexity of the socio-technical systems we engineer, operate, and exist within is staggering. Due to our daily interactions with and familiarity with our systems, the true gravity of this complexity can become easy to ignore. (And... let's face it, as a good coping strategy, too!) When those systems falter or fail, we often find in the postmortems and retrospectives afterward that there were "weak signals" that portended doom, but we didn't know they were there or how to sense them.
In this talk, we'll look at what the safety sciences have to say about humans operating complex socio-technical systems, including what aircraft carriers have to do with Internet infrastructure operations, how resilience engineering can help us, and the role heuristics play in incident response. All of these provide insight into ways we can improve one the most advanced—and most effective—monitoring tools we have available to keep those systems running: ourselves.
2:55pm - 3:45pm
Full Cycle Developers @Netflix
The year was 2012 and operating a critical service at Netflix was laborious. Deployments were like walking through wet sand. Testing devolved into verifying endurance rather than correct functionality. Researching issues felt like bouncing a rubber ball between teams, hard to catch the root cause and harder yet to stop from bouncing between one another. All of these were signs that changes were needed.
Fast forward to 2018. Netflix has grown to over 130M global members enjoying stories from all over the world. Deployments happen daily rather than monthly. The reliability of our service continues to improve and middle-of-the-night pages are much less common. Our mission critical services are owned and operated by small teams of developers with no dedicated test teams and no dedicated operations teams. How did we make this transition?
This talk presents our journey from siloed teams to our Full Cycle Developer model for building and operating our services at Netflix. I will discuss the various approaches we’ve tried, the motivations that pushed us to keep evolving, and the lessons learned along the way. The audience will leave with an understanding of the Full Cycle Developer model, its pros and cons, and what’s required to make it work. I hope that sharing our experiences inspires others to debate the alternatives and learn from our journey.
4:10pm - 5:00pm
DevOps & Lean Thinking Panel
The panelists confront deep questions like "How do you DevOps right?" and "Is testing waste?" Find pointers about selecting incident commanders, DevOps under auditing constraints, and low-overhead deploy coordination.
Matt Stratton, DevOps Advocate @pagerduty
Bridget Kromhout, Principal Cloud Developer Advocate @Microsoft
J. Paul Reed, Build/Release Engineering, DevOps, and Human Factors Consultant
Greg Burrell, Sr SRE @Netflix, member of the Edge Developer Productivity Team
Holly Allen, Service Engineering @SlackHQ
5:25pm - 6:15pm
Day Two Kubernetes: Tools for Operability
Artisanally hand-crafting our own container hosting solutions can be a fun learning experience, but for repeatable production use, we want to deploy and manage Kubernetes clusters in a reproducible fashion. Using open source tools like Helm, Draft, Brigade, and Terraform, we can deploy and update our Kubernetes clusters via a trusted, versioned, repeatable process. We’ll discuss what containers and Kubernetes clusters are at a high level, look into the practical application of open source tools to simplify cluster management, and show you how to deploy Kubernetes clusters in a repeatable and portable fashion.
Tracks
Monday, 5 November
-
Microservices / Serverless Patterns & Practices
Evolving, observing, persisting, and building modern microservices
-
Practices of DevOps & Lean Thinking
Practical approaches using DevOps & Lean Thinking
-
JavaScript & Web Tech
Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
-
Modern CS in the Real World
Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
-
Modern Operating Systems
Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
-
Optimizing You: Human Skills for Individuals
Better teams start with a better self. Learn practical skills for IC
Tuesday, 6 November
-
Architectures You've Always Wondered About
Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
-
21st Century Languages
Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
-
Emerging Trends in Data Engineering
Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
-
Bare Knuckle Performance
Killing latency and getting the most out of your hardware
-
Socially Conscious Software
Building socially responsible software that protects users privacy & safety
-
Delivering on the Promise of Containers
Runtime containers, libraries, and services that power microservices
Wednesday, 7 November
-
Applied AI & Machine Learning
Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
-
Production Readiness: Building Resilient Systems
More than just building software, building deployable production ready software
-
Developer Experience: Level up your Engineering Effectiveness
Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
-
Security: Lessons Attacking & Defending
Security from the defender's AND the attacker's point of view
-
Future of Human Computer Interaction
IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
-
Enterprise Languages
Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track