You are viewing content from a past/completed QCon

Track: Production Readiness: Building Resilient Systems

Location: Ballroom BC

Day of week: Wednesday

A production readiness review is used by software companies to determine whether the design and implementation of the system is ready to be released to its customers. The process is used to identify and address the reliability of a service, sufficiency of the coverage of privacy and security needs, and the ease of the operability. This track explores what types of aspects of software need to be prepared to start taking on full production load with customer’s data. Topics include observability, emergency response, capacity planning, release processes, and SLOs for availability and latency.

Track Host: Michelle Brush

Engineering Manager, SRE @Google

Michelle Brush is a math geek turned computer geek with over 15 years of software development experience. She has developed algorithms and data structures for pathfinding, search, compression, and data mining in embedded as well as distributed systems. In her current role as an SRE Manager for Google, she leads the team of SREs that ensures GCE's APIs are reliable. Previously, she served as the Director of HealtheIntent Architecture for Cerner Corporation, responsible for the data processing platform for Cerner’s Population Health solutions.  Prior to her time at Cerner, she was the lead engineer for Garmin's automotive routing algorithm. 

10:35am - 11:25am

Monitoring and Tracing @Netflix Streaming Data Infrastructure

Netflix streaming data infrastructure transports trillions of events per day and supports hundreds of streaming processing jobs. The team behind it is small and there is no separate operations team. To efficiently manage and operate this huge infrastructure and reduce operational burden for everyone, we developed a set of tools that enables automated operations and mitigations. Our Kafka monitoring tools provide comprehensive signals and great insights into the health of our Kafka brokers and consumers, from which we derived ways to automate error handling that improves stability of brokers and stream processing jobs. For data streams that have high consistency requirements, instead of purely relying on aggregated counts that may be misleading, we trace individual events along their transporting path. Enabled by stream processing with minimal resources, tracing provides insight into end-to-end data loss, duplicates and latency at near real time and with high accuracy. These results helped us to further improve our service quality and validate design trade-offs.

The talk will give the design and implementation details of these dev/ops tools and highlight the critical roles they play in operating our data infrastructure. It will showcase how active and targeted tools development for operational use can quickly payoff with improved product quality and overall agility.

Allen Wang, Architect & Engineer in Real Time Data Infrastructure Team @Netflix

11:50am - 12:40pm

Observability in the Development Process: Not Just for Ops Anymore

Monitoring has been historically considered an afterthought of the software development cycle: something owned by the ops side of the room. But instead of trying to predict the various ways something might go sideways right before release, what might it look like instead to learn about our production systems in order to figure out what to build, and how to build it, and whom for?

Observability is all about asking new questions of your systems -- and is something that should be built into the process of crafting software from the very beginning. In this talk, we'll explore what it looks like in practice, so that production stops being just where our development code runs into issues: it becomes where part of our development process lives.

Christine Yen, Cofounder @honeycombio

1:40pm - 2:30pm

Building Confidence in Healthcare Systems Through Chaos Engineering

Healthcare demands resilient software. Healthcare systems are resistant to change, as change can be viewed as a threat to system availability. To scale and modernize these systems, software engineers have to build confidence in how they can continually introduce change.    

This talk will cover how Cerner evolved their service workloads and applied gameday exercises to improve their resiliency. It will focus on how they transitioned their Java services from traditional enterprise application servers to a container deployment on Kubernetes using Spinnaker. It will share how they standardized their service deployment to have consistent instrumentation to get deep insight into the overall behavior of their system. It will explain strategies for how they applied traffic management approaches to safely introduce chaos engineering experiments, improving their overall understanding of the system.

Carl Chesser, Principal Engineer @Cerner

2:55pm - 3:45pm

How to Invest in Technical Infrastructure

Deciding what to work on is always difficult and is especially treacherous for folks working as infrastructure engineers and leaders. Will Larson unpacks the process of picking and prioritizing technical infrastructure work, which is essential to long-term company success but discussed infrequently. Will shares Stripe's approaches to prioritizing infrastructure as your company scales, justifying—and maybe even expanding—your company's spend on technical infrastructure, exploring the whole range of possible areas to invest into infrastructure, adapting your approach between periods of firefighting and periods of innovation, and balancing investment in supporting existing products and enabling new product development.

Will Larson, Foundation Engineering @Stripe

4:10pm - 5:00pm

Stop Talking & Listen; Practices for Creating Effective Customer SLOs

In this data-driven age we are constantly collecting and analyzing monumental quantities of data. We want to know everything about our product, how our customers use it, how long they use it and more importantly is the product even working? With all this data, we should be able to answer all of these questions. But turns out, that’s not always the case. In this talk, we’ll discuss some of the common pitfalls that arise from collecting and analyzing service data such as only using 'out-of-the-box' metrics and not having feedback loops. Then we'll discuss some practical tips for reducing noise and increasing effective customer signals with SLOs and analyzing customer pain points.

Cindy Quach, Site Reliability Engineer @Google

2020 Tracks

  • Modern Data Engineering

    The innovations necessary to build towards a fully automated decentralized data warehouse.

  • Machine Learning for the Software Engineer

    AI and machine learning is more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice by Software Engineers.

  • Inclusion & Diversity in Tech

    The road map to a inclusive and diverse tech organization. *Diversity & Inclusion defined as the inclusion of all individuals in an within tech, regardless of gender, religion, ethnicity, race, age, sexual orientation, and physical or mental fitness.

  • Architectures You've Always Wondered About

    How do they do it? In QCon's marquee Architectures track, we learn what it takes to operate at large scale from well-known names in our industry. You will take away hard-earned architectural lessons on scalability, reliability, throughput, and performance.

  • Architecting for Confidence: Building Resilant Systems

    Your system will fail. Build systems with the confidence to know when they do, you won't.

  • Remotely Productive: Remote Teams & Software

    More and more companies are moving to remote work. How do you build, work on, and lead teams remotely?

  • Operating Microservices

    Building and operating distributed systems is hard, and microservices are no different. Learn strategies for not just building a service but operating them at scale.

  • Distributed Systems for Developers

    Computer science in practice. An applied track that fuses together the human side of computer science with the technical choices that are made along the way

  • The Future of the API: REST, gRPC, GraphQL and More

    Web-based API continue to evolve. The track provides the what, how, and why of future APIs, including GraphQL, Backend for Frontend, gRPC, & ReST

  • Resurgence of Functional Programming

    What was once a paradigm shift in how we thought of programming languages is now main stream in nearly all modern languages. Hear how software shops are infusing concepts like pure functions and immutablity into their architectures and design choices.

  • Social Responsibility: Implications of Building Modern Software

    Software has an ever increasing impact on individuals and society. Understanding these implications helps build software that works for all users

  • Non-Technical Skills for Technical Folks

    To be an effective engineer, requires more than great coding skills. Learn the subtle arts of the tech lead, including empathy, communication, and organization.

  • Clientside: From WASM to Browser Applications

    Dive into some of the technologies that can be leveraged to ultimately deliver a more impactful interaction between the user and client.

  • Languages of Infra

    More than just Infrastructure as a Service, today we have librarys, languages, and platforms that help us define our infra. Languages of Infra explore languages and libraries being used today to build modern cloud native architectures.

  • Mechanical Sympathy: The Software/Hardware Divide

    Understanding the Hardware Makes You a Better Developer

  • Paths to Production: Deployments You've Always Wondered About

    Deployment pipelines allow us to push to production at ever increasing volume. Paths to production looks at how some of software's most well known shops continuous deliver code.

  • Java, The Platform

    Mobile, Micro, Modular: The platform continues to evolve and change. Discover how the platform continues to drive us forward.

  • Security for Engineers

    How to build secure, yet usable, systems from the engineer's perspective.