You are viewing content from a past/completed QCon -

Presentation: The Highs and Lows of Stateful Containers

Track: Delivering on the Promise of Containers

Location: Bayview AB

Duration: 10:35am - 11:25am

Day of week: Tuesday

Slides: Download Slides

Level: Intermediate

Persona: DevOps Engineer

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Hear what are some of the challenges encountered when running a stateful system in a container.
  2. Learn about some of the features Kubernetes has for running stateful systems, helpful patterns and some of the pitfalls to avoid.
  3. Find out what is still missing and what the future might bring to stateful containers.


As modern organizations have rapidly embraced containers in recent years, stateful applications have proven tougher to transition into this brave new world than other workloads. When persistent state is involved, more is required both of the container orchestration system and of the stateful application itself to ensure the safety and availability of the data. 

This talk will walk through my experiences trying to reliably run a distributed database on Kubernetes, optimize its performance, and help others do the same in their heterogeneous environments. We’ll look at what kinds of stateful applications can most easily be run in containers, which Kubernetes features and usage patterns are most helpful for running them, and a number of pitfalls I encountered along the way. Finally, we’ll ponder what’s missing and what the future may hold for stateful containers.


Tell me about the work that you do today.


I work on the open source CockroachDB database, the product itself, performance, stability of the core system, and then making sure it runs really well in all environments that users and customers want it. There are a lot of people trying to run it on Kubernetes, in a single cluster or across multiple regions. I've had a lot of exposure to trying to make a stateful system work in these various orchestrated container environments.


What's the big challenge when you're talking about stateful systems with an orchestrator?


Orchestrated systems don't provide all the same guarantees you'd expect when you're running something directly on your own VMs. A lot of the challenges are from not properly understanding how these systems work and what guarantees they’re providing that are a little different from what you might be used to in more traditional environments. You need to understand the environment and the system that you're running a little better than you'd have to run something directly out on bare metal.


Can you give me an example of that?


Particularly early on every orchestration system for a container assumed that all your containers are fungible. There was no need to think of one container as being any different from another. And there was also no need to care about where the containers are placed. Every container can be statelessly moved from machine to machine, killed as well without any concern for what it was doing. These kinds of assumptions don't work for many stateful systems. They could be the only instance with a certain piece of data on them or may have an expensive bootup process where they have to reload a bunch of data into memory.


Who are you targeting with this talk?


It's primarily for people who are architecting their applications, deciding where they want to run their systems and whether to put stateful workloads into a container. I’ll be sharing different problems that I've run into myself, helping others with running stateful applications in these systems so they can have a better sense of what it's really like, and cutting through the marketing hype as well as trying to give a better understanding of how to overcome some of the common problems encountered.


Can you illustrate one of the common problems that you might talk through that somebody will walk away with a pattern for?


One of the biggest mistakes that people make when they try to move their stuff into stateful workloads into containers is not understanding what they can rely on from the provider. People can get themselves into situations where they actually just lose their data because they assume that the storage they're running on will always be there, and it works great when they set up the demo, when they follow the steps from a blog post and works well for a week or two, but then when an unexpected failure hits them they realize they didn't plan for recovery, which is something important inside any orchestrator. If you don't plan for failure and pick your storage medium properly, you're going to have a really bad time a week or two down the road when the first failure comes into play.


When you say 'pick your storage correctly', what does it mean in this context?


When you take Kubernetes for example, you have a number of different options for where you want to store your data. You can choose to store it inside the container itself, on the host's desk, on a remote network attached storage, and there are multiple varieties if you're running in the cloud, and if you don't think about what you're doing you're probably defaulting to an incorrect choice that's going to leave you with lost data.


Are you saying 'Just use network storage and you're done', or you need to be intentional about what you're selecting?


You need to first be aware of the most likely mistakes. The defaults are usually wrong, you have to make a conscious choice to avoid defaults. And beyond that for performance reasons and also for different types of failures. You have to make an intelligent choice between the various default options, local disk or some more advanced network solution.


Is this going to be specifically for Cockroach DB or is this applicable to any stateful system you want to deploy with containers?


The same issues are applicable to any system that you're deploying in containers.

Speaker: Alex Robinson

Member of Technical Staff @CockroachDB, previously SWE @GCPcloud

Alex is a member of the technical staff at Cockroach Labs, the startup leading the development of the open source CockroachDB project, where he works on CockroachDB's core transactional storage layer and leads all integrations with orchestration systems. Previously, he was a senior software engineer at Google, where he spent his last two years as a core early developer of Kubernetes and GKE.

Find Alex Robinson at

Last Year's Tracks

  • Monday, 16 November

  • Non-Technical Skills for Technical Folks

    To be an effective engineer, requires more than great coding skills. Learn the subtle arts of the tech lead, including empathy, communication, and organization.

  • Clientside: From WASM to Browser Applications

    Dive into some of the technologies that can be leveraged to ultimately deliver a more impactful interaction between the user and client.

  • Languages of Infra

    More than just Infrastructure as a Service, today we have libraries, languages, and platforms that help us define our infra. Languages of Infra explore languages and libraries being used today to build modern cloud native architectures.

  • Mechanical Sympathy: The Software/Hardware Divide

    Understanding the Hardware Makes You a Better Developer

  • Paths to Production: Deployment Pipelines as a Competitive Advantage

    Deployment pipelines allow us to push to production at ever increasing volume. Paths to production looks at how some of software's most well known shops continuous deliver code.

  • Java, The Platform

    Mobile, Micro, Modular: The platform continues to evolve and change. Discover how the platform continues to drive us forward.

  • Tuesday, 17 November

  • Security for Engineers

    How to build secure, yet usable, systems from the engineer's perspective.

  • Modern Data Engineering

    The innovations necessary to build towards a fully automated decentralized data warehouse.

  • Machine Learning for the Software Engineer

    AI and machine learning are more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice by Software Engineers.

  • Inclusion & Diversity in Tech

    The road map to an inclusive and diverse tech organization. *Diversity & Inclusion defined as the inclusion of all individuals in an within tech, regardless of gender, religion, ethnicity, race, age, sexual orientation, and physical or mental fitness.

  • Architectures You've Always Wondered About

    How do they do it? In QCon's marquee Architectures track, we learn what it takes to operate at large scale from well-known names in our industry. You will take away hard-earned architectural lessons on scalability, reliability, throughput, and performance.

  • Architecting for Confidence: Building Resilient Systems

    Your system will fail. Build systems with the confidence to know when they do and you won’t.

  • Wednesday, 18 November

  • Remotely Productive: Remote Teams & Software

    More and more companies are moving to remote work. How do you build, work on, and lead teams remotely?

  • Operating Microservices

    Building and operating distributed systems is hard, and microservices are no different. Learn strategies for not just building a service but operating them at scale.

  • Distributed Systems for Developers

    Computer science in practice. An applied track that fuses together the human side of computer science with the technical choices that are made along the way

  • The Future of APIs

    Web-based API continue to evolve. The track provides the what, how, and why of future APIs, including GraphQL, Backend for Frontend, gRPC, & ReST

  • Resurgence of Functional Programming

    What was once a paradigm shift in how we thought of programming languages is now main stream in nearly all modern languages. Hear how software shops are infusing concepts like pure functions and immutablity into their architectures and design choices.

  • Social Responsibility: Implications of Building Modern Software

    Software has an ever increasing impact on individuals and society. Understanding these implications helps build software that works for all users