You are viewing content from a past/completed QCon

Presentation: The Highs and Lows of Stateful Containers

Track: Delivering on the Promise of Containers

Location: Bayview AB

Duration: 10:35am - 11:25am

Day of week: Tuesday

Level: Intermediate

Persona: DevOps Engineer

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript

What You’ll Learn

  1. Hear what are some of the challenges encountered when running a stateful system in a container.
  2. Learn about some of the features Kubernetes has for running stateful systems, helpful patterns and some of the pitfalls to avoid.
  3. Find out what is still missing and what the future might bring to stateful containers.

Abstract

As modern organizations have rapidly embraced containers in recent years, stateful applications have proven tougher to transition into this brave new world than other workloads. When persistent state is involved, more is required both of the container orchestration system and of the stateful application itself to ensure the safety and availability of the data. 

This talk will walk through my experiences trying to reliably run a distributed database on Kubernetes, optimize its performance, and help others do the same in their heterogeneous environments. We’ll look at what kinds of stateful applications can most easily be run in containers, which Kubernetes features and usage patterns are most helpful for running them, and a number of pitfalls I encountered along the way. Finally, we’ll ponder what’s missing and what the future may hold for stateful containers.

Question: 

Tell me about the work that you do today.

Answer: 

I work on the open source CockroachDB database, the product itself, performance, stability of the core system, and then making sure it runs really well in all environments that users and customers want it. There are a lot of people trying to run it on Kubernetes, in a single cluster or across multiple regions. I've had a lot of exposure to trying to make a stateful system work in these various orchestrated container environments.

Question: 

What's the big challenge when you're talking about stateful systems with an orchestrator?

Answer: 

Orchestrated systems don't provide all the same guarantees you'd expect when you're running something directly on your own VMs. A lot of the challenges are from not properly understanding how these systems work and what guarantees they’re providing that are a little different from what you might be used to in more traditional environments. You need to understand the environment and the system that you're running a little better than you'd have to run something directly out on bare metal.

Question: 

Can you give me an example of that?

Answer: 

Particularly early on every orchestration system for a container assumed that all your containers are fungible. There was no need to think of one container as being any different from another. And there was also no need to care about where the containers are placed. Every container can be statelessly moved from machine to machine, killed as well without any concern for what it was doing. These kinds of assumptions don't work for many stateful systems. They could be the only instance with a certain piece of data on them or may have an expensive bootup process where they have to reload a bunch of data into memory.

Question: 

Who are you targeting with this talk?

Answer: 

It's primarily for people who are architecting their applications, deciding where they want to run their systems and whether to put stateful workloads into a container. I’ll be sharing different problems that I've run into myself, helping others with running stateful applications in these systems so they can have a better sense of what it's really like, and cutting through the marketing hype as well as trying to give a better understanding of how to overcome some of the common problems encountered.

Question: 

Can you illustrate one of the common problems that you might talk through that somebody will walk away with a pattern for?

Answer: 

One of the biggest mistakes that people make when they try to move their stuff into stateful workloads into containers is not understanding what they can rely on from the provider. People can get themselves into situations where they actually just lose their data because they assume that the storage they're running on will always be there, and it works great when they set up the demo, when they follow the steps from a blog post and works well for a week or two, but then when an unexpected failure hits them they realize they didn't plan for recovery, which is something important inside any orchestrator. If you don't plan for failure and pick your storage medium properly, you're going to have a really bad time a week or two down the road when the first failure comes into play.

Question: 

When you say 'pick your storage correctly', what does it mean in this context?

Answer: 

When you take Kubernetes for example, you have a number of different options for where you want to store your data. You can choose to store it inside the container itself, on the host's desk, on a remote network attached storage, and there are multiple varieties if you're running in the cloud, and if you don't think about what you're doing you're probably defaulting to an incorrect choice that's going to leave you with lost data.

Question: 

Are you saying 'Just use network storage and you're done', or you need to be intentional about what you're selecting?

Answer: 

You need to first be aware of the most likely mistakes. The defaults are usually wrong, you have to make a conscious choice to avoid defaults. And beyond that for performance reasons and also for different types of failures. You have to make an intelligent choice between the various default options, local disk or some more advanced network solution.

Question: 

Is this going to be specifically for Cockroach DB or is this applicable to any stateful system you want to deploy with containers?

Answer: 

The same issues are applicable to any system that you're deploying in containers.

Speaker: Alex Robinson

Member of Technical Staff @CockroachDB, previously SWE @GCPcloud

Alex is a member of the technical staff at Cockroach Labs, the startup leading the development of the open source CockroachDB project, where he works on CockroachDB's core transactional storage layer and leads all integrations with orchestration systems. Previously, he was a senior software engineer at Google, where he spent his last two years as a core early developer of Kubernetes and GKE.

Find Alex Robinson at

Tracks

  • Modern Operating Systems

    Applied, practical & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on.

  • Software Supply Chain

    Securing the container image supply chain (containers + orchestration + security + DevOps).

  • Modern CS in the Real World

    Thoughts pushing software forward, including consensus, CRDT's, formal methods & probabilistic programming.

  • Tech Ethics: The Intersection of Human Welfare & STEM

    What does it mean to be ethical in software? Hear how the discussion is evolving and what is being said in ethics.

  • Optimizing Yourself: Human Skills for Individuals

    Better teams start with a better self. Learn practical skills for IC.

  • Modern Data Architectures

    Today’s systems move huge volumes of data. Hear how places like LinkedIn, Facebook, Uber and more built their systems and learn from their mistakes.

  • Practices of DevOps & Lean Thinking

    Practical approaches using DevOps and a lean approach to delivering software.

  • Operationalizing Microservices: Design, Deliver, Operate

    What's the last mile for deploying your service? Learn techniques from the world's most innovative shops on managing and operating Microservices at scale.

  • Bare Knuckle Performance

    Killing latency and getting the most out of your hardware

  • Architectures You've Always Wondered About

    Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more

  • Machine Learning for Developers

    AI/ML is more approachable than ever. Discover how deep learning and ML is being used in practice. Topics include: TensorFlow, TPUs, Keras, PyTorch & more. No PhD required.

  • Production Readiness: Building Resilient Systems

    Making systems resilient involves people and tech. Learn about strategies being used from chaos testing to distributed systems clustering.

  • Surviving Uncertainty: Regulation, Risk, and Compliance

    With so much uncertainty, how do you bulkhead your organization and technology choices? Learn strategies for dealing with uncertainty.

  • Languages of Infra

    This track explores languages being used to code the infrastructure. Expect practices on toolkits and languages like Cloudformation, Terraform, Python, Go, Rust, Erlang.

  • Building & Scaling High-Performing Teams

    Building, maintaining, and growing a team balanced for skills and aptitudes. Constraint theory, systems thinking, lean, hiring/firing and performance improvement

  • Evolving the JVM

    The JVM continues to evolve. We’ll look at how languages like Kotlin, Graal, Clojure, and Java are evolving the JDK. Expect polyglot, multi-VM, performance, and more in this track.

  • Trust, Safety & Security

    Privacy, confidentiality, safety and security: learning from the frontlines.

  • JavaScript & Transpiler/WebAssembly Track

    JavaScript is the language of the web. Latest practices for JavaScript development in and how transpilers are affecting the way we work. We’ll also look at the work being done with WebAssembly.