Presentation: The Highs and Lows of Stateful Containers
Share this on:
What You’ll Learn
- Hear what are some of the challenges encountered when running a stateful system in a container.
- Learn about some of the features Kubernetes has for running stateful systems, helpful patterns and some of the pitfalls to avoid.
- Find out what is still missing and what the future might bring to stateful containers.
Abstract
As modern organizations have rapidly embraced containers in recent years, stateful applications have proven tougher to transition into this brave new world than other workloads. When persistent state is involved, more is required both of the container orchestration system and of the stateful application itself to ensure the safety and availability of the data.
This talk will walk through my experiences trying to reliably run a distributed database on Kubernetes, optimize its performance, and help others do the same in their heterogeneous environments. We’ll look at what kinds of stateful applications can most easily be run in containers, which Kubernetes features and usage patterns are most helpful for running them, and a number of pitfalls I encountered along the way. Finally, we’ll ponder what’s missing and what the future may hold for stateful containers.
Tell me about the work that you do today.
I work on the open source CockroachDB database, the product itself, performance, stability of the core system, and then making sure it runs really well in all environments that users and customers want it. There are a lot of people trying to run it on Kubernetes, in a single cluster or across multiple regions. I've had a lot of exposure to trying to make a stateful system work in these various orchestrated container environments.
What's the big challenge when you're talking about stateful systems with an orchestrator?
Orchestrated systems don't provide all the same guarantees you'd expect when you're running something directly on your own VMs. A lot of the challenges are from not properly understanding how these systems work and what guarantees they’re providing that are a little different from what you might be used to in more traditional environments. You need to understand the environment and the system that you're running a little better than you'd have to run something directly out on bare metal.
Can you give me an example of that?
Particularly early on every orchestration system for a container assumed that all your containers are fungible. There was no need to think of one container as being any different from another. And there was also no need to care about where the containers are placed. Every container can be statelessly moved from machine to machine, killed as well without any concern for what it was doing. These kinds of assumptions don't work for many stateful systems. They could be the only instance with a certain piece of data on them or may have an expensive bootup process where they have to reload a bunch of data into memory.
Who are you targeting with this talk?
It's primarily for people who are architecting their applications, deciding where they want to run their systems and whether to put stateful workloads into a container. I’ll be sharing different problems that I've run into myself, helping others with running stateful applications in these systems so they can have a better sense of what it's really like, and cutting through the marketing hype as well as trying to give a better understanding of how to overcome some of the common problems encountered.
Can you illustrate one of the common problems that you might talk through that somebody will walk away with a pattern for?
One of the biggest mistakes that people make when they try to move their stuff into stateful workloads into containers is not understanding what they can rely on from the provider. People can get themselves into situations where they actually just lose their data because they assume that the storage they're running on will always be there, and it works great when they set up the demo, when they follow the steps from a blog post and works well for a week or two, but then when an unexpected failure hits them they realize they didn't plan for recovery, which is something important inside any orchestrator. If you don't plan for failure and pick your storage medium properly, you're going to have a really bad time a week or two down the road when the first failure comes into play.
When you say 'pick your storage correctly', what does it mean in this context?
When you take Kubernetes for example, you have a number of different options for where you want to store your data. You can choose to store it inside the container itself, on the host's desk, on a remote network attached storage, and there are multiple varieties if you're running in the cloud, and if you don't think about what you're doing you're probably defaulting to an incorrect choice that's going to leave you with lost data.
Are you saying 'Just use network storage and you're done', or you need to be intentional about what you're selecting?
You need to first be aware of the most likely mistakes. The defaults are usually wrong, you have to make a conscious choice to avoid defaults. And beyond that for performance reasons and also for different types of failures. You have to make an intelligent choice between the various default options, local disk or some more advanced network solution.
Is this going to be specifically for Cockroach DB or is this applicable to any stateful system you want to deploy with containers?
The same issues are applicable to any system that you're deploying in containers.
Similar Talks
Tracks
Monday, 5 November
-
Microservices / Serverless Patterns & Practices
Evolving, observing, persisting, and building modern microservices
-
Practices of DevOps & Lean Thinking
Practical approaches using DevOps & Lean Thinking
-
JavaScript & Web Tech
Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
-
Modern CS in the Real World
Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
-
Modern Operating Systems
Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
-
Optimizing You: Human Skills for Individuals
Better teams start with a better self. Learn practical skills for IC
Tuesday, 6 November
-
Architectures You've Always Wondered About
Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
-
21st Century Languages
Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
-
Emerging Trends in Data Engineering
Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
-
Bare Knuckle Performance
Killing latency and getting the most out of your hardware
-
Socially Conscious Software
Building socially responsible software that protects users privacy & safety
-
Delivering on the Promise of Containers
Runtime containers, libraries, and services that power microservices
Wednesday, 7 November
-
Applied AI & Machine Learning
Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
-
Production Readiness: Building Resilient Systems
More than just building software, building deployable production ready software
-
Developer Experience: Level up your Engineering Effectiveness
Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
-
Security: Lessons Attacking & Defending
Security from the defender's AND the attacker's point of view
-
Future of Human Computer Interaction
IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
-
Enterprise Languages
Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track