The Highs and Lows of Stateful Containers

Next QConSF Conference: Applied AI for Developers QCon.ai April 2019

What You’ll Learn

Hear what are some of the challenges encountered when running a stateful system in a container.
Learn about some of the features Kubernetes has for running stateful systems, helpful patterns and some of the pitfalls to avoid.
Find out what is still missing and what the future might bring to stateful containers.

Abstract

As modern organizations have rapidly embraced containers in recent years, stateful applications have proven tougher to transition into this brave new world than other workloads. When persistent state is involved, more is required both of the container orchestration system and of the stateful application itself to ensure the safety and availability of the data.
This talk will walk through my experiences trying to reliably run a distributed database on Kubernetes, optimize its performance, and help others do the same in their heterogeneous environments. We’ll look at what kinds of stateful applications can most easily be run in containers, which Kubernetes features and usage patterns are most helpful for running them, and a number of pitfalls I encountered along the way. Finally, we’ll ponder what’s missing and what the future may hold for stateful containers.

Question:

Tell me about the work that you do today.

Answer:

I work on the open source CockroachDB database, the product itself, performance, stability of the core system, and then making sure it runs really well in all environments that users and customers want it. There are a lot of people trying to run it on Kubernetes, in a single cluster or across multiple regions. I've had a lot of exposure to trying to make a stateful system work in these various orchestrated container environments.

Question:

What's the big challenge when you're talking about stateful systems with an orchestrator?

Answer:

Orchestrated systems don't provide all the same guarantees you'd expect when you're running something directly on your own VMs. A lot of the challenges are from not properly understanding how these systems work and what guarantees they’re providing that are a little different from what you might be used to in more traditional environments. You need to understand the environment and the system that you're running a little better than you'd have to run something directly out on bare metal.

Question:

Can you give me an example of that?

Answer:

Particularly early on every orchestration system for a container assumed that all your containers are fungible. There was no need to think of one container as being any different from another. And there was also no need to care about where the containers are placed. Every container can be statelessly moved from machine to machine, killed as well without any concern for what it was doing. These kinds of assumptions don't work for many stateful systems. They could be the only instance with a certain piece of data on them or may have an expensive bootup process where they have to reload a bunch of data into memory.

Question:

Who are you targeting with this talk?

Answer:

It's primarily for people who are architecting their applications, deciding where they want to run their systems and whether to put stateful workloads into a container. I’ll be sharing different problems that I've run into myself, helping others with running stateful applications in these systems so they can have a better sense of what it's really like, and cutting through the marketing hype as well as trying to give a better understanding of how to overcome some of the common problems encountered.

Question:

Can you illustrate one of the common problems that you might talk through that somebody will walk away with a pattern for?

Answer:

One of the biggest mistakes that people make when they try to move their stuff into stateful workloads into containers is not understanding what they can rely on from the provider. People can get themselves into situations where they actually just lose their data because they assume that the storage they're running on will always be there, and it works great when they set up the demo, when they follow the steps from a blog post and works well for a week or two, but then when an unexpected failure hits them they realize they didn't plan for recovery, which is something important inside any orchestrator. If you don't plan for failure and pick your storage medium properly, you're going to have a really bad time a week or two down the road when the first failure comes into play.

Question:

When you say 'pick your storage correctly', what does it mean in this context?

Answer:

When you take Kubernetes for example, you have a number of different options for where you want to store your data. You can choose to store it inside the container itself, on the host's desk, on a remote network attached storage, and there are multiple varieties if you're running in the cloud, and if you don't think about what you're doing you're probably defaulting to an incorrect choice that's going to leave you with lost data.

Question:

Are you saying 'Just use network storage and you're done', or you need to be intentional about what you're selecting?

Answer:

You need to first be aware of the most likely mistakes. The defaults are usually wrong, you have to make a conscious choice to avoid defaults. And beyond that for performance reasons and also for different types of failures. You have to make an intelligent choice between the various default options, local disk or some more advanced network solution.

Question:

Is this going to be specifically for Cockroach DB or is this applicable to any stateful system you want to deploy with containers?

Answer:

The same issues are applicable to any system that you're deploying in containers.

Speaker: Alex Robinson

Member of Technical Staff @CockroachDB, previously SWE @GCPcloud

Alex is a member of the technical staff at Cockroach Labs, the startup leading the development of the open source CockroachDB project, where he works on CockroachDB's core transactional storage layer and leads all integrations with orchestration systems. Previously, he was a senior software engineer at Google, where he spent his last two years as a core early developer of Kubernetes and GKE.

Find Alex Robinson at

Speaker page

Twitter

Software Engineer @OpenRoboticsOrg

Louise Poubel

CRDTs in Production

Software Engineer @PayPal

Dmitry Martyanov

FreshEBT

CTO at Propel Inc, building @FreshEBT

Ram Mehta

npm and the Future of JavaScript

Co-Founder & Chief Operating Officer @npmjs

Laurie Voss

Capacity Planning for Crypto Mania

Software Engineer @coinbase

Jordan Sitkin

Capacity Planning for Crypto Mania

Software Engineer @coinbase

Luke Demi

The Most Secure Program Is One That Doesn’t Exist

Research Engineer @mozilla

Diane Hosfelt

Dropping The Work-Life Balancing Act

Senior Software Engineer @stitchfix

Cameron Jacoby

How To: Developers' Community Driven Career Growth

Senior Developer & Team Lead @bookingcom

Georgiy Mogelashvili

Tracks

Monday, 5 November

Microservices / Serverless Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Practices of DevOps & Lean Thinking

Practical approaches using DevOps & Lean Thinking
JavaScript & Web Tech

Beyond JavaScript in the Browser. Exploring WebAssembly, Electron, & Modern Frameworks
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probabilistic programming
Modern Operating Systems

Applied, practical, & real-world deep-dive into industry adoption of OS, containers and virtualization, including Linux on Windows, LinuxKit, and Unikernels
Optimizing You: Human Skills for Individuals

Better teams start with a better self. Learn practical skills for IC

Tuesday, 6 November

Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, & more
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Emerging Trends in Data Engineering

Showcasing DataEng tech and highlighting the strengths of each in real-world applications.
Bare Knuckle Performance

Killing latency and getting the most out of your hardware
Socially Conscious Software

Building socially responsible software that protects users privacy & safety
Delivering on the Promise of Containers

Runtime containers, libraries, and services that power microservices

Wednesday, 7 November

Applied AI & Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, PyTorch, & more
Production Readiness: Building Resilient Systems

More than just building software, building deployable production ready software
Developer Experience: Level up your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand.
Security: Lessons Attacking & Defending

Security from the defender's AND the attacker's point of view
Future of Human Computer Interaction

IoT, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Enterprise Languages

Workhorse languages found in modern enterprises. Expect Java, .NET, & Node in this track

This Year's Schedule

The all-new QCon app!

Available on iOS and Android

The new QCon app helps you make the most of your conference experience. Easily browse and follow the conference schedule, star the talks you want to attend, and keep tabs on your personal itinerary. Download the app now for free on iOS and Android.

Track: Delivering on the Promise of Containers

Location: Bayview AB

Duration: 10:35am - 11:25am

Day of week: Tuesday

Level: Intermediate

Persona: DevOps Engineer

What You’ll Learn

Abstract

Speaker: Alex Robinson

Find Alex Robinson at

Similar Talks

Tracks

Monday, 5 November

Microservices / Serverless Patterns & Practices

Practices of DevOps & Lean Thinking

JavaScript & Web Tech

Modern CS in the Real World

Modern Operating Systems

Optimizing You: Human Skills for Individuals

Tuesday, 6 November

Architectures You've Always Wondered About

21st Century Languages

Emerging Trends in Data Engineering

Bare Knuckle Performance

Socially Conscious Software

Delivering on the Promise of Containers

Wednesday, 7 November

Applied AI & Machine Learning

Production Readiness: Building Resilient Systems

Developer Experience: Level up your Engineering Effectiveness

Security: Lessons Attacking & Defending

Future of Human Computer Interaction

Enterprise Languages

The all-new QCon app!

Available on iOS and Android

Presentation: The Highs and Lows of Stateful Containers

Track: Delivering on the Promise of Containers

Location: Bayview AB

Duration: 10:35am - 11:25am

Day of week: Tuesday

Level: Intermediate

Persona: DevOps Engineer

More talks on:

Share this on:

What You’ll Learn

Abstract

Speaker: Alex Robinson

Find Alex Robinson at

Similar Talks

Tracks

Monday, 5 November

Tuesday, 6 November

Wednesday, 7 November

The all-new QCon app!

Available on iOS and Android