You are viewing content from a past/completed QCon -

Presentation: Caching Beyond RAM: The Case for NVMe

Track: Modern Operating Systems

Location: Bayview AB

Duration: 11:50am - 12:40pm

Day of week:

Slides: Download Slides

Level: Advanced

Persona: Backend Developer, Developer, DevOps Engineer

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Explore the possibility of using new storage devices to reduce DRAM dependency for cache workloads.
  2. Understand the state of art available today for distributed cache system.
  3. Hear about use cases that optimize for different cache workloads.


Caching architectures at every layer of the stack embody an implicit tradeoff between performance and cost. These tradeoffs however are constantly shifting: new inflection points can emerge alongside advances in storage technology, changes in workload patterns, or fluctuations in hardware supply and demand. 

In this talk, we will explore the design ramifications of the increasing cost of RAM on caching systems. While RAM has always been expensive, DRAM prices have risen by over 50% in 2017, and high densities of RAM involve multi-socket NUMA machines, bloating power and overall costs. Concurrently, alternative storage technologies such as Flash and Optane continue to improve. They have specialized hardware interfaces, consistent performance, high density, and relatively low costs. While there is increasing economic incentive to explore offloading caching from RAM onto NVMe or NVM devices, the implications for performance are still not widely understood.


What is the focus of your work today?


I evaluate hardware and software improvements for distributed cache systems.


What’s the motivation for this talk?


I am exploring the possibility of using new storage devices to reduce DRAM dependency for cache workloads.

Caching systems have historically been limited to just RAM. They're written to a lot, they're read from a lot, and so latency matters an awful lot (given what people are using them on).

Recently the cost of RAM has ballooned. Databases have also gotten a lot faster, making use of LSM trees, B+ trees, and novel Flash devices.

I was thinking hard about how cache systems can leapfrog or stay ahead of database systems again. That's why I sat down and did a thorough analysis of NVMe-based systems. Intel was gracious enough to help along with testing their Optane Persistent Memory as well.

Distributed caches are at some point individual pieces of hardware. We need to come back and re-evaluate what’s possible on the cache nodes. The focus of this talk is to discuss some of what I found.


How do you plan to discuss this with the audience?


The use cases are probably the most interesting way of talking about it. In my blog post on NVMe caches, I went through a couple of examples with diagrams. I'll do more of that with the actual slides—more concrete examples of what things can be used this way. There is a limit to what workloads can be used on NVMe devices still. It's mostly figuring out how people can quickly identify, 'I can actually benefit from this new system, how can I quickly put it to use?' or, 'there's a thing we never imagined we could do before but now it's possible.' Netflix has been taking advantage and rolling out petabytes of this stuff to launch new machine learning platforms.


What have you found through your testing that is a bad use case?


Any really small objects. In an average case, people may have 50 percent of cache memory used by objects that are fairly large and 50 percent that are fairly small. But in some use cases it's all very, very small—a couple hundred bytes or less. In those circumstances, device backed storage doesn't really help them that much.


Who is the target persona that you envision?


I'm looking for the tail latency folks. At this point, I'm looking for project leads and application designers. Every time somebody comes up with a new idea, a new project, or is reevaluating the cost of an old project, they have to look at what technologies are available. They have to ask, 'what can I physically do, what am I limited to via cost?'  I want those folks to be aware that these things are new options available to them to use while designing new applications or reevaluating old ones.

The talk is for Performance minded folk, or anyone who needs frequent access to a lot of data, for example ML facts.


The cost efficiencies we're talking about, are they only realized at incredibly large scale or are you seeing them realized at smaller scale?


I put out a blog post showing how somebody running a $20 virtual machine can save themselves $40 a month by avoiding renting more RAM. I'm scaling all the way down. Also, there was another company recently that rolled it out with around 10 hosts and they 10x'd the amount of cache they had for the same amount of money and halved their back-end load. And this is a fairly popular website, just not Facebook or Netflix scale.

I’d really like engineers at various different scales to walk away knowing it’s possible to exploit cache systems for entirely new problems, and that there are new opportunities to reduce the cost of cache (or increase cache and reduce backend cost).


What is the technology problem that keeps you up at night?


The industry slowdown keeps me up at night in two ways. RAM isn't getting much denser much faster. I think last year or the year before was the first time ever that performance per watt cost on CPUs didn't actually improve. That is terrifying because everybody's financial projections are expecting this to happen every 18 to 24 months.

On the other hand, it's great because I'm a performance person, so at some point, the cost overruns will hit the budget, and I'll have better job security. Kidding, mostly!

There are people just disregarding cache systems lately. They're trying to fix their issues by applying the same data structures and storage engines they use for databases, and I'm thinking, 'you have your database, why are you trying to architect it the same way as your cache?' You still want your cache to beat your database by 10x.

These are two of the things that keep me up at night.

Speaker: Alan Kasindorf

OSS Memcached Project Maintainer, previously Memcache / Mcrouter @Facebook & Dir of Edge Engineering @Fastly

Website scalability, distributed caching system, and performance addict. Enjoy contributing to and learning from OSS.

Find Alan Kasindorf at

Last Year's Tracks

  • Monday, 16 November

  • Distributed Systems for Developers

    Computer science in practice. An applied track that fuses together the human side of computer science with the technical choices that are made along the way

  • The Future of APIs

    Web-based API continue to evolve. The track provides the what, how, and why of future APIs, including GraphQL, Backend for Frontend, gRPC, & ReST

  • Resurgence of Functional Programming

    What was once a paradigm shift in how we thought of programming languages is now main stream in nearly all modern languages. Hear how software shops are infusing concepts like pure functions and immutablity into their architectures and design choices.

  • Social Responsibility: Implications of Building Modern Software

    Software has an ever increasing impact on individuals and society. Understanding these implications helps build software that works for all users

  • Non-Technical Skills for Technical Folks

    To be an effective engineer, requires more than great coding skills. Learn the subtle arts of the tech lead, including empathy, communication, and organization.

  • Clientside: From WASM to Browser Applications

    Dive into some of the technologies that can be leveraged to ultimately deliver a more impactful interaction between the user and client.

  • Tuesday, 17 November

  • Languages of Infra

    More than just Infrastructure as a Service, today we have libraries, languages, and platforms that help us define our infra. Languages of Infra explore languages and libraries being used today to build modern cloud native architectures.

  • Mechanical Sympathy: The Software/Hardware Divide

    Understanding the Hardware Makes You a Better Developer

  • Paths to Production: Deployment Pipelines as a Competitive Advantage

    Deployment pipelines allow us to push to production at ever increasing volume. Paths to production looks at how some of software's most well known shops continuous deliver code.

  • Java, The Platform

    Mobile, Micro, Modular: The platform continues to evolve and change. Discover how the platform continues to drive us forward.

  • Security for Engineers

    How to build secure, yet usable, systems from the engineer's perspective.

  • Modern Data Engineering

    The innovations necessary to build towards a fully automated decentralized data warehouse.

  • Wednesday, 18 November

  • Machine Learning for the Software Engineer

    AI and machine learning are more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice by Software Engineers.

  • Inclusion & Diversity in Tech

    The road map to an inclusive and diverse tech organization. *Diversity & Inclusion defined as the inclusion of all individuals in an within tech, regardless of gender, religion, ethnicity, race, age, sexual orientation, and physical or mental fitness.

  • Architectures You've Always Wondered About

    How do they do it? In QCon's marquee Architectures track, we learn what it takes to operate at large scale from well-known names in our industry. You will take away hard-earned architectural lessons on scalability, reliability, throughput, and performance.

  • Architecting for Confidence: Building Resilient Systems

    Your system will fail. Build systems with the confidence to know when they do and you won’t.

  • Remotely Productive: Remote Teams & Software

    More and more companies are moving to remote work. How do you build, work on, and lead teams remotely?

  • Operating Microservices

    Building and operating distributed systems is hard, and microservices are no different. Learn strategies for not just building a service but operating them at scale.