Presentation: Caching Beyond RAM: The Case for NVMe

Track: Modern Operating Systems

Location: Bayview AB

Duration: 11:50am - 12:40pm

Day of week: Monday

Level: Advanced

Persona: Backend Developer, Developer, DevOps Engineer

Share this on:

What You’ll Learn

  1. Explore the possibility of using new storage devices to reduce DRAM dependency for cache workloads.
  2. Understand the state of art available today for distributed cache system.
  3. Hear about use cases that optimize for different cache workloads.


Caching architectures at every layer of the stack embody an implicit tradeoff between performance and cost. These tradeoffs however are constantly shifting: new inflection points can emerge alongside advances in storage technology, changes in workload patterns, or fluctuations in hardware supply and demand. 

In this talk, we will explore the design ramifications of the increasing cost of RAM on caching systems. While RAM has always been expensive, DRAM prices have risen by over 50% in 2017, and high densities of RAM involve multi-socket NUMA machines, bloating power and overall costs. Concurrently, alternative storage technologies such as Flash and Optane continue to improve. They have specialized hardware interfaces, consistent performance, high density, and relatively low costs. While there is increasing economic incentive to explore offloading caching from RAM onto NVMe or NVM devices, the implications for performance are still not widely understood.


What is the focus of your work today?


I evaluate hardware and software improvements for distributed cache systems.


What’s the motivation for this talk?


I am exploring the possibility of using new storage devices to reduce DRAM dependency for cache workloads.

Caching systems have historically been limited to just RAM. They're written to a lot, they're read from a lot, and so latency matters an awful lot (given what people are using them on).

Recently the cost of RAM has ballooned. Databases have also gotten a lot faster, making use of LSM trees, B+ trees, and novel Flash devices.

I was thinking hard about how cache systems can leapfrog or stay ahead of database systems again. That's why I sat down and did a thorough analysis of NVMe-based systems. Intel was gracious enough to help along with testing their Optane Persistent Memory as well.

Distributed caches are at some point individual pieces of hardware. We need to come back and re-evaluate what’s possible on the cache nodes. The focus of this talk is to discuss some of what I found.


How do you plan to discuss this with the audience?


The use cases are probably the most interesting way of talking about it. In my blog post on NVMe caches, I went through a couple of examples with diagrams. I'll do more of that with the actual slides—more concrete examples of what things can be used this way. There is a limit to what workloads can be used on NVMe devices still. It's mostly figuring out how people can quickly identify, 'I can actually benefit from this new system, how can I quickly put it to use?' or, 'there's a thing we never imagined we could do before but now it's possible.' Netflix has been taking advantage and rolling out petabytes of this stuff to launch new machine learning platforms.


What have you found through your testing that is a bad use case?


Any really small objects. In an average case, people may have 50 percent of cache memory used by objects that are fairly large and 50 percent that are fairly small. But in some use cases it's all very, very small—a couple hundred bytes or less. In those circumstances, device backed storage doesn't really help them that much.


Who is the target persona that you envision?


I'm looking for the tail latency folks. At this point, I'm looking for project leads and application designers. Every time somebody comes up with a new idea, a new project, or is reevaluating the cost of an old project, they have to look at what technologies are available. They have to ask, 'what can I physically do, what am I limited to via cost?'  I want those folks to be aware that these things are new options available to them to use while designing new applications or reevaluating old ones.

The talk is for Performance minded folk, or anyone who needs frequent access to a lot of data, for example ML facts.


The cost efficiencies we're talking about, are they only realized at incredibly large scale or are you seeing them realized at smaller scale?


I put out a blog post showing how somebody running a $20 virtual machine can save themselves $40 a month by avoiding renting more RAM. I'm scaling all the way down. Also, there was another company recently that rolled it out with around 10 hosts and they 10x'd the amount of cache they had for the same amount of money and halved their back-end load. And this is a fairly popular website, just not Facebook or Netflix scale.

I’d really like engineers at various different scales to walk away knowing it’s possible to exploit cache systems for entirely new problems, and that there are new opportunities to reduce the cost of cache (or increase cache and reduce backend cost).


What is the technology problem that keeps you up at night?


The industry slowdown keeps me up at night in two ways. RAM isn't getting much denser much faster. I think last year or the year before was the first time ever that performance per watt cost on CPUs didn't actually improve. That is terrifying because everybody's financial projections are expecting this to happen every 18 to 24 months.

On the other hand, it's great because I'm a performance person, so at some point, the cost overruns will hit the budget, and I'll have better job security. Kidding, mostly!

There are people just disregarding cache systems lately. They're trying to fix their issues by applying the same data structures and storage engines they use for databases, and I'm thinking, 'you have your database, why are you trying to architect it the same way as your cache?' You still want your cache to beat your database by 10x.

These are two of the things that keep me up at night.

Speaker: Alan Kasindorf

OSS Memcached Project Maintainer, previously Memcache / Mcrouter @Facebook & Dir of Edge Engineering @Fastly

Website scalability, distributed caching system, and performance addict. Enjoy contributing to and learning from OSS.

Find Alan Kasindorf at

Similar Talks

Security Researcher, Leader, Advisor @Netflix
Staff Security Engineer @Cruise Automation
Engineering Director @ShapeSecurity & JavaScript Expert
Tech Lead Fairness, Transparency, Explainability & Privacy Efforts @LinkedIn
Senior Researcher in the Quantitative Financial Research Group @Bloomberg
Senior Manager & Heading AI for Growth and Communication Relevance @LinkedIn


Monday, 5 November

Tuesday, 6 November

Wednesday, 7 November

The all-new QCon app!

Available on iOS and Android

The new QCon app helps you make the most of your conference experience. Easily browse and follow the conference schedule, star the talks you want to attend, and keep tabs on your personal itinerary. Download the app now for free on iOS and Android.