Ubiquitous Caching: A Journey of Building Efficient Distributed and In-Process Caches at Twitter

Abstract

Modern web applications widely deploy cache across the stack to speed up data access and improve throughput.

In this talk, I will discuss three trends in hardware, workload, and cache usage that shape the design of modern caches.
The slow down of memory capacity scaling, the exploding data creation rate, and the wide adoption of Time-to-live (TTL) motivated the design of Segcache (https://paper.segcache.com and https://open-source.segcache.com).

Segcache is a new cache storage design that provides high memory efficiency and high throughput via metadata sharing to save space wasted on metadata, approximate indexing to efficiently remove short-lived data, and macro management for high throughput and linear scalability.
The design of Segcache is driven by the insight that key-value cache is different from key-value store + eviction, and is the outcome of exploring different trade-offs between caching and storage.

We deploy Segcache as standalone distributed caches and embedded off-heap caches for JVM-based microservices at Twitter. The distributed cache deployment significantly reduces the resources (CPU/DRAM) provisioned for cache clusters; the embedded deployment allows us to speed up data access and reduce GC pause time.

Speaker

Juncheng Yang

Ph.D. student @CarnegieMellon, Focus on Efficiency and Performance, Previously @Twitter & @Cloudflare, Facebook Fellow

As a 5th-year Ph.D. student at Carnegie Mellon University and part of Parallel Data Lab, Juncheng's research studies and improves the efficiency, performance, and reliability of large-scale web applications with a current focus on caching systems.

His works in collaboration with Twitter, Meta, Google, and Akamai have been published at various academic conferences such as OSDI, NSDI, SOSP, and SOCC and have won multiple best paper awards.

One of the designs, Segcache, a high-throughput, space-efficient in-memory cache, has been in production at Twitter.

Besides being a Ph.D. student, he worked at Twitter and Cloudflare, improving the cache infrastructure's efficiency. Moreover, his research is sponsored by Meta, and he is a Facebook Fellow.

Juncheng Yang

Ph.D. student @CarnegieMellon, Focus on Efficiency and Performance, Previously @Twitter & @Cloudflare, Facebook Fellow

From the same track

Session Backends

Backends in Dart

Monday Oct 24 / 10:35AM PDT

Dart's popularity has surged in the past few years, as it's the language behind Flutter - Google's cross platform front end framework. That's now driving a notion of 'Full Stack Dart', where if you've spent time learning Dart for the front end, why not also use it for the back end.

Chris Swan

Engineer @atsigncompan, Previously Fellow @DXCTechnology, CTO & Director of R&D roles @CohesiveNetworks @UBS @Capital SCF and @Credit Suisse

Session Backends

24/7 State Replication

Monday Oct 24 / 11:50AM PDT

Systems that operate non-stop, 24/7 are standard in many consumer-facing industries. Often, but definitely not always, these systems do not have aggressive SLAs nor high availability needs to the degree that some financial systems demand. But that is changing.

Todd Montgomery

Ex Researcher @Nasa, Engineering Fellow @ Adaptive Financial Consulting and a High Performance Distributed Systems Whisperer

Session Backends

Leveraging Determinism

Monday Oct 24 / 05:25PM PDT

Determinism is a very powerful concept when paired with fast business logic. We discuss both intuitive and not-so-obvious architecture choices that can be made to dramatically scale and simplify systems with these properties.

Frank Yu

Director of Engineering @Coinbase, Previously Principal Engineer and Director @FairX

Session

Panel: Building Modern Backends

Monday Oct 24 / 02:55PM PDT

Join today’s speakers in an open panel conversation about modern backends and the languages used to build them. Our panelists today come from backgrounds focused on high performance, legacy modernization, and low latency. The speakers span a variety of software languages and industries.

Chris Swan

Engineer @atsigncompan, Previously Fellow @DXCTechnology, CTO & Director of R&D roles @CohesiveNetworks @UBS @Capital SCF and @Credit Suisse

Frank Yu

Director of Engineering @Coinbase, Previously Principal Engineer and Director @FairX

Juncheng Yang

Ph.D. student @CarnegieMellon, Focus on Efficiency and Performance, Previously @Twitter & @Cloudflare, Facebook Fellow

Session Microservices

Data Mesh: Are We There Yet?

Monday Oct 24 / 04:10PM PDT

Standing at an inflection point is a magical experience. It’s where we look at what has come before, learn from it, and choose a new path. Data Mesh has motivated many organizations to stand at an inflection point of their approach to data.

Zhamak Dehghani

CEO and Founder @Stealth Startup, Data Mesh Founder, Author, Speaker

Ubiquitous Caching: A Journey of Building Efficient Distributed and In-Process Caches at Twitter

Abstract

Speaker

Juncheng Yang

Speaker

Juncheng Yang

Date

Location

Track

Share

From the same track

Backends in Dart

24/7 State Replication

Leveraging Determinism

Panel: Building Modern Backends

Data Mesh: Are We There Yet?

Follow QCon

Contact

Menu

Conferences around the World