Track:

Duration

Duration:

10:35am - 11:25am

Persona:

Developer

Key Takeaways

Gain insight into the first principles used by Twitter in building Pelikan
Understand design decisions and architectural patterns used in the implementation of Twitter’s replacement for Redis and Memecachd.
Learn about techniques used to improve latency in high throughput systems like Twitter.

Abstract

In-memory caching in a distributed setting is ubiquitous, of which users usually expect both high throughput and low latencies. Meanwhile, to actually achieve them requires understanding the details and subtleties in low-level system behavior, including memory management, threading model, locking, network I/O, etc.

Pelikan is a framework to implement distributed caches such as Memcached and Redis. This talk discusses the system aspects that are important to the performance, especially tail latencies, of such services. It also covers the decisions we made in Pelikan, and how they help us reason about performance even before we run benchmarks.

Interview

Question:

QCon: What is Pelikan?

Answer:

Yao: Pelikan is the in memory caching framework that I have been working on to replace the caching use cases at Twitter.

People know that Twitter is, historically, a very large Memcached shop. It is also a very large Redis shop, and we have done our own forks based on those both Memcached and Redis. As we accumulate more experience with these existing caching services, we have a good idea of how to do it well. We realized that instead of having two solutions working toward the same purpose, we can unify them in a single architecture that is optimized for the use cases Twitter needs.

In short, Pelikan is the replacement for Memcached and Redis and everything they are responsible for at Twitter.

Question:

QCon: So is it kind of a ground up replacement for cache tools like Memcached and Redis?

Answer:

Yao: I would describe it as two fold. From an architectural (or design) point of view, it is a clean slate design. We come at it from first principles asking questions like ‘What is the problem we are trying to solve using caching in memory storage?’ and ‘What are the requirements?’ If we pose this problem as a new question today, how are we going to design an architecture that works for the scale and works for the datacenter that we have or that is available on the market? So from a design point of view, it is a clean slate.

However, new code is usually bad code (or if not bad, it’s ugly code). There is significant risk in using new coding in a critical piece of the architecture. In terms of implementation, we are not so strict. We actually prefer using existing code if it fits into the design well. So let’s say you have an existing networking library that works really well, there is no need to write your own library to handle connections. You can just copy the code that handles connections and drop that into your design. You can do that, because it serves exactly the same purpose. With implementation, we are much more practical and utilitarian in examining solutions that we have, including Memcached and Twemcached (which is our fork Redis). We will take a look at whoever has a good library out there that does something similar. gRPC is another one. There are actually a lot of code that does more or less the same thing, so we would just look at all the codebases we can find. If there is something that we can use, we will use it, and we will give credit to open source project.

In reality, what happened is we started by copying code and writing new code about one to one. We were re-using about 50% of our code from existing open source projects and I lost track of the ratio over time because then we started re-factoring and polishing the code. But we started with 50% re-use. The new code ratio has gone up slightly since then, because, as we add more features and as we re-write, we tend to bring more code of our own into the project.

Question:

QCon: How would you describe target audience core persona?

Answer:

Yao: I have two groups of people in mind. The first being architects and designers who are designing or re-designing systems from the ground up. Considering these aspects will help them avoid some pitfalls that they will regret later. The second group being SRE’s (or people who actually need to operate these systems). This group organically showed up after I gave other similar talks. They are the ones getting the question of "why is my PIII high?" If the design is flawed (or if you run in a very unstable environment and share resources with other processes), a lot of things are beyond your control. These are the people who will benefit if the service, especially the quality of service, can be guaranteed more strongly or can be reasoned about more easily.

Question:

QCon: What’s the goal of your talk?

Answer:

Yao: Since this talk is in the performance track, the goal of my talk is to focus on the performance aspect of Pelikan. I will focus on latency because there are services that deliver very high throughput but, if you look at their latency (especially tail latencies), there tend to be outliers. I believe Redis sometimes has this problem. Pretty much all questions, discussions, troubleshooting sessions are about why is something slow unexpectedly? So that’s really what I want to talk about.

Speaker: Yao Yue

Distributed Systems Engineer Working on Cache @Twitter

Distributed Systems Engineer Working on Cache at Twitter

Find Yao Yue at

Speaker page

@thinkingfish

CTO @AzulSystems

Gil Tene

How We Learned to Stop Worrying and Love Fan-In

Tech Lead for Timelines Infrastructure Team / Sr Staff Software Engineer @Twitter

Mike Cvet

Scaling Reliability: So You Want to Add a 9

Core Systems Libraries Software Engineer @Twitter

Moses Nakamura

Building Twitter’s Next-Gen Alerting System

Observability Software Engineer @Twitter

Megan Kanne

Managing Big Storage Clusters

Tech Lead of Manhattan Team @Twitter

Boaz Avital

Hardware & Provisioning Engineering @Twitter

Provisioning Engineering SE @Twitter

Nik Johnson

Hardware & Provisioning Engineering @Twitter

Staff Hardware Engineer @Twitter

Matt Singer

Stranger Things: The Forces that Disrupt Netflix

Senior Software Engineer, Playback Features @Netflix

Haley Tucker

99.99% Availability via Smart Real-Time Alerting

Data Science Manager @Uber

Franziska Bell

Tracks

Monday Nov 7

Architectures You've Always Wondered About

You know the names. Now learn lessons from their architectures
Distributed Systems War Stories

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
Containers Everywhere

State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations

Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques

JavaScript, HTML5, WASM, and more... innovations targetting the browser
Optimize You

Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.

Tuesday Nov 8

Next Generation Microservices

What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?

Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud

Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Evolving DevOps

Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills

Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas

Wednesday Nov 9

Architecting for Failure

Your system will fail. Take control before it takes you with it.
Stream Processing

Stream Processing, Near-Real Time Processing
Bare Metal Performance

Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator

The why and how for building successful engineering cultures
//TODO: Security <-- fix this

Building security from the start. Stories, lessons, and innovations advancing the field of software security.
UX Reimagined

Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.

SCHEDULE

Duration

Persona:

Key Takeaways

Abstract

Interview

Find Yao Yue at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: In-memory caching: curb tail latency with Pelikan

Duration

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Yao Yue at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World