Presentation: How Netflix Directs 1/3rd of Internet Traffic
Duration
Key Takeaways
- Hear real-world implementation details around microservices at Netflix
- Learn how Netflix solves some of the challenges around streaming and deploying content
- Discover Netflix’ approach to rules engines and configuration based deployment
Abstract
Every second thousands of Netflix members hit the play button to stream content on more than 1000 different Netflix device types. These moments are facilitated by microservices that manage the playback experience and a purpose-built Content Delivery Network distributed over 100s of physical locations. In this talk, we’ll walk through the architecture that makes streaming happen, while highlighting interesting lessons and design patterns that can be widely applied.
What you'll hear about:
- Embracing microservices with loosely-coupled APIs to improve development velocity and ease operations.
- Applying rules engines and configuration-based deployments to create a customized viewing experience which accommodates each member’s tastes, device capabilities and location.
- Using data about viewing habits and network topology to distribute content over CDN appliances in a cost effective and timely manner.
- Using real-time network and capacity signals to pick the best CDN appliances a member should stream content from.
Interview with Mohit vora & Haley Tucker
QCon: What is your role today?
Mohit: I am responsible for what we internally refer to as the Control Plane of our Content Delivery Network. In classical networking speak, the Control Plane refers to the part of the architecture responsible for making routing decisions. So that is something that my team works on. We decide where each playback sessions streams from based on a bunch of things. Besides that, we also make predictions on content popularity and proactively cache that content in opportune locations throughout the network.
Haley: When a customer clicks play, I’m responsible for providing them with the best possible playback experience for their device and user preferences, while still enforcing contract and business policies. I own and operate the systems that I will talk about which means I’m doing everything from requirements and coding for new features to production on-call triage.
QCon: Mohit, what is Open Connect?
Mohit: Open Connect is Netflix’s purpose-built CDN. It’s a directed caching CDN. Together with the Control Plane it is comprised of 1000’s of cache nodes and networking equipment located in ISP and Internet Exchange data centers.
QCon: You mentioned Open Connect is a directed caching CDN. What do you mean by Directed Caching?
Mohit: When you can predict what data is going to be accessed more often vs others, you can proactively direct the data to appropriate tiers of your caching infrastructure in advance. This contrasts to moving data around based on actual cache hits / misses.
QCon: Some might think what you plan to discuss are really Netflix-only scale problems. Are these things others could benefit from and use in their architectures?
Haley:Yes, I think so. We are focusing on lessons that can be applied regardless of scale, and I have definitely worked on systems in the past that could have benefited from similar changes.
QCon: What about Directed Caching? That’s a pretty specialized use case.
Mohit: With the popularity metric of our catalog, we are able to pre-position content close to our subscribers using Direct Caching, and hence reduce network costs for Internet Service Providers. We are able to achieve no cache churn in the serving state (i.e. no reactive caching). Most high-scale distributed architectures that utilize caching enjoy a high amount of predictable access patterns (think something like cache pre-warming). Directed caching is applicable to them.
QCon: Haley, how does Netflix “manage the playback experience”?
Haley: We set up the context for the player which allows it to stream. This context contains the video, audio, and text assets that are displayed to the customer in the format that the device needs. It also contains recommendations for which audio and subtitles to start playback with based on our previous signals. In addition to setting up the context, these microservices are engaged for each phase of the lifecycle of a session (license, session events, bookmarks, etc.).
QCon: What types of things will you discuss in your talk that relate to managing the playback experience?
Haley: We use rules engines for customized viewing experiences and leverage configuration deployments.
For example, rules are used for things like filtering (remove streams or bad encodes that have been identified to cause problems for a particular device) and to enable/disable features at a fine-grained level for devices or different UI’s that may or may not support them.
Our rules are all deployed via a pub-sub mechanism to all services running our code. There are times, for example, when we receive a call that a device is having issues with a particular language track. We can put a filter in place and after about 5 minutes that filters out that language track for just that subset of devices having problems.
Additionally, one of the changes we recently made to our architecture is to allow clients to control their own protocols. So, instead of a one-sized-fits all API, client protocols live in groovy scripts which are deployed in a manner similar to configuration files. This allows TVs to integrate with us in a manner different from iPhones and they can emit data back to the client in the format that they want it.
QCon: Can you give me an idea of some of the principles and patterns we will hear about in your talk?
Haley: We are focused on lessons learned for this talk, so we would like people to come away with an understanding of the problems we faced and how we addressed them.
Mohit: Some of the things you will hear include:
- Trend toward microservices, configuration, and improved flexibility
- Challenges around consistent hashing
- Employing a control loop feedback system for load balancing
- Our simplistic model for predicting content popularity over time
Similar Talks
Similar Talks
Tracks
Covering innovative topics
Monday Nov 16
-
Architectures You've Always Wondered About
Silicon Valley to Beijing: Exploring some of the world's most intrigiuing architectures
-
Applied Machine Learning
How to start using machine learning and data science in your environment today. Latest and greatest best practices.
-
Browser as a platform (Realizing HTML5)
Exciting new standards like Service Workers, Push Notifications, and WebRTC are making the browser a formidable platform.
-
Modern Languages in Practice
The rise of 21st century languages: Go, Rust, Swift
-
Org Hacking
Our most innovative companies reimagining the org structure
-
Design Thinking
Level up your approach to problem solving and leave everything better than you found it.
Tuesday Nov 17
-
Containers in Practice
Build resilient, reactive systems one service at a time.
-
Architecting for Failure
Your system will fail. Take control before it takes you with it.
-
Modern CS in the Real World
Real-world Industry adoption of modern CS ideas
-
The Amazing Potential of .NET Open Source
From language design in the open to Rx.NET, there is amazing potential in an Open Source .NET
-
Optimizing You
Keeping life in balance is always a challenge. Learning lifehacks
-
Unlearning Performance Myths
Lessons on the reality of performance, scale, and security
Wednesday Nov 18
-
Streaming Data @ Scale
Real-time insights at Cloud Scale & the technologies that make them happen!
-
Taking Java to the Next Level
Modern, lean Java. Focuses on topics that push Java beyond how you currently think about it.
-
The Dark Side of Security
Lessons from your enemies
-
Taming Distributed Architecture
Reactive architectures, CAP, CRDTs, consensus systems in practice
-
JavaScript Everywhere!
Javascript is Everywhere. Learn why
-
Culture Reimagined
Lessons on building highly effective organizations