Accelerating Netflix Data: A Cross-Team Journey from Offline to Online

Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.

The presentation titled Accelerating Netflix Data: A Cross-Team Journey from Offline to Online discusses the significant transformation of Netflix's data infrastructure from offline to online systems. The presentation highlights the collaborative effort and innovative strategies used to enhance data deployment efficiency.

  • Introduction

    The presentation begins with an introduction about the need for efficient data deployment in Netflix's backend, to meet consumer demands and acceptable latency requirements. The shift from offline pipelines to a data abstraction layer was essential to improve system performance and manage data better.

  • Key Concepts
    • Cross-functional collaboration was crucial in defining requirements, overcoming challenges, and implementing solutions seamlessly.
    • The transformation led to a 99% reduction in data deployment time, along with a 70% cost saving.
    • The project necessitated rethinking architecture and core principles, aligning diverse stakeholders, and adapting prototypes rapidly.
  • Innovative Solutions
    • An architectural pivot involved transforming large datasets from offline to online serving systems using a framework called "capture, conversion, deployment."
    • The solution optimized data formats for deployment and integrated mechanisms for safety, observability, and validation.
    • Implemented a key-value abstraction to provide a stable interface for accessing databases, which minimized the need for direct application-level interventions.
  • Challenges and Strategies
    • Identifying and exploiting access patterns and aligning stakeholder expectations was key to the project's success.
    • Overcoming performance bottlenecks through strategic handling of massive data sets and leveraging cloud infrastructure effectively.
  • Outcomes and Future Directions
    • The transition enabled faster data set provisioning, improved ML platform development, and facilitated robust business innovations.
    • Future aims include extending methodologies to mutable data sets and optimizing deployment processes for various data stores.
  • Conclusion

    The presentation concludes by emphasizing the importance of confidence and safety in technological transitions, and highlights the role of abstraction in achieving seamless, large-scale migrations.

This is the end of the AI-generated content.


Abstract

At Netflix, certain use cases demand the rapid transfer of massive datasets—such as 50 TB—from offline to online systems. Doing this efficiently, without disrupting applications interacting with our online systems, presents a significant challenge. Traditional data transfer methods, such as using batch processing systems and loading data into online systems via PUT APIs, posed significant scalability and cost hurdles, often leading to performance bottlenecks and impacting system efficiency. To overcome these limitations, an innovative architectural solution was developed. This approach involved transforming offline data into an optimized format through pre-processing, staging this data, transforming offline data into RocksDB SST file format, staging these files in the cloud, and enabling direct, on-demand ingestion into the serving system.

This process necessitated navigating complex internal discussions and aligning diverse stakeholders on new technical strategies. It also required rapidly adapting initial prototypes to address urgent customer needs by initially prioritizing speed to onboard them to the prototype, before shifting efforts towards building a robust, scalable production system. Crucially, cross-functional collaboration proved essential. Teams from various domains worked closely to define requirements, overcome challenges, and ensure seamless implementation.

Ultimately, this collaborative effort led to the successful deployment of a system that provides enhanced performance, reducing data deployment time by 99% (from days to just 30 minutes) and cutting costs by 70%. This presentation will delve into the journey of transforming data pipelines at scale, highlighting the key technical strategies, strategic decisions, and crucial team efforts that made this significant improvement possible.

Key Takeaways:

  • Discover the challenges and solutions for large-scale data movement from batch storage to online serving systems.
  • Understand the innovative architectural approach to improve data deployment efficiency.
  • Strategic decision-making and problem-solving in a high-pressure environment.
  • The importance of cross-functional team collaboration in solving complex engineering problems.

Speaker

Rajasekhar Ummadisetty

Software Engineer @Netflix - Driving Scalable Data Abstractions, Leader in Distributed Systems and Data Management, Previously @Amazon and @Facebook

Raj Ummadisetty is a leading professional with over a decade of experience in solving distributed systems problems at scale. He currently leads the development of data abstractions at Netflix, focusing on scalable, high-performance solutions. Previously, Raj contributed significantly at Amazon and Facebook, where he honed his expertise in building systems at scale. He holds advanced degrees from Carnegie Mellon University and IIT Roorkee, providing a solid academic foundation. Known for his passion for continuous learning and staying abreast of industry trends, Raj consistently drives innovation and efficiency, making him a key player in distributed systems and data management.

Read more
Find Rajasekhar Ummadisetty at:

Speaker

Ken Kurzweil

Software Engineer @Netflix - Leading a Data Movement Team Focused on Data Infrastructure Innovation, Previously @Amazon, @Shutterfly, and @Gannett Media

Ken Kurzweil is a seasoned technology leader with 25 years of experience building high-performance, large-scale distributed systems. He currently leads a Data Movement team at Netflix, focusing on data infrastructure innovation that ensures operational reliability and scalability. Prior to Netflix, Ken held engineering roles at Amazon, Shutterfly, Gannett Media, and contributed to pioneering startups like Lulu.com. Ken is recognized for his relentless focus on operational excellence and his mentorship of engineering teams, consistently pushing the boundaries of distributed data systems.

Read more
Find Ken Kurzweil at:

From the same track

Session AI/ML

Modernizing Relevance at Scale: LinkedIn’s Migration Journey to Serve Billions of Users

Tuesday Nov 18 / 11:45AM PST

How do you deliver relevant and personalized recommendations to nearly a billion professionals—instantly, reliably, and at scale? At LinkedIn, the answer has been a multi-year journey of architectural reinvention.

Speaker image - Nishant Lakshmikanth

Nishant Lakshmikanth

Engineering Manager @LinkedIn, Leading Infrastructure for "People You May Know" and "People Follows", Previously @AWS and @Cisco

Session Architecture

Monolith Down: Cleaning Up After the Great Identity Migration Disaster

Tuesday Nov 18 / 10:35AM PST

One does not simply migrate a monolith. Imagine a team working on a monolith-to-microservices migration of a healthcare portal. A foundational first step - migrating to a commercial identity provider - takes 9 months, only to bring the entire portal crashing down on release day.

Speaker image - Sonya Natanzon

Sonya Natanzon

VP of Engineering @Heartflow, Decomplexifier, Software Architect, Healthcare and Life Sciences Specialist, and International speaker

Session

Unconference: Navigating Major Architecture Migrations

Tuesday Nov 18 / 01:35PM PST

Session Databases

Stripe’s Docdb: How Zero-Downtime Data Movement Powers Trillion-Dollar Payment Processing

Tuesday Nov 18 / 02:45PM PST

Stripe processes over $1 trillion in payments annually with industry-leading reliability, powered by its custom-built document database, DocDB, built on top of open source MongoDB. Stripe's DocDB serves over five million queries per second from Stripe’s product applications.

Speaker image - Jimmy Morzaria

Jimmy Morzaria

Staff Software Engineer @Stripe, Previously Software Engineer on Amazon QLDB and Amazon Managed Streaming for Kafka

Session Migration

Migrating Uber Eats Feeds to Webview

Tuesday Nov 18 / 03:55PM PST

Uber Eats has many surfaces developed using native-first design. Historically these were built on the Android and iOS stacks. To accelerate development and enable rapid iteration and experimentation, while preserving the native-first design, a webview-powered stack was developed.

Speaker image - Nick DiStefano

Nick DiStefano

Sr Staff Engineer @Uber, Previously iOS Lead @Tumblr