Presentation: Petabytes Scale Analytics Infrastructure @Netflix

Duration

Duration: 
5:25pm - 6:15pm

Level:

Persona:

Abstract

Netflix runs one of the largest big data analytics infrastructure in the public cloud. Our platform leverages the scalability, reliability, and flexibility of the cloud to move quickly and innovate.

In this talk, we will discuss the overall big data platform architecture and dive into the two key design choices that underpin our platform: Storage and Orchestration. We will discuss how we leverage S3 as our data warehouse storage layer. We rely on Parquet as our primary storage format and will cover the advantages of using Parquet on S3 along with many of the features and optimizations provided by this advanced file format,

We will also discuss our open source federated job management and orchestration layer, Genie... Every day Netflix runs tens of thousands of jobs across the numerous heterogeneous (Hadoop, Presto, Spark, etc.) clusters. From Spark and Pig ETL SLA jobs to ad-hoc interactive queries on Presto to data movement with Sqoop or indexing with Druid, Genie is used to orchestrate this diverse set of use cases across multiple clusters in our environment. Genie also helps us manage clusters and job lifecycles in the cloud.

Finally, we will cover where we plan on taking Genie including scaling job resources via Docker, and more.

Speaker: Tom Gianos

Senior Software Engineer, Big Data Platform @Netflix

Tom began his career working on many projects ranging from web applications to big data genetics applications. His interest in big data led him to take a position at PayPal within their data technology organization. There he helped lead the development of their big data event transformation, storage and extraction platform. He has worked at Netflix for two years on the big data platform team. He leads development of Genie and has a passion for merging web and big data technologies to solve interesting distributed systems problems.

Find Tom Gianos at

Speaker: Dan Weeks

Leads Big Data Compute @Netflix

Daniel Weeks manages the Big Data Compute team at Netflix and is responsible for integrating and enhancing open source big data processing technologies including Spark, Presto, Hive and Hadoop. As an active member of the Apache community and Parquet PMC member, he works to improve the state of processing and storage technologies. Prior to joining Netflix, Daniel focused on research in big data solutions and distributed systems.

Find Dan Weeks at

Similar Talks

Senior Software Engineer, Playback Features @Netflix
Data Scientist, Author of "Faceted Search"
Partner and Data Scientist @Datascope
Senior Software Engineer @Apple
Software Development Engineer @AmazonWebServices
Head of the Java Platform Development Team & VP @Oracle
Software Engineer @Dropbox

.

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers