Track:

Duration

Duration:

5:25pm - 6:15pm

Level:

Intermediate

Persona:

Data Scientist

Abstract

Netflix runs one of the largest big data analytics infrastructure in the public cloud. Our platform leverages the scalability, reliability, and flexibility of the cloud to move quickly and innovate.

In this talk, we will discuss the overall big data platform architecture and dive into the two key design choices that underpin our platform: Storage and Orchestration. We will discuss how we leverage S3 as our data warehouse storage layer. We rely on Parquet as our primary storage format and will cover the advantages of using Parquet on S3 along with many of the features and optimizations provided by this advanced file format,

We will also discuss our open source federated job management and orchestration layer, Genie... Every day Netflix runs tens of thousands of jobs across the numerous heterogeneous (Hadoop, Presto, Spark, etc.) clusters. From Spark and Pig ETL SLA jobs to ad-hoc interactive queries on Presto to data movement with Sqoop or indexing with Druid, Genie is used to orchestrate this diverse set of use cases across multiple clusters in our environment. Genie also helps us manage clusters and job lifecycles in the cloud.

Finally, we will cover where we plan on taking Genie including scaling job resources via Docker, and more.

Speaker: Tom Gianos

Senior Software Engineer, Big Data Platform @Netflix

Tom began his career working on many projects ranging from web applications to big data genetics applications. His interest in big data led him to take a position at PayPal within their data technology organization. There he helped lead the development of their big data event transformation, storage and extraction platform. He has worked at Netflix for two years on the big data platform team. He leads development of Genie and has a passion for merging web and big data technologies to solve interesting distributed systems problems.

Find Tom Gianos at

Speaker page

Speaker: Dan Weeks

Leads Big Data Compute @Netflix

Daniel Weeks manages the Big Data Compute team at Netflix and is responsible for integrating and enhancing open source big data processing technologies including Spark, Presto, Hive and Hadoop. As an active member of the Apache community and Parquet PMC member, he works to improve the state of processing and storage technologies. Prior to joining Netflix, Daniel focused on research in big data solutions and distributed systems.

Find Dan Weeks at

Speaker page

Senior Software Engineer, Playback Features @Netflix

Haley Tucker

Scaling Quality On Quora Using Machine Learning

Engineering Manager @Quora

Nikhil Garg

Query Understanding: a Manifesto

Data Scientist, Author of "Faceted Search"

Daniel Tunkelang

Iterative Design for Data Science Projects

Partner and Data Scientist @Datascope

Bo Peng

Using Mesos at Scale @ Apple

Senior Software Engineer @Apple

Elizabeth Lingg

Using Mesos at Scale @ Apple

Software Engineer @Apple

James Mulcahy

Amazon ECS:a Platform to Run Production Containers

Software Development Engineer @AmazonWebServices

Uttara Sridhar

Java SE 9: Continuing to Thrive in the Cloud!

Head of the Java Platform Development Team & VP @Oracle

Bernard Traversat

Scaling Dropbox

Software Engineer @Dropbox

Preslav Le

Tracks

Monday Nov 7

Architectures You've Always Wondered About

You know the names. Now learn lessons from their architectures
Distributed Systems War Stories

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
Containers Everywhere

State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations

Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques

JavaScript, HTML5, WASM, and more... innovations targetting the browser
Optimize You

Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.

Tuesday Nov 8

Next Generation Microservices

What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?

Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud

Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Evolving DevOps

Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills

Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas

Wednesday Nov 9

Architecting for Failure

Your system will fail. Take control before it takes you with it.
Stream Processing

Stream Processing, Near-Real Time Processing
Bare Metal Performance

Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator

The why and how for building successful engineering cultures
//TODO: Security <-- fix this

Building security from the start. Stories, lessons, and innovations advancing the field of software security.
UX Reimagined

Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.

SCHEDULE

Duration

Level:

Persona:

Abstract

Find Tom Gianos at

Find Dan Weeks at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Petabytes Scale Analytics Infrastructure @Netflix

Duration

Level:

Persona:

More talks on:

Abstract

Find Tom Gianos at

Find Dan Weeks at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World