Conference: Nov 13-15, 2017
Workshops: Nov 16-17, 2017
Presentation: Petabytes Scale Analytics Infrastructure @Netflix
Duration
Level:
- Intermediate
Persona:
- Data Scientist
Abstract
Netflix runs one of the largest big data analytics infrastructure in the public cloud. Our platform leverages the scalability, reliability, and flexibility of the cloud to move quickly and innovate.
In this talk, we will discuss the overall big data platform architecture and dive into the two key design choices that underpin our platform: Storage and Orchestration. We will discuss how we leverage S3 as our data warehouse storage layer. We rely on Parquet as our primary storage format and will cover the advantages of using Parquet on S3 along with many of the features and optimizations provided by this advanced file format,
We will also discuss our open source federated job management and orchestration layer, Genie... Every day Netflix runs tens of thousands of jobs across the numerous heterogeneous (Hadoop, Presto, Spark, etc.) clusters. From Spark and Pig ETL SLA jobs to ad-hoc interactive queries on Presto to data movement with Sqoop or indexing with Druid, Genie is used to orchestrate this diverse set of use cases across multiple clusters in our environment. Genie also helps us manage clusters and job lifecycles in the cloud.
Finally, we will cover where we plan on taking Genie including scaling job resources via Docker, and more.
Similar Talks
.
Tracks
Monday Nov 7
-
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
-
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
-
Containers Everywhere
State of the art in Container deployment, management, scheduling
-
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
-
Next Generation Web Standards, Frameworks, and Techniques
JavaScript, HTML5, WASM, and more... innovations targetting the browser
-
Optimize You
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
-
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
-
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
-
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
-
Evolving DevOps
Lessons/stories on optimizing the deployment pipeline
-
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
-
Architecting for Failure
Your system will fail. Take control before it takes you with it.
-
Stream Processing
Stream Processing, Near-Real Time Processing
-
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
-
Culture as a Differentiator
The why and how for building successful engineering cultures
-
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
-
UX Reimagined
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.