Keynote: ETL is dead; long-live streams


Day of the Week:

What happens if you take everything that is happening in your company—every click, every database change, every application log—and make it all available as a real-time stream of well-structured data?

I will discuss the experience at LinkedIn and elsewhere moving from batch-oriented ETL to real-time streams using Apache Kafka. I’ll talk about how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. I will cover some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, applications, and data systems in a self-service fashion.

I will describe how real-time streams can become the source of ETL into Hadoop or a relational data warehouse, and how real-time data can supplement the role of batch-oriented analytics in Hadoop or a traditional data warehouse.

I will also describe how applications and stream processing systems such as Storm, Spark, or Samza can make use of these feeds for sophisticated real-time data processing as events occur.

Similar Talks

Similar Talks

Senior Software Engineer, Playback Features @Netflix
Senior Software Engineer @Netflix
Principle Data Analysis Leader @Infolace
Director of Design @BigNerdRanch, focused on Mobility & UX
UX Lead, Interaction Design Specialist @Fjord
Software Engineer @Instagram
Chief Architect @Slack, previously @Facebook
PMC Member/Commiter @SamzaStream & Distributed Systems Engineer @Linkedin



Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers