Track: Stream Processing


Day of week:

Scalable stream processing has become essential to many practical applications, including demand and supply forecasting in a marketplace, fraud detection, ad-hoc experiments, and real-time recommendations. The development and operation of a resilient, high volume stream processing system requires many areas of expertise, including distributed systems, applied statistics, and system optimization. The past few years have seen the emergence of multiple solutions in this space, including Spark Streaming, Kafka Streams, Flink, and Apache Beam. What are these technologies and how do they fit together? This track will shed light on these new technologies and also offer interesting applications of stream processing. Ideal mix would be 1-2 technology stories, 2-3 streaming architecture stories, and 1 interesting use of streaming stories. The prospective speakers are color-coded as per these groupings.

Track Host:
Danny Yuan
Real-time Streaming Lead @Uber
Danny Yuan is a software engineer in Uber. He’s currently working on streaming systems for Uber’s marketplace platform. Prior to joining Uber, he worked on building Netflix’s cloud platform. His work includes predictive autoscaling, distributed tracing service, real-time data pipeline that scaled to process hundreds of billions of events every day, and Netflix’s low-latency crypto services.

Trackhost Interview

Can you tell me a bit about your background?

I am a software engineer at Uber, and I have been working on the streaming systems for Uber Marketplace platform. We take care of the end-to-end processing of our streams including indexing, serving, analysis and forecasting. Prior to joining Uber, I worked on building Netflix’s cloud platform. I worked on predictive autoscaling and distributed tracing, and I also worked on data processing and pipeline

You are running the stream processing track at QCon SF 2016. Can you tell us a bit about the track, the talks, and the speakers that you have got?

One really exciting theme is what’s new in this practical field? What are the new technologies in this field, and what functionality do they provide for practitioners? What are the major players in this field? 

In that regard, we have a talk called “Fundamentals of Stream Processing with Apache Beam,” given by two Google engineers. Apache Beam is about unifying batch processing and stream processing. 

We also have two related talks, about using Apache Fling at Uber, and using Apache Spark streaming at Netflix for real-time recommendations.

Another exciting talk is about DynamoDB streams, to be given by two software engineers from Amazon. Attendees will learn about turning the database inside out. Now we have database operations, and at the core of the database there is a commit lock. How can we turn this commit lock into streams? Then we could implement really cool functionalities on top of it, such as reliable replications and distributed state machine.

Since we are talking about stream processing we know there are some scalability challenges. We want to talk about how to scale our stream processing systems. That is why we include another talk about scaling up in near real-time analytics at both LinkedIn and Uber. 

What do you want people to walk away from this track with?

Hopefully, they can learn what’s new in the field, but, more importantly, they can learn how to apply those new technologies and best practices to their own domain, and how to use them to construct effective and efficient architectures to solve real world problems.

10:35am - 11:25am

by Tyler Akidau
Engineer @ Google & Founder/Committer on Apache Beam

by Frances Perry
Engineer @ Google & Founder/Committer on Apache Beam

Apache Beam (unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large.

Come learn the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task.

Beam provides a model that allows developers to...

11:50am - 12:40pm

by Akshat Vig
Senior Software Engineer @Amazon

by Khawaja Shams
VP of Engineering Amazon's @ElementalTech, previously Head of Engineering NoSQL @AWSCloud

Replicated state machines are the cornerstone of distributed systems - and at the heart of a replicated state machine is a transactional log. While these logs are fundamental to replication in a distributed system, they have recently emerged as the glue for event driven systems. In order processing of logs enables multiple developers to build a diverse set of applications on top of the same stream ranging from materialized views, event driven systems, realtime analytics, etc. This powerful...

1:40pm - 2:30pm

by Danny Yuan
Real-time Streaming Lead @Uber

In the core of Uber's architecture is a marketplace platform, which is responsible for fulfilling requests for rides, eats, deliveries, and etc. To make our marketplace system efficient and intelligent, we need to extract timely and deep insights from our carefully curated data, and make them available for both people and machines to consume in real time.

This talk will discuss how Uber builds its next generation of stream processing system to support real time analytics as well as...

2:55pm - 3:45pm

by Elliot Chow
Senior Software Engineer @Netflix

Recommendations play a vital role in a great Netflix experience. Traditionally, these recommendations are precomputed using viewing history, scroll activity, and a variety of other signals in a near-line fashion. To be able to react more quickly to surges and dips in interest, we introduced the Trending Now row that makes use of real time data as an additional signal for generating recommendations. This allows us to not only personalize...

4:10pm - 5:00pm

Open Space
5:25pm - 6:15pm

by Yi Pan
PMC Member/Commiter @SamzaStream & Distributed Systems Engineer @Linkedin

by Chinmay Soman
PMC Member/Commiter @SamzaStream & Staff software Engineer @Uber

Modern businesses are pushing the limits of decision making. Advancements in stream processing and OLAP (Online Analytical Processing) technologies have enabled faster insights into the data coming in, thus powering near real time decisions. A lot of use cases such as Fraud detection, Operational dashboards, Financial Incentive pipelines and Experimentation (A/B testing) need SQL like access to such streaming data.

This talk focuses on how Uber and LinkedIn use Apache Samza, Apache...



Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9