Track: Stream Processing
Day of week:
Scalable stream processing has become essential to many practical applications, including demand and supply forecasting in a marketplace, fraud detection, ad-hoc experiments, and real-time recommendations. The development and operation of a resilient, high volume stream processing system requires many areas of expertise, including distributed systems, applied statistics, and system optimization. The past few years have seen the emergence of multiple solutions in this space, including Spark Streaming, Kafka Streams, Flink, and Apache Beam. What are these technologies and how do they fit together? This track will shed light on these new technologies and also offer interesting applications of stream processing. Ideal mix would be 1-2 technology stories, 2-3 streaming architecture stories, and 1 interesting use of streaming stories. The prospective speakers are color-coded as per these groupings.
I am a software engineer at Uber, and I have been working on the streaming systems for Uber Marketplace platform. We take care of the end-to-end processing of our streams including indexing, serving, analysis and forecasting. Prior to joining Uber, I worked on building Netflix’s cloud platform. I worked on predictive autoscaling and distributed tracing, and I also worked on data processing and pipeline
One really exciting theme is what’s new in this practical field? What are the new technologies in this field, and what functionality do they provide for practitioners? What are the major players in this field?
In that regard, we have a talk called “Fundamentals of Stream Processing with Apache Beam,” given by two Google engineers. Apache Beam is about unifying batch processing and stream processing.
We also have two related talks, about using Apache Fling at Uber, and using Apache Spark streaming at Netflix for real-time recommendations.
Another exciting talk is about DynamoDB streams, to be given by two software engineers from Amazon. Attendees will learn about turning the database inside out. Now we have database operations, and at the core of the database there is a commit lock. How can we turn this commit lock into streams? Then we could implement really cool functionalities on top of it, such as reliable replications and distributed state machine.
Since we are talking about stream processing we know there are some scalability challenges. We want to talk about how to scale our stream processing systems. That is why we include another talk about scaling up in near real-time analytics at both LinkedIn and Uber.
Hopefully, they can learn what’s new in the field, but, more importantly, they can learn how to apply those new technologies and best practices to their own domain, and how to use them to construct effective and efficient architectures to solve real world problems.
by Tyler Akidau
Engineer @ Google & Founder/Committer on Apache Beam
by Frances Perry
Engineer @ Google & Founder/Committer on Apache Beam
Apache Beam (unified Batch and strEAM processing!) is a new Apache incubator project. Originally based on years of experience developing Big Data infrastructure within Google (such as MapReduce, FlumeJava, and MillWheel), it has now been donated to the OSS community at large.
Come learn the fundamentals of out-of-order stream processing, and how Beam’s powerful tools for reasoning about time greatly simplify this complex task.
Beam provides a model that allows developers to...
by Akshat Vig
Senior Software Engineer @Amazon
by Khawaja Shams
VP of Engineering Amazon's @ElementalTech, previously Head of Engineering NoSQL @AWSCloud
Replicated state machines are the cornerstone of distributed systems - and at the heart of a replicated state machine is a transactional log. While these logs are fundamental to replication in a distributed system, they have recently emerged as the glue for event driven systems. In order processing of logs enables multiple developers to build a diverse set of applications on top of the same stream ranging from materialized views, event driven systems, realtime analytics, etc. This powerful...
by Danny Yuan
Real-time Streaming Lead @Uber
In the core of Uber's architecture is a marketplace platform, which is responsible for fulfilling requests for rides, eats, deliveries, and etc. To make our marketplace system efficient and intelligent, we need to extract timely and deep insights from our carefully curated data, and make them available for both people and machines to consume in real time.
This talk will discuss how Uber builds its next generation of stream processing system to support real time analytics as well as...
by Elliot Chow
Senior Software Engineer @Netflix
Recommendations play a vital role in a great Netflix experience. Traditionally, these recommendations are precomputed using viewing history, scroll activity, and a variety of other signals in a near-line fashion. To be able to react more quickly to surges and dips in interest, we introduced the Trending Now row that makes use of real time data as an additional signal for generating recommendations. This allows us to not only personalize...
by Yi Pan
PMC Member/Commiter @SamzaStream & Distributed Systems Engineer @Linkedin
by Chinmay Soman
PMC Member/Commiter @SamzaStream & Staff software Engineer @Uber
Modern businesses are pushing the limits of decision making. Advancements in stream processing and OLAP (Online Analytical Processing) technologies have enabled faster insights into the data coming in, thus powering near real time decisions. A lot of use cases such as Fraud detection, Operational dashboards, Financial Incentive pipelines and Experimentation (A/B testing) need SQL like access to such streaming data.
This talk focuses on how Uber and LinkedIn use Apache Samza, Apache...
Monday Nov 7
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
Architecting for Failure
Your system will fail. Take control before it takes you with it.
Stream Processing, Near-Real Time Processing
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator
The why and how for building successful engineering cultures
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.