Conference: Nov 13-15, 2017
Workshops: Nov 16-17, 2017
Presentation: Real Time Recommendations using Spark Streaming
Duration
Level:
- Intermediate
Persona:
- Architect
- Developer
- Developer, JVM
Key Takeaways
- Understand the architecture, concerns, and challenges building a high volume real-time streaming solution like Trending Now leveraging Spark Streaming.
- Hear practical advice implementing a real-time streaming service, such as joining multiple live streams of data, data persistence patterns, and resiliency considerations.
- Learn some of the strengths and weakness Netflix uncovered using Spark Streaming to better understand whether it fits your individual use case.
Abstract
Recommendations play a vital role in a great Netflix experience. Traditionally, these recommendations are precomputed using viewing history, scroll activity, and a variety of other signals in a near-line fashion. To be able to react more quickly to surges and dips in interest, we introduced the Trending Now row that makes use of real time data as an additional signal for generating recommendations. This allows us to not only personalize this row based on the context like time of day and day of week, but also react to sudden changes in collective interests of members, due to real-world events.
We will discuss the data pipeline that we built to process Netflix user activities in real time for the Trending Now row. We will share our experiences with developing, monitoring, and productionalizing a system that uses Kafka, Spark Streaming, and Cassandra.
Interview
Elliot: I am a software engineer at Netflix on the Personalization Infrastructure team. We develop data pipelines and other infrastructure to help the personalization algorithms team make recommendations.
Elliot: Definitely. With over 70 million of subscribers, there is a lot of traffic every day. In addition to streaming content, we also collect telemetry about how our users are engaging with Netflix. Since we want to generate the best recommendations and user experience, we have to consider all of that data.
Elliot: A significant part of the real time data we process is made up of what people browse through on Netflix and what people watch, and people are watching Netflix at all hours during the day. All of this data comes in through Kafka, so everything is fairly real time. There obviously is a lot of data cleaning that needs to be done, and depending on the algorithms that consume the data, we have to adjust our filtering criteria accordingly.
Elliot: Someone who is considering using Spark Streaming or using it more extensively. They would also be involved in implementation.
Elliot: Trending Now is one of the first use cases at Netflix where we try to make use of real time data for personalization. We are also one of the first at Netflix to be using Spark Streaming. There have been many questions about how well this will work, especially in a production environment at Netflix's scale. Our team had looked into Spark Streaming maybe a year or two ago and it wasn’t quite mature enough at the time. Many improvements have been made since then, which has prompted us to revisit it. We wanted to share our learnings as we build this system and continue to improve it.
Elliot: A couple things I plan to discuss are challenges we faced around joining multiple input streams and some Spark Streaming quirks. We have also done quite a bit on the operations side, including things like failure handling, calculating metrics, and monitoring. These are critical pieces that make the system resilient and production ready.
Elliot: A bit of code will probably come up in a few places about Spark Streaming specifically, but that should only be a small part of the discussion. I'll focus on different patterns we tried.
Elliot: They should come away with an idea about how we used the technologies we chose for building a system like this, and whether they would be suitable for their use cases. Hopefully, they will also find some of our experiences with Spark Streaming useful.
Similar Talks



.
Tracks
Monday Nov 7
-
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
-
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
-
Containers Everywhere
State of the art in Container deployment, management, scheduling
-
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
-
Next Generation Web Standards, Frameworks, and Techniques
JavaScript, HTML5, WASM, and more... innovations targetting the browser
-
Optimize You
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
-
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
-
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
-
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
-
Evolving DevOps
Lessons/stories on optimizing the deployment pipeline
-
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
-
Architecting for Failure
Your system will fail. Take control before it takes you with it.
-
Stream Processing
Stream Processing, Near-Real Time Processing
-
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
-
Culture as a Differentiator
The why and how for building successful engineering cultures
-
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
-
UX Reimagined
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.