Presentation: Unified Big Data Processing with Apache Spark
While early big data systems, such as MapReduce, focused on batch processing, the demands on these systems have quickly grown. Users quickly needed to run (1) more interactive ad-hoc queries, (2) sophisticated multi-pass algorithms (e.g. machine learning), and (3) real-time stream processing. The result has been an explosion of specialized systems to tackle these new workloads. Unfortunately, this means more systems to learn, manage, and stitch together into pipelines. Spark is unique in taking a step back and trying to provide a *unified* post-MapReduce programming model that tackles all these workloads. By generalizing MapReduce to support fast data sharing and low-latency jobs, we achieve best-in-class performance in a variety of workloads, while providing a simple programming model that lets users easily and efficiently combine them.
Today, Spark is the most active open source project in big data, with high activity in both the core engine and a growing array of standard libraries built on top (e.g. machine learning, stream processing, SQL). I'm going to talk about the latest developments in Spark and show examples of how it can combine processing algorithms to build rich data pipelines in just a few lines of code.
Matei Zaharia Elsewhere
Similar Talks
Tracks
Covering innovative topics
Monday, 3 November
-
Architectures You've Always Wondered about
The newest and biggest Internet architectures
-
Real World Functional
Putting functional programming concepts to work in the real world.
-
The Future of Mobile
The future of mobile and performance improvements
-
Continuous Delivery: From Heroics to Becoming Invisible
Continuous Delivery philosophies, cultures, hiccups, and best practices.
-
Unleashing the Power of Streaming Data
This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale!
-
Sponsored Solutions Track I
Tuesday, 4 November
-
Engineering for Product Success
Architectures that make products more successful
-
Reactive Service Architecture
Reactive, Responsive, Fault Tolerant and More.
-
Modern CS In the Real World
How modern CS tackles problems in the real world.
-
Applied Machine Learning and Data Science
Understand your big big data!
-
Deploying at Scale
Containerizing Applications, Discovering Services, and Deploying to the Grid.
-
Sponsored Solutions Track II
Wednesday, 5 November
-
Beyond Hadoop
Emerging Big Data Frameworks and Technology
-
Scalable Microservice Architectures
This track addresses the ways companies with hundreds of fine-grained web-services (e.g. Netflix, LinkedIn) manage complexity!
-
Java at the Cutting Edge
The latest and greatest in the Java ecosystem
-
Engineering culture
Successes and failures in creating an engineering culture.
-
Next gen HTML5 and JS
How Web Components, the Future of CSS, and more are changing the web.
-
Sponsored Solutions Track III