Presentation: Lambda Architectures in Practice
Hybrid batch/real-time architectures (sometimes called “lambda architectures”) are a powerful pattern for building robust, production-quality, up-to-the-minute data analytics systems.
We’ll discuss why you may want to go hybrid, the sorts of challenges that can arise when building production data systems, and effective techniques for making them easier to deploy and manage. We’ll take the data system at Metamarkets as an example, which uses Hadoop, Storm, Kafka, and Druid to ingest over 10TB of new data every day and to offer the ability to query trillions of aggregated events.
We’ll talk about our experience running this system in production for the past year, with particular focus on data pipeline development and operations. We took two pronged approach involving both software and operations practices. The most helpful pieces of software so far have been a Scala library we developed to express common data processing needs, paired with an execution engine that can run those programs on both Hadoop and Storm. We'll cover its design and implementation and what kinds of patterns we've seen emerge in its usage.
We'll also discuss operations practices, which we have found to be just as important as software. Ours center around a robust metrics collection system that feeds into alerting and metrics visualization. We'll talk about what kinds of metrics we've found most valuable, what kinds of things we alert on, and show some of the most useful visualizations.
Gian Merlino Elsewhere
Similar Talks
Tracks
Covering innovative topics
Monday, 3 November
-
Architectures You've Always Wondered about
The newest and biggest Internet architectures
-
Real World Functional
Putting functional programming concepts to work in the real world.
-
The Future of Mobile
The future of mobile and performance improvements
-
Continuous Delivery: From Heroics to Becoming Invisible
Continuous Delivery philosophies, cultures, hiccups, and best practices.
-
Unleashing the Power of Streaming Data
This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale!
-
Sponsored Solutions Track I
Tuesday, 4 November
-
Engineering for Product Success
Architectures that make products more successful
-
Reactive Service Architecture
Reactive, Responsive, Fault Tolerant and More.
-
Modern CS In the Real World
How modern CS tackles problems in the real world.
-
Applied Machine Learning and Data Science
Understand your big big data!
-
Deploying at Scale
Containerizing Applications, Discovering Services, and Deploying to the Grid.
-
Sponsored Solutions Track II
Wednesday, 5 November
-
Beyond Hadoop
Emerging Big Data Frameworks and Technology
-
Scalable Microservice Architectures
This track addresses the ways companies with hundreds of fine-grained web-services (e.g. Netflix, LinkedIn) manage complexity!
-
Java at the Cutting Edge
The latest and greatest in the Java ecosystem
-
Engineering culture
Successes and failures in creating an engineering culture.
-
Next gen HTML5 and JS
How Web Components, the Future of CSS, and more are changing the web.
-
Sponsored Solutions Track III