Location:

Seacliff A/B

Day of week:

Monday

Batch processing based on the Map-Reduce framework has been the dominant paradigm for analyzing a large amount of data in the past few years. The growing popularity of this paradigm can be attributed to a combination of factors, including the increased ease with which companies can produce (e.g. via web-based click telemetry), collect (e.g. via ETL technology), and glean insights from (e.g. via ML platforms) a large amount of data at rest. As popular as this paradigm is, it is inherently “batch oriented” — i.e. it requires processing of the entire data set to get an answer and the entire data set under review must be at rest for the duration of the processing. In many cases, the business requires an answer within a few minutes and does not require the entire data set to be processed. In these cases, a new paradigm has emerged : Real-time (a.k.a. Streaming) data processing.
For example, many e-commerce sites (e.g. ITunes) and credit card issuers use a form of online-analytics to detect and mitigate possibly-fraudulent financial transactions before they occur — this is because the cost of repairing fraud is too high! In another example, most web scale companies rely on real-time or near-real-time system metrics to detect when a portion of a web site has become unhealthy. In most cases, some level of automated fail-over is possible. Where automation does not bring about recovery, real-time alerting is key to getting the right people to take evasive action to avoid a site meltdown! In yet another example, companies like LinkedIn generate real-time recommendations based on recent user actions (e.g. social gestures such as shares and likes).
What do you need to know when you eventually build and deploy one of these systems? This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale.

Track Host:

Danny Yuan

Distributed system engineer in Netflix. Owner of Netflix's data pipeline and predictive autoscaling engine

Danny is an architect and software developer in Netflix’s Platform Engineering team. He works on Netflix’s distributed crypto service, data pipeline, and real-time analytics. He is the owner of Netflix’s open sourced data pipeline, Suro, and also the owner of Netflix’s predictive autoscaling engine. @g9yuayon

10:35am - 11:25am

by Terence Yim
Committer of Apache Twill, lead of Continuuity's JetStream

High throughput stream processing with ACID guarantees

Distributed realtime event processing, also called stream processing, gained a lot of traction in the Big Data community recently. Frameworks like Apache Storm and Apache Samza tackle various challenges in stream processing including performance, scalability, reliability and fault tolerance. However, very limited support on transactions are available in these frameworks. The lack of complete transaction support puts burden on developers to maintain data integrity by implementing rollback...

11:50am - 12:40pm

by Fangjin Yang
Software Engineer at Metamarkets

by Nelson Ray
Quantitative analyst working on AdWords at Google

Not Exactly! Fast Queries via Approximation Algorithms

Many exact queries require computation and storage that scale linearly or super-linearly in the data. There exist many classes of problems for which exact results are not necessary.

We describe the roles of various approximation algorithms that allow Druid, a distributed datastore, to increase query speeds and minimize data volume while maintaining rigorous error bounds on the results.

Calculating the cardinality exactly of 1 billion unique integer-valued IDs requires 4GB...

1:40pm - 2:30pm

by Danny Yuan
Distributed system engineer in Netflix. Owner of Netflix's data pipeline and predictive autoscaling engine

by Justin Becker
Senior Software Engineer at Netflix

Mantis: Netflix's Event Stream Processing System

Netflix customers stream over two billion hours of content each month, accounting for over a third of downstream Internet traffic during peak hours. At this scale, Netflix's systems generate and collect millions of events every second, such as request traces, streaming client activities, and system metrics. It is essential for engineers to process such data streams efficiently and reliably to support real-time monitoring and alerting, outlier detection, application diagnostics, trend...

2:55pm - 3:45pm

by Neha Narkhede
Key contributor to Apache's Kafka & Samza

Samza in LinkedIn: How LinkedIn Processes Billions of Events Everyday in Real-time

We are enjoying something of a renaissance in data infrastructure. The old workhorses like MySQL and Oracle still exist but they are complemented by new specialized distributed data systems like Cassandra, Redis, Druid, and Hadoop. At the same time what we consider data has changed too--user activity, monitoring, logging and other event data are becoming first class citizens for data driven companies. Taking full advantage of all these systems and the relevant data creates a massive data...

4:10pm - 5:00pm

by Tyler Akidau
Software Engineer at Google

Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda Architecture

Proponents of the Lambda Architecture argue that streaming systems are unreliable. That they can’t be used to provide consistent results. That they’re difficult to backfill when data changes. That the only way to get low latency and have precise results and be able to respond to changes in upstream data is by...

5:25pm - 6:15pm

by Richard Kasperowski
QCon Open Space Facilitator

Open Space

Real-time Data Open Space

Join Danny Yuan, our speakers, and other attendees as we explore a variety of use-cases, platforms, and techniques for processing and analyzing streaming data from the companies deploying them at scale, and from each other!

What is Open Space?

Every day at QConSF, we’ll open space five times, once for each track. Open Space is a kind of unconference, a simple way to run productive meetings for 5 to 2000 or more people, and a powerful way to lead any kind of organization in...

Tracks

Covering innovative topics

Monday, 3 November

Architectures You've Always Wondered about

The newest and biggest Internet architectures
Real World Functional

Putting functional programming concepts to work in the real world.
The Future of Mobile

The future of mobile and performance improvements
Continuous Delivery: From Heroics to Becoming Invisible

Continuous Delivery philosophies, cultures, hiccups, and best practices.
Unleashing the Power of Streaming Data

This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale!
Sponsored Solutions Track I

Tuesday, 4 November

Engineering for Product Success

Architectures that make products more successful
Reactive Service Architecture

Reactive, Responsive, Fault Tolerant and More.
Modern CS In the Real World

How modern CS tackles problems in the real world.
Applied Machine Learning and Data Science

Understand your big big data!
Deploying at Scale

Containerizing Applications, Discovering Services, and Deploying to the Grid.
Sponsored Solutions Track II

Wednesday, 5 November

Beyond Hadoop

Emerging Big Data Frameworks and Technology
Scalable Microservice Architectures

This track addresses the ways companies with hundreds of fine-grained web-services (e.g. Netflix, LinkedIn) manage complexity!
Java at the Cutting Edge

The latest and greatest in the Java ecosystem
Engineering culture

Successes and failures in creating an engineering culture.
Next gen HTML5 and JS

How Web Components, the Future of CSS, and more are changing the web.
Sponsored Solutions Track III

Tracks or Schedule

Location:

Day of week:

What is Open Space?

Tracks

Covering innovative topics

Monday, 3 November

Tuesday, 4 November

Wednesday, 5 November

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Track: Unleashing the Power of Streaming Data

Location:

Day of week:

What is Open Space?

Tracks

Covering innovative topics

Monday, 3 November

Tuesday, 4 November

Wednesday, 5 November

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World