Presentation: Monitoring and Tracing @Netflix Streaming Data Infrastructure

Track: Production Readiness: Building Resilient Systems

Location: Ballroom BC

Duration: 10:35am - 11:25am

Day of week: Wednesday

Share this on:

Abstract

Netflix streaming data infrastructure transports trillions of events per day and supports hundreds of streaming processing jobs. The team behind it is small and there is no separate operations team. To efficiently manage and operate this huge infrastructure and reduce operational burden for everyone, we developed a set of tools that enables automated operations and mitigations. Our Kafka monitoring tools provide comprehensive signals and great insights into the health of our Kafka brokers and consumers, from which we derived ways to automate error handling that improves stability of brokers and stream processing jobs. For data streams that have high consistency requirements, instead of purely relying on aggregated counts that may be misleading, we trace individual events along their transporting path. Enabled by stream processing with minimal resources, tracing provides insight into end-to-end data loss, duplicates and latency at near real time and with high accuracy. These results helped us to further improve our service quality and validate design trade-offs.

The talk will give the design and implementation details of these dev/ops tools and highlight the critical roles they play in operating our data infrastructure. It will showcase how active and targeted tools development for operational use can quickly payoff with improved product quality and overall agility.

Speaker: Allen Wang

Architect & Engineer in Real Time Data Infrastructure Team @Netflix

Allen Wang is an architect and engineer in Real Time Data Infrastructure team at Netflix. He architected the multi-cluster Kafka infrastructure for Netflix in cloud environment and is heavily involved in developing the tools needed for operating the streaming data infrastructure. He is an open source contributor for Apache Kafka and NetflixOSS and a frequent speaker for Kafka.

Find Allen Wang at

Tracks

Monday, 11 November

Tuesday, 12 November

Wednesday, 13 November