From Zero to A Hundred Billion: Building Scalable Real Time Event Processing At DoorDash

At DoorDash, real time events are an important data source to gain insight into our business but building a system capable of handling billions of real time events is challenging. Events are generated from our services and user devices and need to be processed and transported to different destinations to help us make data driven decisions on the platform.

Historically, DoorDash has a few data pipelines that get data from our legacy monolithic web application and ingest the data into Snowflake. These pipelines incurred high data latency, significant cost and operational overhead. Two years ago, we started the journey of creating a real time event processing system named Iguazu to replace the legacy data pipelines and address the following event processing needs we anticipated as the data volume grows with the business:

  • Data ingest from heterogeneous data sources including mobile and web
  • Easily accessible by streaming data consumers with higher abstractions
  • High data quality with end to end schema enforcement
  • Scalable, fault tolerant and easy to operate for a small team

To meet those objectives, we decided to create a new event processing system with a strategy to leverage open source frameworks which can be customized and better integrated with DoorDash’s infrastructure. Stream processing platforms like Apache Kafka and Apache Flink have matured in the past few years and become easy to adopt. By leveraging those excellent building blocks, applying best engineering practices and a platform mindset, we built the system from ground up and scaled it to process hundreds of billions of events per day with 99.99% delivery rate and only minutes of latency to Snowflake and other OLAP data warehouses.

The talk will discuss in detail the design of the system including major components of event producing, event processing with Flink and streaming SQL, event format and schema validation, Snowflake integration, and self-serve platform. It will also showcase how we have adapted and customized open source frameworks to address our needs, and solved some of the challenges including schema update and service/resource orchestration in an infrastructure-as-code environment.


Speaker

Allen Wang

Software Engineer @DoorDash

Allen Wang is currently a tech lead at data platform at DoorDash. He is the architect for the Iguazu event processing system and a founding member of the real time streaming platform team. 

Prior to joining DoorDash, he was a lead in the real time data infrastructure team at Netflix where he created the Kafka infrastructure for Netflix’s Keystone data pipeline and was a main contributor to shape the Kafka ecosystem at Netflix.

He is a contributor to Apache Kafka and NetflixOSS, and a two-time QCon speaker.

Read more

Date

Wednesday Oct 26 / 01:40PM PDT ( 50 minutes)

Share

From the same track

Session

Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Wednesday Oct 26 / 10:35AM PDT

Azure Cosmos DB is a fully-managed, multi-tenant, distributed, shared-nothing, horizontally scalable database that provides planet-scale capabilities and multi-model APIs for Apache Cassandra, MongoDB, Gremlin, Tables, and the Core (SQL) APIs.

Mei-Chin Tsai

Partner Director of Software Eng Manager @Microsoft

Vinod Sridharan

Principal Software Engineering Architect @Microsoft

Session

Honeycomb: How We Used Serverless to Speed Up Our Servers

Wednesday Oct 26 / 11:50AM PDT

Honeycomb is the state of the art in observability: customers send us lots of data and then compose complex, ad-hoc queries. Most are simple, some are not. Some are REALLY not; this load is both complex, spontaneous, and urgent.

Jessica Kerr

Principal Developer Evangelist @honeycombio

Session

Magic Pocket: Dropbox’s Exabyte-Scale Block Storage System

Wednesday Oct 26 / 02:55PM PDT

Magic Pocket is used to store all of Dropbox’s data.

Facundo Agriel

Software Engineer / Tech Lead @Dropbox

Session

AYAWA Panel

Wednesday Oct 26 / 04:10PM PDT

Details coming soon.