At Doordash, the last year has seen a surge in applications of machine learning to various product verticals in our growing business. However, with this growth, our data scientists have had increasing bottlenecks in their development cycle because of our existing feature engineering process. At a daily feature volume of over 500 unique features and 10B feature values, each component of the feature engineering process from feature generation, online materialization, offline serving, and lifecycle management was becoming operationally intensive and low velocity.
To overcome these challenges, we designed an end-to-end declarative and central feature engineering platform Fabricator. This framework leverages simple high-level YAML definitions to automate the feature pipeline orchestration using Dagster, perform scalable pipeline executions leveraging Spark on Databricks, and simplify feature store materialization and management via Redis. Additionally, the entire framework is continuously deployed, bringing iteration velocities down to just a few minutes.
In this session, we’d like to present how our Machine Learning Platform designed Fabricator by integrating various open source and enterprise solutions to deliver a declarative end-to-end feature engineering framework and take a look at the wins this enabled us to deliver. In the end, we take a closer look at key optimizations and learning and discuss plans for extending the framework for hybrid real-time and batch architectures.
ML Platform Engineering Manager @DoorDash, Previously ML Platforms & Data Engineering frameworks @Airbnb & @YouTube
Kunal Shah is an ML Platform Engineering Manager at Doordash focusing on building a feature engineering platform. Over the last year he has launched declarative frameworks for both batch and real time feature development, accelerating the development lifecycle by over 2x. Previously, he has worked on ML Platforms and Data Engineering frameworks at Airbnb and YouTube. He finished his Compute Science undergraduate at IIT Bombay, and holds a Masters in Data Science from UC Berkeley.