Fabricator: End-to-End Declarative Feature Engineering Platform

At Doordash, the last year has seen a surge in applications of machine learning to various product verticals in our growing business. However, with this growth, our data scientists have had increasing bottlenecks in their development cycle because of our existing feature engineering process. At a daily feature volume of over 500 unique features and 10B feature values, each component of the feature engineering process from feature generation, online materialization, offline serving, and lifecycle management was becoming operationally intensive and low velocity.

To overcome these challenges, we designed an end-to-end declarative and central feature engineering platform Fabricator. This framework leverages simple high-level YAML definitions to automate the feature pipeline orchestration using Dagster, perform scalable pipeline executions leveraging Spark on Databricks, and simplify feature store materialization and management via Redis. Additionally, the entire framework is continuously deployed, bringing iteration velocities down to just a few minutes.

In this session, we’d like to present how our Machine Learning Platform designed Fabricator by integrating various open source and enterprise solutions to deliver a declarative end-to-end feature engineering framework and take a look at the wins this enabled us to deliver. In the end, we take a closer look at key optimizations and learning and discuss plans for extending the framework for hybrid real-time and batch architectures.


Speaker

Kunal Shah

ML Platform Engineering Manager @DoorDash, Previously ML Platforms & Data Engineering frameworks @Airbnb & @YouTube

Kunal Shah is an ML Platform Engineering Manager at Doordash focusing on building a feature engineering platform. Over the last year he has launched declarative frameworks for both batch and real time feature development, accelerating the development lifecycle by over 2x. Previously, he has worked on ML Platforms and Data Engineering frameworks at Airbnb and YouTube. He finished his Compute Science undergraduate at IIT Bombay, and holds a Masters in Data Science from UC Berkeley.

Read more

Date

Monday Oct 24 / 11:50AM PDT ( 50 minutes )

Location

Pacific DEKJ

Track

MLOps

Topics

Machine Learning YAML Pipeline Batch Architectures Architecture

Share

From the same track

Session Machine Learning

Ray: The Next Generation Compute Runtime for ML Applications

Monday Oct 24 / 10:35AM PDT

Ray is an open source project that makes it simple to scale any compute-intensive Python workload. Industry leaders like Uber, Shopify, Spotify are building their next generation ML platforms on top of Ray.

Speaker image - Zhe Zhang
Zhe Zhang

Head of Open Source Engineering @anyscalecompute, Previously Hadoop/Spark infra Team Manager @LinkedIn

Session Machine Learning

An Open Source Infrastructure for PyTorch

Monday Oct 24 / 01:40PM PDT

In this talk we’ll go over tools and techniques to deploy PyTorch in production. The PyTorch organization maintains and supports open source tools for efficient inference like pytorch/serve, job management pytorch/torchx and streaming datasets like pytorch/data.

Speaker image - Mark Saroufim
Mark Saroufim

Applied AI Engineer @Meta

Session Machine Learning

Real-Time Machine Learning: Architecture and Challenges

Monday Oct 24 / 02:55PM PDT

Fresh data beats stale data for machine learning applications. This talk discusses the value of fresh data as well as different types of architecture and challenges of online prediction.  

Speaker image - Chip Huyen
Chip Huyen

Co-founder @Claypot AI, previously @Snorkel Ai & @NVIDIA

Session Machine Learning

Declarative Machine Learning: A Flexible, Modular and Scalable Approach for Building Production ML Models

Monday Oct 24 / 04:10PM PDT

Building ML solutions from scratch is challenging because of a variety of reasons: the long development cycles of writing low level machine learning code and the fast pace of state-of-the-art ML methods to name a few.

Speaker image - Shreya Rajpal
Shreya Rajpal

Founder @Guardrails AI, Experienced ML Practitioner with a Decade of Experience in ML Research, Applications and Infrastructure

Session

Unconference: MLOps

Monday Oct 24 / 05:25PM PDT

What is an unconference? At QCon SF, we’ll have unconferences in most of our tracks.

Speaker image - Shane Hastie
Shane Hastie

Global Delivery Lead for SoftEd and Lead Editor for Culture & Methods at InfoQ.com