At Doordash, the last year has seen a surge in applications of machine learning to various product verticals in our growing business. However, with this growth, our data scientists have had increasing bottlenecks in their development cycle because of our existing feature engineering process. At a daily feature volume of over 500 unique features and 10B feature values, each component of the feature engineering process from feature generation, online materialization, offline serving, and lifecycle management was becoming operationally intensive and low velocity.

To overcome these challenges, we designed an end-to-end declarative and central feature engineering platform Fabricator. This framework leverages simple high-level YAML definitions to automate the feature pipeline orchestration using Dagster, perform scalable pipeline executions leveraging Spark on Databricks, and simplify feature store materialization and management via Redis. Additionally, the entire framework is continuously deployed, bringing iteration velocities down to just a few minutes.

In this session, we’d like to present how our Machine Learning Platform designed Fabricator by integrating various open source and enterprise solutions to deliver a declarative end-to-end feature engineering framework and take a look at the wins this enabled us to deliver. In the end, we take a closer look at key optimizations and learning and discuss plans for extending the framework for hybrid real-time and batch architectures.

From the same track

Session Machine Learning

Ray: The Next Generation Compute Runtime for ML Applications

Monday Oct 24 / 10:35AM PDT

Ray is an open source project that makes it simple to scale any compute-intensive Python workload. Industry leaders like Uber, Shopify, Spotify are building their next generation ML platforms on top of Ray.

Zhe Zhang

Head of Open Source Engineering @anyscalecompute, Previously Hadoop/Spark infra Team Manager @LinkedIn

Session Machine Learning

An Open Source Infrastructure for PyTorch

Monday Oct 24 / 01:40PM PDT

In this talk we’ll go over tools and techniques to deploy PyTorch in production. The PyTorch organization maintains and supports open source tools for efficient inference like pytorch/serve, job management pytorch/torchx and streaming datasets like pytorch/data.

Mark Saroufim

Applied AI Engineer @Meta

Session Machine Learning

Real-Time Machine Learning: Architecture and Challenges

Monday Oct 24 / 02:55PM PDT

Fresh data beats stale data for machine learning applications. This talk discusses the value of fresh data as well as different types of architecture and challenges of online prediction.  

Chip Huyen

Co-founder @Claypot AI, previously @Snorkel Ai & @NVIDIA

Session Machine Learning

Declarative Machine Learning: A Flexible, Modular and Scalable Approach for Building Production ML Models

Monday Oct 24 / 04:10PM PDT

Building ML solutions from scratch is challenging because of a variety of reasons: the long development cycles of writing low level machine learning code and the fast pace of state-of-the-art ML methods to name a few.

Shreya Rajpal

Founder @Guardrails AI, Experienced ML Practitioner with a Decade of Experience in ML Research, Applications and Infrastructure

Session

Unconference: MLOps

Monday Oct 24 / 05:25PM PDT

What is an unconference? At QCon SF, we’ll have unconferences in most of our tracks.

Shane Hastie

Global Delivery Lead for SoftEd and Lead Editor for Culture & Methods at InfoQ.com

Fabricator: End-to-End Declarative Feature Engineering Platform

Abstract

Speaker

Kunal Shah

Speaker

Kunal Shah

Date

Location

Track

Topics

Share

From the same track

Ray: The Next Generation Compute Runtime for ML Applications

An Open Source Infrastructure for PyTorch

Real-Time Machine Learning: Architecture and Challenges

Declarative Machine Learning: A Flexible, Modular and Scalable Approach for Building Production ML Models

Unconference: MLOps