ML Models typically use upwards of 100 features to generate a single prediction. As a result, there is an explosion in the number of data pipelines and high request fanout during prediction. On top of that, feature schemas evolve with every model iteration and managing them manually becomes cumbersome. Models also can fail silently with shifts in input data making observability challenging. To make matters even worse, for use-cases that involve ranking, the per-inference latency budgets are extremely tight.
In this presentation we will talk about Airbnb’s Feature Platform - focusing on the recent efforts to solve the challenges mentioned above. Specifically we will cover four areas - core APIs, training data generation, feature serving and feature observability in detail.
Main Takeaways
In this presentation we will cover the API and architecture of Airbnb’s Feature Platform with special focus on the following aspects
- Training data generation - full support for the entire training data generation pipeline.
- feature bootstrap, label computation and training set generation at large scale. We covered point-in-time feature backfill in the past, and will briefly introduce it, but focus on the other areas.
- Feature serving - full support for advanced feature computation
- Feature derivations, Feature Chaining & external and contextual feature support. We will describe how this works in batch, streaming and application serving environments.
- Feature observability - pre-training and post productionization monitoring for data failures
- Online Offline consistency, training data health metrics, feature & prediction drift
Speaker
Nikhil Simha
Author of "Chronon Feature Platform", Previously Built Stream Processing Infra @Meta and NLP Systems @Amazon & @Walmartlabs
Nikhil Simha is a Staff Software Engineer on the Machine Learning infrastructure team at Airbnb. He is currently working on Chronon, an end-to-end feature engineering platform. Prior to Airbnb, he was a founding engineer on the stream processing team where he built a scheduler (Turbine, ICDE '20) and a stream processing framework (RealTime Data @ FB, SIGMOD '16) at Facebook. He is interested in the intersection of compilers, machine learning and realtime data processing systems. Nikhil got his Bachelors degree in Computer Science from Indian Institute of Technology, Bombay. While not working, he likes to walk with his dog Leela.