Advanced machine learning (ML) models, particularly large language models (LLMs), require scaling beyond a single machine. As open-source LLMs become more prevalent on platforms and model hubs like HuggingFace (HF), ML practitioners and GenAI developers are increasingly inclined to fine-tune these models with their private data to suit their specific needs.

However, several concerns arise: which compute infrastructure should be used for distributed fine-tuning and training? How can ML workloads be effectively scaled for data ingestion, training/tuning, or inference? How can large models be accommodated within a cluster? And how can CPUs and GPUs be optimally utilized?

Fortunately, an opinionated stack is emerging among ML practitioners, leveraging open-source libraries.

This session focuses on the integration of HuggingFace and Ray AI Runtime (AIR), enabling scaling of model training and data loading. We’ll delve into implementation details, explore the

Transformer APIs, and demonstrate how Ray AIR facilitates an end-to-end ML workflow, encompassing data ingestion, training/tuning, or inference.

By exploring the integration between HF and Ray AIR, we’ll discuss how Ray’s orchestration capabilities fulfill computation and memory requirements. Also, we’ll showcase how existing HF Transformer APIs, DeepSpeed, and Accelerate code can seamlessly integrate with Ray AIR’s Trainers and demonstrate its capabilities within this emerging component stack. Finally, we’ll demonstrate how to fine-tune an open-source LLM model with HF Transformer APIs and Ray AIR Trainers.

From the same track

Session AI/ML

Chronon - Airbnb’s End-to-End Feature Platform

Tuesday Oct 3 / 10:35AM PDT

ML Models typically use upwards of 100 features to generate a single prediction. As a result, there is an explosion in the number of data pipelines and high request fanout during prediction.

Nikhil Simha

Author of "Chronon Feature Platform", Previously Built Stream Processing Infra @Meta and NLP Systems @Amazon & @Walmartlabs

Session AI/ML

Defensible Moats: Unlocking Enterprise Value with Large Language Models

Tuesday Oct 3 / 11:45AM PDT

Building LLM-powered applications using APIs alone poses significant challenges for enterprises. These challenges include data fragmentation, the absence of a shared business vocabulary, privacy concerns regarding data, and diverse objectives among data and ML users.

Nischal HP

Vice President of Data Science @Scoutbee, Decade of Experience Building Enterprise AI

Session AI/ML

Generative Search: Practical Advice for Retrieval Augmented Generation (RAG)

Tuesday Oct 3 / 02:45PM PDT

In this presentation, we will delve into the world of Retrieval Augmented Generation (RAG) and its significance for Large Language Models (LLMs) like OpenAI's GPT4. With the rapid evolution of data, LLMs face the challenge of staying up-to-date and contextually relevant.

Sam Partee

Principal Engineer @Redis

Session AI/ML

Building Guardrails for Enterprise AI Applications W/ LLMs

Tuesday Oct 3 / 05:05PM PDT

Large Language Models (LLMs) such as ChatGPT have revolutionized AI applications, offering unprecedented potential for complex real-world scenarios. However, fully harnessing this potential comes with unique challenges such as model brittleness and the need for consistent, accurate outputs.

Shreya Rajpal

Founder @Guardrails AI, Experienced ML Practitioner with a Decade of Experience in ML Research, Applications and Infrastructure

Session

Unconference: Modern ML

Tuesday Oct 3 / 03:55PM PDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Modern Compute Stack for Scaling Large AI/ML/LLM Workloads

Abstract

Speaker

Jules Damji

Find Jules Damji at:

Speaker

Jules Damji

Date

Location

Track

Topics

Share

From the same track

Chronon - Airbnb’s End-to-End Feature Platform

Defensible Moats: Unlocking Enterprise Value with Large Language Models

Generative Search: Practical Advice for Retrieval Augmented Generation (RAG)

Building Guardrails for Enterprise AI Applications W/ LLMs

Unconference: Modern ML

Follow QCon

Contact

Menu

Conferences around the World