Progressive Failure Modes of Modern AI Serving Systems

Abstract

Inference platforms fail in layers. Most organizations focus on model quality while underestimating the systems engineering required to operate production AI workloads safely and reliably at scale.

Before GPU saturation even becomes a problem, teams often expose models directly to ungoverned traffic, lack concurrency controls, fail to measure system behavior, overload memory bandwidth, and eventually destroy latency guarantees and operational stability.

This talk walks through the progressive failure modes of modern AI serving systems and how to architect scalable inference infrastructure that remains observable, resilient, and performant under real production workloads. In this talk, I will walk the attendees through real code paths, production failure scenarios, debugging strategies, and architectural tradeoffs, showing both how these systems fail and how to systematically fix them.

Speaker

Abi Aryan

AI Infrastructure Engineer and Educator

Abi Aryan is an AI infrastructure engineer and educator specializing in scalable inference systems and production AI infrastructure. She spends her time helping enterprises design and optimize large-scale inferencing serving architectures, improve observability in production pipelines, and solve performance bottlenecks across distributed GPU systems.

Outside of her startup work, Abi teaches distributed systems in a university HPC program, mentors AI Engineering Team Leads through her Maven course, and is currently writing a book on GPU Engineering. Her doctoral research explores the future of adaptive AI infrastructure.

Abi Aryan

AI Infrastructure Engineer and Educator

From the same track

Session

The Revenge of the Data Scientist: Why Reliable AI Needs Evals, Traces, and Metrics

Most teams can now ship an AI prototype by calling a foundation-model API. The hard part is knowing whether that system works when real users, messy data, and business consequences arrive.

Hamel Husain

Machine Learning Engineer, 20+ Years in Applied AI, Machine Learning, and Data Science

Session

Skills, Memory, or Fine-Tuning? The Engineering Loop Behind Self-Improving Agents

As agents become mainstream, everyone wants to improve theirs either by making fewer mistakes on existing tasks or by taking on harder ones. This usually happens once an agent is already deployed in production.

Abhinav Sinha

CEO @Lucidic AI, Previously @Stanford AI Lab, @Citadel and Susquehanna International Group, and @Apple

Progressive Failure Modes of Modern AI Serving Systems

Abstract

Speaker

Abi Aryan

Speaker

Abi Aryan

Date

Track

Share

From the same track

The Revenge of the Data Scientist: Why Reliable AI Needs Evals, Traces, and Metrics

Skills, Memory, or Fine-Tuning? The Engineering Loop Behind Self-Improving Agents

Follow QCon

Contact

Menu

Conferences around the World