As agents become mainstream, everyone wants to improve theirs either by making fewer mistakes on existing tasks or by taking on harder ones. This usually happens once an agent is already deployed in production. So when teams try to make an existing agent system better, they see a bad trace and always end up asking the same question:

Should the team update the prompt, add a skill, change memory, fine-tune a model, rewrite a tool, add an eval, or rethink the architecture?

This talk presents practical heuristics for improving production agents safely and repeatedly and for building the kind of self-improving agents teams are now after. It focuses on two core questions: how do you know the agent actually got better, and what part of the agent should you update when something goes wrong? We'll cover failure attribution, scalable vs. one-off fixes, overfitting to individual traces, regression prevention, and how teams can build a manual improvement loop that turns agent failures into durable system improvements.

Main Takeaways

How to tell whether an agent actually improved, rather than just performing better on a single failure case.
How to identify which part of an agent system should change: prompt, skill, memory, fine-tune a model, tool, eval, workflow, or architecture.
How to distinguish scalable improvements from brittle one-off patches that create future maintenance problems.

From the same track

Session

Progressive Failure Modes of Modern AI Serving Systems

Inference platforms fail in layers. Most organizations focus on model quality while underestimating the systems engineering required to operate production AI workloads safely and reliably at scale.

Abi Aryan

AI Infrastructure Engineer and Educator

Session

The Revenge of the Data Scientist: Why Reliable AI Needs Evals, Traces, and Metrics

Most teams can now ship an AI prototype by calling a foundation-model API. The hard part is knowing whether that system works when real users, messy data, and business consequences arrive.

Hamel Husain

Machine Learning Engineer, 20+ Years in Applied AI, Machine Learning, and Data Science

Skills, Memory, or Fine-Tuning? The Engineering Loop Behind Self-Improving Agents

Abstract

Main Takeaways

Speaker

Abhinav Sinha

Speaker

Abhinav Sinha

Date

Track

Share

From the same track

Progressive Failure Modes of Modern AI Serving Systems

The Revenge of the Data Scientist: Why Reliable AI Needs Evals, Traces, and Metrics

Follow QCon

Contact

Menu

Conferences around the World