Inference
Session
AI/ML
One Platform to Serve Them All: Autoscaling Multi-Model LLM Serving
Wednesday Nov 19 / 10:35AM PST
AI teams are moving to self-hosted inference away from hosted LLMs as fine-tuning drives model performance. The catch is scale, hundreds of variants create long-tail traffic, cold starts, and duplicated stacks.
Meryem Arik
Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist