Producing the World's Cheapest Tokens: A How-to Guide

Abstract

AI inference is expensive, but it doesn’t have to be. In this talk, we’ll break down how to systematically drive down the cost per token across different types of AI workloads. Using real-world examples from data transformation, offline agents, and aggregated insights, we’ll unpack how to measure, optimize, and ultimately produce the world’s cheapest tokens. The session will be hardware-agnostic, featuring analysis of both Nvidia and AMD GPUs, and will include advice which can be implemented by using open-source serving frameworks such as Dynamo, vLLM, and SGLang.

What you'll take away:

  1. Token Economics 101 - Understand what actually drives cost per token
  2. Inference Optimization Tactics that can be used to drive down unit economics depending on the AI workload type
  3. Right GPU, Right Job - Ho two choose hardware and deployment strategy for maximum cost performance

Speaker

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist

Meryem is the Co-founder and CEO of Doubleword (previously TitanML), a self-hosted AI inference platform empowering enterprise teams to deploy domain-specific or custom models in their private environment. An alumna of Oxford University, Meryem studied Theoretical Physics and Philosophy. She frequently speaks at leading conferences, including TEDx and QCon, sharing insights on inference technology and enterprise AI. Meryem has been recognized as a Forbes 30 Under 30 honoree for her contributions to the AI field.

Read more
Find Meryem Arik at:

From the same track

Session Capacity Planning

How Netflix Shapes our Fleet for Efficiency and Reliability

Wednesday Nov 19 / 11:45AM PST

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Speaker image - Joseph Lynch

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Speaker image - Argha C

Argha C

Staff Software Engineer @Netflix - Leading Netflix's Cloud Scalability Efforts for Live

Session AI Architecture

Realtime and Batch Processing of GPU Workloads

Wednesday Nov 19 / 01:35PM PST

SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc.

Speaker image - Joseph Stein

Joseph Stein

Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member

Session Architecture

From ms to µs: OSS Valkey Architecture Patterns for Modern AI

Wednesday Nov 19 / 02:45PM PST

As AI applications demand faster and more intelligent data access, traditional caching strategies are hitting performance and reliability limits.

Speaker image - Dumanshu Goyal

Dumanshu Goyal

Uber Technical Lead @Airbnb Powering $11B Transactions, Formerly @Google and @AWS

Session Platform Engineering

Write-Ahead Intent Log: A Foundation for Efficient CDC at Scale

Wednesday Nov 19 / 03:55PM PST

As companies grow, so does the complexity of keeping distributed systems in sync. At DoorDash, we tackled this challenge while building a high-throughput, domain-oriented data platform for capturing changes across hundreds of services.

Speaker image - Vinay Chella

Vinay Chella

Engineering Leader @DoorDash - Specializing in Distributed Systems, Streaming & Storage Platforms, Apache Cassandra Committer, Previously Engineering Leader @Netflix

Speaker image - Akshat Goel

Akshat Goel

Staff Software Engineer, Core Infra at @DoorDash, Previously Senior Software Engineer @Amazon