Abstract

This talk presents a comprehensive journey through modern AI post-training techniques, from Pinterest's production-scale content discovery systems to enterprise agent training through Veris AI’s simulation. We'll explore how reinforcement learning and supervised fine-tuning bridge the critical gap between base model capabilities and real-world performance across two distinct but complementary domains.

We begin by exploring industry advances in RL-enhanced diffusion models and their impact on bias reduction and human preference alignment. We then dive into Pinterest's implementation of these techniques at scale for content generation via PinLanding, a multimodal content-first architecture that turns billions of content into shopping collections.

We then transition to the broader challenge of agent training, presenting a general-purpose simulation sandbox approach that generates high-fidelity training data for task-based agents. This system bridges theory and practice, showing how to transform LLM knowledge into agent experience through controlled environments that mirror real enterprise workflows.

Both systems demonstrate how post-training techniques (RL, SFT, and curriculum learning) solve the "demo-to-production" gap that plagues AI deployments, whether for content generation or autonomous task execution.

Interview:

What is the focus of your work these days?

The majority time in my day is spent on:

Building and deploying large-scale AI systems that leverage reinforcement learning and multimodal architectures for content understanding and generation
Developing production-ready implementations of cutting-edge research (Stable Diffusion, CLIP, Vision-Language Models) that scale to billions of content items
Leading engineering teams that bridge research innovations with practical deployment challenges in high-traffic content platforms
Researching novel applications of RL for improving generative models and AI agents for automated content organization across diverse industry verticals

And what was the motivation behind your talk?

The motivation stems from the massive opportunity that reinforcement learning and multimodal AI represent for any industry managing large content collections. While most companies are still experimenting with basic LLM applications, we've moved beyond that to solve fundamental challenges in content generation and organization at unprecedented scale.

Our experience demonstrates that:

RL can dramatically improve any generative model: Our Stable Diffusion improvements aren't Pinterest-specific - they're applicable to any company using image generation for marketing, product design, or content creation
Multimodal AI is ready for production: Our content-first architecture patterns work for any large catalog - e-commerce product databases, media libraries, document repositories, or digital asset management systems
Scale reveals new opportunities: Moving from millions to billions of items reveals architectural insights that smaller-scale experiments miss

I want to share this because the techniques we've developed solve universal problems:

E-commerce platforms struggling to organize massive product catalogs
Media companies with overwhelming content libraries
Marketing teams needing better image generation capabilities
Any platform where users need to discover relevant content from vast collections for search traffic

Who is your session for?

ML Engineers implementing reinforcement learning for generative models or building multimodal AI pipelines in production

Data Scientists working with large-scale content catalogs, image generation, or content organization challenges

Engineering Leaders at e-commerce, media, or content platforms evaluating AI technology choices for catalog management and content discovery

Research Engineers bridging cutting-edge research (Stable Diffusion, CLIP, VLMs) with production deployment

Product Managers in retail, media, or content-heavy platforms seeking to understand the business impact of advanced AI architectures

Software Architects designing systems to handle massive content collections and user-facing AI features

Speaker

Faye Zhang

Staff Software Engineer @Pinterest, Tech Lead on GenAI Search Traffic Projects, Speaker, Expert in AI/ML with a Strong Background in Large Distributed System

Faye Zhang is a staff AI engineer and tech lead at Pinterest, where she leads Multimodal AI work for search traffic discovery, driving significant user growth globally. She combines expertise in large-scale distributed systems with cutting-edge NLP and AI Agent research pursuits at Stanford. She also volunteers in AI x genomic science for mRNA sequence analysis with work published at multiple science journals. As a recognized thought leader, Faye regularly shares insights at conferences across San Francisco and Paris.

Speaker

Andi Partovi

Co-Founder @Veris AI, Making AI Agents World-Ready

Andi is the co-founder of Veris AI, an infrastructure AI company developing simulation environments for the continuous testing and training of AI agents in enterprise settings. Prior to founding Veris, he served as a Generative AI Solution Architect at Google, where for three years he advised enterprises and startups on the design and deployment of large language model applications. Andi holds a PhD in Computer Science from the University of Melbourne, and is a repeat founder with deep expertise in enterprise AI/ML.