Abstract
This talk presents a comprehensive journey through modern AI post-training techniques, from Pinterest's production-scale content discovery systems to enterprise agent training through Veris AI’s simulation. We'll explore how reinforcement learning and supervised fine-tuning bridge the critical gap between base model capabilities and real-world performance across two distinct but complementary domains.
We begin by exploring industry advances in RL-enhanced diffusion models and their impact on bias reduction and human preference alignment. We then dive into Pinterest's implementation of these techniques at scale for content generation via PinLanding, a multimodal content-first architecture that turns billions of content into shopping collections.
We then transition to the broader challenge of agent training, presenting a general-purpose simulation sandbox approach that generates high-fidelity training data for task-based agents. This system bridges theory and practice, showing how to transform LLM knowledge into agent experience through controlled environments that mirror real enterprise workflows.
Both systems demonstrate how post-training techniques (RL, SFT, and curriculum learning) solve the "demo-to-production" gap that plagues AI deployments, whether for content generation or autonomous task execution.
Interview:
What is the focus of your work these days?
The majority time in my day is spent on:
- Building and deploying large-scale AI systems that leverage reinforcement learning and multimodal architectures for content understanding and generation
- Developing production-ready implementations of cutting-edge research (Stable Diffusion, CLIP, Vision-Language Models) that scale to billions of content items
- Leading engineering teams that bridge research innovations with practical deployment challenges in high-traffic content platforms
- Researching novel applications of RL for improving generative models and AI agents for automated content organization across diverse industry verticals
And what was the motivation behind your talk?
The motivation stems from the massive opportunity that reinforcement learning and multimodal AI represent for any industry managing large content collections. While most companies are still experimenting with basic LLM applications, we've moved beyond that to solve fundamental challenges in content generation and organization at unprecedented scale.
Our experience demonstrates that:
- RL can dramatically improve any generative model: Our Stable Diffusion improvements aren't Pinterest-specific - they're applicable to any company using image generation for marketing, product design, or content creation
- Multimodal AI is ready for production: Our content-first architecture patterns work for any large catalog - e-commerce product databases, media libraries, document repositories, or digital asset management systems
- Scale reveals new opportunities: Moving from millions to billions of items reveals architectural insights that smaller-scale experiments miss
I want to share this because the techniques we've developed solve universal problems:
- E-commerce platforms struggling to organize massive product catalogs
- Media companies with overwhelming content libraries
- Marketing teams needing better image generation capabilities
- Any platform where users need to discover relevant content from vast collections for search traffic
Who is your session for?
ML Engineers implementing reinforcement learning for generative models or building multimodal AI pipelines in production
Data Scientists working with large-scale content catalogs, image generation, or content organization challenges
Engineering Leaders at e-commerce, media, or content platforms evaluating AI technology choices for catalog management and content discovery
Research Engineers bridging cutting-edge research (Stable Diffusion, CLIP, VLMs) with production deployment
Product Managers in retail, media, or content-heavy platforms seeking to understand the business impact of advanced AI architectures
Software Architects designing systems to handle massive content collections and user-facing AI features
Speaker

Faye Zhang
Staff Software Engineer @Pinterest, Tech Lead on GenAI Search Traffic Projects, Speaker, Expert in AI/ML with a Strong Background in Large Distributed System
Faye Zhang is a staff AI engineer and tech lead at Pinterest, where she leads Multimodal AI work for search traffic discovery, driving significant user growth globally. She combines expertise in large-scale distributed systems with cutting-edge NLP and AI Agent research pursuits at Stanford. She also volunteers in AI x genomic science for mRNA sequence analysis with work published at multiple science journals. As a recognized thought leader, Faye regularly shares insights at conferences across San Francisco and Paris.