From Reinforcement Learning Enhanced Image to AI Collection Generation: How Pinterest Cracked the Code on Content Discovery

Abstract

his talk presents Pinterest's journey in deploying AI at massive scale, from using Reinforcement Learning to create images to building multimodal AI and Agent systems that automatically organize billions of content items. We'll explore how we evolved from fixing algorithmic bias in diffusion models to creating millions of shopping collections that drive discovery for hundreds of millions of users globally.

The presentation covers two breakthrough systems: our RL-enhanced diffusion models that reduced gender and racial bias while improving human preference alignment by 80.3%, and PinLanding, our content-first architecture that achieved 4X coverage improvement over traditional search approaches.

The RL-enhanced diffusion model work was presented at the 18th European Conference on Computer Vision (ECCV 2024) conference in Milan, Italy. Both innovations resulted in US patent applications.

Main Takeaways:

  1. Reinforcement Learning transforms Stable Diffusion performance
  2. Content-first architecture scales better than behavior-based approaches
  3. Production-ready multimodal AI architecture patterns with Agents

Interview:

What is the focus of your work these days?

The majority time in my day is spent on:

  • Building and deploying large-scale AI systems that leverage reinforcement learning and multimodal architectures for content understanding and generation
  • Developing production-ready implementations of cutting-edge research (Stable Diffusion, CLIP, Vision-Language Models) that scale to billions of content items
  • Leading engineering teams that bridge research innovations with practical deployment challenges in high-traffic content platforms
  • Researching novel applications of RL for improving generative models and AI agents for automated content organization across diverse industry verticals
     

And what was the motivation behind your talk?

The motivation stems from the massive opportunity that reinforcement learning and multimodal AI represent for any industry managing large content collections. While most companies are still experimenting with basic LLM applications, we've moved beyond that to solve fundamental challenges in content generation and organization at unprecedented scale.

Our experience demonstrates that:

  • RL can dramatically improve any generative model: Our Stable Diffusion improvements aren't Pinterest-specific - they're applicable to any company using image generation for marketing, product design, or content creation
  • Multimodal AI is ready for production: Our content-first architecture patterns work for any large catalog - e-commerce product databases, media libraries, document repositories, or digital asset management systems
  • Scale reveals new opportunities: Moving from millions to billions of items reveals architectural insights that smaller-scale experiments miss

I want to share this because the techniques we've developed solve universal problems:

  • E-commerce platforms struggling to organize massive product catalogs
  • Media companies with overwhelming content libraries
  • Marketing teams needing better image generation capabilities
  • Any platform where users need to discover relevant content from vast collections for search traffic

Who is your session for?

ML Engineers implementing reinforcement learning for generative models or building multimodal AI pipelines in production

Data Scientists working with large-scale content catalogs, image generation, or content organization challenges

Engineering Leaders at e-commerce, media, or content platforms evaluating AI technology choices for catalog management and content discovery

Research Engineers bridging cutting-edge research (Stable Diffusion, CLIP, VLMs) with production deployment

Product Managers in retail, media, or content-heavy platforms seeking to understand the business impact of advanced AI architectures

Software Architects designing systems to handle massive content collections and user-facing AI features


Speaker

Faye Zhang

Staff Software Engineer @Pinterest, Tech Lead on GenAI Search Traffic Projects, Speaker, Expert in AI/ML with a Strong Background in Large Distributed System

Faye is a Staff Software Engineer at Pinterest, where she leads AI-driven search traffic initiatives and launched the company's first successful GenAI production experiment, driving significant user engagement growth. With a Computer Science degree from Georgia Tech and ongoing AI graduate studies at Stanford, she combines deep technical expertise in distributed systems with cutting-edge AI research. Her work spans both industry and academia, including contributions to university genomic science research. She regularly shares insights on AI innovation at technical conferences in San Francisco and Paris, focusing on scalable AI solutions that transform user experiences.

Read more
Find Faye Zhang at:

From the same track

Session

Dynamic Moments: Weaving LLMs into Deep Personalization at DoorDash

In this talk, we’ll walk through how DoorDash is redefining personalization by tightly integrating cutting-edge large language models (LLMs) with deep learning architectures such as Two-Tower Embeddings (TTE) and Multi-Task Multi-Label (MTML) models.

Speaker image - Sudeep Das

Sudeep Das

Head of Machine Learning and Artificial Intelligence, New Business Verticals @DoorDash, Previously Machine Learning Lead @Netflix, 15+ Years in Machine Learning

Speaker image - Pradeep Muthukrishnan

Pradeep Muthukrishnan

Head of Growth for New Business Verticals @DoorDash, Previously Founder & CEO @TrustedFor, 15+ Years in Machine Learning