Warning message

The service having id "twitter" is missing, reactivate its module or save again the list of services.
The service having id "facebook" is missing, reactivate its module or save again the list of services.
The service having id "google_plus" is missing, reactivate its module or save again the list of services.
The service having id "linkedin" is missing, reactivate its module or save again the list of services.

Track:

Applied Machine Learning

Location:

Pacific LMNO

Duration

Duration:

2:55pm - 3:45pm

Key Takeaways

Understand when you build machine learning into your product you need to make sure you build it into your infrastructure and processes
Hear data processing and architecture best practices in use at Pinterest
Learn lessons around ranking, personalizing, and recommending functions at extremely high scale

Abstract

The Pinterest Homefeed personalizes and ranks 1B+ pins for 100M+ users on Pinterest, using data gathered from collaborative filtering, user curation, web crawl, and many more. This talk will give an overview of the system and focus on effective engineering choices made to enable productive ML development. To have multiple engineers effectively develop, test, and deploy machine-learned models for the Pinterest Homefeed, we’ve built a system that allows for continuous training and feature gathering. We will discuss our signal gathering framework, and our system to enable config-based featurization, offline training, and online classification to be driven by a single system. Additionally we will discuss other engineering constraints we’ve built around the system to satisfy business rules and requirements.

Interview with Dmitry Chechik

QCon: Can you give me an idea of the scale you are talking about?

Dmitry: We personalized homefeeds for over 100 million users and provide recommendations for over 1 billion unique items. If you compare that to most other recommendations (like movies which are on the order of 10’s of thousands or songs on the order single’s of millions), we are several orders of magnitude bigger. So part of the scaling challenge is understanding how to build and design a system under those constraints.

QCon: What does your ML stack look like?

Dmitry: There are a number of things we built ourselves and a number of things we are built on top of. In terms of our serving infrastructure, we are built on top of HBase as a storage layer and several java services that do ranking, recommendation, and building feeds.

We have a pretty heavily offline stack with offline hadoop jobs using Cascading, Hive and other technologies to process and build data offline. One of the main things I want to discuss is a case study of a domain specific language used for modeling, machine learning, and feature transformation. That language is the thing that allows us to scale the engineering aspects of the work.

QCon: Is this DSL something that is specific to Pinterest or is it available for other people to use?

Dmitry: Right now, it is something specific to Pinterest, but we may make it available to others. It is a pretty generic system that we can potentially use to serve many different kinds of models such as reading data from many different data sources.

What I’m going to focus on with the DSL is several principles common to any system that attempts to use machine learning and classification online. Things like being able to:

make experimentation really easy for engineers

easily push changes online

consume a variety of datasource

combine all the work that goes into a model (which is everything from joins, to feature transforms to the actual classifier and weights) as a single package

The focus of the talk will be about using the DSL as a case study about how to take machine learning from being a one-off solution to a repeatable part of your infrastructure.

QCon: What did you mean when you said: “you need to make sure you build machine learning into your infrastructure and processes.”

Dmitry: It’s answering questions like how do you make the process of data collection easy and make that something you can iterate and move forward as your underlying data changes or as you add data types. How do you make the process of building models easy? How do you make it so that your online and offline systems are really 1-to-1, and how do you make sure they are working in tandem? How do you make sure the work you do offline when you are training data transfers to online? Finally, how do you make engineers productive?

Build your systems in a way to make online experimentation easy, to make it easy to get data quickly, and make your online classifier environment be as similar to your offline environment as possible.

QCon: Who do you feel is the main type of person you are talking to in your talk?

Dmitry: I think there are two kinds of people that will benefit from the talk. First is someone who has been very focused on the machine learning aspect and wants to be able to figure out how to scale the system over the long term and make it something that other ML folks can contribute to as well. The other person is someone working on infrastructure for machine learning. So this is someone who is working with a data scientist or machine learning researcher to build out infrastructure and scale a machine learning product. These are some of the pieces that wind up being essential and that everyone has to build out, and it’s worth thinking about them early.

Similar Talks

Spark: A Coding Joyride

Dir. of Training @NewCircle

Doug Bateman

Rethinking Streaming Analytics For Scale

VP of Product Engineering @Tuplejump

Helena Edelson

Stylus, Facebook's new stream processing platform

Lead developer of Facebook's Stylus

Jerry Chen

Resilience planning & how the empire strikes back

Senior Software Engineer @BlueJeansNetwork

Bhakti Mehta

Preparing PayPal for Launch

VP of Global Platform and Infrastructure @PayPal

Sri Shivananda

Beyond the Hype: 4 years of Go in Production

CTO & Iron.io Co-founder

Travis Reeder

How NOT to measure Latency

CTO and co-founder @AzulSystems

Gil Tene

Flying faster with Heron

Engineering Manager and Technical Lead for Real Time Analytics @Twitter

Karthik Ramasamy

Alibaba Mobile Infrastructure at "China Scale"

Senior Director for Alibaba Wireless Division

Zhuoran Zhuang

Tracks

Covering innovative topics

Monday Nov 16

Architectures You've Always Wondered About

Silicon Valley to Beijing: Exploring some of the world's most intrigiuing architectures
Applied Machine Learning

How to start using machine learning and data science in your environment today. Latest and greatest best practices.
Browser as a platform (Realizing HTML5)

Exciting new standards like Service Workers, Push Notifications, and WebRTC are making the browser a formidable platform.
Modern Languages in Practice

The rise of 21st century languages: Go, Rust, Swift
Org Hacking

Our most innovative companies reimagining the org structure
Design Thinking

Level up your approach to problem solving and leave everything better than you found it.

Tuesday Nov 17

Containers in Practice

Build resilient, reactive systems one service at a time.
Architecting for Failure

Your system will fail. Take control before it takes you with it.
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
The Amazing Potential of .NET Open Source

From language design in the open to Rx.NET, there is amazing potential in an Open Source .NET
Optimizing You

Keeping life in balance is always a challenge. Learning lifehacks
Unlearning Performance Myths

Lessons on the reality of performance, scale, and security

Wednesday Nov 18

Streaming Data @ Scale

Real-time insights at Cloud Scale & the technologies that make them happen!
Taking Java to the Next Level

Modern, lean Java. Focuses on topics that push Java beyond how you currently think about it.
The Dark Side of Security

Lessons from your enemies
Taming Distributed Architecture

Reactive architectures, CAP, CRDTs, consensus systems in practice
JavaScript Everywhere!

Javascript is Everywhere. Learn why
Culture Reimagined

Lessons on building highly effective organizations

Schedule

Warning message

Location:

Duration

Key Takeaways

Abstract

Interview with Dmitry Chechik

Find Dmitry Chechik at

Similar Talks

Tracks

Covering innovative topics

Monday Nov 16

Tuesday Nov 17

Wednesday Nov 18

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Warning message

Presentation: Personalization in the Pinterest Homefeed

Location:

Duration

More talks on:

Key Takeaways

Abstract

Interview with Dmitry Chechik

Find Dmitry Chechik at

Similar Talks

Tracks

Covering innovative topics

Monday Nov 16

Tuesday Nov 17

Wednesday Nov 18

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World