Warning message

The service having id "twitter" is missing, reactivate its module or save again the list of services.
The service having id "facebook" is missing, reactivate its module or save again the list of services.
The service having id "google_plus" is missing, reactivate its module or save again the list of services.
The service having id "linkedin" is missing, reactivate its module or save again the list of services.

Track:

Applied Machine Learning

Location:

Pacific LMNO

Duration

Duration:

10:35am - 11:25am

Key Takeaways

Learn real-world approaches to developing reusable machine learning models.
Discover approaches to making data science accessible to everyone.
Hear lessons learned from building a modular, reusable a Machine Learning pipeline at Salesforce.

Abstract

80-90% of data science is data cleaning and feature engineering. However, if we were to plot a count of what all the data science tools are for, we would find that most innovation happens in data infrastructure and modeling. We want to change that and make data scientists much more productive while also improving the quality of their work.

In this talk I will describe the machine learning platform we wrote on top of spark to modularize these steps. This allows easy reuse of components, simplifying model building and changes. The framework simplifies the data preparation and feature building stages with reusable classes for each data source, making subsequent feature generation a matter of a few lines of code.

Model selection classes wrap the functionality of pre-existing and custom algorithms to provide a uniform interface for modeling, allowing rapid iteration in model evolution. By breaking the machine learning process into a series of simple to implement and interchangeable pieces we have democratized the process of building machine learning models.

Interview with Leah McGuire

QCon: Leah it sounds like you’re going to be talking a lot about a product you’ve built at Salesforce, is this a product talk?

Leah: I think I'll talk mostly about what the biggest challenges are in machine learning, so the time sinks and the way you build machine learning models. I mean, the way machine learning models are generally built is to start from scratch and build up your data set, feature design, and do model selection. For example, this was the case at LinkedIn. The data is being used by hundreds of other data scientists to build similar models, but none of that work is reusable. So I'll talk about how we made it possible to re-use a lot of the feature extractions, feature cleaning, etc. that you have to do in order to do machine learning.

QCon: Are the problems you plan to discuss specific to Salesforce or something applicable to everyone?

Leah: The parts of the system I'm going to describe will be generally for the people who are thinking about the same issues, because they have to build some of these different models based on the same data sources. I think it can help with that. I think the principles I'm going to describe and the techniques will be very useful to people who want to make building machine learning models more efficient. These are the designs for building to solve those problems.

QCon: So I have to ask, why didn’t you you just use the Spark Machine Learning Pipeline?

Leah: There were a couple of reasons. The first was that it treats everything sequentially. So, basically, if you wanted to do a lot of transformations on your data, you would have to do them all sort of chained up. The second was that it didn't allow for non-deterministic transformations. So for example, if you need to pivot your data, the only way you can do that with the Spark ML framework is if you know exactly what was supposed to come out at the end, which is not always the case.

QCon: What is the key point you like people to leave your talk with:

Leah: The key is don't think of machine learning as a one time project, because it's something that you want to integrate into your product in many different ways. If you're smart and you build it like you're building another architecture, you're going to save yourself a lot of time in the future. No one wants to write the same function over and over with slight changes, but that's what happens a lot in machine learning, and it's really not necessary if you think about it.

QCon: So will you dive into any code examples?

Leah: Undoubtedly, I will have code examples. This is all built on Spark and Scala.

I think the code will be more examples of how you can implement specific ideas, it's not going to be like this is the way you should code this up. More like if you want to make reusable transformations, this is the kind of interface that you might write for that.

Similar Talks

Spark: A Coding Joyride

Dir. of Training @NewCircle

Doug Bateman

Rethinking Streaming Analytics For Scale

VP of Product Engineering @Tuplejump

Helena Edelson

Stylus, Facebook's new stream processing platform

Lead developer of Facebook's Stylus

Jerry Chen

Resilience planning & how the empire strikes back

Senior Software Engineer @BlueJeansNetwork

Bhakti Mehta

Preparing PayPal for Launch

VP of Global Platform and Infrastructure @PayPal

Sri Shivananda

Beyond the Hype: 4 years of Go in Production

CTO & Iron.io Co-founder

Travis Reeder

How NOT to measure Latency

CTO and co-founder @AzulSystems

Gil Tene

Flying faster with Heron

Engineering Manager and Technical Lead for Real Time Analytics @Twitter

Karthik Ramasamy

Alibaba Mobile Infrastructure at "China Scale"

Senior Director for Alibaba Wireless Division

Zhuoran Zhuang

Tracks

Covering innovative topics

Monday Nov 16

Architectures You've Always Wondered About

Silicon Valley to Beijing: Exploring some of the world's most intrigiuing architectures
Applied Machine Learning

How to start using machine learning and data science in your environment today. Latest and greatest best practices.
Browser as a platform (Realizing HTML5)

Exciting new standards like Service Workers, Push Notifications, and WebRTC are making the browser a formidable platform.
Modern Languages in Practice

The rise of 21st century languages: Go, Rust, Swift
Org Hacking

Our most innovative companies reimagining the org structure
Design Thinking

Level up your approach to problem solving and leave everything better than you found it.

Tuesday Nov 17

Containers in Practice

Build resilient, reactive systems one service at a time.
Architecting for Failure

Your system will fail. Take control before it takes you with it.
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
The Amazing Potential of .NET Open Source

From language design in the open to Rx.NET, there is amazing potential in an Open Source .NET
Optimizing You

Keeping life in balance is always a challenge. Learning lifehacks
Unlearning Performance Myths

Lessons on the reality of performance, scale, and security

Wednesday Nov 18

Streaming Data @ Scale

Real-time insights at Cloud Scale & the technologies that make them happen!
Taking Java to the Next Level

Modern, lean Java. Focuses on topics that push Java beyond how you currently think about it.
The Dark Side of Security

Lessons from your enemies
Taming Distributed Architecture

Reactive architectures, CAP, CRDTs, consensus systems in practice
JavaScript Everywhere!

Javascript is Everywhere. Learn why
Culture Reimagined

Lessons on building highly effective organizations

Schedule

Warning message

Location:

Duration

Key Takeaways

Abstract

Interview with Leah McGuire

Find Leah McGuire at

Similar Talks

Tracks

Covering innovative topics

Monday Nov 16

Tuesday Nov 17

Wednesday Nov 18

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Warning message

Presentation: The lego model for machine learning pipelines

Location:

Duration

More talks on:

Key Takeaways

Abstract

Interview with Leah McGuire

Find Leah McGuire at

Similar Talks

Tracks

Covering innovative topics

Monday Nov 16

Tuesday Nov 17

Wednesday Nov 18

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World