Track: Applied AI & Machine Learning

Location: Pacific LMNO

Day of week: Wednesday

Machine learning will soon have a profound effect on every industry on the planet. This revitalized wave generates huge demands and challenges for software developers with this expertise. The Practical Machine Learning track will focus on how developers can successfully build real world machine learning models based on the proven techniques using viable APIs and frameworks. Since its critical to using machine learning in our applications, we’ll also cover the best practices for collecting and preprocessing data, choosing and building models; these are some of the biggest challenges in putting machine learning in production.

Track Host: Sid Anand

Chief Data Engineer @PayPal

Sid Anand currently serves as PayPal's Chief Data Engineer, focusing on ways to realize the value of data. Prior to joining PayPal, he held several positions including Agari's Data Architect, a Technical Lead in Search @ LinkedIn, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. In his spare time, he is a maintainer/committer on Apache Airflow, a co-chair for QCon, and a frequent speaker at conferences. When not working, Sid spends time with his wife, Shalini, and their 2 kids.

CASE STUDY TALK (50 MIN)

10:35am - 11:25am

Human-Centric Machine Learning Infrastructure @Netflix

Netflix has over 100 data scientists applying machine learning to a wide range of business problems from title popularity predictions to quality of streaming optimizations. Our unique culture gives data scientists plenty of freedom to choose the modeling approach, libraries, and even the programming language that will make them productive at solving problems. However, we want to balance this freedom by providing a solid infrastructure for machine learning, ensuring models can be promoted quickly and reliably from prototype to production, and enabling reproducible and easily shareable results.

We started building this infrastructure a little over a year ago with a human-centric mindset. Many existing open-source machine learning frameworks are great at making advanced modeling possible. The job of our ML infrastructure is to make it remarkably easy to apply these frameworks to real business problems at Netflix. We have found that this requires an infrastructure that covers the day-to-day challenges of data scientists holistically, from understanding input data to building trust with consumers of models, not just the parts that are directly related to fitting and scoring models.

Come learn the techniques and underlying principles driving our approach, which you'll be able to adapt and apply to your own use cases.

Ville Tuulos, Machine Learning Infrastructure Engineer @Netflix
CASE STUDY TALK (50 MIN)

11:50am - 12:40pm

Deep Representation: Building a Semantic Image Search Engine

Many problems combine Natural Language Processing and Computer Vision.  Sharing his experience of having led over a hundred applied AI projects at Insight, Emmanuel will give a step by step tutorial on how to build a semantic search engine for text and images, with code included! The approaches presented extend naturally to other applications such as image and video captioning, reading text from videos, selecting optimal thumbnails and generating code from sketches of websites (all projects that were tackled at Insight), and more!

Emmanuel Ameisen, Head of AI @InsightDataSci
CASE STUDY TALK (50 MIN)

1:40pm - 2:30pm

Nearline Recommendations for Active Communities @LinkedIn

At LinkedIn, our mission is to use AI to connect every member of the global workforce to make them more productive and successful. The social network is the backbone for professionals to engage with each other at every stage of their career. In the first half of this talk, I will focus on technologies we have built to power LinkedIn’s “People You May Know” product, the primary driver to connect the world’s professionals to each other to form a basic community. Our platform allows for triangle closing and other graph walk algorithms in real time. It also allows models to consider near real-time features based on a user’s context. We will demonstrate improvements through AB tests. We will then move on to discuss work done in predicting the downstream impact of forming an edge between two members on the overall activity of our ecosystem. We will show that how a member’s network evolves plays an important role in their downstream engagement. Finally, we will present our work on near real-time optimization of activity-based notifications that ensure that our members never miss a conversation that matters. We will describe our nearline platform for notification recommendation and show through experiments that delivering the right information to the right user (through better content targeting) at the right time (through delivery time optimization and message spacing) is critical to building an actively engaged community.

Hema Raghavan, Senior Manager & Heading AI for Growth and Communication Relevance @LinkedIn
CASE STUDY TALK (50 MIN)

2:55pm - 3:45pm

Fairness, Transparency, and Privacy in AI @LinkedIn

How do we protect privacy of users in large-scale systems? How do we ensure fairness and transparency when developing machine learned models? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical and legal challenges encountered by researchers and practitioners alike. In this talk, we will first present an overview of privacy breaches as well as algorithmic bias / discrimination issues observed in the Internet industry over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving privacy and fairness in data-driven systems. We will motivate the need for adopting a "privacy and fairness by design" approach when developing data-driven AI/ML models and systems for different consumer and enterprise applications. We will also focus on the application of privacy-preserving data mining and fairness-aware machine learning techniques in practice, by presenting case studies spanning different LinkedIn applications, and conclude with the key takeaways and open challenges.

Krishnaram Kenthapadi, Tech Lead Fairness, Transparency, Explainability & Privacy Efforts @LinkedIn
CASE STUDY TALK (50 MIN)

4:10pm - 5:00pm

Visualizing Machine Learning Models in Jupyter Noteboooks

Jupyter Notebooks are becoming the IDE of choice for data scientists and researchers. They provide the users with a nice exploratory environment where they can quickly research and prototype different models and visualize the results all in one place. Notebooks are easy to share and can be converted into documents/slides to present to stakeholders. 

With widget libraries like ipywidgets and bqplot, users can create rich interactive web app like functionality with just a few lines of python code. 

In this talk, we will see how we can understand and visualize machine learning models using interactive widgets. In the first part of the talk, I'll introduce the widget libraries and walk you through the code of a simple example so we understand how to assemble and link these widgets. Then we'll look at models like regressions, clustering and finally a wizard for building and training deep learning models with diagnostic plots.

Chakri Cherukuri, Senior Researcher in the Quantitative Financial Research Group @Bloomberg

Tracks

Monday, 5 November

Tuesday, 6 November

Wednesday, 7 November