Track:

Duration

Duration:

2:55pm - 3:45pm

Persona:

Architect
Data Scientist
Developer

Key Takeaways

Learn how Quora leverages machine learning algorithms to more correctly determine the quality of questions and answers.
Hear approaches to solving machine learning problems and walk away with a better understanding of when to employ machine learning algorithms.
Understand some of the issues facing Quora and how machine learning helped to address them.

Abstract

Quora is a high-quality knowledge platform used by more than 100M people every month. In this presentation, I will introduce various ML problems that are important for Quora to solve in order to keep our quality high at such a massive scale. I will also describe our approach to these problems and share some lessons from building and maintaining these system at production scale.

Interview

Question:

QCon: Can you explain your talk title to me?

Answer:

Nikhil: Quora, as you probably know, is a question and answer website. We think of ourselves as a knowledge platform. As we became larger (as we get more content and more people), the quality of content was expected to go down but we have managed to keep it high even at a scale of 100M monthly uniques and growing - mostly through machine learning. We do a lot of work behind the scenes using machine learning to keep the content quality high. A big part of this is understanding whether the content quality to start with is good or bad.

This talk is about a lot of the things we have done in the space over the last two or three years. I will be talking about a lot of different machine learning problems that we have formulated, as well as some of the approaches we have taken towards them. I will also be talking about approaches taken by other people that have been published in white papers.

Question:

QCon: Can you give me an example? I mean, are you talking like sentiment analysis or something similar? What types of things do you do?

Answer:

Nikhil: That’s a good question. One example is answer quality. Let’s say you have a large number of answers to a question, and you want to understand what the best answer to that question is. We could rank answers on the basis of upvotes and downvotes, but that’s not very good. For example, joke answers get a lot of upvotes. If someone writes a new answer (a quality answer), it’s likely never to get to the top- things like that. There are lots of other problems with voting based ranking.

So we utilize machine learning. We have trained models which given a question and an answer, can output a score denoting the quality of the answer. The higher the score, the better the answer is.

Another example is given two questions, you need to understand if they are asking the same thing or not. Are the questions essentially duplicates? The language might be very different. One might be even a subset of the other, but even so, if they are talking about the same thing you’ll want to have a single canonical question.

These are just two examples. I will discuss a much longer list of such things that we do and discuss them as machine learning problems.

Question:

QCon: How do you develop models for this domain? How do you develop something that you can test whether a question (or an answer) is any good?

Answer:

Nikhil: I think that’s the most interesting part of the life cycle of machine learning solution. The problem with these systems is that there is no obvious source of data to train and benchmark. A problem like answer quality prediction is not like ad prediction where you just maximize the clicks themselves. I think this is where the secret sauce is.

We first try to define what we want the algorithm to optimize for in a very subjective/human form, and then somehow bootstrap data using different techniques unique for the problem. Some of these techniques might work for one problem domain but not for the other. It’s really important for us to iterate so that way we can test if the model is actually improving things or not. I will be talking about these things in this talk.

Question:

QCon: How would you rate this talk: Beginner, Intermediate, or Advanced?

Answer:

Nikhil: I think it depends on the background of the attendees and their area of interest. If I were to classify it, I would call it an intermediate talk. I don’t plan on going into a lot of math (maybe none at all). But I will be talking about content. People who have never worked in the problem domain might have some difficulty understanding why this will be useful to them.

Question:

QCon: What are the key takeaways for this talk?

Answer:

Nikhil: I think the most important thing would be using machine learning in the right places. I think a lot of people are just asking machine learning to optimize for the wrong things. Optimizing for the right things is much harder because you might not have the data, and it’s harder to formulate problems like that. I’m hoping that attendees will walk away with a much better understanding of how they can make their machine learning algorithms do more nuance subjectivity predictions than they are doing right now.

Question:

QCon: What do you feel is the most disruptive tech in IT right now?

Answer:

Nikhil: I think machine learning becoming more mainstream is the most disruptive change in the industry. I am not specifically talking about deep learning. I think deep learning is creating this hype that is getting more and more people to try out things. But a much wider adoption of machine learning in the industry is an extremely revolutionary change and should have far reaching consequences.

Speaker: Nikhil Garg

Engineering Manager @Quora

Nikhil Garg is an engineering manager at Quora where he is leading a team of great engineers working on various machine learning and infrastructure problems. He is very interested in distributed systems, machine learning, and product design.

Find Nikhil Garg at

Speaker page

https://github.com/nikhilgarg28

IBM Distinguished Engineer

Mark Vanderwiele

Stranger Things: The Forces that Disrupt Netflix

Senior Software Engineer, Playback Features @Netflix

Haley Tucker

99.99% Availability via Smart Real-Time Alerting

Data Science Manager @Uber

Franziska Bell

Creating A Culture of Observability at Stripe

Observability Specialist @Stripe

Cory Watson

Migrating to a Fault Tolerant System with Spanner

Software Engineer @Google

Edwin Fuquen

Freeing the Whale: How to Fail at Scale

CTO @Buoyant

Oliver Gould

Automating Chaos Experiments In Production

Senior Software Engineer @Netflix

Ali Basiri

Architecting for Failure in a Containerized World

Principle Data Analysis Leader @Infolace

Tom Faulhaber

Further Together: Curated Pairing Culture @Pivotal

Software Engineer @Pivotal

Neha Batra

Tracks

Monday Nov 7

Architectures You've Always Wondered About

You know the names. Now learn lessons from their architectures
Distributed Systems War Stories

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
Containers Everywhere

State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations

Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques

JavaScript, HTML5, WASM, and more... innovations targetting the browser
Optimize You

Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.

Tuesday Nov 8

Next Generation Microservices

What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?

Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud

Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Evolving DevOps

Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills

Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas

Wednesday Nov 9

Architecting for Failure

Your system will fail. Take control before it takes you with it.
Stream Processing

Stream Processing, Near-Real Time Processing
Bare Metal Performance

Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator

The why and how for building successful engineering cultures
//TODO: Security <-- fix this

Building security from the start. Stories, lessons, and innovations advancing the field of software security.
UX Reimagined

Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.

SCHEDULE

Duration

Persona:

Key Takeaways

Abstract

Interview

Find Nikhil Garg at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Scaling Quality On Quora Using Machine Learning

Duration

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Nikhil Garg at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World