Conference: Nov 13-15, 2017
Workshops: Nov 16-17, 2017
Presentation: Scaling Quality On Quora Using Machine Learning
Duration
Persona:
- Architect
- Data Scientist
- Developer
Key Takeaways
- Learn how Quora leverages machine learning algorithms to more correctly determine the quality of questions and answers.
- Hear approaches to solving machine learning problems and walk away with a better understanding of when to employ machine learning algorithms.
- Understand some of the issues facing Quora and how machine learning helped to address them.
Abstract
Quora is a high-quality knowledge platform used by more than 100M people every month. In this presentation, I will introduce various ML problems that are important for Quora to solve in order to keep our quality high at such a massive scale. I will also describe our approach to these problems and share some lessons from building and maintaining these system at production scale.
Interview
Nikhil: Quora, as you probably know, is a question and answer website. We think of ourselves as a knowledge platform. As we became larger (as we get more content and more people), the quality of content was expected to go down but we have managed to keep it high even at a scale of 100M monthly uniques and growing - mostly through machine learning. We do a lot of work behind the scenes using machine learning to keep the content quality high. A big part of this is understanding whether the content quality to start with is good or bad.
This talk is about a lot of the things we have done in the space over the last two or three years. I will be talking about a lot of different machine learning problems that we have formulated, as well as some of the approaches we have taken towards them. I will also be talking about approaches taken by other people that have been published in white papers.
Nikhil: That’s a good question. One example is answer quality. Let’s say you have a large number of answers to a question, and you want to understand what the best answer to that question is. We could rank answers on the basis of upvotes and downvotes, but that’s not very good. For example, joke answers get a lot of upvotes. If someone writes a new answer (a quality answer), it’s likely never to get to the top- things like that. There are lots of other problems with voting based ranking.
So we utilize machine learning. We have trained models which given a question and an answer, can output a score denoting the quality of the answer. The higher the score, the better the answer is.
Another example is given two questions, you need to understand if they are asking the same thing or not. Are the questions essentially duplicates? The language might be very different. One might be even a subset of the other, but even so, if they are talking about the same thing you’ll want to have a single canonical question.
These are just two examples. I will discuss a much longer list of such things that we do and discuss them as machine learning problems.
Nikhil: I think that’s the most interesting part of the life cycle of machine learning solution. The problem with these systems is that there is no obvious source of data to train and benchmark. A problem like answer quality prediction is not like ad prediction where you just maximize the clicks themselves. I think this is where the secret sauce is.
We first try to define what we want the algorithm to optimize for in a very subjective/human form, and then somehow bootstrap data using different techniques unique for the problem. Some of these techniques might work for one problem domain but not for the other. It’s really important for us to iterate so that way we can test if the model is actually improving things or not. I will be talking about these things in this talk.
Nikhil: I think it depends on the background of the attendees and their area of interest. If I were to classify it, I would call it an intermediate talk. I don’t plan on going into a lot of math (maybe none at all). But I will be talking about content. People who have never worked in the problem domain might have some difficulty understanding why this will be useful to them.
Nikhil: I think the most important thing would be using machine learning in the right places. I think a lot of people are just asking machine learning to optimize for the wrong things. Optimizing for the right things is much harder because you might not have the data, and it’s harder to formulate problems like that. I’m hoping that attendees will walk away with a much better understanding of how they can make their machine learning algorithms do more nuance subjectivity predictions than they are doing right now.
Nikhil: I think machine learning becoming more mainstream is the most disruptive change in the industry. I am not specifically talking about deep learning. I think deep learning is creating this hype that is getting more and more people to try out things. But a much wider adoption of machine learning in the industry is an extremely revolutionary change and should have far reaching consequences.
Similar Talks
.
Tracks
Monday Nov 7
-
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
-
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
-
Containers Everywhere
State of the art in Container deployment, management, scheduling
-
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
-
Next Generation Web Standards, Frameworks, and Techniques
JavaScript, HTML5, WASM, and more... innovations targetting the browser
-
Optimize You
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
-
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
-
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
-
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
-
Evolving DevOps
Lessons/stories on optimizing the deployment pipeline
-
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
-
Architecting for Failure
Your system will fail. Take control before it takes you with it.
-
Stream Processing
Stream Processing, Near-Real Time Processing
-
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
-
Culture as a Differentiator
The why and how for building successful engineering cultures
-
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
-
UX Reimagined
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.