Track:

Duration

Duration:

1:40pm - 2:30pm

Persona:

Architect
Data Scientist
Developer

Key Takeaways

Understand the importance of focusing on search queries to determine user intent.
Gain deeper insights into search behaviors, such as: search suggestions, prediction, and query rewriting.
Hear a list of quick wins you can walk away with that will increase the understanding and performance of search.

Abstract

Query understanding is about focusing less on the search results and more on the query itself. It's about figuring out what the searcher wants, rather than scoring and ranking results. Once you have established a query understanding mindset, your overall approach to search changes: you focus on query performance rather than ranking. In particular, you pay more attention to query suggestions, especially those generated through autocomplete.

In this talk, I'll show you what search looks like when viewed through a query understanding mindset. I'll focus on query performance prediction, query rewriting, and search suggestions. If you work on search problems, then come to this talk to discover opportunities for quick wins and longer-term investments in your search stack. Even if you don't work on search problems, seize this opportunity to gain a perspective on search that you won't find in an information retrieval textbook.

Interview

Question:

QCon: What is the main focus of your work today?

Answer:

Daniel: Since leaving LinkedIn in mid-2015 (after 4.5 years there), I’ve been advising and consulting for a variety of companies. These companies range from early-stage startups to established public companies. My specialty is search and discovery (query understanding in particular), but I generally help them make decisions around algorithms, technology, product strategy, hiring, and organizational structure.

Question:

QCon: Can you explain your talk title to me?

Answer:

Daniel: Ever since my pioneering work on faceted search at Endeca, I’ve been an evangelist for a more query-centric approach to search. While I see value in developing better ranking algorithms to improve search relevance, I feel that as an industry we’ve overemphasized result ranking and neglected what we can do with the queries themselves. While Endeca focused on query refinement, my later work has emphasized the entire query lifecycle.

My argument is simple: instead of treating search relevance as a result-ranking problem, let’s focus on the query understanding problem. This view is somewhat contrarian in the information retrieval space. My talk is not just an exposition of query understanding techniques, it is also a manifesto to persuade people to change their philosophical approach to search relevance.

Question:

QCon: Is there a difference when you are trying to analyze the query between consumer facing searches versus an enterprise typical search, internal search? Daniel: I think that the query workload is very much a function of the application and perhaps more broadly of the domain.

Answer:

We often classify web search queries into navigational, informational, and transactional queries. E-commerce sites distinguish between specific product searches and category searches. Sites like LinkedIn or Facebook have name searches and searches based on people's characteristics.

So the domain certainly matters, but there are common themes. Let’s look at an example, like daniel carnegie mellon, a query we might see on a site like LinkedIn or Facebook.

The first step is to tokenize the query and segment the queries into entities, i.e., daniel, carnegie mellon. We then want to associate those entities with classes, i.e., First Name: Daniel, School: Carnegie Mellon. This leads you to another level of understanding where you can infer that the searcher is looking for a person whose first name is Daniel and who attended Carnegie Mellon.

The interpretation stack tends to have a lot of commonality across domains. But the particular query understanding challenges can be highly domain-dependent.

For example, how do you identify entities and their associated classes? Are your classes clearly distinct from one another, or do you have to worry about class overlap? Even when you correctly identify the entities in a query, could there still be ambiguity in the overall query intent?

How hard it is to address each of these questions is domain-dependent.

Question:

QCon: What do you think are the implications of a conversational interface like Alexia or Siri? Are there different complexities to understanding the query?

Answer:

Daniel: Absolutely. If you think about where search was going before voice and conversational interfaces, it was heading in a direction where, instead of seeing full queries, we were seeing instant search suggestions and even instant search results. As Google said in its "10 things we know to be true", fast is better than slow. In fact, search suggestions do more than save time and reduce effort; they also guide the searcher to better queries.

But how do we apply the instant search suggestions in the context of natural language and voice? Do we give up on them and instead require the users to enter complete sentences before giving them feedback? To me, that feels like a huge step back, all in the name of providing a more natural interface.

At the same time, we know that machines interrupting searchers to complete their sentences is probably not going to work. It's like a Clippy 2.0.

So I worry that we’re making a big sacrifice just because we believe people prefer a voice interface.

I’m also curious how our interactions with a conversational search engine will compare to our interactions with one another. People pay attention to tone of voice. We perceive the nuances in everything ranging from how we speak to their accompanying body language (or even the look on someone’s face). We don't have that today with our voice-based applications, and it seems like we are not getting there with video yet, despite (in principle) the ability to do that. There's been some research at analyzing people’s faces to predict searcher frustration, but no practical application of this or related research as far as I know.

In general, I feel that we are in an uncanny valley, as far as the way machines interact with us. It’s hard to know at what point we will overcome it. I think that (choosing a different domain) it’s like Pixar's movies that have made animation truly on par with live action. We are not yet there with our machines trying to interact with us like people.

Question:

QCon: How would you rate the level of this talk?

Answer:

Daniel: Intermediate. It will be most useful to people -- particularly engineers, data scientists, and product managers -- who know something about search. But I’ll try to make the talk sufficiently self-contained to be useful to any technical generalist with an interest in search.

Question:

QCon: QCon targets advanced architects and Sr development leads, what do you feel will be actionable for that type of persona in your talk?

Answer:

Daniel: I believe that anyone who is responsible for a search engine or search-based application will walk away with a list of quick wins to improve relevance through query rewriting, a better scoring function for search suggestions, etc. And I hope that I’ll have influenced their longer-term strategy for improving search relevance, thus enabling them to better prioritize their roadmap.

Speaker: Daniel Tunkelang

Data Scientist, Author of "Faceted Search"

Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He currently advises and consults for various companies on search and discovery. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He then led a local search team at Google. After than, he was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He wrote the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents.

Find Daniel Tunkelang at

Speaker page

@dtunkelang

IBM Distinguished Engineer

Mark Vanderwiele

Understanding Hardware Transactional Memory

CTO @AzulSystems

Gil Tene

Stranger Things: The Forces that Disrupt Netflix

Senior Software Engineer, Playback Features @Netflix

Haley Tucker

99.99% Availability via Smart Real-Time Alerting

Data Science Manager @Uber

Franziska Bell

Creating A Culture of Observability at Stripe

Observability Specialist @Stripe

Cory Watson

Migrating to a Fault Tolerant System with Spanner

Software Engineer @Google

Edwin Fuquen

Freeing the Whale: How to Fail at Scale

CTO @Buoyant

Oliver Gould

Automating Chaos Experiments In Production

Senior Software Engineer @Netflix

Ali Basiri

Architecting for Failure in a Containerized World

Principle Data Analysis Leader @Infolace

Tom Faulhaber

Tracks

Monday Nov 7

Architectures You've Always Wondered About

You know the names. Now learn lessons from their architectures
Distributed Systems War Stories

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
Containers Everywhere

State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations

Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques

JavaScript, HTML5, WASM, and more... innovations targetting the browser
Optimize You

Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.

Tuesday Nov 8

Next Generation Microservices

What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?

Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud

Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Evolving DevOps

Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills

Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas

Wednesday Nov 9

Architecting for Failure

Your system will fail. Take control before it takes you with it.
Stream Processing

Stream Processing, Near-Real Time Processing
Bare Metal Performance

Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator

The why and how for building successful engineering cultures
//TODO: Security <-- fix this

Building security from the start. Stories, lessons, and innovations advancing the field of software security.
UX Reimagined

Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.

SCHEDULE

Duration

Persona:

Key Takeaways

Abstract

Interview

Find Daniel Tunkelang at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Query Understanding: a Manifesto

Duration

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Daniel Tunkelang at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World