Presentation: Query Understanding: a Manifesto


1:40pm - 2:30pm


Key Takeaways

  • Understand the importance of focusing on search queries to determine user intent.
  • Gain deeper insights into search behaviors, such as: search suggestions, prediction, and query rewriting.
  • Hear a list of quick wins you can walk away with that will increase the understanding and performance of search.


Query understanding is about focusing less on the search results and more on the query itself. It's about figuring out what the searcher wants, rather than scoring and ranking results. Once you have established a query understanding mindset, your overall approach to search changes: you focus on query performance rather than ranking. In particular, you pay more attention to query suggestions, especially those generated through autocomplete.

In this talk, I'll show you what search looks like when viewed through a query understanding mindset. I'll focus on query performance prediction, query rewriting, and search suggestions. If you work on search problems, then come to this talk to discover opportunities for quick wins and longer-term investments in your search stack. Even if you don't work on search problems, seize this opportunity to gain a perspective on search that you won't find in an information retrieval textbook.


QCon: What is the main focus of your work today?

Daniel: Since leaving LinkedIn in mid-2015 (after 4.5 years there), I’ve been advising and consulting for a variety of companies. These companies range from early-stage startups to established public companies. My specialty is search and discovery (query understanding in particular), but I generally help them make decisions around algorithms, technology, product strategy, hiring, and organizational structure.

QCon: Can you explain your talk title to me?

Daniel: Ever since my pioneering work on faceted search at Endeca, I’ve been an evangelist for a more query-centric approach to search. While I see value in developing better ranking algorithms to improve search relevance, I feel that as an industry we’ve overemphasized result ranking and neglected what we can do with the queries themselves. While Endeca focused on query refinement, my later work has emphasized the entire query lifecycle.

My argument is simple: instead of treating search relevance as a result-ranking problem, let’s focus on the query understanding problem. This view is somewhat contrarian in the information retrieval space. My talk is not just an exposition of query understanding techniques, it is also a manifesto to persuade people to change their philosophical approach to search relevance.

QCon: Is there a difference when you are trying to analyze the query between consumer facing searches versus an enterprise typical search, internal search? Daniel: I think that the query workload is very much a function of the application and perhaps more broadly of the domain.

We often classify web search queries into navigational, informational, and transactional queries. E-commerce sites distinguish between specific product searches and category searches. Sites like LinkedIn or Facebook have name searches and searches based on people's characteristics.

So the domain certainly matters, but there are common themes. Let’s look at an example, like daniel carnegie mellon, a query we might see on a site like LinkedIn or Facebook.

The first step is to tokenize the query and segment the queries into entities, i.e., daniel, carnegie mellon. We then want to associate those entities with classes, i.e., First Name: Daniel, School: Carnegie Mellon. This leads you to another level of understanding where you can infer that the searcher is looking for a person whose first name is Daniel and who attended Carnegie Mellon.

The interpretation stack tends to have a lot of commonality across domains. But the particular query understanding challenges can be highly domain-dependent.

For example, how do you identify entities and their associated classes? Are your classes clearly distinct from one another, or do you have to worry about class overlap? Even when you correctly identify the entities in a query, could there still be ambiguity in the overall query intent?

How hard it is to address each of these questions is domain-dependent.

QCon: What do you think are the implications of a conversational interface like Alexia or Siri? Are there different complexities to understanding the query?

Daniel: Absolutely. If you think about where search was going before voice and conversational interfaces, it was heading in a direction where, instead of seeing full queries, we were seeing instant search suggestions and even instant search results. As Google said in its "10 things we know to be true", fast is better than slow. In fact, search suggestions do more than save time and reduce effort; they also guide the searcher to better queries. 

But how do we apply the instant search suggestions in the context of natural language and voice? Do we give up on them and instead require the users to enter complete sentences before giving them feedback? To me, that feels like a huge step back, all in the name of providing a more natural interface. 

At the same time, we know that machines interrupting searchers to complete their sentences is probably not going to work. It's like a Clippy 2.0.

So I worry that we’re making a big sacrifice just because we believe people prefer a voice interface. 

I’m also curious how our interactions with a conversational search engine will compare to our interactions with one another.  People pay attention to tone of voice. We perceive the nuances in everything ranging from how we speak to their accompanying body language (or even the look on someone’s face). We don't have that today with our voice-based applications, and it seems like we are not getting there with video yet, despite (in principle) the ability to do that. There's been some research at analyzing people’s faces to predict searcher frustration, but no practical application of this or related research as far as I know. 

In general, I feel that we are in an uncanny valley, as far as the way machines interact with us. It’s hard to know at what point we will overcome it. I think that (choosing a different domain) it’s like Pixar's movies that have made animation truly on par with live action. We are not yet there with our machines trying to interact with us like people. 

QCon: How would you rate the level of this talk?

Daniel: Intermediate. It will be most useful to people -- particularly engineers, data scientists, and product managers -- who know something about search. But I’ll try to make the talk sufficiently self-contained to be useful to any technical generalist with an interest in search.

QCon: QCon targets advanced architects and Sr development leads, what do you feel will be actionable for that type of persona in your talk?

Daniel: I believe that anyone who is responsible for a search engine or search-based application will walk away with a list of quick wins to improve relevance through query rewriting, a better scoring function for search suggestions, etc. And I hope that I’ll have influenced their longer-term strategy for improving search relevance, thus enabling them to better prioritize their roadmap.

Speaker: Daniel Tunkelang

Data Scientist, Author of "Faceted Search"

Daniel Tunkelang is a data science and engineering executive who has built and led some of the strongest teams in the software industry. He currently advises and consults for various companies on search and discovery. He studied computer science and math at MIT and has a PhD in computer science from CMU. He was a founding employee and chief scientist of Endeca, a search pioneer that Oracle acquired for $1.1B. He then led a local search team at Google. After than, he was a director of data science and engineering at LinkedIn, and he established their query understanding team. Daniel is a recognized writer and speaker. He is frequently invited to speak at academic and industry conferences, particularly in the areas of information retrieval, web science, and data science. He wrote the definitive textbook on faceted search (now a standard for ecommerce sites), established an annual symposium on human-computer interaction and information retrieval, and authored 24 US patents.

Find Daniel Tunkelang at



Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers