Presentation: Performance and Search

Duration

Duration: 
2:55pm - 3:45pm

Persona:

Abstract

We’re going to use search as a case study on how to reason about performance before coding. Imagine you're building a search engine from scratch. There are maybe a trillion documents on the Internet. If each each document is 10kB, that's 10 petabytes of data! How do you query 10PB of data in a fraction of a second?

That might take years to implement, but it’s surprisingly easy to reason about the performance of various possible designs. We’ll talk about how to estimate performance using back of the envelope calculations that can be done in minutes or hours, even for applications that take months or years to implement.

In particular, we’ll walk through a core search algorithm used in Bing, discuss how we initially estimated the performance and decided the system was worth building, and how the actual performance compared to the estimated performance.

We’ll also talk about why the class of algorithm we’re discussing has been considered obsolete as a core search engine technology for almost two decades, despite having reasonable performance characteristics in theory and practice.

Speaker: Dan Luu

BitFunnel @Microsoft & previously TPU @Google

Similar Talks

.

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers