Jeremy.Pollack

Scaling AncestryDNA using Hadoop and HBase
Track: Architectures you've always wondered about

Location:
Grand Ballroom A

Abstract:
What do you get when you take Bioinformatics Scientists with PhDs and mix them up with Software Engineers? Why Ancestry DNA on Hadoop and HBase! Get the whole story from both the management (Bill Yetman, Sr. Director) and developer (Jeremy Pollack, Principle Engineer/Team Lead) points of view. Find out how this unique cast of characters took academic programs and created an industrial, scalable, DNA processing pipeline (a real Big Data problem) using Hadoop and HBase. The final implementation provided a 1700% performance improvement.

You don’t know how DNA matching works? No worries. We’ll provide a simple example so you follow along. A full autosomal test, 700,000 SNPs used for ethnicity and matching, a DNA pool size of 120,000 samples, and over 6 million 4th cousin matches already delivered to our users. Learn how Agile techniques (start simple, get going, iterate), the “measure everything” principle, and a unique mix of scientists and engineers worked together to create a truly unique breakthrough architecture – and created a unique Family History Product along the way.