Presentation: Better Together ­ Using Spark and Redshift to combine your data with public datasets

At Jawbone, the Data Science team correlated step and workout data for hundreds of thousands of UP wearers with publicly available external datasets in order to understand how various factors affect physical activity.

In this talk we will highlight the challenges of combining internal and external datasets: knowing how the data was generated and its limitations, understanding the domain logic and, most importantly, addressing data errors and outliers.

We will also compare two implementations ­ first, using Hadoop and Amazon Redshift and, second, using Spark, and show how the choice of technology drives the way we model the problem.

The talk is intended for software engineers interested in practical applications of data science.

Tracks

Covering innovative topics

Monday, 3 November

Tuesday, 4 November

Wednesday, 5 November

Conference for Professional Software Developers