Presentation: Better Together  Using Spark and Redshift to combine your data with public datasets
At Jawbone, the Data Science team correlated step and workout data for hundreds of thousands of UP wearers with publicly available external datasets in order to understand how various factors affect physical activity.
In this talk we will highlight the challenges of combining internal and external datasets: knowing how the data was generated and its limitations, understanding the domain logic and, most importantly, addressing data errors and outliers.
We will also compare two implementations  first, using Hadoop and Amazon Redshift and, second, using Spark, and show how the choice of technology drives the way we model the problem.
The talk is intended for software engineers interested in practical applications of data science.
Tracks
Covering innovative topics
Monday, 3 November
-   
          Architectures You've Always Wondered about    
  The newest and biggest Internet architectures 
-   
          Real World Functional     
  Putting functional programming concepts to work in the real world. 
-   
          The Future of Mobile    
  The future of mobile and performance improvements 
-   
          Continuous Delivery: From Heroics to Becoming Invisible    
  Continuous Delivery philosophies, cultures, hiccups, and best practices. 
-   
          Unleashing the Power of Streaming Data    
  This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale! 
-   
          Sponsored Solutions Track I    
  
Tuesday, 4 November
-   
          Engineering for Product Success    
  Architectures that make products more successful 
-   
          Reactive Service Architecture    
  Reactive, Responsive, Fault Tolerant and More. 
-   
          Modern CS In the Real World    
  How modern CS tackles problems in the real world. 
-   
          Applied Machine Learning and Data Science    
  Understand your big big data! 
-   
          Deploying at Scale    
  Containerizing Applications, Discovering Services, and Deploying to the Grid. 
-   
          Sponsored Solutions Track II    
  
Wednesday, 5 November
-   
          Beyond Hadoop     
  Emerging Big Data Frameworks and Technology 
-   
          Scalable Microservice Architectures    
  This track addresses the ways companies with hundreds of fine-grained web-services (e.g. Netflix, LinkedIn) manage complexity! 
-   
          Java at the Cutting Edge    
  The latest and greatest in the Java ecosystem 
-   
          Engineering culture    
  Successes and failures in creating an engineering culture. 
-   
          Next gen HTML5 and JS    
  How Web Components, the Future of CSS, and more are changing the web. 
-   
          Sponsored Solutions Track III    
  



