Conference: Nov 13-15, 2017
Workshops: Nov 16-17, 2017
Presentation: Data Science in the Cloud @StitchFix
Duration
Level:
- Intermediate
Persona:
- Data Scientist
Key Takeaways
- Learn approaches to leveraging cloud in order to make Data Scientists more effective and efficient.
- Hear techniques and solutions used by a Data Driven company to create advantage.
- Understand how a shift towards Data Scientists owning their own code gives them a lot of freedom without burning the house down at Stitch Fix.
Abstract
Stitch Fix is an online clothing retailer that not only focuses on delivering personalized clothing recommendations for our customers, but also applies the output of data science to automate numerous other business functions through the delivery of forecasts, predictions, and analyses via a robust API layer. We rely heavily on the ability for applied mathematics & statistics and our human decision makers to synergistically work; doing this well requires us to merge art & science together. However with over eighty data scientists in residence, it can be challenging to support so many different needs from an infrastructure perspective.
In this talk we’ll cover how we use the cloud to enable over 80 data scientists to be productive.
Specifically we’ll cover our infrastructure for:
- Prototyping ideas, algorithms and analyses.
- How we set up & keep schemas in sync between hive, presto, redshift & spark and make access easy for our data scientists.
- How we productionize recommendation algorithms & our patterns for gracefully degrading and still serving fashion recommendations if something breaks down in our ETL.
Interview
Stefan: I am on the horizontally-focused data platform team at Stitch Fix. We help with building common infrastructure libraries and patterns for use by our data scientists. I am in charge of leading development of an algorithm development platform. My job is to make it easy and fast for our data scientists (or anyone else really) to prototype machine learning models and get them into production. For example, right now I am designing a nice API layer to make it easy for them to create new features, for those features to be versioned, and those features to be easily available to others.
Stefan: Stitch Fix is a personal styling service for both men and women, where we send clothes and accessories tailored to your tastes and budget. Clients simply fill out an online style profile and Stitch Fix personal stylists handpick a selection of five clothing items and accessories. Clients keep what they love and easily return the rest.
The interesting part is that Stitch Fix is heavily reliant on data to deliver a great customer experience. For instance we have an algorithm that personalizes and recommends what clothing should be sent to a client. Then we have a stylist who mediates that process and picks out the things that contribute to what is called a "fix" and that gets sent out.
We have a large number of customers and a large volume of data. So we have a lot of feedback coming in and that helps run the business. This data is used for things like predicting inventory to figuring out what styles people like.
Stefan: I think anyone who tries to support a data science team will be interested in this talk. I think it will appeal to more technical data scientists who would like to know what other companies are doing, through to the data architects who want some more ideas of what are good patterns or the things that might challenge them further on down the road.
Stefan: The talk is about how we utilize Cloud infrastructure to empower our data scientists. I think we are kind of unique (at least for a start-up). We have close to 80 data scientists at last count, and subgroups of them work for different parts of the business. So our data scientists have their own data and resource needs. It’s our philosophy at Stitch Fix to empower them as much as possible without having them have to hand over anything to anyone else. The only way that we can actually deliver that is by using the Cloud. This talk discusses our approach.
Stefan: The main motivation is we want data scientists to own as much code as possible. We had a pretty popular post on the Stitch Fix blog called "Engineers Shouldn’t Write ETL." It explained that we don’t have dedicated engineers who handle ETL, and instead our philosophy is that we want our data scientists to own as much as possible, which includes each of them owning their ETL. We could only do this at Stitch Fix’s by using the Cloud; by building infrastructure, and by offering tooling we can lower the engineering bar and enable Data Scientists to own as much code as possible.
Stefan: Well, provisioning of hardware is difficult. Data scientists' workloads are generally kind of bursty. They do a lot of ad hoc stuff. So having something that’s elastic helps you keep your costs down and therefore you only utilize resources when they are needed. If a data scientist wants to run four different versions of the model rather than having them wait, just because you have fixed resources, you can scale up dynamically to have everything run in parallel. So they get their job done faster.
Stefan: You could say part of it is definitely economies of scale. It is also about tooling to lower the engineering bar for data scientists to get work done. You want things to be reproducible and auto scalable. The Cloud forces you into better habits for designing workflows that can specifically cater to different needs and resources. Data scientists are known for being better at engineering than statisticians, but they are not necessarily the best engineers. So I will talk about how we lower the bar for them to get their work done with the infrastructure we built.
Stefan: Strategies or design patterns that you can use to take back to your organization for enabling data scientists. Things like: how you can give dependable and reproducible elastic resources, some ideas on writing libraries and API’s for them to use that also potentially influence their behavior, and then understanding what data scientists at scale look like at a company.
Stefan: Sure. Everyone gets a MacBook to work on but that has a limited amount of memory, which is a bottleneck for some data scientists. If you give them a shared EC2 instance, there is contention for resources. So how do you give them a push button way where they can set up a Jupiter notebook with, depending on the resources they need, from 4 GB to 32 GB (even as much as 64 GB)? I will talk through how we enable Data Scientists to basically just push a button and get a kind of containerized resource that allows them to work between sessions and save things whenever they want. The solution essentially gives them infinite compute resources when they need it.
Similar Talks
.
Tracks
Monday Nov 7
-
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
-
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
-
Containers Everywhere
State of the art in Container deployment, management, scheduling
-
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
-
Next Generation Web Standards, Frameworks, and Techniques
JavaScript, HTML5, WASM, and more... innovations targetting the browser
-
Optimize You
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
-
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
-
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
-
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
-
Evolving DevOps
Lessons/stories on optimizing the deployment pipeline
-
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
-
Architecting for Failure
Your system will fail. Take control before it takes you with it.
-
Stream Processing
Stream Processing, Near-Real Time Processing
-
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
-
Culture as a Differentiator
The why and how for building successful engineering cultures
-
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
-
UX Reimagined
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.