Presentation: Data Science in the Cloud @StitchFix


4:10pm - 5:00pm



Key Takeaways

  • Learn approaches to leveraging cloud in order to make Data Scientists more effective and efficient.
  • Hear techniques and solutions used by a Data Driven company to create advantage.
  • Understand how a shift towards Data Scientists owning their own code gives them a lot of freedom without burning the house down at Stitch Fix.


Stitch Fix is an online clothing retailer that not only focuses on delivering personalized clothing recommendations for our customers, but also applies the output of data science to automate numerous other business functions through the delivery of forecasts, predictions, and analyses via a robust API layer. We rely heavily on the ability for applied mathematics & statistics and our human decision makers to synergistically work; doing this well requires us to merge art & science together. However with over eighty data scientists in residence, it can be challenging to support so many different needs from an infrastructure perspective.

In this talk we’ll cover how we use the cloud to enable over 80 data scientists to be productive.
Specifically we’ll cover our infrastructure for:

  • Prototyping ideas, algorithms and analyses.
  • How we set up & keep schemas in sync between hive, presto, redshift & spark and make access easy for our data scientists.
  • How we productionize recommendation algorithms & our patterns for gracefully degrading and still serving fashion recommendations if something breaks down in our ETL.


QCon: What is your role today?

Stefan: I am on the horizontally-focused data platform team at Stitch Fix. We help with building common infrastructure libraries and patterns for use by our data scientists. I am in charge of leading development of an algorithm development platform. My job is to make it easy and fast for our data scientists (or anyone else really) to prototype machine learning models and get them into production. For example, right now I am designing a nice API layer to make it easy for them to create new features, for those features to be versioned, and those features to be easily available to others.

QCon: For people who aren’t familiar with Stitch Fix, could you tell us about the problem sets you’re dealing with?

Stefan: Stitch Fix is a personal styling service for both men and women, where we send clothes and accessories tailored to your tastes and budget. Clients simply fill out an online style profile and Stitch Fix personal stylists handpick a selection of five clothing items and accessories. Clients keep what they love and easily return the rest.

The interesting part is that Stitch Fix is heavily reliant on data to deliver a great customer experience. For instance we have an algorithm that personalizes and recommends what clothing should be sent to a client. Then we have a stylist who mediates that process and picks out the things that contribute to what is called a "fix" and that gets sent out.

We have a large number of customers and a large volume of data. So we have a lot of feedback coming in and that helps run the business. This data is used for things like predicting inventory to figuring out what styles people like.

QCon: How would you describe the persona of the target audience of this talk?

Stefan: I think anyone who tries to support a data science team will be interested in this talk. I think it will appeal to more technical data scientists who would like to know what other companies are doing, through to the data architects who want some more ideas of what are good patterns or the things that might challenge them further on down the road.

QCon: Can you explain your talk title to me?

Stefan: The talk is about how we utilize Cloud infrastructure to empower our data scientists. I think we are kind of unique (at least for a start-up). We have close to 80 data scientists at last count, and subgroups of them work for different parts of the business. So our data scientists have their own data and resource needs. It’s our philosophy at Stitch Fix to empower them as much as possible without having them have to hand over anything to anyone else. The only way that we can actually deliver that is by using the Cloud. This talk discusses our approach.

QCon: What’s the motivation for your talk?

Stefan: The main motivation is we want data scientists to own as much code as possible. We had a pretty popular post on the Stitch Fix blog called "Engineers Shouldn’t Write ETL." It explained that we don’t have dedicated engineers who handle ETL, and instead our philosophy is that we want our data scientists to own as much as possible, which includes each of them owning their ETL. We could only do this at Stitch Fix’s by using the Cloud; by building infrastructure, and by offering tooling we can lower the engineering bar and enable Data Scientists to own as much code as possible.

QCon: When it comes to data science, why is it different with the Cloud versus on prem? What are some issues you ran into?

Stefan: Well, provisioning of hardware is difficult. Data scientists' workloads are generally kind of bursty. They do a lot of ad hoc stuff. So having something that’s elastic helps you keep your costs down and therefore you only utilize resources when they are needed. If a data scientist wants to run four different versions of the model rather than having them wait, just because you have fixed resources, you can scale up dynamically to have everything run in parallel. So they get their job done faster.

QCon: So is it mostly in the economies of scale with the cloud, or is it going to appeal to a startup? Are there other things that you are going to dive into about running on the Cloud?

Stefan: You could say part of it is definitely economies of scale. It is also about tooling to lower the engineering bar for data scientists to get work done. You want things to be reproducible and auto scalable. The Cloud forces you into better habits for designing workflows that can specifically cater to different needs and resources. Data scientists are known for being better at engineering than statisticians, but they are not necessarily the best engineers. So I will talk about how we lower the bar for them to get their work done with the infrastructure we built.

QCon: What are your key takeaways for this talk?

Stefan: Strategies or design patterns that you can use to take back to your organization for enabling data scientists. Things like: how you can give dependable and reproducible elastic resources, some ideas on writing libraries and API’s for them to use that also potentially influence their behavior, and then understanding what data scientists at scale look like at a company.

QCon: Can you give us an idea of one pattern that you are planning to talk about?

Stefan: Sure. Everyone gets a MacBook to work on but that has a limited amount of memory, which is a bottleneck for some data scientists. If you give them a shared EC2 instance, there is contention for resources. So how do you give them a push button way where they can set up a Jupiter notebook with, depending on the resources they need, from 4 GB to 32 GB (even as much as 64 GB)? I will talk through how we enable Data Scientists to basically just push a button and get a kind of containerized resource that allows them to work between sessions and save things whenever they want. The solution essentially gives them infinite compute resources when they need it.

Speaker: Stefan Krawczyk

Algo Dev Platform Lead @StitchFix

Stefan loves the stimulus of working at the intersection of design, engineering, and data. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence. At Stitch Fix he’s leading development of the algorithm development platform.

Find Stefan Krawczyk at



Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers