Track:

Duration

Duration:

4:10pm - 5:00pm

Level:

Intermediate

Persona:

Data Scientist

Key Takeaways

Learn approaches to leveraging cloud in order to make Data Scientists more effective and efficient.
Hear techniques and solutions used by a Data Driven company to create advantage.
Understand how a shift towards Data Scientists owning their own code gives them a lot of freedom without burning the house down at Stitch Fix.

Abstract

Stitch Fix is an online clothing retailer that not only focuses on delivering personalized clothing recommendations for our customers, but also applies the output of data science to automate numerous other business functions through the delivery of forecasts, predictions, and analyses via a robust API layer. We rely heavily on the ability for applied mathematics & statistics and our human decision makers to synergistically work; doing this well requires us to merge art & science together. However with over eighty data scientists in residence, it can be challenging to support so many different needs from an infrastructure perspective.

In this talk we’ll cover how we use the cloud to enable over 80 data scientists to be productive.
Specifically we’ll cover our infrastructure for:

Prototyping ideas, algorithms and analyses.
How we set up & keep schemas in sync between hive, presto, redshift & spark and make access easy for our data scientists.
How we productionize recommendation algorithms & our patterns for gracefully degrading and still serving fashion recommendations if something breaks down in our ETL.

Interview

Question:

QCon: What is your role today?

Answer:

Stefan: I am on the horizontally-focused data platform team at Stitch Fix. We help with building common infrastructure libraries and patterns for use by our data scientists. I am in charge of leading development of an algorithm development platform. My job is to make it easy and fast for our data scientists (or anyone else really) to prototype machine learning models and get them into production. For example, right now I am designing a nice API layer to make it easy for them to create new features, for those features to be versioned, and those features to be easily available to others.

Question:

QCon: For people who aren’t familiar with Stitch Fix, could you tell us about the problem sets you’re dealing with?

Answer:

Stefan: Stitch Fix is a personal styling service for both men and women, where we send clothes and accessories tailored to your tastes and budget. Clients simply fill out an online style profile and Stitch Fix personal stylists handpick a selection of five clothing items and accessories. Clients keep what they love and easily return the rest.

The interesting part is that Stitch Fix is heavily reliant on data to deliver a great customer experience. For instance we have an algorithm that personalizes and recommends what clothing should be sent to a client. Then we have a stylist who mediates that process and picks out the things that contribute to what is called a "fix" and that gets sent out.

We have a large number of customers and a large volume of data. So we have a lot of feedback coming in and that helps run the business. This data is used for things like predicting inventory to figuring out what styles people like.

Question:

QCon: How would you describe the persona of the target audience of this talk?

Answer:

Stefan: I think anyone who tries to support a data science team will be interested in this talk. I think it will appeal to more technical data scientists who would like to know what other companies are doing, through to the data architects who want some more ideas of what are good patterns or the things that might challenge them further on down the road.

Question:

QCon: Can you explain your talk title to me?

Answer:

Stefan: The talk is about how we utilize Cloud infrastructure to empower our data scientists. I think we are kind of unique (at least for a start-up). We have close to 80 data scientists at last count, and subgroups of them work for different parts of the business. So our data scientists have their own data and resource needs. It’s our philosophy at Stitch Fix to empower them as much as possible without having them have to hand over anything to anyone else. The only way that we can actually deliver that is by using the Cloud. This talk discusses our approach.

Question:

QCon: What’s the motivation for your talk?

Answer:

Stefan: The main motivation is we want data scientists to own as much code as possible. We had a pretty popular post on the Stitch Fix blog called "Engineers Shouldn’t Write ETL." It explained that we don’t have dedicated engineers who handle ETL, and instead our philosophy is that we want our data scientists to own as much as possible, which includes each of them owning their ETL. We could only do this at Stitch Fix’s by using the Cloud; by building infrastructure, and by offering tooling we can lower the engineering bar and enable Data Scientists to own as much code as possible.

Question:

QCon: When it comes to data science, why is it different with the Cloud versus on prem? What are some issues you ran into?

Answer:

Stefan: Well, provisioning of hardware is difficult. Data scientists' workloads are generally kind of bursty. They do a lot of ad hoc stuff. So having something that’s elastic helps you keep your costs down and therefore you only utilize resources when they are needed. If a data scientist wants to run four different versions of the model rather than having them wait, just because you have fixed resources, you can scale up dynamically to have everything run in parallel. So they get their job done faster.

Question:

QCon: So is it mostly in the economies of scale with the cloud, or is it going to appeal to a startup? Are there other things that you are going to dive into about running on the Cloud?

Answer:

Stefan: You could say part of it is definitely economies of scale. It is also about tooling to lower the engineering bar for data scientists to get work done. You want things to be reproducible and auto scalable. The Cloud forces you into better habits for designing workflows that can specifically cater to different needs and resources. Data scientists are known for being better at engineering than statisticians, but they are not necessarily the best engineers. So I will talk about how we lower the bar for them to get their work done with the infrastructure we built.

Question:

QCon: What are your key takeaways for this talk?

Answer:

Stefan: Strategies or design patterns that you can use to take back to your organization for enabling data scientists. Things like: how you can give dependable and reproducible elastic resources, some ideas on writing libraries and API’s for them to use that also potentially influence their behavior, and then understanding what data scientists at scale look like at a company.

Question:

QCon: Can you give us an idea of one pattern that you are planning to talk about?

Answer:

Stefan: Sure. Everyone gets a MacBook to work on but that has a limited amount of memory, which is a bottleneck for some data scientists. If you give them a shared EC2 instance, there is contention for resources. So how do you give them a push button way where they can set up a Jupiter notebook with, depending on the resources they need, from 4 GB to 32 GB (even as much as 64 GB)? I will talk through how we enable Data Scientists to basically just push a button and get a kind of containerized resource that allows them to work between sessions and save things whenever they want. The solution essentially gives them infinite compute resources when they need it.

Speaker: Stefan Krawczyk

Algo Dev Platform Lead @StitchFix

Stefan loves the stimulus of working at the intersection of design, engineering, and data. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence. At Stitch Fix he’s leading development of the algorithm development platform.

Find Stefan Krawczyk at

Speaker page

Similar Talks

Docker Container Life Cycles

Senior Solution Architect @JFrog

Mark Galpin

Stranger Things: The Forces that Disrupt Netflix

Senior Software Engineer, Playback Features @Netflix

Haley Tucker

99.99% Availability via Smart Real-Time Alerting

Data Science Manager @Uber

Franziska Bell

Creating A Culture of Observability at Stripe

Observability Specialist @Stripe

Cory Watson

Migrating to a Fault Tolerant System with Spanner

Software Engineer @Google

Edwin Fuquen

Freeing the Whale: How to Fail at Scale

CTO @Buoyant

Oliver Gould

Automating Chaos Experiments In Production

Senior Software Engineer @Netflix

Ali Basiri

Architecting for Failure in a Containerized World

Principle Data Analysis Leader @Infolace

Tom Faulhaber

Scaling Quality On Quora Using Machine Learning

Engineering Manager @Quora

Nikhil Garg

Tracks

Monday Nov 7

Architectures You've Always Wondered About

You know the names. Now learn lessons from their architectures
Distributed Systems War Stories

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
Containers Everywhere

State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations

Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques

JavaScript, HTML5, WASM, and more... innovations targetting the browser
Optimize You

Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.

Tuesday Nov 8

Next Generation Microservices

What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?

Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud

Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Evolving DevOps

Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills

Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas

Wednesday Nov 9

Architecting for Failure

Your system will fail. Take control before it takes you with it.
Stream Processing

Stream Processing, Near-Real Time Processing
Bare Metal Performance

Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator

The why and how for building successful engineering cultures
//TODO: Security <-- fix this

Building security from the start. Stories, lessons, and innovations advancing the field of software security.
UX Reimagined

Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.

SCHEDULE

Duration

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Stefan Krawczyk at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Data Science in the Cloud @StitchFix

Duration

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Stefan Krawczyk at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World