Track:

Duration

Duration:

11:50am - 12:40pm

Persona:

Data Scientist

Key Takeaways

Hear how Datascope uses human centered design and an iterative design process to produce actionable insights.
Understand how Datascope built an Expert Finder application for a Fortune 50 company using design methodologies for both their user interfaces and algorithms.
Learn how the iterative design process puts business problems first and pivots as needs change.

Abstract

When solving problems, data scientists often start from the data, run analyses, and then almost as an afterthought, think about presenting results to stakeholders. This rigid, linear approach often fails to produce useful results.

At Datascope, we adopt methodologies from the design community, iteratively improving our work to ensure that our deliverable is useful to our clients. In building an Expert Finder application for a Fortune 50 company, we adopted such methodologies not only for the user interfaces, but also for our “expert finding” algorithms and data sources.

During this talk, I will go over how we iterated on the major pieces of this project to produce actionable insights and recommendations.

Interview

Question:

QCon: What is your role today?

Answer:

Bo: I am a partner and data scientist at a data science consulting company out of Chicago. We are called Datascope. Basically, if a company has some amount of data and they aren’t quite sure to do with it, or they do have an idea what to do with it but requires more bandwidth, we are a data science team for hire. As both a data scientist and partner, I do actual data science work and am also involved in business development, outreach, speaking at conferences, etc.

Question:

QCon: What types of stacks are you typically working in?

Answer:

Bo: We try to be as true agnostic as possible. We really believe that the business problem should drive what we end up using. With that said, we have a very strong preference for open source. We don’t believe that if a company comes to us wanting some analysis done that we should sell them other proprietary stuff to implement our model. Pretty much everything to date is open source.

We use a lot of Python on the backend. On the frontend, the usual HTML and JavaScript. For interactive visualizations, we use open source JavaScript libraries. In terms of Cloud storage and infrastructure, it's pretty client dependant. Some clients prefer AWS or RackSpace was really popular although less popular now with our clients. We are also seeing an up-pick in Google Cloud services. It really it depends on the client and what makes sense for them.

Question:

QCon: Can you explain your talk title to me?

Answer:

Bo: We recently completed a project for a large company. At the time, it was one of our larger projects and it was a big learning experience for us. One of our distinguishing features that separates us from the competition is our iterative design process. Starting with the business problem and then seeing what is possible with data (from the client or from outside the client).

We use this process to make sure that, every step of the way, we are getting it right. As opposed to starting a long PRD which makes it more difficult to pivot. What we see a lot of in-house data science teams do is they start with the data first and then try to dig up stuff. That is also often a rather un-directed and somewhat less efficient approach.

With this project, the customer came to us with a very hazy idea of what they wanted to do. So we started with the business problem first, and we brainstormed. We came up with several mock prototypes first. Then, right off the bat, we had a bunch of great ideas (at least, we thought we did at the time) for data science applications with their data. But the client eliminated several of them right away. If we didn’t use the iterative design process, we would have probably tried to build out actual code and taken weeks to do something.

In using this design process, we were able to zero in on what makes sense from the beginning. So we lost as little time as possible. As we kept doing this iterative design process, essentially asking for feedback on a regular basis and constantly communicating rather than deploying every couple weeks, it became clear to us halfway through the project that we needed to pivot a bit. What we had both originally thought would be the perfect solution turned out to be missing something. The whole idea is to use this iterative design process rather than take a more traditional linear approach. It was a bumpy road. But, at the end of the day, we were able to deliver something that was useful to them and drove a lot of business value.

Question:

QCon: Is this agile for data science? Is that what the talk is about or is it something different?

Answer:

I would say that it is. I don’t think there is a better, more established term in the software community because a lot of people are very familiar with agile project management. I know that there are very particular parts of agile that are not in here. For example, we don’t have a proper ScrumMaster. What we call it instead is the design process. At the end of the day, it is a very similar concept. It is designed for data science which is not very different from agile.

Question:

QCon: But why the word design?

Answer:

Bo: I will answer this two fold. First, design is a process, and this is the process that we implement. This is precisely the idea. Second, with this particular project and for almost all of our bigger scope projects, a key component is the end user dashboard or some interactive visualization.

So when I say that we try to deliver and ask them for feedback from the very beginning, the first prototypes that we gave them were sketches of some type of interface. Sketches of a network diagram, sketches of some interactive bar charts, sketches of custom internal searches. Totally low fidelity; pencil and paper. It took us five minutes.

In every project, and in every data science project, I think it’s really important to first think about what the business problem is that you are trying to solve. Secondly, once you have said answers or deliverables (or whatever the end product is) for your data sciences project, how is it going to be digested and used? This project, for example, it was going to be digested and used by people that had a technical background, but were not data scientists. It was necessary to develop some type of end web app and that requires some user interface design.

Question:

QCon: QCon targets architects and sr development leads, what do you feel will be the actionable for that persona?

Answer:

Bo: For those in a more senior role (where it’s important for them to help manage their team in terms of figuring out what to work on at any given point), one key takeaway will be to always keep in mind what the goal is and knowing that the client might change it.

Just because you have a clear perspective and plan of what to do today, that doesn’t mean that next week your team will find something (or your client will have) another idea. The whole idea is to have a big picture idea but be willing to and account for the very real possibility that your project might pivot over time.

Speaker: Bo Peng

Partner and Data Scientist @Datascope

Bo is a partner and a data scientist at Datascope, a leading data science consultancy in Chicago. At Datascope, she combines human centered design with analytics to derive actionable business insights for clients like P&G, Motorola, and Thomson Reuters. Through Datascope's partnership with test-prep giant, Kaplan, she helped design, launch, and teach their first-ever 12-week Data Science Bootcamp, an immersive program to help people transition into a career in data science. Beyond Datascope, she is an active member of the technology community in the midwest, co-organizing data science meetups in Chicago and Madison, and lead organizing the Women in Machine Learning & Data Science Chicago chapter. She is a frequent speaker about data science projects and methods. Bo has a BS in Mathematics and an MS in Statistics, both from The University of Chicago.

Find Bo Peng at

Speaker page

@bo_p

IBM Distinguished Engineer

Mark Vanderwiele

Stranger Things: The Forces that Disrupt Netflix

Senior Software Engineer, Playback Features @Netflix

Haley Tucker

99.99% Availability via Smart Real-Time Alerting

Data Science Manager @Uber

Franziska Bell

Creating A Culture of Observability at Stripe

Observability Specialist @Stripe

Cory Watson

Migrating to a Fault Tolerant System with Spanner

Software Engineer @Google

Edwin Fuquen

Freeing the Whale: How to Fail at Scale

CTO @Buoyant

Oliver Gould

Automating Chaos Experiments In Production

Senior Software Engineer @Netflix

Ali Basiri

Architecting for Failure in a Containerized World

Principle Data Analysis Leader @Infolace

Tom Faulhaber

Further Together: Curated Pairing Culture @Pivotal

Software Engineer @Pivotal

Neha Batra

Tracks

Monday Nov 7

Architectures You've Always Wondered About

You know the names. Now learn lessons from their architectures
Distributed Systems War Stories

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
Containers Everywhere

State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations

Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques

JavaScript, HTML5, WASM, and more... innovations targetting the browser
Optimize You

Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.

Tuesday Nov 8

Next Generation Microservices

What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?

Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud

Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Evolving DevOps

Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills

Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas

Wednesday Nov 9

Architecting for Failure

Your system will fail. Take control before it takes you with it.
Stream Processing

Stream Processing, Near-Real Time Processing
Bare Metal Performance

Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator

The why and how for building successful engineering cultures
//TODO: Security <-- fix this

Building security from the start. Stories, lessons, and innovations advancing the field of software security.
UX Reimagined

Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.

SCHEDULE

Duration

Persona:

Key Takeaways

Abstract

Interview

Find Bo Peng at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Iterative Design for Data Science Projects

Duration

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Bo Peng at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World