Presentation: Iterative Design for Data Science Projects

Duration

Duration: 
11:50am - 12:40pm

Persona:

Key Takeaways

  • Hear how Datascope uses human centered design and an iterative design process to produce actionable insights.
  • Understand how Datascope built an Expert Finder application for a Fortune 50 company using design methodologies for both their user interfaces and algorithms. 
  • Learn how the iterative design process puts business problems first and pivots as needs change.

Abstract

When solving problems, data scientists often start from the data, run analyses, and then almost as an afterthought, think about presenting results to stakeholders. This rigid, linear approach often fails to produce useful results.

At Datascope, we adopt methodologies from the design community, iteratively improving our work to ensure that our deliverable is useful to our clients. In building an Expert Finder application for a Fortune 50 company, we adopted such methodologies not only for the user interfaces, but also for our “expert finding” algorithms and data sources.

During this talk, I will go over how we iterated on the major pieces of this project to produce actionable insights and recommendations.

Interview

Question: 
QCon: What is your role today?
Answer: 

Bo: I am a partner and data scientist at a data science consulting company out of Chicago. We are called Datascope. Basically, if a company has some amount of data and they aren’t quite sure to do with it, or they do have an idea what to do with it but requires more bandwidth, we are a data science team for hire. As both a data scientist and partner, I do actual data science work and am also involved in business development, outreach, speaking at conferences, etc. 

Question: 
QCon: What types of stacks are you typically working in?
Answer: 

Bo: We try to be as true agnostic as possible. We really believe that the business problem should drive what we end up using. With that said, we have a very strong preference for open source. We don’t believe that if a company comes to us wanting some analysis done that we should sell them other proprietary stuff to implement our model. Pretty much everything to date is open source.

We use a lot of Python on the backend. On the frontend, the usual HTML and JavaScript. For interactive visualizations, we use open source JavaScript libraries. In terms of Cloud storage and infrastructure, it's pretty client dependant. Some clients prefer AWS or RackSpace was really popular although less popular now with our clients. We are also seeing an up-pick in Google Cloud services. It really it depends on the client and what makes sense for them.

Question: 
QCon: Can you explain your talk title to me?
Answer: 

Bo: We recently completed a project for a large company. At the time, it was one of our larger projects and it was a big learning experience for us. One of our distinguishing features that separates us from the competition is our iterative design process. Starting with the business problem and then seeing what is possible with data (from the client or from outside the client).

We use this process to make sure that, every step of the way, we are getting it right. As opposed to starting a long PRD which makes it more difficult to pivot. What we see a lot of in-house data science teams do is they start with the data first and then try to dig up stuff. That is also often a rather un-directed and somewhat less efficient approach.

With this project, the customer came to us with a very hazy idea of what they wanted to do. So we started with the business problem first, and we brainstormed. We came up with several mock prototypes first. Then, right off the bat, we had a bunch of great ideas (at least, we thought we did at the time) for data science applications with their data. But the client eliminated several of them right away. If we didn’t use the iterative design process, we would have probably tried to build out actual code and taken weeks to do something. 

In using this design process, we were able to zero in on what makes sense from the beginning. So we lost as little time as possible. As we kept doing this iterative design process, essentially asking for feedback on a regular basis and constantly communicating rather than deploying every couple weeks, it became clear to us halfway through the project that we needed to pivot a bit. What we had both originally thought would be the perfect solution turned out to be missing something. The whole idea is to use this iterative design process rather than take a more traditional linear approach. It was a bumpy road. But, at the end of the day, we were able to deliver something that was useful to them and drove a lot of business value. 

Question: 
QCon: Is this agile for data science? Is that what the talk is about or is it something different?
Answer: 

I would say that it is. I don’t think there is a better, more established term in the software community because a lot of people are very familiar with agile project management. I know that there are very particular parts of agile that are not in here. For example, we don’t have a proper ScrumMaster. What we call it instead is the design process. At the end of the day, it is a very similar concept. It is designed for data science which is not very different from agile.

Question: 
QCon: But why the word design?
Answer: 

Bo: I will answer this two fold. First, design is a process, and this is the process that we implement. This is precisely the idea. Second, with this particular project and for almost all of our bigger scope projects, a key component is the end user dashboard or some interactive visualization. 

So when I say that we try to deliver and ask them for feedback from the very beginning, the first prototypes that we gave them were sketches of some type of interface. Sketches of a network diagram, sketches of some interactive bar charts, sketches of custom internal searches. Totally low fidelity; pencil and paper. It took us five minutes. 

In every project, and in every data science project, I think it’s really important to first think about what the business problem is that you are trying to solve. Secondly, once you have said answers or deliverables (or whatever the end product is) for your data sciences project, how is it going to be digested and used? This project, for example, it was going to be digested and used by people that had a technical background, but were not data scientists. It was necessary to develop some type of end web app and that requires some user interface design. 

Question: 
QCon: QCon targets architects and sr development leads, what do you feel will be the actionable for that persona?
Answer: 

Bo: For those in a more senior role (where it’s important for them to help manage their team in terms of figuring out what to work on at any given point), one key takeaway will be to always keep in mind what the goal is and knowing that the client might change it. 

Just because you have a clear perspective and plan of what to do today, that doesn’t mean that next week your team will find something (or your client will have) another idea. The whole idea is to have a big picture idea but be willing to and account for the very real possibility that your project might pivot over time. 

Speaker: Bo Peng

Partner and Data Scientist @Datascope

Bo is a partner and a data scientist at Datascope, a leading data science consultancy in Chicago. At Datascope, she combines human centered design with analytics to derive actionable business insights for clients like P&G, Motorola, and Thomson Reuters. Through Datascope's partnership with test-prep giant, Kaplan, she helped design, launch, and teach their first-ever 12-week Data Science Bootcamp, an immersive program to help people transition into a career in data science. Beyond Datascope, she is an active member of the technology community in the midwest, co-organizing data science meetups in Chicago and Madison, and lead organizing the Women in Machine Learning & Data Science Chicago chapter. She is a frequent speaker about data science projects and methods. Bo has a BS in Mathematics and an MS in Statistics, both from The University of Chicago.

Find Bo Peng at

.

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers