You are viewing content from a past/completed QCon -

Presentation: Mind the Software Gap: How We Can Operationalize Privacy & Compliance

Track: Ethics, Regulation, Risk, and Compliance

Location: Pacific LMNO

Duration: 10:35am - 11:25am

Day of week: Monday

Slides: Download Slides

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Hear some of the ways GDPR and CCPA can influence software.
  2. Learn about some of the practical solutions to protecting data privacy and security.


With legislation like GDPR and CCPA, it has become newly urgent for organizations to understand internal and external data flows. In the push towards compliance, software organizations have been discovering just how difficult it is to maintain an up-to-date picture of data inventory and data flows. A major challenge is that modern software teams are developing and deploying software quickly and in decentralized ways. When each code change can cause data flow changes, building a clear, up-to-date map of data flows becomes more and more elusive. The state of the art (using human processes; catching data as it flows to untrusted locations) leaves gaps.

Understanding software behavior makes up a big part of the compliance gap--and automated techniques can help. In this talk, I discuss just what it could look like to get visibility into data flows and hint at what kinds of solutions could get us there.


You were assistant professor at Carnegie Mellon and now you're CEO of a company. What led to that transition?


For the last 10+ years, I'd been going after this problem of how can we build tools to help software teams better understand their software. I had decided that security and privacy was going to be one of the big application domains. The whole time I really tried to keep an eye on what was the best way to make impact building software tools for security and privacy. For the first few years, it was pretty clear that academia was the best place to do this because the world didn't care so much about it yet and a lot of the ideas I was thinking about weren't ready for primetime. But in the last couple of years, it's become really clear that companies care more about it, consumers care more about it, regulators care more about it, and the tooling ecosystem has gotten to a place where we can start building tools to automate some of the software problems that people have been treating as process problems. What data are my teams using, where are they sending that data, and what are they doing with that data? For me, it wasn't so much as a decision to switch between academia and running a company. Every year I ask myself, is what I'm doing right now the way to best make impact doing this? When GDPR came along, I thought about this question and concluded that building a product was the way to bring the tooling that I wanted to help the biggest number of developers.


What's the goal of this talk?


Part of it is education. I talk to a lot of companies these days about their problems with monitoring their data. The state of the art seems to be asking people what they're doing with their data and telling them and telling them to fill it into a spreadsheet. I want to reach as many people as possible and tell them it doesn't have to be this way. My goal here is to, step one, tell people, you can have nicer things than what you have now, and then, step two, lay out, here's some options about other things you can do. Hopefully, one of the conclusions people will come to is they need the kinds of tools that we're building, because I think that especially when we started working in this space, GDPR was pretty new, people were just starting to get their heads around this.


Can you give me some examples?


There's a class of problems that I was hearing about over and over again that inspired a large part of our solution now, which is developers accidentally writing passwords and logs. When I first started hearing this problem, I was skeptical that  this was really a problem. I was like, it doesn't seem like that big of a deal. Why haven't you solved it yet? Over the last year, I’ve seen this becoming an increasingly serious problem. And because of GDPR, companies now need to notify their users to change their passwords if they discover passwords in logs and there's been a lot more discussion around this question. It's also analogous to many other problems that might sound like bigger deals, like sending credit card numbers or health information to Twilio or Salesforce, or using data for purposes it’s not supposed to be used The most recent was Twitter using two-factor authentication phone numbers for advertising. I really like this Hacker News thread from a few months ago when RobinHood had the passwords and logs problem or the leak passwords problem, and someone said, how can they possibly do this? Everyone said, here's one time I accidentally logged passwords to Apache; here’s another time passwords ended up in one of my crash dumps. There are just many ways to do it. The reason I think passwords in logs is a really good example is that it can happen at any time and passwords are hard to detect. Your password could be anything. It's really hard to pattern match: like, OK, if you see like a bunch of stuff with three dashes in between, that's the password. This is something that if you're doing code review, you'll probably miss because it's usually not like log-password, but log my user, log something that contains some part of a user in it. At Akita, we’re very focused on solving this class of problems: how do we take sensitive data and provide tools that tell you this is where it goes? We’re building tools to support  better software practices to move faster without leaking data.


In your talk, will you be talking about the product or the practices?


We're still in stealth mode, so we're not talking about the product publicly yet. I'm going to outline the classes of problems that you should be worried about, some ways you can mitigate those concerns today, as well as where the gaps still are. If you are interested in still filling those gaps, you should come talk to me.


What do you want someone to leave the talk with?


I would love for someone to leave the talk having gotten a better understanding the data flow problems around GDPR, how to reason about software in its context. One of my goals is to also have people come away with what isn’t likely to work on its own. Previously most legislation around software was not so tied to the software itself. GDPR was the first legislation that said, if you use this data over there or for something else, then that's problematic. Which means you have to know that your data ended up over there or ended up being used for something else.


Do you give specific ways, specific things to look at privacy?


Yes. My goal is to give an idea of how to think about responsible data practice if you wanted to do right by GDPR--and by your customers. Some of the solutions are going to be difficult to implement with the tools we have today. In fact, one of the main motivations for starting Akita was that GDPR is ahead of what we are able to support technologically. There are two possible outcomes. One is that everyone just ignores it and the state of privacy is where it was before, maybe even worse. Or people start following GDPR--but we're gonna need new tools to take us there, because reading code by hand and understanding data flows by hand is just not feasible. But since we can’t do all that in a single talk, I’m planning to give the audience actionable things they can start doing today.

Speaker: Jean Yang

Founder and CEO @AkitaSoftware

Jean Yang is the founder and CEO of Akita Software, an enterprise data monitoring company. She was previously an Assistant Professor in the Computer Science Department at Carnegie Mellon University, where she led a research group working on techniques for automating software-based security and privacy. She has also worked in this space during her PhD at MIT, at Microsoft Research, and at Facebook. In 2016, the MIT Technology Review named her one of the Top 35 Innovators Under 35 for her work in this area.

Find Jean Yang at

Last Year's Tracks

  • Monday, 16 November

  • Mechanical Sympathy: The Software/Hardware Divide

    Understanding the Hardware Makes You a Better Developer

  • Paths to Production: Deployment Pipelines as a Competitive Advantage

    Deployment pipelines allow us to push to production at ever increasing volume. Paths to production looks at how some of software's most well known shops continuous deliver code.

  • Java, The Platform

    Mobile, Micro, Modular: The platform continues to evolve and change. Discover how the platform continues to drive us forward.

  • Security for Engineers

    How to build secure, yet usable, systems from the engineer's perspective.

  • Modern Data Engineering

    The innovations necessary to build towards a fully automated decentralized data warehouse.

  • Machine Learning for the Software Engineer

    AI and machine learning are more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice by Software Engineers.

  • Tuesday, 17 November

  • Inclusion & Diversity in Tech

    The road map to an inclusive and diverse tech organization. *Diversity & Inclusion defined as the inclusion of all individuals in an within tech, regardless of gender, religion, ethnicity, race, age, sexual orientation, and physical or mental fitness.

  • Architectures You've Always Wondered About

    How do they do it? In QCon's marquee Architectures track, we learn what it takes to operate at large scale from well-known names in our industry. You will take away hard-earned architectural lessons on scalability, reliability, throughput, and performance.

  • Architecting for Confidence: Building Resilient Systems

    Your system will fail. Build systems with the confidence to know when they do and you won’t.

  • Remotely Productive: Remote Teams & Software

    More and more companies are moving to remote work. How do you build, work on, and lead teams remotely?

  • Operating Microservices

    Building and operating distributed systems is hard, and microservices are no different. Learn strategies for not just building a service but operating them at scale.

  • Distributed Systems for Developers

    Computer science in practice. An applied track that fuses together the human side of computer science with the technical choices that are made along the way

  • Wednesday, 18 November

  • The Future of APIs

    Web-based API continue to evolve. The track provides the what, how, and why of future APIs, including GraphQL, Backend for Frontend, gRPC, & ReST

  • Resurgence of Functional Programming

    What was once a paradigm shift in how we thought of programming languages is now main stream in nearly all modern languages. Hear how software shops are infusing concepts like pure functions and immutablity into their architectures and design choices.

  • Social Responsibility: Implications of Building Modern Software

    Software has an ever increasing impact on individuals and society. Understanding these implications helps build software that works for all users

  • Non-Technical Skills for Technical Folks

    To be an effective engineer, requires more than great coding skills. Learn the subtle arts of the tech lead, including empathy, communication, and organization.

  • Clientside: From WASM to Browser Applications

    Dive into some of the technologies that can be leveraged to ultimately deliver a more impactful interaction between the user and client.

  • Languages of Infra

    More than just Infrastructure as a Service, today we have libraries, languages, and platforms that help us define our infra. Languages of Infra explore languages and libraries being used today to build modern cloud native architectures.