Presentation: The Art of Relevance and Recommendations


10:35am - 11:25am


Key Takeaways

  • Build a recommender system that leverages content-based approaches, collaborative filtering, and multi-armed bandits in a simple step-by-step approach.
  • Learn about the tradeoffs in using different techniques to make recommendations.
  • Hear practices, tips, and approaches to building a recommendation system.


The age of artificial intelligence is upon us. Whether you know it or not, we interact with systems powered by machine learning on a daily basis. If you ever wondered how social networks, online retailers, and video streaming sites seem to know exactly what content and products you desire, this session is for you.

In this talk, we will walk you through the creation of a real-world relevance and recommendation system from scratch. We will cover the machine learning theory powering such systems, focusing on useful hacks and techniques that are not typically covered in standard machine learning courses. Through this crash course on the black art of relevance and recommendation systems, you will be on well track to using artificial intelligence to maximize your product’s user retention, engagement, and conversion rates.


QCon: What is the focus of your work today?

Clarence: I am a Security Research Engineer at Shape Security. My role is to go through our data and come up with models to identify traffic and automated attacks on large websites and stop them. These are core competency of our company. We gain these competencies in a variety of ways, but machine learning is starting to become more and more of a focus of the company.

QCon: What’s the motivation for your talk?

Clarence: The goal of the talk is to get developers who have a cursory understanding of machine learning to really get their hands dirty and to start doing ML on their own. I think many people have taken online courses on machine learning and have a good idea of what the general concepts are. But ML is a little bit intimidating when you actually have to build something from scratch. These online courses cover high level concepts, but they don’t cover the practical stuff that you need to know. They also skip many of the problems you may face when you actually build these systems. So these pragmatic concepts and practices are a big focus of the talk.

QCon: What are you going to go through in the talk?

Clarence: In general, the premise is that there will be a recommendation problem to solve (this will not be a complex problem). There won’t be too many different features or dimensions to look at, because I want to focus on the problems that developers face in general area (as opposed to specific to a particular data set).  I will go through, on a high level, the different kinds of recommendation systems that are possible. For example, highly secured recommendations for simple aggregate to tailor individual profiles for users on the platform and then I will go on to content based items recommendation systems, to generate item and user profiles with collaborative filtering to match them up together. Then, I will go into feature engineering on explicit, implicit, and latent features. These are things that I will slowly add on and with just a few lines of code. In each topic that I will focus on, I will add on to this recommended system that we will build throughout this system in our discussion.

The most useful part (I think) will be some practical hacks, such as how to reliably collect data for utility matrix collaborative filtering, how to extrapolate scores from the known ones if you don’t have scores for many of the items in your dataset, how to measure successful performance (arguably one of the most important in building recommender systems), and also many common things like how to get past the cold start or new entity problems without any users in the system. 

QCon: How would you describe the persona of the target audience of this talk?

Clarence: The main target would be some kind of lead developer who is actively developing but also has some kind of decision making powers into how to start a project and how to push a project from beginning to a level of maturity.

QCon: What should someone know before coming to your talk?

Clarence: I think that those coming to the talk would be interested in some kind of recommender systems and have a potential use case for it. They should know some basic terms in the space like ‘what is a recommender system?’ I will go through terms on a high level but it’s hard to compare all the different ways of solving a particular problem. Recommender systems are just one way of solving the general recommendation problems. So if they were to know in general what recommender systems are meant to do, that would be great. 

QCon: QCon targets advanced architects and senior development leads, what do you feel will be actionable for that persona when they leave your talk?

Clarence: I want them to walk away with a sense that while there are a lot of hidden problems when it comes to implementing recommender systems, these problems are actually pretty approachable if you know how to work around them. There are a lot of practical hacks around. For example, the "cold start" problem: not many people know a general answer to, and there are some general best practices that are not that easy to find online.

I think they might be buried in books somewhere, especially machine learning text books. It is generally hard for someone who is unfamiliar with the topic to start building these systems and know where to look for patterns.

QCon: Can you give me an example of one of the ones that I might not know, like the cold start problem or something?

Clarence: I think the cold start problem is the most general one that people don’t have a good answer to or think that it depends a lot on the context of the problem. For example, if I am building a recommender system for online videos, and you don’t have a good idea of what users might like a particular video. But if you use user profiling to extrapolate the existing knowledge that you have, and use an external datasource to perform extrapolation on the existing users. By performing similarity matching on another dimension that is not based on video content or anything to do with the items that you are recommending, you are able to draw auxiliary decisions and make a different comparison on a dimension that you are not making a recommendation on.

Speaker: Clarence Chio

Security Research Engineer @ShapeSecurity

Clarence Chio graduated with a B.S. and M.S. in Computer Science from Stanford, specializing in data mining and artificial intelligence. He currently works as a Security Research Engineer at Shape Security, building a product that protects high valued web assets from automated attacks. At Shape, he works on the data analysis systems used to tackle this problem. Clarence spoke on Machine Learning and Security at DEF CON 24, GeekPwn Shanghai, PHDays Moscow, BSides Las Vegas and NYC, Code Blue Tokyo, SecTor Toronto, and Hack in Paris (2015-2016). He had been a community speaker with Intel, and is also the founder and organizer of the ‘Data Mining for Cyber Security’ meetup group, the largest gathering of security data scientists in the San Francisco Bay Area.

Find Clarence Chio at



Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers