You are viewing content from a past/completed QCon -

Presentation: The Evolution of Reddit.com's Architecture

Track: Architectures You've Always Wondered About

Location: Ballroom A

Day of week:

Slides: Download Slides

Level: Intermediate

Persona: Architect, Technical Engineering Manager

What You’ll Learn

  • Learn how Reddit is breaking down the monolith and moving to services
  • Hear about lessons learnt with their architecture evolution, and about the surprising trade-offs that weren’t just technology-focused
  • Gain insights into how to deal with ‘what’s next’ when experiencing immense growth

Abstract

A stroll through the history of the systems that power reddit.com, looking at things that worked, things that didn't, and where we're going next.

Question: 

What's the focus of the work that you're doing at Reddit?

Answer: 

We're doing a mishmash of things right now. At the core of it is building an environment that allows the product team at Reddit to act. That is, stuff that the users want to see, and is something they can feel confident about shipping—and to let them do that as quickly as possible, while balancing that with performance.

Question: 

So Netflix calls it a “Paved Road”. What does the paved road at Reddit look like?

Answer: 

We have a couple of main supported archetypes. We've got the frontend stack which is node based, and the backend side that is primarily in Python right now. We have a shared library for all of those services to use, and that brings with it things like automatic tracing and metrics collection, logging etc. Underlying all of that is a set of Puppet modules that's shared across all of this which does log shipping, metrics collection and so on. It reduces the amount of work that each person has to do when launching a new service.

This is all pretty new. One of the weird things about Reddit is that for probably 10 of the 12 years it's existed, the entire engineering work has been between five and 10 people. In the last two years have we really started growing, so we’ve started needing to get rid of the monolith and figure out how to split up into services. We also have a focus on how we deal with all these new people.

Question: 

What's the infrastructure like there?

Answer: 

It's entirely an AWS. And right now we are not using any containers in production. So it's all just standard instances. The deployment workloads can basically be summed up as a nice automated FOR loop.

Question: 

What steps were involved in relation to Reddit's architecture?

Answer: 

An early step was the jump from data center to the cloud. There's some interesting stuff there in terms of how the network latency changed drastically, how the architecture had to work and so on. One of the major components of Reddit is listings on the site, like a sub-Reddit or commentaries. They have gone through a ton of iterations in how those are stored and eventually pre-computed and fetched on the fly. Also dealing with the huge amounts of data we have.

The other major thing is the last couple of years we have grown immensely, so figuring out how to deal with a bunch of new people, starting to split up the monolith, starting to figure out services, what it looks like and what we're doing with services and how to create autonomy for the teams in the company.

Question: 

Was was it a Python monolith and now Python Microservices?

Answer: 

Way back the first version of Reddit was in Lisp, and that was rewritten about a year in into Python and now R2 is the current monolith. That is being split up into Python Microservices - well, we're just calling them services because we're not sure but the micro part.

Question: 

What do you want someone who comes to your talk to walk away with, what lessons learnt from this journey?

Answer: 

That it's never done. Don't let the perfect be the enemy of the good. And know that lot of the trade offs involved are actually human, not technical.

Question: 

Who is the target persona for your talk?

Answer: 

Basically anybody who is going through the growth curve and trying to figure out what do we do next. Both with traffic scaling and people scaling.

Question: 

Can you give me an insight into what you might talk about when it comes to scaling an engineering team of 10, to a service based environment with 100 people? What were the lessons learnt?

Answer: 

Some things are really obvious in retrospect, like coordination for example becomes really important, but I think it's also that there's certain things such getting designs more thought out upfront becomes way more important because they become harder to change when you've got different teams on other sides of boundaries. Communication is a big part.

Question: 

What's the technology problem that keeps you up at night?

Answer: 

How to get autonomy to people in the organization, and get them to feel empowered and able to ship things as quickly as make sense—without compromising the security and stability of the site in the process.

Speaker: Neil Williams

Leads Infrastructure Team @Reddit

Neil Williams is an engineer at Reddit who's spent the past few years figuring out how to break things less often.

Find Neil Williams at