Presentation: Creating A Culture of Observability at Stripe

Duration

Duration: 
11:50am - 12:40pm

Persona:

Key Takeaways

  • Learn how to approach observability in your company
  • Understand what observability looks like for Stripe
  • See how important observability is to the engineering and operations process

Abstract

It's common to hear that an organization needs more observability, but what does that mean?

How do you change the culture of a company such that these needs are addressed sooner than later? I've got some ideas, and I've been trying them out at Stripe. Let's review how it's gone and talk about what worked at what didn't.

Let's talk about people, their needs and how to make them — and your observability — awesome.

Interview

Question: 
QCon: You work in Observability at Stripe. What does that mean for Stripe?
Answer: 

Cory: Stripe didn’t really have an ownership for who did monitoring, who did metrics collection, who did all these other things.

I decided that there needed to be an observability team, and we effectively took over all the kind of pieces of ‘monitoring and observability’ within Stripe, put them under one umbrella, fixed the broken windows and we are building the best platform we can for making our engineers as smart as possible about the state of their systems.

Question: 
QCon: What does observability look like for Stripe?
Answer: 

Cory: At Stripe, it’s an amalgamation of different tools. At its core, it’s problem of run time metric collection, whether it be push versus pull is unimportant, but we are pushed based and so we have a lot of metrics that are streaming out at run time. We have a lot of logs that are being collected at the same time and a lot of signals that are coming from other systems. The sum of all of those parts is what we would call observability.

Question: 
QCon: Your talk isn’t about necessarily the architecture, but creating a culture of observability on that architecture. Can you talk a bit about your goals?
Answer: 

Cory: I am going to give a talk about the work that we are doing, and be honest about out progress, not try and present us as a company with everything figured out. 

We had metrics, logs, and dashboards, but it wasn’t a cultural thing. I think culturally, what you have to do is teach people that this is a very important part of the engineering and operations process.

This is not just something you bolt on after the fact. This is something that needs to be considered from the genesis of your system. How do we teach a company of people that is established to include this as part of their culture right from the beginning? 

Not everything works, so some of the ideas we had didn’t work and some of the biases that we had coming in didn’t work. I think that it’s really interesting to explore this culturally. How do you convince other people to do this work? That is the basis of the whole talk.

Question: 
QCon: What will you cover in the talk?
Answer: 

Cory: Probably the biggest decision point is as if this were a choose your own adventure. If you were trying to decide  “How do I embark on something like this in my organization?” the sort of decision making point I think that you start off with is what kind of organization are you in? 

Are you in a very devops-style org where ideas come from all over the place, or a more traditional org where the CTO has come and said we need observability because I heard at a conference? (Nothing wrong with that, it’s great!)

I think that knowing the answer to that question really helps you decide how you should proceed. There are a lot of different tactics that we tried, that worked, that didn’t work, that have worked for me in other organizations, that didn’t work here.

One of the biggest was trying to find ways to do this with the smallest number of people on your team. If someone gave you this task and said, here are 20 engineers to do this work, you have got a mandate and you’ve some energy. 

But how do you do this when it is one person or two people or three people? It’s really all about finding champions in the organization that you can leverage.

One of the biggest points for us was finding these champions, and not only empowering them, but also learning from them. This type of learning from people both empowers them to go and make it so that you can do this on a smaller budget, both in terms of people or in terms people or in terms of time or money, but also it makes the process better because they feed back new information into your team which you can use to make an even better product so that the next person is even more successful than the one you engaged with first.

Question: 
QCon: What is the core persona that you are talking to, that you want to address with your talk? Is it an architect, developer? Is it CTO?
Answer: 

Cory: I think that it’s very heavily weighted toward people in leadership positions because they are here to lead and to make change happen.

If you are an engineer, but you are not an executive, how do those people affect change? 

How do you find a way to communicate this and make it such that your people are interested? How do you show people that there is value in a change without being dictatorial and forceful about it?

I’m going to give people ideas and tactics for how to make this sort of change. You can be a CTO or an architect or a rank and file engineer.

Speaker: Cory Watson

Observability Specialist @Stripe

Cory G Watson is an Observability Engineer at Stripe. He's been an OSS contributor for 20 years and observability has been a common thread in all his work from collection, monitoring and even charting libraries. Previously he was Principal Infrastructure Engineer at Keen IO and managed the Observability team at Twitter.

Find Cory Watson at

Similar Talks

.

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers