Conference: Nov 13-15, 2017
Workshops: Nov 16-17, 2017
Presentation: Creating A Culture of Observability at Stripe
Key Takeaways
- Learn how to approach observability in your company
- Understand what observability looks like for Stripe
- See how important observability is to the engineering and operations process
Abstract
It's common to hear that an organization needs more observability, but what does that mean?
How do you change the culture of a company such that these needs are addressed sooner than later? I've got some ideas, and I've been trying them out at Stripe. Let's review how it's gone and talk about what worked at what didn't.
Let's talk about people, their needs and how to make them — and your observability — awesome.
Interview
Cory: Stripe didn’t really have an ownership for who did monitoring, who did metrics collection, who did all these other things.
I decided that there needed to be an observability team, and we effectively took over all the kind of pieces of ‘monitoring and observability’ within Stripe, put them under one umbrella, fixed the broken windows and we are building the best platform we can for making our engineers as smart as possible about the state of their systems.
Cory: At Stripe, it’s an amalgamation of different tools. At its core, it’s problem of run time metric collection, whether it be push versus pull is unimportant, but we are pushed based and so we have a lot of metrics that are streaming out at run time. We have a lot of logs that are being collected at the same time and a lot of signals that are coming from other systems. The sum of all of those parts is what we would call observability.
Cory: I am going to give a talk about the work that we are doing, and be honest about out progress, not try and present us as a company with everything figured out.
We had metrics, logs, and dashboards, but it wasn’t a cultural thing. I think culturally, what you have to do is teach people that this is a very important part of the engineering and operations process.
This is not just something you bolt on after the fact. This is something that needs to be considered from the genesis of your system. How do we teach a company of people that is established to include this as part of their culture right from the beginning?
Not everything works, so some of the ideas we had didn’t work and some of the biases that we had coming in didn’t work. I think that it’s really interesting to explore this culturally. How do you convince other people to do this work? That is the basis of the whole talk.
Cory: Probably the biggest decision point is as if this were a choose your own adventure. If you were trying to decide “How do I embark on something like this in my organization?” the sort of decision making point I think that you start off with is what kind of organization are you in?
Are you in a very devops-style org where ideas come from all over the place, or a more traditional org where the CTO has come and said we need observability because I heard at a conference? (Nothing wrong with that, it’s great!)
I think that knowing the answer to that question really helps you decide how you should proceed. There are a lot of different tactics that we tried, that worked, that didn’t work, that have worked for me in other organizations, that didn’t work here.
One of the biggest was trying to find ways to do this with the smallest number of people on your team. If someone gave you this task and said, here are 20 engineers to do this work, you have got a mandate and you’ve some energy.
But how do you do this when it is one person or two people or three people? It’s really all about finding champions in the organization that you can leverage.
One of the biggest points for us was finding these champions, and not only empowering them, but also learning from them. This type of learning from people both empowers them to go and make it so that you can do this on a smaller budget, both in terms of people or in terms people or in terms of time or money, but also it makes the process better because they feed back new information into your team which you can use to make an even better product so that the next person is even more successful than the one you engaged with first.
Cory: I think that it’s very heavily weighted toward people in leadership positions because they are here to lead and to make change happen.
If you are an engineer, but you are not an executive, how do those people affect change?
How do you find a way to communicate this and make it such that your people are interested? How do you show people that there is value in a change without being dictatorial and forceful about it?
I’m going to give people ideas and tactics for how to make this sort of change. You can be a CTO or an architect or a rank and file engineer.
Similar Talks


.
Tracks
Monday Nov 7
-
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
-
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
-
Containers Everywhere
State of the art in Container deployment, management, scheduling
-
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
-
Next Generation Web Standards, Frameworks, and Techniques
JavaScript, HTML5, WASM, and more... innovations targetting the browser
-
Optimize You
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
-
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
-
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
-
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
-
Evolving DevOps
Lessons/stories on optimizing the deployment pipeline
-
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
-
Architecting for Failure
Your system will fail. Take control before it takes you with it.
-
Stream Processing
Stream Processing, Near-Real Time Processing
-
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
-
Culture as a Differentiator
The why and how for building successful engineering cultures
-
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
-
UX Reimagined
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.