Growing from the Few to the Many: Scaling the Operations Organization at Facebook
Growing from the Few to the Many: Scaling the Operations Organization at Facebook
Billions of interactions take place on Facebook every day -- people connect with each other, share photos and other content, discover new things, and more. Facebook has built out a massive, global infrastructure to enable all this activity, and when problems arise in the operation of that infrastructure, we need to react quickly to minimize any potential disruption for the more than 1 billion people who rely on Facebook in all these ways.
In this talk, I will describe the ways our operations organization works to maintain an infrastructure that is both highly reliable and highly scalable, focusing specifically on some of the challenges we've faced and the lessons we've learned. Some of the topics will include how we make prioritization calls and manage technical debt; how our organizational structures have evolved over time; and how we handle incident management.