Qconn

Growing from the Few to the Many: Scaling the Operations Organization at Facebook

Growing from the Few to the Many: Scaling the Operations Organization at Facebook

Location: 
Grand Ballroom B/C
Time: 
Monday, 2:50pm - 3:40pm
Abstract: 

Billions of interactions take place on Facebook every day -- people connect with each other, share photos and other content, discover new things, and more. Facebook has built out a massive, global infrastructure to enable all this activity, and when problems arise in the operation of that infrastructure, we need to react quickly to minimize any potential disruption for the more than 1 billion people who rely on Facebook in all these ways.

 

In this talk, I will describe the ways our operations organization works to maintain an infrastructure that is both highly reliable and highly scalable, focusing specifically on some of the challenges we've faced and the lessons we've learned. Some of the topics will include how we make prioritization calls and manage technical debt; how our organizational structures have evolved over time; and how we handle incident management.

Pedro.Canahuati's picture
Pedro Canahuati leads the Infrastructure Production Engineering and Site Reliability teams at Facebook, where he oversees teams scaling the infrastructure and making sure Facebook is available 24x7. Prior to this, Pedro was Director of Operations at BuzzMedia and Qloud. He previously leveraged his network and systems knowledge, to build datacenters and scale web operations at companies like NameMedia, Relera and Verio/NTT.