Conference: Nov 13-15, 2017
Workshops: Nov 16-17, 2017
Presentation: Scaling Instagram Infrastructure
Duration
Level:
- Intermediate
Persona:
- Architect
Key Takeaways
- Learn some of the issues and solutions Instagram’s Infrastructure team had around scalability.
- Hear how Instagram improved single server capacity and improved network latency.
- Learn some of the tools, techniques, and metrics Instagram uses to support 500 million monthly users.
Abstract
Instagram is a social network mobile app that allows people to share the world's moments as they happen. It serves 300 millions users on a daily basis throughout the world.
In this talk, we will give an overview on the infrastructure that supports its users on this large scale.
Topics will include:
- a brief history of infrastructure evolution
- overall architecture and multi-data center support
- tuning of uwsgi parameters for scaling
- performance monitoring and diagnosis
- and django/python upgrade (why, challenges and lessons learned)
Interview
Lisa: I am a software engineer on the Instagram Infrastructure Team. Our team’s main purpose is to keep the scalability of our systems up. While doing that, we identify both short term and long term fixes around scale. Additionally, we work closely with many other teams on the product side to help them to identify bottlenecks and make suggestions related to scale when they are shipping new features to our users.
Lisa: We are serving more than 500 million monthly active users, with 300M of them on Instagram every day.
Lisa: Our web tier stack is Django with Python, and we have backend services using Cassandra, MySQL, and MemCache. Those are basically our storage devices. We use Facebook’s Ever store as our photo storage. We also have an async tier with RabbitMQ and Celery.
Lisa: We do use containers. We basically use Linux LXC, a variant of it. Facebook has its own Tupperware container which is also a publicly talked topic. it’s a wrap around of LXC.
We moved from AWS to Facebook’s data center about two years ago. When we made that move we expanded to multi datacenters.
Lisa: The rationale is really just about accessing Facebook’s servers more conveniently. Otherwise, you always have the firewall and things like that in between. So we really could not take advantage of some of the things Facebook had like monitoring and scaling. Aside from that, I think there was were some VDM limitations that caused us issues around data replication.
Lisa: I will discuss different aspects of scaling, horizontal, vertical, and scale of dev team. I will talk about how we scaled to multiple data centers; how we define scale up and what tools we use and built to identify scaling bottlenecks; what we have done to enable product development velocity and our release process. Along with the things we have achieved, we’ll discuss some of the continued challenges and our plans to address them.
Similar Talks
.
Tracks
Monday Nov 7
-
Architectures You've Always Wondered About
You know the names. Now learn lessons from their architectures
-
Distributed Systems War Stories
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
-
Containers Everywhere
State of the art in Container deployment, management, scheduling
-
Art of Relevancy and Recommendations
Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
-
Next Generation Web Standards, Frameworks, and Techniques
JavaScript, HTML5, WASM, and more... innovations targetting the browser
-
Optimize You
Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.
Tuesday Nov 8
-
Next Generation Microservices
What will microservices look like in 3 years? What if we could start over?
-
Java: Are You Ready for This?
Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
-
Big Data Meets the Cloud
Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
-
Evolving DevOps
Lessons/stories on optimizing the deployment pipeline
-
Software Engineering Softskills
Great engineers do more than code. Learn their secrets and level up.
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS ideas
Wednesday Nov 9
-
Architecting for Failure
Your system will fail. Take control before it takes you with it.
-
Stream Processing
Stream Processing, Near-Real Time Processing
-
Bare Metal Performance
Native languages, kernel bypass, tooling - make the most of your hardware
-
Culture as a Differentiator
The why and how for building successful engineering cultures
-
//TODO: Security <-- fix this
Building security from the start. Stories, lessons, and innovations advancing the field of software security.
-
UX Reimagined
Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.