Track:

Architectures You've Always Wondered About

Duration

Duration:

1:40pm - 2:30pm

Level:

Intermediate

Persona:

Architect

Key Takeaways

Learn some of the issues and solutions Instagram’s Infrastructure team had around scalability.
Hear how Instagram improved single server capacity and improved network latency.
Learn some of the tools, techniques, and metrics Instagram uses to support 500 million monthly users.

Abstract

Instagram is a social network mobile app that allows people to share the world's moments as they happen. It serves 300 millions users on a daily basis throughout the world.

In this talk, we will give an overview on the infrastructure that supports its users on this large scale.

Topics will include:

a brief history of infrastructure evolution
overall architecture and multi-data center support
tuning of uwsgi parameters for scaling
performance monitoring and diagnosis
and django/python upgrade (why, challenges and lessons learned)

Interview

Question:

QCon: What are the main problems you are focused on today?

Answer:

Lisa: I am a software engineer on the Instagram Infrastructure Team. Our team’s main purpose is to keep the scalability of our systems up. While doing that, we identify both short term and long term fixes around scale. Additionally, we work closely with many other teams on the product side to help them to identify bottlenecks and make suggestions related to scale when they are shipping new features to our users.

Question:

QCon: What can you share about the scale Instagram is dealing with?

Answer:

Lisa: We are serving more than 500 million monthly active users, with 300M of them on Instagram every day.

Question:

QCon: What does the stack look like for Instagram?

Answer:

Lisa: Our web tier stack is Django with Python, and we have backend services using Cassandra, MySQL, and MemCache. Those are basically our storage devices. We use Facebook’s Ever store as our photo storage. We also have an async tier with RabbitMQ and Celery.

Question:

QCon: What about other aspects of the infrastructure. Does Instagram leverage containers? Are you on cloud based or on Prem?

Answer:

Lisa: We do use containers. We basically use Linux LXC, a variant of it. Facebook has its own Tupperware container which is also a publicly talked topic. it’s a wrap around of LXC.

We moved from AWS to Facebook’s data center about two years ago. When we made that move we expanded to multi datacenters.

Question:

QCon: This move from AWS to on prem is interesting. What drove the move to Facebook’s infrastructure?

Answer:

Lisa: The rationale is really just about accessing Facebook’s servers more conveniently. Otherwise, you always have the firewall and things like that in between. So we really could not take advantage of some of the things Facebook had like monitoring and scaling. Aside from that, I think there was were some VDM limitations that caused us issues around data replication.

Question:

QCon: Can you tell me a bit about some of the things you plan to discuss in your talk?

Answer:

Lisa: I will discuss different aspects of scaling, horizontal, vertical, and scale of dev team. I will talk about how we scaled to multiple data centers; how we define scale up and what tools we use and built to identify scaling bottlenecks; what we have done to enable product development velocity and our release process. Along with the things we have achieved, we’ll discuss some of the continued challenges and our plans to address them.

Speaker: Lisa Guo

Software Engineer @Instagram

Lisa is a software engineer on Instagram infrastructure for the past 2.5 years. She has worked on various aspects on the backend, mostly recently focusing on efficiency aspect of the platform. Prior to Instagram, she worked on several Software Defined Networking projects at Facebook infrastructure team. Life prior to social network involved physical networking devices such as routers, switches and security appliances at Juniper Networks and other networking companies.

Find Lisa Guo at

Speaker page

Similar Talks

Designed For Deployment

Developer @ThoughtWorks Inc

Badri Janakiraman

Managing Big Storage Clusters

Tech Lead of Manhattan Team @Twitter

Boaz Avital

JVMs across the Data Center

Staff Engineer, JVM Team @Twitter

John Coomes

JVMs across the Data Center

Technical Manager Aurora / Mesos Team @Twitter

Ian Downes

Hardware & Provisioning Engineering @Twitter

Provisioning Engineering SE @Twitter

Nik Johnson

Hardware & Provisioning Engineering @Twitter

Staff Hardware Engineer @Twitter

Matt Singer

Stranger Things: The Forces that Disrupt Netflix

Senior Software Engineer, Playback Features @Netflix

Haley Tucker

99.99% Availability via Smart Real-Time Alerting

Data Science Manager @Uber

Franziska Bell

Creating A Culture of Observability at Stripe

Observability Specialist @Stripe

Cory Watson

Tracks

Monday Nov 7

Architectures You've Always Wondered About

You know the names. Now learn lessons from their architectures
Distributed Systems War Stories

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” - Lamport.
Containers Everywhere

State of the art in Container deployment, management, scheduling
Art of Relevancy and Recommendations

Lessons on the adoption of practical, real-world machine learning practices. AI & Deep learning explored.
Next Generation Web Standards, Frameworks, and Techniques

JavaScript, HTML5, WASM, and more... innovations targetting the browser
Optimize You

Keeping life in balance is a challenge. Learn lifehacks, tips, & techniques for success.

Tuesday Nov 8

Next Generation Microservices

What will microservices look like in 3 years? What if we could start over?
Java: Are You Ready for This?

Real world lessons & prepping for JDK9. Reactive code in Java today, Performance/Optimization, Where Unsafe is heading, & JVM compile interface.
Big Data Meets the Cloud

Overviews and lessons learned from companies that have implemented their Big Data use-cases in the Cloud
Evolving DevOps

Lessons/stories on optimizing the deployment pipeline
Software Engineering Softskills

Great engineers do more than code. Learn their secrets and level up.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas

Wednesday Nov 9

Architecting for Failure

Your system will fail. Take control before it takes you with it.
Stream Processing

Stream Processing, Near-Real Time Processing
Bare Metal Performance

Native languages, kernel bypass, tooling - make the most of your hardware
Culture as a Differentiator

The why and how for building successful engineering cultures
//TODO: Security <-- fix this

Building security from the start. Stories, lessons, and innovations advancing the field of software security.
UX Reimagined

Bots, virtual reality, voice, and new thought processes around design. The track explores the current art of the possible in UX and lessons from early adoption.

SCHEDULE

Duration

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Lisa Guo at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Scaling Instagram Infrastructure

Duration

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Lisa Guo at

Similar Talks

Tracks

Monday Nov 7

Tuesday Nov 8

Wednesday Nov 9

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World