Presentation: Scaling Slack

Track: Architectures You've Always Wondered About

Location: Ballroom A

Duration: 2:55pm - 3:45pm

Day of week: Monday

Level: Intermediate

Persona: Architect, Technical Engineering Manager

Abstract

Slack is a communication and collaboration platform for teams. Our millions of users spend 10+ hrs connected to the service on a typical working day. They expect reliability, low latency, and extraordinarily rich client experiences across a wide variety of devices and network conditions. In the talk, we'll examine the limitations that Slack's backend ran into and how we overcame them to scale from supporting small teams to serving gigantic organizations of hundreds and thousands of users. We'll hear stories about the edge cache service, real-time messaging system and how they evolved for major product efforts including Grid and Shared Channels.

Interview

Question: 
What is the focus of your work today?
Answer: 

I work on the edge cache tier for Slack. The focus is to make the service more performant with our growing user base and more resilient to failures. The other important aspects of my work is to support new product features for Slack. At Slack we are always product first.

Question: 
What’s the motivation for this talk?
Answer: 

I feel developers are generally interested in how the system works. I will give a high level introduction about how Slack works and then focus on our two year journey of how Slack scaled. There were mistakes made and lessons and learned. Other companies with similar growth may learn a thing or two from our experience.

Question: 
How you you describe the persona and level of the target audience?
Answer: 

Our ability to services excites me day to day. I think the problems that Slack is dealing with are highly relevant to architects, system engineers, full stack engineers and also site reliability engineers.

Question: 
What do you want “that” persona to walk away from your talk knowing that they might not have known 50 minutes before?
Answer: 

Building Slack is not as easy as it may appear to be. Users expect low-latency, high-performance, and extremely rich the experience user experience. Slack contains a rapidly changing datasets, and many of its components of the dataset like users, messages, files, and channels reference each other and those changes needs to be consistent across all clients. So with the rapid growth or are you the base and request volume. We have to make fundamental changes to our architecture to accommodate the growth in addition to incremental steps. I think that’s the main takeaway I want the audience to get.

Question: 
What technology problem keeps you up at night?
Answer: 

As the business keeps growing, we have to evolve Slack’s architecture to support 10X today’s scale. Let’s say, tomorrow, Slack’s biggest client decides to provision a few hundred thousand more users, the additional load of those users may strain or break the product in new and exciting ways. Sure, we have a load testing framework to simulate a team with a large number of users and channels, but we need to understand the new usage patterns and non-linear scaling of such a titanic team. The difficulty comes with the unknown unknowns and what bottlenecks we will encounter.

The other problem is around resiliency and fast failure recovery. If we have a system wide failure, like losing the whole us-east region, or having a big network partition, how do we minimize users perceived failures and shorten the failure recovery time? This may include building new components in the architecture and new tools surrounding it.

Speaker: Bing Wei

Software Engineer @Slack

Bing Wei is a software engineer on the infrastructure team at Slack, working on its edge cache service. Before Slack, she was at Twitter, where she contributed to the open source RPC library Finagle, worked on core services for Tweets and Timelines, and led the migration of Tweet writes from the monolithic Rails application to the JVM based micro-services.

Find Bing Wei at

Similar Talks

Senior Director of Engineering - Job Seeker Products @Indeed
Director of Vulnerability Research @Endgame
Technical Program Manager @Questback
Product Management and Marketing @Datacoral
Founding Member of the Atom Editor Team @GitHub

.

Tracks

  • Architectures You've Always Wondered About

    Architectural practices from the world's most well-known properties, featuring startups, massive scale, evolving architectures, and software tools used by nearly all of us.

  • Going Serverless

    Learn about the state of Serverless & how to successfully leverage it! Lessons learned in the track hit on security, scalability, IoT, and offer warnings to watch out for.

  • Microservices: Patterns and Practices

    Stories of success and failure building modern Microservices, including event sourcing, reactive, decomposition, & more.

  • DevOps: You Build It, You Run It

    Pushing DevOps beyond adoption into cultural change. Hear about designing resilience, managing alerting, CI/CD lessons, & security. Features lessons from open source, Linkedin, Netflix, Financial Times, & more. 

  • The Art of Chaos Engineering

    Failure is going to happen - Are you ready? Chaos engineering is an emerging discipline - What is the state of the art?

  • The Whole Engineer

    Success as an engineer is more than writing code. Hear inward looking thoughts on inclusion, attitude, leadership, remote working, and not becoming the brilliant jerk.

  • Evolving Java

    Java continues to evolve & change. Track covers Spring 5, async, Kotlin, serverless, the 6-month cadence plans, & AI/ML use cases.

  • Security: Attacking and Defending

    Offense and defensive security evolution that application developers should know about including SGX Enclaves, effects of AI, software exploitation techniques, & crowd defense

  • The Practice & Frontiers of AI

    Learn about machine learning in practice and on the horizon. Learn about ML at Quora, Uber's Michelangelo, ML workflow with Netflix Meson and topics on Bots, Conversational interfaces, automation, and deployment practices in the space.

  • 21st Century Languages

    Compile to Native, Microservices, Machine learning... tailor-made languages solving modern challenges, featuring use cases around Go, Rust, C#, and Elm.

  • Modern CS in the Real World

    Applied trends in Computer Science that are likely to affect Software Engineers today. Topics include category theory, crypto, CRDT's, logic-based automated reasoning, and more.

  • Stream Processing In The Modern Age

    Compelling applications of stream processing using Flink, Beam, Spark, Strymon & recent advances in the field, including Custom Windowing, Stateful Streaming, SQL over Streams.  

Conference for Professional Software Developers