Azure Cosmos DB: Low Latency and High Availability at Planet Scale

Azure Cosmos DB is a fully-managed, multi-tenant, distributed, shared-nothing, horizontally scalable database that provides planet-scale capabilities and multi-model APIs for Apache Cassandra, MongoDB, Gremlin, Tables, and the Core (SQL) APIs. It currently powers many mission-critical services both within Microsoft (such as Microsoft Teams and Active Directory) and across large-scale Fortune 500 organizations (such as Walmart and Adobe). 

This talk covers the internal architecture of Azure Cosmos DB and how it achieves high availability, low latency, and scalability. We will first cover the design of the storage engine, with particular emphasis on ensuring high availability and scalability through partitioning and replication. Next, we will zoom in on the request routing gateway to see how it has evolved to solve the well-known multi-tenant cloud infrastructure challenges of containing noisy neighbors and limiting blast radius. Lastly, we will discuss performance as a feature and as a culture. We will cover what we measure and how we think about SLOs to achieve and maintain low latency. 

Building planet-scale services necessitates solving complex scalability challenges and making numerous tradeoffs across various components in the product. We look forward to sharing our experiences and lessons learned in building Azure Cosmos DB.


Speaker

Mei-Chin Tsai

Partner Director of Software Eng Manager @Microsoft

Mei-Chin Tsai is a Engineering Director at Microsoft, responsible for Azure Cosmos DB developer experience. She leads the charge to evolve a frictionless developer experience for Azure Cosmos DB;  from the Software Development Kit, request routing gateway, to OSS APIs and tooling (such as Notebook and Portal). She was previously the Development Manager for .NET Runtime and C# in Microsoft’s Developer Division. Mei-Chin graduated from University of Illinois at Urbana-Champaign with a Ph.D. degree in Computer Science. She joined Microsoft in 1994 and was one of the original developers on .NET. She is passionate about scalability, performance, quality, and developer experience. She is committed in growing and mentoring people. At spare time, she loves to travel and is an avid tennis player.

Read more
Find Mei-Chin Tsai at:

Speaker

Vinod Sridharan

Principal Software Engineering Architect @Microsoft

Vinod Sridharan is a Principal Software Engineering Architect at Microsoft responsible for the Azure Cosmos DB APIs. He works on the design and architecture of the core components that power them, the gateway and the supporting distributed service infrastructure. Across various components including storage, transport, load balancing, and routing, Vinod drives low latency, high availability, and performance throughout the Azure Cosmos DB service. In his spare time, Vinod loves to travel, sing, and go hiking.

Read more
Find Vinod Sridharan at:

From the same track

Session

Honeycomb: How We Used Serverless to Speed Up Our Servers

Wednesday Oct 26 / 11:50AM PDT

Honeycomb is the state of the art in observability: customers send us lots of data and then compose complex, ad-hoc queries. Most are simple, some are not. Some are REALLY not; this load is both complex, spontaneous, and urgent.

Jessica Kerr

Principal Developer Evangelist @honeycombio

Session

From Zero to A Hundred Billion: Building Scalable Real Time Event Processing At DoorDash

Wednesday Oct 26 / 01:40PM PDT

At DoorDash, real time events are an important data source to gain insight into our business but building a system capable of handling billions of real time events is challenging.

Allen Wang

Software Engineer @DoorDash

Session

Magic Pocket: Dropbox’s Exabyte-Scale Block Storage System

Wednesday Oct 26 / 02:55PM PDT

Magic Pocket is used to store all of Dropbox’s data.

Facundo Agriel

Software Engineer / Tech Lead @Dropbox

Session

AYAWA Panel

Wednesday Oct 26 / 04:10PM PDT

Details coming soon.