Stripe’s Docdb: How Zero-Downtime Data Movement Powers Trillion-Dollar Payment Processing

Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.

In the presentation titled Stripe’s Docdb: How Zero-Downtime Data Movement Powers Trillion-Dollar Payment Processing, Jimmy Morzaria, a Staff Software Engineer at Stripe, delves into Stripe's custom-built document database, DocDB, and its data movement platform, which enables scalable and reliable payment processing.

Key Points:

  • Scale and Reliability: Stripe processes over $1 trillion in payments annually using DocDB, handling over 5 million queries per second across more than 2000 database shards with 5.59 reliability.
  • Architecture and Evolution: Originally, Stripe started with MongoDB which evolved into DocDB to handle exponential growth by introducing a database proxy service and a control plane.
  • Zero-Downtime Data Movement: The engineering approach to data movement ensures zero downtime through strategies like version gating and bidirectional replication, crucial for horizontal scalability and database upgrades.
  • Use Cases and Flexibility: The platform allows Stripe to add or merge database shards, facilitate version upgrades, and support transitions between multi-tenancy and single tenancy environments.

Design Principles:

  • Consistency: Ensures data integrity across source and target shards during migration.
  • Availability: Minimal downtime is essential due to the critical nature of payment processing.
  • Performance: Preserve database performance and throughput during data migrations.
  • Scalability: Adaptable to varying sizes and numbers of database shards and migrations.

Takeaways:

  • Reliability is non-negotiable: Stripe prioritizes API reliability as a key differentiator, with the data movement platform designed to maintain application uptime.
  • Invest in foundations: Stripe's data movement platform supports a variety of missions, from scaling to version upgrades, maintaining reliability and performance.

This is the end of the AI-generated content.


Abstract

Stripe processes over $1 trillion in payments annually with industry-leading reliability, powered by its custom-built document database, DocDB, built on top of open source MongoDB. Stripe's DocDB serves over five million queries per second from Stripe’s product applications. Our deployment is also highly customized to provide low latency and diverse access, with 10,000+ distinct query shapes over petabytes of important financial data that lives in 5,000+ collections distributed over 2,000+ database shards.

This session unveils the engineering behind Stripe’s DocDB and dives deep into its Data Movement Platform, a foundational primitive that allows us to migrate terabytes of data across database shards at lightning speed and is key to provide a durable, reliable, scalable, and efficient database-as-a-service to product teams at Stripe. We’ll explore the challenges of scaling the online database backing a trillion-dollar payment infrastructure, ensuring data consistency and high reliability. Key learnings include optimizing bulk data ingestion in MongoDB and implementing a traffic switch protocol for seamless, zero-downtime data movement across database shards. Attendees will gain practical insights for building high-availability database systems and navigating complex migrations without compromising performance.

Interview:

What is your session about, and why is it important for senior software developers?

My talk covers the core architectural strategies to horizontally scale open-source MongoDB to handle millions of queries per second over petabytes of data with high reliability and low-latency in multi-tenant environments. A key focus will be on the critical role of a robust, zero-downtime data movement platform in enabling elastic scaling without impacting the performance and availability of your workloads.

Why is it critical for software leaders to focus on this topic right now, as we head into 2026?

As we head toward 2026, reliability isn't just a feature—it's the ultimate moat in an era of AI-driven scale. At the core of this reliability challenge are databases. Investing in the foundational primitives to build a reliable, scalable and performant database platform for product engineering now fortifies the data backbone of your systems, turning potential single points of failure into scalable strengths that sustain growth amid scaling pressures.

What are the common challenges developers and architects face in this area?

Developers and architects face significant challenges when horizontally scaling stateful systems such as databases, primarily revolving around increased architectural and operational complexity, maintaining data consistency, high availability and ensuring optimal query performance during migrations in a distributed environment.

What's one thing you hope attendees will implement immediately after your talk?

I hope attendees audit their core infrastructure systems—like databases—for hidden reliability gaps right away: map out failure modes in a quick chaos drill or dependency graph, then prioritize one fix. It's the spark that turns "good enough" systems into unbreakable ones, and we've seen it work well with teams at Stripe.

What makes QCon stand out as a conference for senior software professionals?

QCon goes beyond hype with talks from builders who've solved the same hairy problems you face, plus ample time for peer discussions that yield fresh perspectives about the problems top of mind for you.


Speaker

Jimmy Morzaria

Staff Software Engineer @Stripe, Previously Software Engineer on Amazon QLDB and Amazon Managed Streaming for Kafka

Jimmy is a Staff Software Engineer at Stripe, where he contributes to developing secure, reliable, scalable, and efficient database infrastructure to support Stripe’s mission of increasing the GDP of the internet by building the economic infrastructure—a platform that processes over $1 trillion in payments annually. Previously, he spent over five years at Amazon Web Services, contributing to the development of a greenfield database service - Amazon Quantum Ledger Database and a managed streaming platform - Amazon Managed Streaming for Kafka. Outside of tackling large scale distributed systems challenges, Jimmy loves traveling, hiking, and spending time with friends and family, always seeking new experiences to fuel his creativity.

Read more
Find Jimmy Morzaria at:

From the same track

Session AI/ML

Modernizing Relevance at Scale: LinkedIn’s Migration Journey to Serve Billions of Users

Tuesday Nov 18 / 11:45AM PST

How do you deliver relevant and personalized recommendations to nearly a billion professionals—instantly, reliably, and at scale? At LinkedIn, the answer has been a multi-year journey of architectural reinvention.

Speaker image - Nishant Lakshmikanth

Nishant Lakshmikanth

Engineering Manager @LinkedIn, Leading Infrastructure for "People You May Know" and "People Follows", Previously @AWS and @Cisco

Session Architecture

Monolith Down: Cleaning Up After the Great Identity Migration Disaster

Tuesday Nov 18 / 10:35AM PST

One does not simply migrate a monolith. Imagine a team working on a monolith-to-microservices migration of a healthcare portal. A foundational first step - migrating to a commercial identity provider - takes 9 months, only to bring the entire portal crashing down on release day.

Speaker image - Sonya Natanzon

Sonya Natanzon

VP of Engineering @Heartflow, Decomplexifier, Software Architect, Healthcare and Life Sciences Specialist, and International speaker

Session

Unconference: Navigating Major Architecture Migrations

Tuesday Nov 18 / 01:35PM PST

Session Architecture

Accelerating Netflix Data: A Cross-Team Journey from Offline to Online

Tuesday Nov 18 / 05:05PM PST

At Netflix, certain use cases demand the rapid transfer of massive datasets—such as 50 TB—from offline to online systems. Doing this efficiently, without disrupting applications interacting with our online systems, presents a significant challenge.

Speaker image - Rajasekhar Ummadisetty

Rajasekhar Ummadisetty

Software Engineer @Netflix - Driving Scalable Data Abstractions, Leader in Distributed Systems and Data Management, Previously @Amazon and @Facebook

Speaker image - Ken Kurzweil

Ken Kurzweil

Software Engineer @Netflix - Leading a Data Movement Team Focused on Data Infrastructure Innovation, Previously @Amazon, @Shutterfly, and @Gannett Media

Session Migration

Migrating Uber Eats Feeds to Webview

Tuesday Nov 18 / 03:55PM PST

Uber Eats has many surfaces developed using native-first design. Historically these were built on the Android and iOS stacks. To accelerate development and enable rapid iteration and experimentation, while preserving the native-first design, a webview-powered stack was developed.

Speaker image - Nick DiStefano

Nick DiStefano

Sr Staff Engineer @Uber, Previously iOS Lead @Tumblr