Summary
Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.
In the presentation titled Stripe’s Docdb: How Zero-Downtime Data Movement Powers Trillion-Dollar Payment Processing, Jimmy Morzaria, a Staff Software Engineer at Stripe, delves into Stripe's custom-built document database, DocDB, and its data movement platform, which enables scalable and reliable payment processing.
Key Points:
- Scale and Reliability: Stripe processes over $1 trillion in payments annually using DocDB, handling over 5 million queries per second across more than 2000 database shards with 5.59 reliability.
- Architecture and Evolution: Originally, Stripe started with MongoDB which evolved into DocDB to handle exponential growth by introducing a database proxy service and a control plane.
- Zero-Downtime Data Movement: The engineering approach to data movement ensures zero downtime through strategies like version gating and bidirectional replication, crucial for horizontal scalability and database upgrades.
- Use Cases and Flexibility: The platform allows Stripe to add or merge database shards, facilitate version upgrades, and support transitions between multi-tenancy and single tenancy environments.
Design Principles:
- Consistency: Ensures data integrity across source and target shards during migration.
- Availability: Minimal downtime is essential due to the critical nature of payment processing.
- Performance: Preserve database performance and throughput during data migrations.
- Scalability: Adaptable to varying sizes and numbers of database shards and migrations.
Takeaways:
- Reliability is non-negotiable: Stripe prioritizes API reliability as a key differentiator, with the data movement platform designed to maintain application uptime.
- Invest in foundations: Stripe's data movement platform supports a variety of missions, from scaling to version upgrades, maintaining reliability and performance.
This is the end of the AI-generated content.
Abstract
Stripe processes over $1 trillion in payments annually with industry-leading reliability, powered by its custom-built document database, DocDB, built on top of open source MongoDB. Stripe's DocDB serves over five million queries per second from Stripe’s product applications. Our deployment is also highly customized to provide low latency and diverse access, with 10,000+ distinct query shapes over petabytes of important financial data that lives in 5,000+ collections distributed over 2,000+ database shards.
This session unveils the engineering behind Stripe’s DocDB and dives deep into its Data Movement Platform, a foundational primitive that allows us to migrate terabytes of data across database shards at lightning speed and is key to provide a durable, reliable, scalable, and efficient database-as-a-service to product teams at Stripe. We’ll explore the challenges of scaling the online database backing a trillion-dollar payment infrastructure, ensuring data consistency and high reliability. Key learnings include optimizing bulk data ingestion in MongoDB and implementing a traffic switch protocol for seamless, zero-downtime data movement across database shards. Attendees will gain practical insights for building high-availability database systems and navigating complex migrations without compromising performance.
Interview:
What is your session about, and why is it important for senior software developers?
My talk covers the core architectural strategies to horizontally scale open-source MongoDB to handle millions of queries per second over petabytes of data with high reliability and low-latency in multi-tenant environments. A key focus will be on the critical role of a robust, zero-downtime data movement platform in enabling elastic scaling without impacting the performance and availability of your workloads.
Why is it critical for software leaders to focus on this topic right now, as we head into 2026?
As we head toward 2026, reliability isn't just a feature—it's the ultimate moat in an era of AI-driven scale. At the core of this reliability challenge are databases. Investing in the foundational primitives to build a reliable, scalable and performant database platform for product engineering now fortifies the data backbone of your systems, turning potential single points of failure into scalable strengths that sustain growth amid scaling pressures.
What are the common challenges developers and architects face in this area?
Developers and architects face significant challenges when horizontally scaling stateful systems such as databases, primarily revolving around increased architectural and operational complexity, maintaining data consistency, high availability and ensuring optimal query performance during migrations in a distributed environment.
What's one thing you hope attendees will implement immediately after your talk?
I hope attendees audit their core infrastructure systems—like databases—for hidden reliability gaps right away: map out failure modes in a quick chaos drill or dependency graph, then prioritize one fix. It's the spark that turns "good enough" systems into unbreakable ones, and we've seen it work well with teams at Stripe.
What makes QCon stand out as a conference for senior software professionals?
QCon goes beyond hype with talks from builders who've solved the same hairy problems you face, plus ample time for peer discussions that yield fresh perspectives about the problems top of mind for you.
Speaker
Jimmy Morzaria
Staff Software Engineer @Stripe, Previously Software Engineer on Amazon QLDB and Amazon Managed Streaming for Kafka
Jimmy is a Staff Software Engineer at Stripe, where he contributes to developing secure, reliable, scalable, and efficient database infrastructure to support Stripe’s mission of increasing the GDP of the internet by building the economic infrastructure—a platform that processes over $1 trillion in payments annually. Previously, he spent over five years at Amazon Web Services, contributing to the development of a greenfield database service - Amazon Quantum Ledger Database and a managed streaming platform - Amazon Managed Streaming for Kafka. Outside of tackling large scale distributed systems challenges, Jimmy loves traveling, hiking, and spending time with friends and family, always seeking new experiences to fuel his creativity.