Summary
Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.
The presentation titled Parting the Clouds: The Rise of Disaggregated Systems discusses the evolving architecture of cloud systems towards disaggregation. The speaker, Murat Demirbas, highlights the following key points:
- Shift in Cloud Architecture: Traditional cloud systems with shared-nothing designs are insufficient for modern cloud demands. Disaggregated systems decouple compute, storage, and logs to allow independent scaling, facilitate faster failover, and provide shared durable storage.
- Economic and Operational Motivations: The decoupling arises from economic needs, such as elastic scaling and fault isolation. Compute is costly and fluctuates rapidly, whereas storage is cheap and stable, prompting the need for separation.
- Technological Advances Enable Disaggregation: Improvements in high-speed networks and advanced networking technologies such as RDMA and CXL have made disaggregated systems feasible.
- Benefits of Disaggregation: These include elastic scalability, fault isolation, faster recovery, simplified operations, and customer cost benefits via pay-per-use models.
- Challenges and Design Trade-offs: Disaggregation introduces new challenges like latency, synchronization issues, and potential metastable failures. Innovative solutions such as prefetching, using faster fabrics, and intelligent caching are discussed as mitigations.
- Connection to Distributed Systems Principles: Modern disaggregated architectures draw on foundational ideas in distributed systems, such as Paxos, to separate roles and integrate fault tolerance natively.
- Future Directions: The talk concludes with an outlook on self-assembling databases that could auto-configure for workload shifts and leverage advances in AI and hardware specialization.
This is the end of the AI-generated content.
Abstract
Cloud systems are undergoing an architectural shift. Traditional shared-nothing designs struggle to deliver the elasticity, availability, and operational simplicity that the cloud demands. The new generation of cloud systems (Amazon Aurora, Microsoft Socrates, Google AlloyDB, Snowflake, S3 Athena, Neon) embrace disaggregation, and decouple compute, storage, and increasingly, logs. This separation allows independent scaling, faster failover, and shared durable storage across tenants, but it also brings new performance tradeoffs and system design challenges.
In this talk, I will survey why disaggregation matters, what we have learned from the past decade of industrial systems, and where research and practice are headed next. I will explain the economic and operational motivations (elastic scaling, fault isolation, pooling) and discuss how disaggregation reshapes core database components like logging, recovery, and concurrency control. I'll discuss how design choices like "log-as-database", shared-storage replication, and caching tiers affect throughput, latency, and cost.
Finally, I’ll connect these modern architectures back to the timeless ideas in distributed systems, using Lamport's proposers, acceptors, and learners as a way to reason about how we disaggregate coordination, availability, durability, and computation. Viewed through this lens, disaggregation is our new Paxos: an architecture for separating roles so systems can scale, fail, and recover gracefully.
Speaker
Murat Demirbas
Principal Research Scientist @MongoDB Research, Previously Principal Applied Scientist @AWS and a Professor of Computer Science at the University at Buffalo (SUNY)
Murat Demirbas is a Principal Research Scientist at MongoDB Research. Before joining MongoDB, he was a Principal Applied Scientist at AWS for 3 years, and a Professor of Computer Science at the University at Buffalo (SUNY) for 16 years. His work spans distributed systems and databases, with contributions to hybrid logical clocks, WPaxos, PigPaxos, and Paxos Quorum Reads. He received the NSF CAREER Award in 2008 and the UB School of Engineering Senior Researcher of the Year Award in 2016. Murat writes a widely read blog on distributed systems at http://muratbuffalo.blogspot.com, with over 5.6 million views.