Abstract
In late 2024, Netflix made a bet: consolidate the vast majority of our relational database use cases onto a single engine: Amazon Aurora PostgreSQL. This talk uses Netflix’s Aurora PostgreSQL consolidation as a case study in platform engineering: how we turned risky, bespoke database migrations into a repeatable internal platform capability.
Over 12 months, we migrated 100+ unique workloads off a third-party, distributed, PostgreSQL-compatible store and onto Aurora PostgreSQL. The workloads varied widely in size, query patterns, availability requirements, and team risk tolerance, including use cases with tens of terabytes of data and business-critical traffic. To make this tractable, we built a migration platform that standardized the common path while preserving escape hatches for the hard cases: automated data movement, byte-for-byte validation, resumable workflows and transparent cutovers with only single-digit minutes of write downtime.
We’ll cover both the technical and organizational sides of the effort: how we minimized application-team churn, built confidence in our tooling, worked with external and internal partners, and coordinated a high-risk infrastructure change across hundreds of stakeholders without turning the platform team into a blocking approval gate.
Takeaways:
- How to turn a large migration program into an internal platform capability with reusable tooling, standardized workflows, and clear escape hatches.
- How to migrate business-critical relational workloads with near-zero downtime while keeping application-team effort low.
- How to build trust in migration tooling through validation harnesses, byte-for-byte correctness checks, resumable execution, and cutovers.
- How to make build-vs-buy decisions when vendor tools and internal platform requirements do not fully overlap.
- How to coordinate a high-risk infrastructure change across many teams without adding overhead for developer teams.