Live Resharding Without Regret: Lessons from Building Valkey's Atomic Slot Migration

Abstract

Sharding is easy. Resharding under heavy load is notoriously difficult. How do you move gigabytes of state across live database nodes without dropping keys, blocking the main event loop, or breaking client abstractions?

Using Valkey and Redis as case studies, we will survey different resharding architectures and dive deep into Valkey's new Atomic Slot Migration. We'll walk through the practical tradeoffs of these approaches, covering client redirections (MOVED/ASK), fork-based slot snapshotting, and rollback staging. Along the way, we'll shine a light on the rough edge cases that actually matter in production.