Many API backends can be scaled by deploying multiple instances, adding a load balancer in front of them, and pointing clients at the load balancer. Unfortunately, this simple plan doesn't work that well when API requests can take more than a few seconds, and if we have to deal with sudden bursts of requests. This is exactly what happens when deploying compute-heavy inference workloads on Kubernetes; specifically with "generative AI" like large language and stable diffusion models.
This is the kind of scenario where asynchronous architectures and message queues can save the day (or at least considerably improve it) by buffering requests and decoupling clients and servers (which become producers and consumers).
In this hands-on workshop, we will implement, deploy, and scale an application using a large language model on Kubernetes. We will leverage open source components such as:
- RabbitMQ and PostgreSQL to store requests and responses
- Benthos to implement API servers, producers, and consumers without writing code
- Prometheus, Grafana, and KEDA for observability, dashboard, and autoscaling
- Helm and Helmfile to automate deployment as much as possible
This workshop is for:
- Data scientists who have been asked to deploy their models on Kubernetes
- Ops folks who have been asked to support their fellow data scientists
- Everyone in between!
Key Takeaways
1 An understanding of the challenges associated with the deployment and scaling of "Gen AI" and similar compute-heavy workloads.
2 Best practices and tools (like Benthos, KEDA...) to implement asynchronous data pipelines and autoscaling on Kubernetes.
3 An open source repository with all the samples, code, and configurations used during the workshop.
Speaker
Jérôme Petazzoni
Founder @Tiny Shell Script LLC, Cofounder @Enix France
Jérôme was part of the team that built, scaled, and operated the dotCloud PAAS, before that company became Docker. He worked seven years at the container startup, where he wore countless hats and ran containers in production before it was cool. He loves to share what he knows, which led him to give hundreds of talks and demos on containers, Docker, and Kubernetes. He trained thousands of people to deploy their apps in confidence on these platforms, and continues to do so as an independent consultant. He values diversity, and strives to be a good ally, or at least a decent social justice sidekick. He also collects musical instruments and can arguably play the theme of Zelda on a dozen of them.