Realtime and Batch Processing of GPU Workloads

Abstract

SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc. A year ago we broke ground and went live with AI as a service providing RAG, inference for embeddings, LLM text, image and voice and we needed an efficient and low TCO platform to power the needs of the business. Our centralized AI Gateway has a prioritized job scheduler that we wrote and we will discuss how over 300 production use cases run workloads in a way that provide the SLAs for the demands required while keeping the GPU costs down. We also run on AWS around the globe and will discuss how the platform works in a multi cloud environment also keeping costs down in different ways in AWS while also meeting SLAs.


Speaker

Joseph Stein

Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member

Joe Stein is an Architect, Developer and Security Professional with over 25 years of experience. He has worked on production environments (mostly running Apache Kafka at the core also most often within a containerized environment) at Bloomberg, Verizon, EMC, CrowdStrike, Cisco, Bridgewater Associates, MUFG Union Bank and US Bank. He was also an Apache Kafka Committer and PMC member from Jan 2012- Aug 2016. Currently he is the Principal Architect of Research & Development at SS&C Technologies.

Read more
Find Joseph Stein at:

From the same track

Session

How Netflix Shapes our Fleet for Efficiency and Reliability

Wednesday Nov 19 / 11:45AM PST

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Speaker image - Joseph Lynch

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Speaker image - Argha C

Argha C

Staff Software Engineer @Netflix - Leading Netflix's Cloud Scalability Efforts for Live

Session

From ms to µs: OSS Valkey Architecture Patterns for Modern AI

Wednesday Nov 19 / 02:45PM PST

As AI applications demand faster and more intelligent data access, traditional caching strategies are hitting performance and reliability limits. 

Speaker image - Dumanshu Goyal

Dumanshu Goyal

Uber Technical Lead @Airbnb Powering $11B Transactions, Formerly @Google and @AWS

Session

One Platform to Serve Them All: Autoscaling Multi-Model LLM Serving

Wednesday Nov 19 / 10:35AM PST

AI teams are moving to self-hosted inference away from hosted LLMs as fine-tuning drives model performance. The catch is scale, hundreds of variants create long-tail traffic, cold starts, and duplicated stacks.

Speaker image - Meryem Arik

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist

Session

Write-Ahead Intent Log: A Foundation for Efficient CDC at Scale

Wednesday Nov 19 / 03:55PM PST

As companies grow, so does the complexity of keeping distributed systems in sync. At DoorDash, we tackled this challenge while building a high-throughput, domain-oriented data platform for capturing changes across hundreds of services.