Realtime and Batch Processing of GPU Workloads

Abstract

SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc. A year ago we broke ground and went live with AI as a service providing RAG, inference for embeddings, LLM text, image and voice and we needed an efficient and low TCO platform to power the needs of the business. Our centralized AI Gateway has a prioritized job scheduler that we wrote and we will discuss how over 300 production use cases run workloads in a way that provide the SLAs for the demands required while keeping the GPU costs down. We also run on AWS around the globe and will discuss how the platform works in a multi cloud environment also keeping costs down in different ways in AWS while also meeting SLAs.


Speaker

Joseph Stein

Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member

Joe Stein is an Architect, Developer and Security Professional with over 25 years of experience. He has worked on production environments (mostly running Apache Kafka at the core also most often within a containerized environment) at Bloomberg, Verizon, EMC, CrowdStrike, Cisco, Bridgewater Associates, MUFG Union Bank and US Bank. He was also an Apache Kafka Committer and PMC member from Jan 2012- Aug 2016. Currently he is the Principal Architect of Research & Development at SS&C Technologies.

Read more

From the same track

Session

How Netflix Shapes our Fleet for Efficiency and Reliability

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Speaker image - Joseph Lynch

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Speaker image - Argha C

Argha C

Staff Software Engineer @Netflix Building Highly Available, High Throughput Systems

Session

From ms to µs: OSS Valkey Architecture Patterns for Modern AI

As AI applications demand faster and more intelligent data access, traditional caching strategies are hitting performance and reliability limits. 

Speaker image - Dumanshu Goyal

Dumanshu Goyal

Software Engineer @Airbnb - Leading Online Data Priorities, Previously @Google and @AWS

Session

One Platform to Serve Them All: Autoscaling Multi-Model LLM Serving

AI teams are moving to self-hosted inference away from hosted LLMs as fine-tuning drives model performance. The catch is scale, hundreds of variants create long-tail traffic, cold starts, and duplicated stacks.

Speaker image - Meryem Arik

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist

Session

Cost-Conscious Cloud: Designing Systems that Don't Break the Bank

Details coming soon.