Realtime and Batch Processing of GPU Workloads

Abstract

SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc. A year ago we broke ground and went live with AI as a service providing RAG, inference for embeddings, LLM text, image and voice and we needed an efficient and low TCO platform to power the needs of the business. Our centralized AI Gateway has a prioritized job scheduler that we wrote and we will discuss how over 300 production use cases run workloads in a way that provide the SLAs for the demands required while keeping the GPU costs down. We also run on AWS around the globe and will discuss how the platform works in a multi cloud environment also keeping costs down in different ways in AWS while also meeting SLAs.


Speaker

Joseph Stein

Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member

Joe Stein is an Architect, Developer and Security Professional with over 25 years of experience. He has worked on production environments (mostly running Apache Kafka at the core also most often within a containerized environment) at Bloomberg, Verizon, EMC, CrowdStrike, Cisco, Bridgewater Associates, MUFG Union Bank and US Bank. He was also an Apache Kafka Committer and PMC member from Jan 2012- Aug 2016. Currently he is the Principal Architect of Research & Development at SS&C Technologies.

Read more

From the same track

Session

How Netflix Shapes our Fleet for Efficiency and Reliability

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Speaker image - Joseph Lynch

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Speaker image - Argha C

Argha C

Staff Software Engineer @Netflix Building Highly Available, High Throughput Systems