Evaluating and Deploying State-of-the-Art Hardware to Meet the Challenges of Modern Workloads

At GEICO we are on a journey to entirely modernize our Infrastructure. We are building an open-source, cloud-agnostic hybrid stack to run across public and on prem private cloud infrastructure without having to expose vendor specific stacks to our application developers. This hybrid stack gives us flexibility to run workloads wherever we need them, and to migrate significant workloads from the public cloud to our on-prem infrastructure where cost or latency are better served for those workloads. 

Through that process we had to select new colocation facilities (moving from 6 facilities to 3 better balanced and geo-distributed sites), Open Hardware servers (based on the workload characteristics of our legacy and cloud footprints leveraging OpenBMC and Redfish for management), Open Network solutions (switches, routers and our own NOS for those systems), and OpenStack (including Ceph for SDS) to deliver fleet management solutions across our on prem footprint. 

This change is driving 30% to 3x cost savings per workload relative to the equivalent capacity, latency, and up time in our current cloud providers. We have also completely redesigned our current on prem network and servers from a demilitarized zone isolated network approach on MPLS cirtuits to a fully untrusted network (only decrypt where the user/account/application is allowed to have access) using direct internet access, and profoundly simplifying our hardware skus (going from over 200 instances in the public cloud down to 5 primary, and 15 specialty solutions to be phased out as our applications modernize). 

In this session we will walk through the hardware selection process taking our workload characteristics from the cloud and using that to optimize a subset of SKUs for our on prem cloud.


Speaker

Rebecca Weekly

VP of Infrastructure @GEICO

Rebecca is VP of Platform and Infrastructure Engineering at GEICO, leading their hybrid cloud transformation to repatriate key workloads, develop and deliver a true hybrid Open Source stack, and modernize their physical infrastructure. She recently led the organization that built, validated, and automated the full lifecycle management of Cloudflare’s compute, network, storage, and AI systems in 300+ cities and 100+ countries delivering >20% of the world’s Internet traffic. Rebecca is the former Open Compute Project President and Chairperson, helping ensure that hyperscale innovation can be scaled to all organizations, is on Fortune’s 40 Under 40 2020 list of most influential people in Technology, is on Business Insider's 2022 Cloudverse100 list of the builders of the next generation of the Internet, and was voted CloudGirls Trailblazer for women in technology in 2023. In her "spare" time, she is the lead singer of the funk and soul band, Sinister Dexter, and enjoys her passion of dance and choreography. She has two amazing little boys, and loves to run (after them, and on her own). Rebecca graduated from MIT with a degree in Computer Science and Electrical Engineering.

Read more

Date

Wednesday Nov 20 / 01:35PM PST ( 50 minutes )

Location

Seacliff ABC

Share

From the same track

Session

Maximizing Deep Learning Performance: Hardware and Software Innovations for Optimizing AI Workloads

Wednesday Nov 20 / 10:35AM PST

As deep learning continues to drive advancements across various industries, efficiently navigating the landscape of specialized AI hardware has huge impact in cost and speed of operation.

Speaker image - Bibek Bhattarai

Bibek Bhattarai

AI Technical Lead @Intel, Computer Scientist Invested in Hardware-Software Optimization, Building Scalable Data Analytics, Mining, and Learning Systems

Session

High-Resolution Platform Observability

Wednesday Nov 20 / 02:45PM PST

Many observability tools fail to provide us with the relevant insights for understanding hardware health and utilization.

Speaker image - Brian Martin

Brian Martin

Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter

Session

Optimizing Custom Workloads with RISC-V

Wednesday Nov 20 / 11:45AM PST

This talk will explore how RISC-V architecture can accelerate custom workloads, focusing on AI/ML applications. We’ll start by examining the RISC-V ecosystem and its increasing relevance in the software development landscape.

Speaker image - Ludovic Henry

Ludovic Henry

Member of Technical Staff @Rivos, Performance-Minded Engineer, Hardware & Software, Previously @Xamarin, @Microsoft, @Datadog

Session

Unleashing Llama's Potential: CPU-Based Fine-Tuning

Wednesday Nov 20 / 03:55PM PST

Details coming soon.

Speaker image - Anil Rajput

Anil Rajput

AMD Fellow, Software System Design Eng. Java Committee Chair @SPEC, Architected Industry Standard Benchmarks and Authored Best Practices Guides for Platform Engineering and Cloud

Speaker image - Dr. Rema Hariharan

Dr. Rema Hariharan

PRincipal Engineer @AMD, Seasoned Performance Engineer With a Base in Quantitative Sciences and a Penchant for Root-Causing