High-Resolution Platform Observability

Many observability tools fail to provide us with the relevant insights for understanding hardware health and utilization. Whether due to incomplete instrumentation of key components or resolution that’s too coarse to capture brief or intermittent disturbances, we’re often left with gaps in our understanding and questions left unanswered. In this talk, we’ll explore techniques and technologies for getting a more detailed and comprehensive understanding of hardware health and utilization. With more comprehensive hardware telemetry we can pinpoint issues or exonerate components. We can finally answer the questions we have and unlock better performance.


Speaker

Brian Martin

Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter

Brian is a software engineer who focuses on performance optimization and distributed systems. He worked at Twitter for 8 years, initially with the Cache Team and later as a member of the newly created Performance Team. After November 2022, Brian joined his teammates from Twitter as a co-founder of IOP Systems and continues to work on improving software and platform performance, efficiency, and reliability.

Read more

From the same track

Session

Evaluating and Deploying State-of-the-Art Hardware to Meet the Challenges of Modern Workloads

Wednesday Nov 20 / 01:35PM PST

At GEICO we are on a journey to entirely modernize our Infrastructure. We are building an open-source, cloud-agnostic hybrid stack to run across public and on prem private cloud infrastructure without having to expose vendor specific stacks to our application developers.

Speaker image - Rebecca Weekly

Rebecca Weekly

VP of Infrastructure @GEICO

Session

Maximizing Deep Learning Performance: Hardware and Software Innovations for Optimizing AI Workloads

Wednesday Nov 20 / 10:35AM PST

As deep learning continues to drive advancements across various industries, efficiently navigating the landscape of specialized AI hardware has huge impact in cost and speed of operation.

Speaker image - Bibek Bhattarai

Bibek Bhattarai

AI Technical Lead @Intel, Computer Scientist Invested in Hardware-Software Optimization, Building Scalable Data Analytics, Mining, and Learning Systems

Session

Optimizing Custom Workloads with RISC-V

Wednesday Nov 20 / 11:45AM PST

This talk will explore how RISC-V architecture can accelerate custom workloads, focusing on AI/ML applications. We’ll start by examining the RISC-V ecosystem and its increasing relevance in the software development landscape.

Speaker image - Ludovic Henry

Ludovic Henry

Member of Technical Staff @Rivos, Performance-Minded Engineer, Hardware & Software, Previously @Xamarin, @Microsoft, @Datadog

Session

Unleashing Llama's Potential: CPU-Based Fine-Tuning

Wednesday Nov 20 / 03:55PM PST

Details coming soon.

Speaker image - Anil Rajput

Anil Rajput

AMD Fellow, Software System Design Eng. Java Committee Chair @SPEC, Architected Industry Standard Benchmarks and Authored Best Practices Guides for Platform Engineering and Cloud

Speaker image - Dr. Rema Hariharan

Dr. Rema Hariharan

PRincipal Engineer @AMD, Seasoned Performance Engineer With a Base in Quantitative Sciences and a Penchant for Root-Causing