Observability & SRE at QCon San Francisco 2025

At QCon San Francisco 2025, discover the emerging trends trends and practices in Observability & SRE directly from the senior practitioners who are defending what's next.

November 17–21, 2025

Hyatt Regency, San Francisco

Early Bird Deadline November 11th

Conference: $2,970

Secure early bird savings - deadline coming soon!
Need to convince your boss? Use our templates.

Observability & SRE sessions at QCon San Francisco 2025

Nov 17

Continuous Delivery for Foundational Platforms

Platform teams frequently inherit systems that were never architected for their current scale, yet are so foundational that downtime can halt the business.

Ian Nowland

Ian Nowland

CEO @Junction Labs, Author of O'Reilly's Platform Engineering, Previously SVP Core Engineering at Datadog and Leader of AWS Nitro

Nov 17

Beyond Line Charts: Why Some Diversity in Telemetry Visualization Is Long Overdue

For decades, visualization of service metrics overwhelmingly converges to line charts. The time-centric nature of real-time telemetry further cemented this phenomenon via storage layouts and domain-specific query languages.

Yao Yue

Yao Yue

Platform Engineer, Distributed System Aficionado, Cache Expert, and the Founder of IOP Systems

Nov 17

Architecting a Centralized Platform for Data Deletion at Netflix

What does it take to safely delete data at Netflix scale? In large-scale systems, data deletion cuts across infrastructure, reliability, and performance complexities.

Vidhya Arvind

Vidhya Arvind

Tech Lead & a Founding Architect for the Data Abstraction Platform @Netflix, Previously @Box and @Verizon

Shawn Liu

Shawn Liu

Senior Software Engineer @Netflix, Building Reliable and Extensible Systems for Consumer Data Lifecycle at Scale

Nov 17

Enhancing Reliability Using Service-Level Prioritized Load Shedding at Netflix

How does Netflix maintain a seamless viewing experience for millions of users, especially during traffic spikes or when backend datastores are overloaded? Autoscaling can help during traffic spikes, but it costs money, takes a few minutes to kick in, and capacity may not always be available.

Anirudh Mendiratta

Anirudh Mendiratta

Staff Software Engineer, Playback Lifecycle @Netflix, Previously @Amazon Prime Video and @fuboTV

Benjamin Fedorka

Benjamin Fedorka

Staff Software Engineer, Productivity Engineering @Netflix

Nov 18

Monolith Down: Cleaning Up After the Great Identity Migration Disaster

One does not simply migrate a monolith. Imagine a team working on a monolith-to-microservices migration of a healthcare portal. A foundational first step - migrating to a commercial identity provider - takes 9 months, only to bring the entire portal crashing down on release day.

Sonya Natanzon

Sonya Natanzon

VP of Engineering @Heartflow, Decomplexifier, Software Architect, Healthcare and Life Sciences Specialist, and International speaker

Nov 18

Modernizing Relevance at Scale: LinkedIn’s Migration Journey to Serve Billions of Users

How do you deliver relevant and personalized recommendations to nearly a billion professionals—instantly, reliably, and at scale? At LinkedIn, the answer has been a multi-year journey of architectural reinvention.

Nishant Lakshmikanth

Nishant Lakshmikanth

Engineering Manager @LinkedIn, Leading Infrastructure for "People You May Know" and "People Follows", Previously @AWS and @Cisco

Nov 19

The Human Toll of Incidents & Ways To Mitigate It

Have you ever wondered what it's like to respond to a significant incident? Walk through an hour by hour reconstruction of an incident response or two, focusing on what it was like to be "in the room" and the human response to the incidents.

Kyle Lexmond

Kyle Lexmond

Production Engineer @Meta, Previously @AWS and @Twitter

Nov 19

Instrumentation at Scale: Having Your Performance Cake and Eating It Too

In high-performance code, a single misplaced counter increment can cost more than the operation it’s measuring. That creates a paradox: instrument too much and you slow the system down; instrument too little and you miss the insights you need to continuously deliver.

Brian Martin

Brian Martin

Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter

Nov 19

When Incidents Refuse to End

As engineers, we’re used to managing failure, but long-running outages hit differently. They stretch teams, systems, and assumptions about how incidents “should” play out.

Vanessa Huerta Granda

Vanessa Huerta Granda

Resiliency Manager @Enova, Co-Author of the Howie Guide on Post Incident Analysis

Nov 19

How Netflix Shapes our Fleet for Efficiency and Reliability

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Joseph Lynch

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Argha C

Argha C

Staff Software Engineer @Netflix - Leading Netflix's Cloud Scalability Efforts for Live

Nov 19

Week-Long Outage: Lifelong Lessons

Routine database upgrades should be straightforward, especially with familiar, well-established technology. We were confident heading into our Elasticsearch upgrade, equipped with a solid plan and excited to see performance gains like we had seen from past upgrades.

Molly Struve

Molly Struve

Staff Site Reliability Engineer @Netflix

Nov 19

The Time it Wasn't DNS

In January of 2023, the Microsoft Azure Wide Area Network experienced a global outage. If you were a Microsoft customer at the time, you were impacted by this outage.

Sean Klein

Sean Klein

Principal Technical Program Manager - Modern Incident Analysis @Microsoft Azure

Need to convince your boss? Use our templates.

Explore the schedule

QCon is where you discover what’s next, from the senior practitioners building it. We focus on emerging patterns proven in production, sharing the unfiltered story: the real-world trade-offs, the hard-won lessons, and what it actually took to ship.

Dio Synodinos

President, C4Media (makers of InfoQ and QCon)

Conversations that turn insight into impact

QCon for your team

The scheduled sessions at QCon are the agenda, but the real value is in the unscripted moments: the whiteboard debates in an unconference, the candid advice over coffee, the speaker dinner stories about failures and trade-offs. That's the perspective you can't get from a screen.

Luca Mezzalira,

Principal Solutions Architect, QCon Speaker, O'Reilly Author, YouTuber

QCon QCon for your team

QCon is designed for senior practitioners to move ideas forward and solve problems with peers.

Connect with senior developers who understand your challenges. Whether brainstorming new ideas, exploring learning paths, or engaging in casual conversations, our social events and learning spaces are designed for all interaction styles, helping you leave with fresh insights, new connections, and actionable ideas. See all Social Events See all Peer Sharing activities

Convince your boss

QCon Difference

Need to get approval to attend QCon San Francisco 2025?
We’ve made it easier.

Download a ready-to-use “Convince Your Boss” template, perfect for sharing with your manager.

Get a PDF version of the Observability & SRE talks at the conference, perfect for sharing with your manager or teammates who want to see what’s covered.


Download the convince your boss template

Get a PDF of the Observability & SRE talks

Unlock your potential at QCon San Francisco 2025

  • Gain concrete strategies from 60+ hand picked speakers across 12 curated tracks.

  • Real-world talks curated for depth, value, without hidden product pitches.

  • Network with peers at Unconferences, in the 'hallway track', during extended breaks, over lunch, and at conference socials.

  • Gain 12 months on-demand access to session recordings after the conference to continue your learning journey.

We've helped thousands of senior software engineers, software architects and tech leaders adopt the right patterns & practices for over 20 years.
Amazon Logo Airbnb Logo american airlines Logo AE Logo Conde Nast Logo Ebay Logo Meta Logo Apple Logo Etsy Logo JPMorgan Logo Nasa Logo netflix Logo Oracle Logo Paypal Logo Goldman Logo spotify Logo SalesForce Logo uber Logo tesla Logo accentrue Logo charles Logo Fedex Logo Hulu Logo Google Logo Intuit Logo mckinsey Logo microsoft Logo workday Logo youtube Logo Amazon Logo Airbnb Logo american airlines Logo AE Logo Conde Nast Logo Ebay Logo Meta Logo Apple Logo Etsy Logo JPMorgan Logo Nasa Logo netflix Logo Oracle Logo Paypal Logo Goldman Logo spotify Logo SalesForce Logo uber Logo tesla Logo accentrue Logo charles Logo Fedex Logo Hulu Logo Google Logo Intuit Logo mckinsey Logo microsoft Logo workday Logo youtube Logo