Observability & SRE at QCon San Francisco 2025
At QCon San Francisco 2025, discover the emerging trends trends and practices in Observability & SRE directly from the senior practitioners who are defending what's next.
November 17–21, 2025
Hyatt Regency, San Francisco
Early Bird Deadline November 11th
Conference: $2,970
Secure early bird savings - deadline coming soon!
Need to convince your boss?
Use our templates.
Observability & SRE sessions at QCon San Francisco 2025
Nov 17
Continuous Delivery for Foundational Platforms
Platform teams frequently inherit systems that were never architected for their current scale, yet are so foundational that downtime can halt the business.
Ian Nowland
CEO @Junction Labs, Author of O'Reilly's Platform Engineering, Previously SVP Core Engineering at Datadog and Leader of AWS Nitro
Nov 17
Beyond Line Charts: Why Some Diversity in Telemetry Visualization Is Long Overdue
For decades, visualization of service metrics overwhelmingly converges to line charts. The time-centric nature of real-time telemetry further cemented this phenomenon via storage layouts and domain-specific query languages.
Yao Yue
Platform Engineer, Distributed System Aficionado, Cache Expert, and the Founder of IOP Systems
Nov 17
Architecting a Centralized Platform for Data Deletion at Netflix
What does it take to safely delete data at Netflix scale? In large-scale systems, data deletion cuts across infrastructure, reliability, and performance complexities.
Vidhya Arvind
Tech Lead & a Founding Architect for the Data Abstraction Platform @Netflix, Previously @Box and @Verizon
Shawn Liu
Senior Software Engineer @Netflix, Building Reliable and Extensible Systems for Consumer Data Lifecycle at Scale
Nov 17
Enhancing Reliability Using Service-Level Prioritized Load Shedding at Netflix
How does Netflix maintain a seamless viewing experience for millions of users, especially during traffic spikes or when backend datastores are overloaded? Autoscaling can help during traffic spikes, but it costs money, takes a few minutes to kick in, and capacity may not always be available.
Anirudh Mendiratta
Staff Software Engineer, Playback Lifecycle @Netflix, Previously @Amazon Prime Video and @fuboTV
Benjamin Fedorka
Staff Software Engineer, Productivity Engineering @Netflix
Nov 18
Monolith Down: Cleaning Up After the Great Identity Migration Disaster
One does not simply migrate a monolith. Imagine a team working on a monolith-to-microservices migration of a healthcare portal. A foundational first step - migrating to a commercial identity provider - takes 9 months, only to bring the entire portal crashing down on release day.
Sonya Natanzon
VP of Engineering @Heartflow, Decomplexifier, Software Architect, Healthcare and Life Sciences Specialist, and International speaker
Nov 18
Modernizing Relevance at Scale: LinkedIn’s Migration Journey to Serve Billions of Users
How do you deliver relevant and personalized recommendations to nearly a billion professionals—instantly, reliably, and at scale? At LinkedIn, the answer has been a multi-year journey of architectural reinvention.
Nishant Lakshmikanth
Engineering Manager @LinkedIn, Leading Infrastructure for "People You May Know" and "People Follows", Previously @AWS and @Cisco
Nov 19
The Human Toll of Incidents & Ways To Mitigate It
Have you ever wondered what it's like to respond to a significant incident? Walk through an hour by hour reconstruction of an incident response or two, focusing on what it was like to be "in the room" and the human response to the incidents.
Kyle Lexmond
Production Engineer @Meta, Previously @AWS and @Twitter
Nov 19
Instrumentation at Scale: Having Your Performance Cake and Eating It Too
In high-performance code, a single misplaced counter increment can cost more than the operation it’s measuring. That creates a paradox: instrument too much and you slow the system down; instrument too little and you miss the insights you need to continuously deliver.
Brian Martin
Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter
Nov 19
When Incidents Refuse to End
As engineers, we’re used to managing failure, but long-running outages hit differently. They stretch teams, systems, and assumptions about how incidents “should” play out.
Vanessa Huerta Granda
Resiliency Manager @Enova, Co-Author of the Howie Guide on Post Incident Analysis
Nov 19
How Netflix Shapes our Fleet for Efficiency and Reliability
Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.
Joseph Lynch
Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services
Argha C
Staff Software Engineer @Netflix - Leading Netflix's Cloud Scalability Efforts for Live
Nov 19
Week-Long Outage: Lifelong Lessons
Routine database upgrades should be straightforward, especially with familiar, well-established technology. We were confident heading into our Elasticsearch upgrade, equipped with a solid plan and excited to see performance gains like we had seen from past upgrades.
Molly Struve
Staff Site Reliability Engineer @Netflix
Nov 19
The Time it Wasn't DNS
In January of 2023, the Microsoft Azure Wide Area Network experienced a global outage. If you were a Microsoft customer at the time, you were impacted by this outage.
Sean Klein
Principal Technical Program Manager - Modern Incident Analysis @Microsoft Azure
Need to convince your boss? Use our templates.
Explore the scheduleQCon is where you discover what’s next, from the senior practitioners building it. We focus on emerging patterns proven in production, sharing the unfiltered story: the real-world trade-offs, the hard-won lessons, and what it actually took to ship.
President, C4Media (makers of InfoQ and QCon)
Conversations that turn insight into impact
The scheduled sessions at QCon are the agenda, but the real value is in the unscripted moments: the whiteboard debates in an unconference, the candid advice over coffee, the speaker dinner stories about failures and trade-offs. That's the perspective you can't get from a screen.
Principal Solutions Architect, QCon Speaker, O'Reilly Author, YouTuber
QCon is designed for senior practitioners to move ideas forward and solve problems with peers.
Connect with senior developers who understand your challenges. Whether brainstorming new ideas, exploring learning paths, or engaging in casual conversations, our social events and learning spaces are designed for all interaction styles, helping you leave with fresh insights, new connections, and actionable ideas. See all Social Events See all Peer Sharing activities
Convince your boss
Need to get approval to attend
QCon San Francisco 2025?
We’ve made it easier.
Download a ready-to-use “Convince Your Boss” template, perfect for sharing with your manager.
Get a PDF version of the Observability & SRE talks at the conference, perfect for sharing with your manager or teammates who want to see what’s covered.
Download the convince your boss template
Get a PDF of the Observability & SRE talks
Unlock your potential at QCon San Francisco 2025
-
Gain concrete strategies from 60+ hand picked speakers across 12 curated tracks.
-
Real-world talks curated for depth, value, without hidden product pitches.
-
Network with peers at Unconferences, in the 'hallway track', during extended breaks, over lunch, and at conference socials.
-
Gain 12 months on-demand access to session recordings after the conference to continue your learning journey.