Leading and Analyzing Operational Incidents from Chaos to Resolution

If you work on systems that can adversely impact dependency availability or end customers, then this session is for you.

In today's fast-paced technical environment, high-severity incidents can strike at any moment, potentially impacting critical business services and customer experience. The difference between a well-managed incident and a chaotic response often comes down to effective leadership during these crucial moments. This training dives into real-world challenges and approaches to tackle them by drawing learnings from over a decade of experience while working on system handling millions of transactions per second.

We will tackle two areas that come up frequently:

  1. How can you take the role of a confident incident commander who can turn chaos into coordinated action. This session will allow you to cover all essential areas to create your own blue-print suitable for your space to drive incidents to resolution effectively.
  2. We will also look at what separates great incident analysis from others and how to take actions that matter for future proofing your systems.
     

Do you feel confused or worried about what actions to take when a system is not working as expected? If so, we will cover:

  • How to handle crisis situations around system outages and not just resolve the problem at hand but stand out in the organization while doing so.
  • How to write an effective post mortem analysis and convert a crisis situation to benefit your entire organization.

 
This this training will give you all the essentials to lead an event as well as write effective post mortems for operational incidents. By the end of it, you'll have the confidence and competence to lead your team through any technical crisis, turning potential disasters into opportunities for system improvement, personal and team growth.

Key Takeaways

1 Incident Leadership skills are a must if you aspire to become a technical lead or own systems that can impact multiple customers.

2 Crisis Management Techniques will not just benefit your customers when tough situations arise but will also help propel your career by standing out from your peers.

3 Post-Incident Analysis Excellence: Once an incident is passed, post incident analysis is usually delayed or left out. We will look at why they matter and what are battle tested ways to write one.

4 Future-Proofing Your Systems: Action item prioritization frameworks, implementation tracking mechanisms, metrics for measuring improvement, strategies for preventing similar incidents.


Speaker

Tejas Ghadge

Engineering Head @AWS Amplify, AWS Lambda Event Driven Applications and AWS Lambda Developer Experience where he leads an organization of 100+ engineers/managers

Tejas Ghadge is engineering head for AWS Amplify, AWS Lambda Event Driven Applications and AWS Lambda Developer Experience where he leads an organization of 100+ engineers/managers across multiple sites in US and Canada.

With over 14 years of experience at AWS, Tejas brings deep operational and architectural experience from - operating large scale (millions of requests per second) event driven systems, leading and analyzing hundreds of operational incidents and successfully launching dozens of delightful customer features for AWS Lambda and AWS Amplify customers. 

Read more