How Do We Talk to Each Other? How Surfacing Communication Patterns in Organizations Can Help You Understand and Improve Your Resilience

As a system increases in inevitable complexity, it becomes impossible for a single operator to have a clear, unambiguous understanding of what's happening in the system. Understanding the system requires a joint effort between teammates and technology. Often, we are too focused on the single operator experience to improve this. In this talk, we will uncover how communication patterns in organizations can reveal how systems actually work in practice, vs how we think they work in theory -- and use this knowledge to improve the resilience of our systems.


Speaker

Nora Jones

Founder and CEO @jeli_io, Founder of Learning From Incidents (LFI) Online Community and Conference

Nora is the founder and CEO of Jeli. She is a dedicated and driven technology leader and software engineer with a passion for the intersection between how people and software work in practice in distributed systems. In November 2017 she keynoted at AWS re:Invent to share her experiences helping organizations large and small reach crucial availability with an audience of ~40,000 people, helping kick off the Chaos Engineering movement we see today. She created and founded the www.learningfromincidents.io movement to develop and open-source cross-organization learnings and analysis from reliability incidents across various organizations, and the business impacts of doing so.

Read more
Find Nora Jones at:

Date

Tuesday Oct 3 / 01:35PM PDT ( 50 minutes )

Location

Ballroom BC

Topics

Resiliency Communication Practical Applications System Resilience

Share

From the same track

Session Database

How Netflix Ensures Highly-Reliable Online Stateful Systems

Tuesday Oct 3 / 02:45PM PDT

Under most stateless services are stateful databases, caches, and systems which form the bedrock applications are built on.

Speaker image - Joseph Lynch
Joseph Lynch

Distributed Systems Engineer @Netflix Working on Online Datastores and Data Abstractions

Session Architecture

Disaster Recovery Across a Million Pieces

Tuesday Oct 3 / 10:35AM PDT

Data recovery is more than just backing up and restoring a data store. The goal of any disaster recovery effort is getting the system back to working as expected across all of its parts.

Speaker image - Michelle Brush
Michelle Brush

Engineering Director, SRE @Google, Previously Director of HealtheIntent Architecture @Cerner Corporation & Lead Engineer @Garmin, Author of "2 out of the 97 Things Every SRE Should Know"

Session Resiliency

Orchestrating Resilience: Building Modern Asynchronous Systems

Tuesday Oct 3 / 03:55PM PDT

Building asynchronous, event-driven systems can be daunting. Managing states, ensuring resilience, maintaining traceability, and handling a myriad of other challenges often require more effort than building the functionality itself.

Speaker image - Sai Pragna Etikyala
Sai Pragna Etikyala

Technical Lead @Twilio

Session Reliability

Designing Fault-Tolerant Software with Control System Transparency

Tuesday Oct 3 / 11:45AM PDT

Teams at NASA and JPL that create mission-critical software for spacecraft take a principled approach to fault tolerance. Let's see how those same principles, centered around a concept of transparency, can help us achieve reliability in pragmatic, modern software delivery settings.

Speaker image - Jon Moore
Jon Moore

Staff Software Engineer @Stripe with over 35 years of software engineering experience across both academia and industry

Session

Unconference: Designing for Resilience

Tuesday Oct 3 / 05:05PM PDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.