Presentation: "And It All Went Horribly Wrong: Debugging Production Systems"
Time: Thursday 12:05 - 13:05
Location: Metropolitan Ballroom II & III
Despite the implications of its most breathless proponents, the trend towards building putatively reliable systems out of wholly unreliable components does not mean the end of software defects: even where bugs do not result in system outage, they can induce misbehavior, degradations of service, and cascading failures that themselves can lead to outage. So it remains critical to debug our software -- and more critical than ever that we are able to do so in production environments. This talk will discuss the essential technologies for debugging such systems: postmortem debugging (when failure is fatal) and dynamic instrumentation (when failure is transient). We will discuss the history, current state-of-the-art, intersections, and open problems of these technologies -- and how they have been shaped in the kiln of unspeakable pain that is production systems failure.