In early December 2021, rumors about a remote code execution (RCE) vulnerability in Log4j began circulating on social media, dubbed Log4Shell. Over the next three days, those rumors were confirmed and the immense scope of the vulnerability became clear. The vulnerability was found in almost all Java applications. Vulnerable versions of Log4j were in organizations’ applications’ direct dependencies and in their transitive dependencies. It was embedded in vendor products, including monitoring, visualization, and security tools. Remediating it required dependency updates, testing and deployment cycles, and redeployment of vendor software.

In the aftermath of this vulnerability, some organizations responded quickly and with relative efficiency. Others lost days, weeks and months before even beginning their response.

There is much we can learn from these differences among organizations, and this talk attempts to capture and synthesize some of those learnings.

This talk will (a) Describe three broad categories of enterprises based on their responses to Log4Shell and (b) Identify the key characteristics of each of these patterns.

Three broad categories of enterprises are:
- High performers/”The Good” - Had good visibility of the blast radius. Took control of the situation within days. Fixed most custom developed applications in a week or so. Had least amount of toil and human impact
- Medium performers/”The Not-so-good” - Had partial visibility of the blast radius. Took control of the situation within a month or so. Fixed most custom developed applications within 2 months. Had some human impact.
- Poor performers/”The bad” - No visibility of the blast radius. Lots of custom script and manual labor to understand the impact. Took months to fix only the “known” applications.
Key characteristics of the above groups
- High performers - (1) Have Software Composition Analysis (SCA) or similar capabilities to get full visibility of bill of material of all applications, (2) Have fully automated delivery pipelines, (3) Uses IaC (4) High ownership engineering culture / “You Built It, You Own It” (5) Developers own security too (with the help of InfoSec team consultation)
- Medium Performers - (1) Have partial coverage via SCA or similar capabilities (2) Have semi-automated delivery pipelines (3) Mix of cloud and on-prem production load (4) Mix of legacy and modern application (5) Bit of hero culture (6) High people impact including people canceling vacation time
- Poor performers - (1) Weak tooling, (2) Legacy applications and technology stack, (3) Silos of Dev, Ops and Security (4) Poor engineering culture

This talk is based on a paper (to be published during Fall 2022 by IT Revolution) written by Randy Shoup (Chief Architect, eBay), Mychael Nygard (Author of Release It), Chris Hill (Director, TMobile), Dominica DeGrandis (Principle Flow Advisor, Tasktop) and Topo Pal (VP, Fidelity Investments)

Key Takeaways:

Discovery, inventory and managing Risks - Open Source Software enters an enterprise in different “shapes and sizes” - from a few lines of code to full blown frameworks and platforms; from a developer tool to OS; from web proxy to RDBMS. And then some more - via vendor software. They also come in via many entryways - such as, developers downloading software from the internet, build pipelines downloading directly from the internet or using a “proxy” such as Nexus, Artifactory etc, production instances directly installing software from public repositories and so on. You will learn how you can (1) Have full visibility of what gets in and what gets used and how (2) Identify risks and (3) Design and implement controls to mitigate the risks
Fast impact analysis - You will learn how you can get a list of all the places where a particular open source software is used very fast.
Automated deployment and release - Once a fix is made, need to send the fix to production safely and promptly. You will learn about a few patterns to achieve this.
Identifying the fix and fixing it - Even the simplest fix can be very confusing and mind-numbing and frustrating at times when it comes to the scale of Log4Shell. You will learn about various patterns to identify and fix while reducing friction between teams.
It may be a good idea to establish a “fire drill” process going forward.