Log4Shell Response Patterns & Learnings From Them

In early December 2021, rumors about a remote code execution (RCE) vulnerability in Log4j began circulating on social media, dubbed Log4Shell. Over the next three days, those rumors were confirmed and the immense scope of the vulnerability became clear. The vulnerability was found in almost all Java applications. Vulnerable versions of Log4j were in organizations’ applications’ direct dependencies and in their transitive dependencies. It was embedded in vendor products, including monitoring, visualization, and security tools. Remediating it required dependency updates, testing and deployment cycles, and redeployment of vendor software.

In the aftermath of this vulnerability, some organizations responded quickly and with relative efficiency. Others lost days, weeks and months before even beginning their response.

There is much we can learn from these differences among organizations, and this talk attempts to capture and synthesize some of those learnings.

This talk will (a) Describe three broad categories of enterprises based on their responses to Log4Shell and  (b) Identify the key characteristics of each of these patterns. 

  1. Three broad categories of enterprises are: 
    • High performers/”The Good” - Had good visibility of the blast radius. Took control of the situation within days. Fixed most custom developed applications in a week or so. Had least amount of toil and human impact
    • Medium performers/”The Not-so-good” - Had partial visibility of the blast radius. Took control of the situation within a month or so. Fixed most custom developed applications within 2 months. Had some human impact.
    • Poor performers/”The bad” - No visibility of the blast radius. Lots of custom script and manual labor to understand the impact. Took months to fix only the “known” applications.
  2. Key characteristics of the above groups
    • High performers - (1) Have Software Composition Analysis (SCA) or similar capabilities to get full visibility of bill of material of all applications, (2) Have fully automated delivery pipelines, (3) Uses IaC (4) High ownership engineering culture / “You Built It, You Own It” (5) Developers own security too (with the help of InfoSec team consultation)
    • Medium Performers - (1) Have partial coverage via SCA or similar capabilities (2) Have semi-automated delivery pipelines (3) Mix of cloud and on-prem production load (4) Mix of legacy and modern application (5)  Bit of hero culture (6) High people impact including people canceling vacation time
    • Poor performers - (1) Weak tooling, (2) Legacy applications and technology stack, (3) Silos of Dev, Ops and Security (4) Poor engineering culture

This talk is based on a paper (to be published during Fall 2022 by IT Revolution) written by Randy Shoup (Chief Architect, eBay), Mychael Nygard (Author of Release It), Chris Hill (Director, TMobile), Dominica DeGrandis (Principle Flow Advisor, Tasktop) and Topo Pal (VP, Fidelity Investments)

Key Takeaways:

  1. Discovery, inventory and managing Risks - Open Source Software enters an enterprise in different “shapes and sizes” - from a few lines of code to full blown frameworks and platforms; from a developer tool to OS; from web proxy to RDBMS. And then some more - via vendor software. They also come in via many entryways - such as, developers downloading software from the internet, build pipelines downloading directly from the internet or using a “proxy” such as Nexus, Artifactory etc, production instances directly installing software from public repositories and so on. You will learn how you can (1) Have full visibility of what gets in and what gets used and how (2) Identify risks and (3) Design and implement controls to mitigate the risks 
  2. Fast impact analysis - You will learn how you can get a list of all the places where a particular open source software is used very fast. 
  3. Automated deployment and release - Once a fix is made, need to send the fix to production safely and promptly. You will learn about a few patterns to achieve this.
  4. Identifying the fix and fixing it - Even the simplest fix can be very confusing and mind-numbing and frustrating at times when it comes to the scale of Log4Shell. You will learn about various patterns to identify and fix while reducing friction between teams. 
  5. It may be a good idea to establish a “fire drill” process going forward.

Speaker

Tapabrata Pal

Vice President of Architecture @Fidelity

Tapabrata “Topo” Pal is a thought leader, keynote speaker, evangelist in the areas of DevSecOps, Continuous Delivery, Cloud Computing, Open Source Adoption and Digital Transformation. Topo is currently a Vice President of Architecture at Fidelity Investments. Prior to joining Fidelity, Topo spent 10 years at Capital One where he was a key player in the company’s DevOps and Cloud journey. Topo is a programming committee member of DevOps Enterprise Summit (US, UK) and DevOps India Summit (India). Topo is a coauthor of Investments Unlimited: A Novel About DevOps, Security, Audit Compliance, and Thriving in the Digital Age (available September 13, 2022) published by ITRevolution.

Read more
Find Tapabrata Pal at:

Date

Monday Oct 24 / 05:25PM PDT ( 50 minutes )

Location

Bayview

Topics

Architecture Java Monitoring Visualization Security Testing Deployment Cycles Continuous Integration Continuous Delivery

Share

From the same track

Session Architecture

Adopting Continuous Deployment at Lyft

Monday Oct 24 / 10:35AM PDT

All organizations, regardless of size, need to be able to make rapid changes and improvements in their constantly growing systems. How can we handle all this change while maintaining a reliable product? 

Speaker image - Tom Wanielista
Tom Wanielista

Senior Staff Software Engineer @Lyft

Session Architecture

Dark Side of DevOps

Monday Oct 24 / 02:55PM PDT

Topics like “you build it, you run it” and “shifting testing/security/data governance left” are popular: moving things to the earlier stages of software development, empowering engineers, shifting control definitely sounds good.

Speaker image - Mykyta Protsenko
Mykyta Protsenko

Senior Software Engineer @Netflix

Session Architecture

Stress Free Change Validation at Netflix

Monday Oct 24 / 04:10PM PDT

How do you gain confidence that a system modification does what it’s supposed to do? A refactoring should not cause a functional change, whereas a feature modification should cause a specific kind of change.

Speaker image - Javier Fernandez-Ivern
Javier Fernandez-Ivern

Staff Software Engineer @Netflix with over 20 years in Software Engineering

Session

Enabling Change @ Scale Roundtable

Monday Oct 24 / 11:50AM PDT

Increasing the safe delivery of change has immense business value across a number of dimensions, so how can we improve our ability to manage change at scale?

Speaker image - Tom Wanielista
Tom Wanielista

Senior Staff Software Engineer @Lyft

Speaker image - Mykyta Protsenko
Mykyta Protsenko

Senior Software Engineer @Netflix

Speaker image - Tapabrata Pal
Tapabrata Pal

Vice President of Architecture @Fidelity

Speaker image - Javier Fernandez-Ivern
Javier Fernandez-Ivern

Staff Software Engineer @Netflix with over 20 years in Software Engineering

Session

Unconference: Architecting for Change

Monday Oct 24 / 01:40PM PDT

What is an unconference? At QCon SF, we’ll have unconferences in most of our tracks.

Speaker image - Shane Hastie
Shane Hastie

Global Delivery Lead for SoftEd and Lead Editor for Culture & Methods at InfoQ.com