Saturday, January 29, 2022

Bring on the SBOM

Mad Scramble

Software supply chains are a mess. Anyone who dealt with the Log4Shell scramble last year (or perhaps is still dealing with it) probably understands this. That particular vulnerability was not caused by a malicious actor taking over a repository or modifying a build process, but it was still a supply chain problem.

When the bug in Log4J that allowed easy remote code execution was made public, every executive, manager, and team was asking the same question: "Are we vulnerable?" It is such a simple question, but the answer turned out to be very difficult for many. If you work on a single Java project, your first thought might have been to look at your Maven POM or Gradle build. Of course, you cannot just look at what is declared in the file. You need to dig deeper into your dependencies' dependencies and keep following that thread until you reach the end. There are tools for that though (e.g. mvn dependency:tree) so no big deal, right?

Wait a second. What about your deployment? In many cases, whatever you are building is getting deployed somewhere. Are you using an application server written in Java? Does it have Log4J? Do you have any Java agents augmenting your deployment like New Relic APM? What about the servers or containers hosting your application? Looks like the dependency problem is a bit murkier than you thought.

Beyond your specific application, what about the applications on which you depend? Do you have a database and is it affected? Is your cloud hosting provider affected? Is your log aggregator affected? Time to dig through your entire infrastructure stack. If you run packaged software that you bought from another vendor, you have another headache. Just finding out if that application is affected might be difficult depending on how well-funded the vendor happens to be and how aware of security vulnerabilities they are. Do they have the capability to deliver a patch now? If not, you might need to shut it down until they sort it out.

Now multiply this by tens, hundreds, or thousands depending on how big your organization has grown. Most corporate technology teams were probably struggling to identify affected applications with all the complications above and to make matters worse, it was a popular holiday season in the United States and many other places around the world. Teams were short-staffed and those without automated build pipelines or at least some decent documentation probably spent a few long nights and weekends untangling unfamiliar systems to answer that simple question: "Are we vulnerable?"

Input/Output

Most industries track their inputs. Manufacturing companies track their raw inputs and sub-assemblies. Food processing companies track their ingredients. Yet, most software teams do not. If there is E. coli contamination found on some lettuce, that product is immediately tracked through supply chains to every farm, handler, manufacturer, grocery store, and restaurant that interacted with the tainted produce. That is done with supply chain management and bills of materials.

So why do we, as software engineers, not bother to track our dependencies when we have arguably the easiest context in which to do it? Your dependencies have to be identified somewhere to make it into your build process. You are probably using a dependency manager (and if you are not, you should look into that right away). All of your raw ingredients are known to you when you put them together to compile and deploy your application. The information just needs to be published in a consistent way so you can aggregate it and search it.

Da BOM

The answer is a software bill of materials, or SBOM (pronounced "ess-bom") for short. There are a few common formats: SPDX, CycloneDX, and Syft (not yet a standard). There are a plethora of tools already to generate the SBOM automatically by scanning your source code or your deployment environment (container, virtual machine, etc.) and publishing could be as simply as putting the resulting file somewhere accessible for analysis. You could push it to your Maven repository for Java projects or drop your container SBOM into Rekor with Cosign, for example.

Do not forget about your vendors, if you have any! You need SBOMs from them as well if you expect to use this data to reason about your entire environment. In the United States, the National Telecommunications and Information Administration (NTIA) has given you a boost in this area because they set a standard for SBOMs. The US government will likely require vendors to provide them eventually and so software companies should already be thinking about solutions. Asking your vendors for SBOMs should not be a surprise to them (and if it is, you might want to consider new vendors).

Once you have all your projects generating SBOMs as part of your pipeline and you are gathering them from your vendors, the real magic can happen. Using these standard formats, you can aggregate and analyze all the inputs that exist in your ecosystem. Are your projects complying with all open source licenses? You can answer that. What library do you depend on the most? You can answer that. The data is ripe for answering many questions about your environment and even your business. Maybe you will find out that your company is so reliant on a single open source project that if it went away, your business might collapse (obligatory xkcd). Now that you know this, it might be a good time to sponsor that project.

A Smarter Future

Imagine a time when all of your projects are generating SBOMs and publishing them to a central repository along with your vendor SBOMs. You have a small tool that reads that data and lets you query it. Suddenly, news breaks that a ubiquitous library underpinning a large portion of the Internet is laughably easy to exploit by every teenager with a phone. Your CEO turns to you and asks, "Are we vulnerable?"