Sunday, February 20, 2022

The Bill is Due

Rising Debt

Stop me if you have heard this story before. You have a deadline for a product launch. The business team is excited about a new feature your engineering team is adding to the company's flagship site. The timeline is tight, but there is a sense of urgency to get this idea to market before your competitors. You have an architectural design, but some compromises were made to meet the date. It was all documented and a card was written to revisit it after the launch. Your team is worried about getting it all done, but they are ready to try and the first iteration goes well.

On the second iteration, you run into some trouble. Some parts of the initial design are not working out the way you hoped, but there is not enough time to re-think it now unless everyone agrees to push back the launch date. Since delaying the launch is a hard "no," your team shoehorns in a few questionable bits of code and you move past the issue.

By the final iteration, you see the finish line. It is crunch time and the team is putting in a few extra hours, but nothing crazy. Changes are taking a bit longer because you had to take a few more shortcuts along the way. You added some cards to the backlog for the technical debt, but some of it was likely lost in the rush to get everything done in time. You plan to circle back around once this new feature is out the door.

Launch day arrives and the code is in production. You flip the switch to make the new feature live and adoption is quick. Customers seem to like the change and everyone is celebrating. This was a big win for the company. Your team delivered again. Tomorrow you can start on that backlog and get to work on fixing those workarounds you put in to make the launch deadline, right?

Not so fast. The new feature was a big hit, but it could be better. There are some changes to make that will make customers happier. Everything on the backlog is lowered in priority because this new round of improvements needs to be delivered before the shine wears off. Those shortcuts you took for the last launch because the design was not quite right are still there, but now you need to make more changes on top. The code is starting to look like Mom's spaghetti, but there is a new deadline and no time to fix it now. The team will circle back to it after this next release. At least, that is what they are telling themselves.

Does this story sound like the minutes from your last retro? If not, you are lucky. If it does, you probably know all too well that the can gets kicked down the road in this timeline until it hits a brick wall. That wall can take many forms. Sometimes the code becomes so convoluted that every change introduces three new bugs and eventually everyone is looking at a rewrite. Or worse, a new zero-day appears on Twitter in one of your dependencies and they have been neglected so long that you need to update your language runtime and rewrite half your application to fix it. By the way, you have the weekend to get that done.

Technical debt has been a problem for as long as there have been developers writing software. We probably all have an especially egregious horror story about some application that was a hot mess. The situation where technical hygiene takes a back seat to delivering new functionality is not uncommon, but the software world is not the same as it used to be. Gone is a time where you had weeks or even days to respond to a security threat. According to data from the US National Institute of Standards and Technology (NIST), the rate of new vulnerability disclosures is increasing every year and there is no reason to expect this trend to reverse soon.

There is a very real possibility that at any moment your organization may need to apply a security patch and push it to production today. Even for very high-performing teams (back in 2017, Facebook was already pushing changes to production every few hours), this can still be a problem if the change you need to make is tied up in layers of technical debt that has built up over a long period.

So what do you do about this problem? You are not suddenly going to have the luxury to push back every product launch so the engineering teams can make everything perfect. You can, however, leverage some great tools and sound practices that help identify the indicators that things are getting bad and right the ship before you are bailing water.

Check Your Assets

The first step is knowing what is inside your application. You need to have a handle on your dependencies and for that, the software bill of materials (SBOM) is your best friend. Not only can the SBOM tell you what you have, but you can also feed standard SBOM formats into other tools like vulnerability scanners that will compare your dependencies against a database of known security issues. Tools like GitHub's Dependabot can not only tell you about a vulnerability, but it can regularly update all your dependencies regardless of vulnerability status. 

Even without any special tools, your team could start to get a handle on your dependencies today by simply instituting a monthly dependency review process. For example, on the first Monday of every month set aside an hour to review all your direct dependencies (e.g. package.json, pom.xml, requirements.txt, Dockerfile, etc.) to see if there are any updates available. If there are, check to see if they can be applied quickly. If so, do it! If they need more research, write up a card and agree on a priority that results in the card actually getting pulled. Automation is better, but if your team is not ready to automate this, you can still start doing it with minimal time investment.

Create a Budget

Beyond keeping your dependencies updated, static analysis tools like SpotBugsSonarQube, and Coverity can help reduce that technical debt before it happens. The more you let code smells and anti-patterns creep into the code, the harder it will be to undo them later. When the next vulnerability hits, if you are mired in a big ball of mud, patching might be nearly impossible. By integrating static analysis into your build pipeline, you create a filter that keeps the code cleaner one commit at a time rather than relying on massive cleanup efforts that no business or engineering team wants to endure. As an added bonus, engineers learn from the collective wisdom encoded in the static analysis rules about what to avoid.

If you do not have a code review process, treat that situation like your hair is on fire and get one set up. It does not have to be elaborate, but you are asking for trouble if there is only one set of eyes looking at code changes. Once that foundation is in place, add a static analysis scanner to it. If you have a team of one, static analysis can be your code reviewer. You can integrate them in to just about any build pipeline tool and if you happen to use GitHub pull requests, it will likely be even easier to add a static analysis scanner with direct, automated feedback into your review process. There are open source and proprietary options and if your project is open source you can almost certainly find a free offering that will meet your needs.

Once you have it in your pipeline, you can usually fine tune what is considered a passing grade for code being scanned. This is where you can define a technical debt budget that is appropriate for your product and team, but avoid being too lenient here. If you start light, create a plan immediately for ramping up to where you want to be or the tool will be ignored. Also, plan for how you will manage false positives. As good as the scanners are, they will sometimes flag legitimate code with an issue and you need to deal with that. If the scan results become littered with false positives, engineers will ignore them and the benefits are lost. All the well-known scanners have an option to close an issue with an explanation, so take advantage of that with an agreement for how the status will be decided when false positives are found.

Pay Your Bills

Finally, keep an eye on the design of your application and plan for continual adjustments. Refactoring is not a dirty word. Sometimes teams refer to it as "rework" labeling it as a cost and make it sound like having to modify the design is the result of poor engineering or planning. That kind of thinking is from the Dark Ages of corporate software engineering before we realized that design is a continual process. It is incredibly naive to think your team will account for every possible change to a system and every future use case for it when it is first created. Each new requirement changes the system in a way that might not align with the mental model of the engineers that originally wrote it (even if those engineers are the ones currently working on it).

If we agree that design changes will be a constant need as new requirements are created and new functionality is built, then it would be irresponsible to ignore that fact until the system is too unwieldy to modify. Instead, if we treat refactoring as a standard component of every code change there is less need for radical redesigns that divert effort from delivery of business value. Robert "Uncle Bob" Martin calls this the Boy Scout Rule in his book "Clean Code," Martin Fowler calls this opportunistic refactoring, and many people call it continuous refactoring. Regardless of which term you use, get in there and start refactoring early and often.

The cost to maintain software will be paid one way or another and as the old adage goes, "An ounce of prevention is worth a pound of cure."

Saturday, January 29, 2022

Bring on the SBOM

Mad Scramble

Software supply chains are a mess. Anyone who dealt with the Log4Shell scramble last year (or perhaps is still dealing with it) probably understands this. That particular vulnerability was not caused by a malicious actor taking over a repository or modifying a build process, but it was still a supply chain problem.

When the bug in Log4J that allowed easy remote code execution was made public, every executive, manager, and team was asking the same question: "Are we vulnerable?" It is such a simple question, but the answer turned out to be very difficult for many. If you work on a single Java project, your first thought might have been to look at your Maven POM or Gradle build. Of course, you cannot just look at what is declared in the file. You need to dig deeper into your dependencies' dependencies and keep following that thread until you reach the end. There are tools for that though (e.g. mvn dependency:tree) so no big deal, right?

Wait a second. What about your deployment? In many cases, whatever you are building is getting deployed somewhere. Are you using an application server written in Java? Does it have Log4J? Do you have any Java agents augmenting your deployment like New Relic APM? What about the servers or containers hosting your application? Looks like the dependency problem is a bit murkier than you thought.

Beyond your specific application, what about the applications on which you depend? Do you have a database and is it affected? Is your cloud hosting provider affected? Is your log aggregator affected? Time to dig through your entire infrastructure stack. If you run packaged software that you bought from another vendor, you have another headache. Just finding out if that application is affected might be difficult depending on how well-funded the vendor happens to be and how aware of security vulnerabilities they are. Do they have the capability to deliver a patch now? If not, you might need to shut it down until they sort it out.

Now multiply this by tens, hundreds, or thousands depending on how big your organization has grown. Most corporate technology teams were probably struggling to identify affected applications with all the complications above and to make matters worse, it was a popular holiday season in the United States and many other places around the world. Teams were short-staffed and those without automated build pipelines or at least some decent documentation probably spent a few long nights and weekends untangling unfamiliar systems to answer that simple question: "Are we vulnerable?"

Input/Output

Most industries track their inputs. Manufacturing companies track their raw inputs and sub-assemblies. Food processing companies track their ingredients. Yet, most software teams do not. If there is E. coli contamination found on some lettuce, that product is immediately tracked through supply chains to every farm, handler, manufacturer, grocery store, and restaurant that interacted with the tainted produce. That is done with supply chain management and bills of materials.

So why do we, as software engineers, not bother to track our dependencies when we have arguably the easiest context in which to do it? Your dependencies have to be identified somewhere to make it into your build process. You are probably using a dependency manager (and if you are not, you should look into that right away). All of your raw ingredients are known to you when you put them together to compile and deploy your application. The information just needs to be published in a consistent way so you can aggregate it and search it.

Da BOM

The answer is a software bill of materials, or SBOM (pronounced "ess-bom") for short. There are a few common formats: SPDX, CycloneDX, and Syft (not yet a standard). There are a plethora of tools already to generate the SBOM automatically by scanning your source code or your deployment environment (container, virtual machine, etc.) and publishing could be as simply as putting the resulting file somewhere accessible for analysis. You could push it to your Maven repository for Java projects or drop your container SBOM into Rekor with Cosign, for example.

Do not forget about your vendors, if you have any! You need SBOMs from them as well if you expect to use this data to reason about your entire environment. In the United States, the National Telecommunications and Information Administration (NTIA) has given you a boost in this area because they set a standard for SBOMs. The US government will likely require vendors to provide them eventually and so software companies should already be thinking about solutions. Asking your vendors for SBOMs should not be a surprise to them (and if it is, you might want to consider new vendors).

Once you have all your projects generating SBOMs as part of your pipeline and you are gathering them from your vendors, the real magic can happen. Using these standard formats, you can aggregate and analyze all the inputs that exist in your ecosystem. Are your projects complying with all open source licenses? You can answer that. What library do you depend on the most? You can answer that. The data is ripe for answering many questions about your environment and even your business. Maybe you will find out that your company is so reliant on a single open source project that if it went away, your business might collapse (obligatory xkcd). Now that you know this, it might be a good time to sponsor that project.

A Smarter Future

Imagine a time when all of your projects are generating SBOMs and publishing them to a central repository along with your vendor SBOMs. You have a small tool that reads that data and lets you query it. Suddenly, news breaks that a ubiquitous library underpinning a large portion of the Internet is laughably easy to exploit by every teenager with a phone. Your CEO turns to you and asks, "Are we vulnerable?"