Sunday, February 20, 2022

The Bill is Due

Rising Debt

Stop me if you have heard this story before. You have a deadline for a product launch. The business team is excited about a new feature your engineering team is adding to the company's flagship site. The timeline is tight, but there is a sense of urgency to get this idea to market before your competitors. You have an architectural design, but some compromises were made to meet the date. It was all documented and a card was written to revisit it after the launch. Your team is worried about getting it all done, but they are ready to try and the first iteration goes well.

On the second iteration, you run into some trouble. Some parts of the initial design are not working out the way you hoped, but there is not enough time to re-think it now unless everyone agrees to push back the launch date. Since delaying the launch is a hard "no," your team shoehorns in a few questionable bits of code and you move past the issue.

By the final iteration, you see the finish line. It is crunch time and the team is putting in a few extra hours, but nothing crazy. Changes are taking a bit longer because you had to take a few more shortcuts along the way. You added some cards to the backlog for the technical debt, but some of it was likely lost in the rush to get everything done in time. You plan to circle back around once this new feature is out the door.

Launch day arrives and the code is in production. You flip the switch to make the new feature live and adoption is quick. Customers seem to like the change and everyone is celebrating. This was a big win for the company. Your team delivered again. Tomorrow you can start on that backlog and get to work on fixing those workarounds you put in to make the launch deadline, right?

Not so fast. The new feature was a big hit, but it could be better. There are some changes to make that will make customers happier. Everything on the backlog is lowered in priority because this new round of improvements needs to be delivered before the shine wears off. Those shortcuts you took for the last launch because the design was not quite right are still there, but now you need to make more changes on top. The code is starting to look like Mom's spaghetti, but there is a new deadline and no time to fix it now. The team will circle back to it after this next release. At least, that is what they are telling themselves.

Does this story sound like the minutes from your last retro? If not, you are lucky. If it does, you probably know all too well that the can gets kicked down the road in this timeline until it hits a brick wall. That wall can take many forms. Sometimes the code becomes so convoluted that every change introduces three new bugs and eventually everyone is looking at a rewrite. Or worse, a new zero-day appears on Twitter in one of your dependencies and they have been neglected so long that you need to update your language runtime and rewrite half your application to fix it. By the way, you have the weekend to get that done.

Technical debt has been a problem for as long as there have been developers writing software. We probably all have an especially egregious horror story about some application that was a hot mess. The situation where technical hygiene takes a back seat to delivering new functionality is not uncommon, but the software world is not the same as it used to be. Gone is a time where you had weeks or even days to respond to a security threat. According to data from the US National Institute of Standards and Technology (NIST), the rate of new vulnerability disclosures is increasing every year and there is no reason to expect this trend to reverse soon.

There is a very real possibility that at any moment your organization may need to apply a security patch and push it to production today. Even for very high-performing teams (back in 2017, Facebook was already pushing changes to production every few hours), this can still be a problem if the change you need to make is tied up in layers of technical debt that has built up over a long period.

So what do you do about this problem? You are not suddenly going to have the luxury to push back every product launch so the engineering teams can make everything perfect. You can, however, leverage some great tools and sound practices that help identify the indicators that things are getting bad and right the ship before you are bailing water.

Check Your Assets

The first step is knowing what is inside your application. You need to have a handle on your dependencies and for that, the software bill of materials (SBOM) is your best friend. Not only can the SBOM tell you what you have, but you can also feed standard SBOM formats into other tools like vulnerability scanners that will compare your dependencies against a database of known security issues. Tools like GitHub's Dependabot can not only tell you about a vulnerability, but it can regularly update all your dependencies regardless of vulnerability status. 

Even without any special tools, your team could start to get a handle on your dependencies today by simply instituting a monthly dependency review process. For example, on the first Monday of every month set aside an hour to review all your direct dependencies (e.g. package.json, pom.xml, requirements.txt, Dockerfile, etc.) to see if there are any updates available. If there are, check to see if they can be applied quickly. If so, do it! If they need more research, write up a card and agree on a priority that results in the card actually getting pulled. Automation is better, but if your team is not ready to automate this, you can still start doing it with minimal time investment.

Create a Budget

Beyond keeping your dependencies updated, static analysis tools like SpotBugsSonarQube, and Coverity can help reduce that technical debt before it happens. The more you let code smells and anti-patterns creep into the code, the harder it will be to undo them later. When the next vulnerability hits, if you are mired in a big ball of mud, patching might be nearly impossible. By integrating static analysis into your build pipeline, you create a filter that keeps the code cleaner one commit at a time rather than relying on massive cleanup efforts that no business or engineering team wants to endure. As an added bonus, engineers learn from the collective wisdom encoded in the static analysis rules about what to avoid.

If you do not have a code review process, treat that situation like your hair is on fire and get one set up. It does not have to be elaborate, but you are asking for trouble if there is only one set of eyes looking at code changes. Once that foundation is in place, add a static analysis scanner to it. If you have a team of one, static analysis can be your code reviewer. You can integrate them in to just about any build pipeline tool and if you happen to use GitHub pull requests, it will likely be even easier to add a static analysis scanner with direct, automated feedback into your review process. There are open source and proprietary options and if your project is open source you can almost certainly find a free offering that will meet your needs.

Once you have it in your pipeline, you can usually fine tune what is considered a passing grade for code being scanned. This is where you can define a technical debt budget that is appropriate for your product and team, but avoid being too lenient here. If you start light, create a plan immediately for ramping up to where you want to be or the tool will be ignored. Also, plan for how you will manage false positives. As good as the scanners are, they will sometimes flag legitimate code with an issue and you need to deal with that. If the scan results become littered with false positives, engineers will ignore them and the benefits are lost. All the well-known scanners have an option to close an issue with an explanation, so take advantage of that with an agreement for how the status will be decided when false positives are found.

Pay Your Bills

Finally, keep an eye on the design of your application and plan for continual adjustments. Refactoring is not a dirty word. Sometimes teams refer to it as "rework" labeling it as a cost and make it sound like having to modify the design is the result of poor engineering or planning. That kind of thinking is from the Dark Ages of corporate software engineering before we realized that design is a continual process. It is incredibly naive to think your team will account for every possible change to a system and every future use case for it when it is first created. Each new requirement changes the system in a way that might not align with the mental model of the engineers that originally wrote it (even if those engineers are the ones currently working on it).

If we agree that design changes will be a constant need as new requirements are created and new functionality is built, then it would be irresponsible to ignore that fact until the system is too unwieldy to modify. Instead, if we treat refactoring as a standard component of every code change there is less need for radical redesigns that divert effort from delivery of business value. Robert "Uncle Bob" Martin calls this the Boy Scout Rule in his book "Clean Code," Martin Fowler calls this opportunistic refactoring, and many people call it continuous refactoring. Regardless of which term you use, get in there and start refactoring early and often.

The cost to maintain software will be paid one way or another and as the old adage goes, "An ounce of prevention is worth a pound of cure."