Thoughts on technical debt and experiences to leverage on them!

The sole purpose of “engineering” in “software engineering” is to keep building software at a sustainable velocity that is open to change and can quickly respond and adapt to evolving business needs and requirements.

Many times the reality is different. A project that is built in a hacky way meets a customer’s needs and, over time, quickly turns into this ball of mud that can only be managed and handled by people with tribal knowledge.

I experienced this even more within startups when I consulted them a few years ago who want to get to a Product Market Fit quickly and build hacky solutions often. While this is an excellent strategy to fail fast, we must remember that as soon as the product direction gets clearer, a massive effort must be deployed to restructure the architecture. The restructure should not only continue to be built fast, but it should also ensure that the software is scalable and has a very high level of confidence in the quality of the release.

A few startups with extraordinary Engineering Leadership do get to the restructuring and ensure that most of the Technical Debt is paid, but many others continue to accrue the debt over months only to realize that they can no longer capitalize on the ever-changing customer needs. They become slow.

In finance, debt is the leverage that you can use to build an asset. Similarly, technical debt is the leverage we can incur for a short time. But not repaying the debt quickly results into technical bankruptcy.

Well, why have technical debt in the first place? 

The answer is simple. While building software, we have to undergo many tradeoffs and later deal with them when an assumption changes.

E.g., In a highly scaleable system, we may want to sacrifice consistency but not availability. But if the business requirements change and we suddenly care for consistency as well, many changes are to be made. To work around this quickly, the team may want to implement quick fixes in the underlying system or add another layer to mimic levels of consistency rather than changing the architecture and storage model (the right thing to do) that supports consistency by default. This decision now has created a technical debt that, if not resolved in due course of time, will become a standard architecture pattern in the company that people will forget over time and the purpose with which it was built.

I have often heard that software needs to be rewritten in 18 – 20 months. Statements like these make me wonder why?

The only reason that I can think of is that the team did not pay the debt in time resulting in the compounding factor of having more technical debt in their system to support the previous decisions that now needs some form of easing in the form of rewrite. In finance terms, it is called quantitative easing.

I am drawn back to Uncle Bob’s philosophy: “Leave the camp ground much cleaner than you found it.” 

Why can’t we continue to pay the debt every few iterations and this way, we would have a much different system every 18 – 20 months evolved over time. With this approach, the business does not suffer a pause while the software is being rewritten.

But isn’t this risky? What if, while clearing the debt, we miss use cases that result in a production bug? 

This is a great question, and many times while we know that paying the debt is the best thing to do, we do not because of the fear of something breaking in production only to be found by the customer or after a significant impact.

The answer to this is obvious. Rather than remember the usecases or have them documented in a spreadsheet for manual testing, why not bring in a machine to take care of testing on demand? Humans hate monotony and doing repeated work, but devices are exceptionally excellent at performing the same tasks millions of times.

Why not from the get-go invest heavily in two things:

1. Having correct Abstractions at the level where the software interacts with each other

2. Having a great suite of test automation

For the first point, having abstractions allows you to think in the form of Domain Modeling that would likely not change too frequently (even when you pivot). Having good abstractions will enable you to write slightly hacky code in the short term to help you complete your deliverables for your startup when you want to build an MVP to get to Product Market Fit. This way, you can change the implementation at any time under the hood without breaking the rest of your system. 

For the second point, not having test automation from day one is a sin. There is a misconception that writing automation tests is a waste of time, but the same people would be OK spending more time doing manual testing before every release (which IMO is too late to catch defects). When you have automation (unit and integration), you have an ultimate safety net on which you can fall. Change your code and run the test. If all pass, you are good to go; if not, you have immediate feedback on what caused the failure. Using the idea from the first point, you can rapidly evolve your code under abstraction with proper automation testing.

With these two points, you now have superpowers to fix technical debt on a regular basis and meet market needs quickly. 

There are a few other items on the Technical Debt list that pop up over time and are not the team’s doing. These are migrations of Database Engines, Operating Systems, etc., as these dependencies have reached their EOL and now need replacing before it opens up a security hole or needs an urgent migration due to lack of support. In such cases, we should always subscribe to the release notes distribution list for all our dependencies to calendarize the migration. But here, too, having test automation is a lifesaver as it helps speed up our work.

We, as Engineering Leaders, need to have the back of our team and stand up to clean up technical debt regularly after a few sprints when we created it. Anything beyond a few sprints is too late.

By no means this post covers the entire breadth of Technical Debt, as there are many more, and since they rarely occur (at least in my experience), this post only focuses on Code, Architecture, and Dependencies.

Please share your thoughts via comments

Photo by Towfiqu barbhuiya on Unsplash