Paying down tech debt: further learnings
Using tech debt to get into the flow, and big rewrites needing heavyweight support.
Before we start: next week The Pragmatic Engineer Podcast kicks off. If you enjoy podcasts, please consider adding this podcast in your podcast player so you get the first episode there. Add it on Apple Podcasts, Spotify YouTube, or in your favourite player.
This is a follow-up to the article Paying down tech debt, written by industry veteran Lou Franco. Lou has been in the software business for over 30 years as an engineer, EM, and executive. He’s also worked at four startups and the companies that later acquired them; most recently Atlassian as a Principal Engineer on the Trello iOS app. Later this year, he’s publishing a book on tech debt. For updates on this upcoming release, subscribe here.
In this issue, we cover:
Use tech debt payments to get into the flow. In a counter-intuitive observation: by making small, non-functional improvements, you gain more confidence in a new codebase, and can start to move faster.
Big rewrites need heavyweight support. Without the backing of management, a large-scale rewrite is likely to fail.
With this, it’s over to Lou.
Use tech debt payments to get into the flow and stay in it
A good reason to add new comments to old code before you change it is to speed up a code review. Adding comments is also a good way to reduce cognitive load. When it takes me time to learn what code does, writing something down helps me remember what I figured out. Clarifying the code is even better.
Reducing tech debt as I go also helps me get into ‘the flow.’ We all know what it feels like to stare at code and wonder what it does, scrolling up, scrolling down, and then command-tabbing over to Slack to procrastinate. By making small changes as I learn the code, I get more confidence, and soon I find myself going from adding tests and comments to making more substantive changes. The tech-debt payments weren’t for some future benefit. They are helping me right now.
Once I have gotten into flow, staying in flow is so important that whenever I feel resistance to whatever change I’m making, I make a small code change to keep me from dropping out of it, applying the same techniques that I used to get into it.
Reducing tech debt as you go is also a great way to learn a new codebase. I learned this at my first job out of college, which was for a company that made DOS software to price foreign exchange options. When I started at the company, I didn’t know what a lot of the code was even supposed to do.
Every program has non-domain specific code. In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. I spent my first couple of weeks rewriting each “window” of the application to make it more memory efficient. By doing so, I got to see every screen of the system. This project helped onboard me to the software, its structure, its build, and our issue tracking and version control workflows. I incidentally paid down some debt, but also learned how our software stack worked, while doing so!
I did this again when I started at Trello. My first project was supporting i18n (internationalization) in the app. The first step of that was getting each string in the code into a strings file that could be sent to translators. My goal was to fix the debt of hardcoded strings, but I learned a lot about the codebase and our process as I did it.
This has even worked unintentionally. When I was hired to work on a large-scale rewrite, I ended up being an expert on the legacy code as well because I had to read it every day to reimplement its features.
Big rewrites need heavyweight support
I avoid large scale rewrites, as I’ve seen them fail or drag on much more often than they succeeded. My biggest mistake personally was trying to rewrite a C/C++ system as memory safe C# code, which seemed like a good idea, but didn’t have the backing needed. It was eventually shelved. Sometimes it’s impossible to do this work in a way that delivers value until it’s done, so lots of commitment is required.
But I was also part of a two-year long rewrite that worked, and learned some things that I recommend to clients when they want to take one on.
In 2004, I was hired by ISO-NE, a non-profit that manages the electric grid in New England. There are ISOs all around the country established by the local energy companies to provide services to themselves – an important one is managing the electricity market, which trades 24x7 and determines the prices for generation. If it isn’t working, then there might be problems delivering electricity.
At ISO-NE, the electricity price publishing system was a pile of Bash, Perl, PHP, and C. These scripts mixed database access, HTML generation, and logic in unexpected ways. Sometimes a script would generate another script. You’d fix a bug in a generated script, and it would get overwritten. It was a mess!
I was hired to rewrite it as a clean Java-based system, and brought in for my experience with the legacy languages and J2EE. They doubled the team size from two to four to include a developer with a lot of Java Server Pages experience, and then later to eight members with contractors who only worked on the new system. They hired a manager who had done this kind of project before, and set a target date of nearly two years. The first lesson I learned about big rewrites was not to underestimate them.
The size of the project was well-understood and planned. Frankly, having two years for a rewrite with such a large team seemed wrong to me. In the end, we needed that time. We got these resources because the current situation was untenable: the code was fragile and mission critical, and our stakeholders ranked reliability as the #1 priority.
We ensured the legacy system was improved during the rewrite. I was hired to work full-time on the new code, but we realized the best thing for me to do was spending some time maintaining legacy code. Contractors were recruited for their expertise in the new system, so couldn’t work on the legacy one. Because I was in both codebases, whenever I did something new with the legacy code I added it to the new code, or updated the spec. By the end of the project, all the non-contract engineers were working on both codebases in the same ratio as each other. The systems were built in parallel and both kept running during an overlap period, much like Gergely describes in Migrations Done Well.
We coupled the rewrite with a user-facing improvement project. We coupled the rewrite – which was invisible to users – with a move to a customer relationship management system (CRM). We branded both projects as “Warp” (Web Application Redesign Project). Our content management was implemented with similar home-spun scripts maintained by the same developers. We moved to an enterprise CRM system and created a fresh design.
Throughout the project, we mostly talked about things users could see, like the new design, the easier way to update content, the content workflow, etc. After the system had been up for a while, we could talk about the reduction of incidents and other improved reliability metrics because these were heavily monitored at ISO-NE, and showed significant improvement.
If the state of your system is untenable and a rewrite in small chunks over time isn’t possible, then ensure everyone in the company, all the way up to the executive management team, agrees and understands what needs to be done. Don’t underestimate what it will take.
This is the most important lesson, and it’s what I saw work at Atlassian fifteen years later in a multi-year rewrite that transformed the business from a mostly on-premises company to a cloud based one.
When doing a large-scale rewrite, the work will require a temporary increase in the number of developers, which can be done with contractors or temporary reassignment if your company has the resources. Recruit expertise in a new technology and front-line managers that have done it before.
It also helped us to couple the project with one that was more visible to stakeholders. This was fine because it was maintained by the same team and had similar issues. If you can find a way to do it, you’ll have something to show when you’re done, and can follow up with quality metrics after measuring them.
For more tactics on how to deal with tech debt, see the article Paying down tech debt, published in The Pragmatic Engineer.
Thanks to Lou for sharing some hard-earned experience and advice for taming tech debt. As mentioned, Lou is working on a book dedicated to this topic; sign up to get emails from him with more thoughts about managing tech debt, and to be notified when the book is ready. If you have suggestions of topics for Lou to cover in the upcoming book, please connect with him on Linked In, or via his website.