Bug management that works (Part 2)
Making time for bug fixing, dedicated ‘goalies’, preventing bugs, and how to create a culture quality-focused culture which minimizes them
How do you deal with bugs in the software products you build? This topic is under-discussed, even though almost all software engineers deal with software bugs and regressions every week.
We reached out to two dozen engineering leaders and software engineers to learn about current, cutting-edge approaches to bug management. In Part 1 of this mini-series, we covered:
Catching bugs
Reporting bugs
Bug triage
“Don’t manage bugs: fix them!”
Zero-bugs policy
In this second, concluding article on this topic, we look into what typically happens after the bug triage stage:
Create time and space for bug fixing. Bug expiration dates, weekly ‘bug pickups’, bug fixing time budgets, can all help.
Dedicated bug fixers: ‘goalies’. An engineer dedicated to bug fixing for a couple of weeks may be known as a support engineer”, “being on bug duty”, “first line”. It’s a popular approach with its own challenges.
Fixing bugs properly. A root cause analysis to fix underlying causes is a pragmatic approach. Several teams opt to treat high-priority bugs as incidents.
Prevention is better than cure. Automated tests like unit, integration, end-to-end, and performance tests, coupled with CI/CD is a common approach. So is investing in other bug prevention approaches.
A quality culture for fewer bugs. It takes effort from engineering leadership to create a culture that prioritizes quality. At companies with this focus, tactics for this include bug metrics, operational reviews, and engineers not asking permission to do bug fixing.
Related deep dives on this topic to check out:
Thank you to everyone who contributed insights to this article:
Anaïs van Asselt (senior QA engineer), Andrea Sipos (product leader, Felix Hageloh (lead engineer), Gus Fune (CTO), Hugo Valante (tech lead), Ignatius Nothnagel (Director of Platform Engineering), Ivan Tchomgue (People manager/product owner), Jason Diller (VP of Engineering), Jayesh Varma (Lead Android engineer), Marin Dimitrov (Head of Engineering), Matteo Galli (CTO), Maya Ziv (senior software engineer), Owain Lewis (Director of Engineering), Piotr Zolnierek (CTO), Neil Oler (senior software engineer), Rebecca Frost (QA leader), Rebecca Holm Ring (engineering leader), Ruben Weijers (engineering manager), Ryan Hanni (Director of Engineering), Serdar Biyik (engineering manager), Walter de Bruijn (Head of Engineering Productivity)
The bottom of this article could be cut off in some email clients. Read the full article uninterrupted, online.
1. Create time and space for bug fixing
Fixing bugs when they happen is the single best approach, but unfortunately not always realistic. The next best thing is to ensure there’s enough time for engineers to fix problematic issues. Below are some approaches.
Fix bugs on the go
Several engineering leaders at smaller companies say their preferred approach is to simply fix bugs as they occur:
“We do continuous bug fixing: always balancing value creation (creating new features) with value loss prevention (removing the friction caused by bugs in existing features). – Marin Dimitrov, Head of Engineering at Manual
We prioritize fixing bugs over working on other things. Most of our bugs are cleared as they come in. We found this is more productive than having ‘bug fixing weeks’ – Gus Fune, CTO at Div Brands
This approach seems harder to do at growing or large companies, where some bugs need several teams to fix them, or it’s unclear who owns a bug.
Bug expiration dates
A clever approach is to set expiration dates for when a bug should be resolved. The closer this date gets, the higher its priority. Ruben Weijers, engineering manager at TomTom elaborates:
“All triaged bugs have an end-date and an owner. If a bug is past its end-date (meaning it ‘breaches’ this date), it becomes a release blocker, regardless of whether it's a low priority bug.”
Weekly ‘bug pickups’
Ryan Hanni, engineering director at Ontra:
“We have used a weekly bug pickup process. The way it worked was simple: pick up one bug per team, per week, and fix it! This helped our bug backlog stay reasonably sized. We would always pick a high priority bug until there were none left, then do this with Medium and Low priority ones.”
Time budgets
A common approach is to fix a percentage of devs’ time to be used for bug fixing on a sprint basis, weekly or monthly.
“At various companies we’ve used quotas, which refers to the minimum percentage of time invested in bug fixing and quality improvements for each sprint. SLO-like targets on the resolution time for a bug, based on its severity (critical / high / medium / low), may be a good ‘forcing function’ to help teams balance better building new features vs bugfixing and quality improvements: when the SLO targets are regularly exceeded, this may be a signal that the team needs to increase the time (quota) allocated to bugfixing until the balance can be restored again.” – Marin Dimitrov, Head of Engineering at Manual.
“We first add bugs onto our sprint, allocating around 10-15% of our velocity. We prioritize bugs reported from our Live/Production environment. This approach means that we balance delivering new features with fixing existing issues.” – Jayesh Varma, lead Android engineer at Barclays
Ignatius Nothnagel, director of platform engineering at LocalStack uses this approach, but advises against sprinting:
“I've seen two approaches actually work in the wild:
1. Dedicate a fixed, non-negotiable percentage of capacity during every sprint to bugs and improvements.
2. Not ‘sprinting.’ Drop the concept of sprints. In my experience, this works amazingly! It turns out that hiring responsible adults and holding them accountable for making the right trade-off decisions actually works.”
Bug sprints and bug days
Interestingly enough, the approach of ‘batching’ bug fixing into a few days or a week can be hit-and-miss: either it works well enough to be a regular thing, or teams drop it because the outcome disappoints.
Accounts of when it’s a hit:
“We do a quarterly ‘just do it day’ where all engineers get to work on whatever they want for a day. This usually ends up being quality of life (QOL) improvements, dev tooling, and refactoring/cleanup work. It’s everyone’s favorite holiday!” – Maya Ziv, senior software engineer at Pavilion
“Regular bug bashes and FixIt weeks have worked very well for teams I’ve worked at Uber, Hopin, Craft, and now Manual” – Marin Dimitrov, Head of Engineering at Manual
…and when it’s a miss:
“Bug fixing, keeping the lights on (KTLO), and other emergent work outside of regular product strategy increments happens on Fridays. It doesn’t work because a lot of this work won’t fit in a single Friday, and leads to lots of context switching, dead end effort, and wasted time.” – a software engineer at a small health tech company
“We avoid ‘fix it weeks’ in favor of continuous, weekly bug pickups. If our backlog gets too big, we meet with cross-functional stakeholders (PM, UX, Dev, QE) to divide up the bugs across teams and have them fixed within the next two weeks or so, working the bugs into their cycle as they see fit.” – Ryan Hanni, director of engineering at Ontra
Another criticism of regular bug sprints is that they incentivize engineers to not worry about bugs day to day because they know there’s a regular event for dealing with them. This can reduce motivation to keep software tidy and bug-free at all times.
Warranty sprints
A variation of regular bug sprints are ‘warranty sprints.’ These refer to spending a week or two on addressing incoming bugs and feedback about a freshly released feature. Figma similarly prioritizes bugs for newly released features, as covered in the “Fix all bugs for recently released features” section, but are more focused.
Jason Diller, VP of Engineering at Arteria AI, shares:
“For warranty sprints, we typically don’t shift a team off a project as soon as it ships. We expect and plan for feedback and bugs to be higher volume right after a delivery, and keep the team dedicated to addressing those for a sprint or two, rather than punting all of that to a backlog to be dealt with later.”
2. Dedicated bug fixers: ‘goalies’
At mid-sized and larger companies, a common approach to staying on top of bugs is for an engineer to focus only on bug-related work. This role goes by a couple of names: “goalie” is the most common one, as in a soccer goalkeeper. Other terms are “support engineer,” “being on bug duty”, “bug duty officer”, “first line”, and even “bug trooper”.