Measuring Software Engineering Productivity
How do you measure developer productivity, how are DORA and SPACE related, and some hard-learned lessons on this topic.
Q: I’d like to measure the productivity of my engineering team and use it to improve how we work. How would you suggest I get started?
This question comes up sooner or later at every company. At startups, it’s often the CEO telling the CTO they want to see some metrics, as they’re concerned how much the engineering team costs and want to promote efficiency. At larger companies, engineering leaders often want ways to “debug” teams with productivity issues.
A few weeks ago, we did a community thread on measuring engineering productivity in which engineering leaders in the community shared thoughtful insights on what worked in their environments.
To dive deeper in this topic, I pulled in Laura Tacho, a VP of engineering and a leadership coach who’s faced this exact issue, has rolled out GitPrime (now Pluralsight Flow) and got burnt in the process and has since dived deep into this space.
Laura is well-known in the Docker community, being involved from the early days as a software engineer and then an engineering leader. She was Senior Director of Engineering at CloudBees; a CI/CD SaaS solution. She took on VP of Engineering roles at two scaleups, before becoming a leadership coach to engineering leaders.
Laura gave the talk What Dashboards Don’t Tell You at LeadDev London in June 2022. She also runs the cohort-based course Measuring Development Team Performance, which is a hands-on take on this topic. The course is three days long and applications for the next cohort close on 4th August.
In this article, we will cover:
What is productivity?
Rolling out a developer productivity tool
The SPACE framework and an updated mental model
Where to start
Over to Laura:
In software engineering, we spend a lot of time evaluating metrics. We’re used to looking at observability data in a dashboard, running queries, and finding inefficiencies. Then we issue a configuration change or merge a pull request and we can see results on the dashboard to validate our choices – or not. This feedback loop works for many of the problems we encounter on a daily basis: observe, debug, patch, repeat. But when you apply this method to systems of humans, you won’t get the result you hope for.
I’m talking about developer productivity metrics.
When I was a Senior Director of Engineering at CloudBees several years ago, I helped roll out GitPrime to 300+ developers. It was a flop. (It’s worth noting this was prior to GitPrime’s acquisition and rebranding to Pluralsight Flow, and before CloudBees released its own engineering efficiency tool.) Hindsight tells me why: beyond my own mistakes, the fundamental idea I could boil down productivity to a handful of metrics which look nice on a dashboard, was – and remains – a myth.
I learned these lessons the hard – and expensive – way. Like so many of you, I sought quantitative data to help me diagnose bottlenecks and free up the flow of work. On top of that, I had pressure from my senior leadership team to quantify performance and report on it. So, I bought into the idea that quantitative data is always better than qualitative data, though my teams had plenty to share about what slowed them down.
This experience taught me a lot about measurements, incentives, and motivation. Most importantly, it set me on a path of research and analysis to understand how and why engineering leaders measure development team performance.
1. What is productivity?
This is a simple question, but it lacks a simple answer. Developers might tell you productivity means coding time, or the number and size of contributions to a codebase. Managers might tie productivity to on-time delivery of projects. Executives might have an altogether different perspective.
So here is our first problem. Even after decades of trying to measure productivity, we’ve yet to agree on a definition.
A 2022 study by Cornell University titled How Developers and Managers Define and Trade Productivity for Quality, outlines this in clear and surprising ways. It asks how developers and managers each define productivity, and more interestingly, how they think the other group defines it. It found that while developers define productivity in terms of activity in the codebase, managers often focus more on the quality and performance of delivered projects. The gap here makes it difficult to have useful conversations about measuring and improving productivity.
In my own experience, I find that performance and productivity are often used interchangeably. For some teams, they are the same thing. If released software is not performant – meaning it is of low quality, or simply does not function as intended – then the team is not productive, because they’re not hitting their goals.
This is usually the case on teams which practice lean methodologies, or teams where engineering is a main stakeholder in feature design, alongside product and design. Let’s say a team is working on a new landing page UI with Google SSO, designed to increase signups for a free trial.
The product team prioritizes the initiative, design is responsible for the UI, and engineering advocates implementing SSO for a more frictionless account creation experience and to be able to refactor some spaghetti code in the existing authentication service. The developers are chugging out code, but when it’s released, signups don’t increase. From a business perspective, this project is not a success and the team has not been productive. There is output, but not good outcomes.
For other engineering teams, productivity is a measure of activity in a codebase, and not bound to business results. On these teams, development teams often don’t have agency to decide what is being built, but they do control how (and importantly, how fast).
In a parallel universe, the same engineering team is handed wireframes and a product spec from their product owner. They execute efficiently, deliver on time, and the code holds up against spiky production loads. Still, the number of signups doesn’t go up. In this scenario, the engineering team is productive, even though the intended outcome isn’t realized.
Which of these scenarios is more similar to how yours works? Has your team collectively and explicitly defined productivity, or are you using a definition that’s assumed to be understood by everyone?
My own definition of productivity has evolved over time. Four years ago, you would find me on stage and on podcasts talking about the benefits of adopting DORA (DevOps Research and Assessment) metrics, after the book Accelerate was published in 2018. Years before that, I talked about story points and velocity.
Over time, my definition of productivity has matured, but also become more complex. It changes depending on context. A small team at a startup needs to have a different definition of productivity from a team within an org with 1,000+ developers. The scope of a team’s responsibility, as well as their business objectives, all influence definitions of productivity.
When I work with engineering organizations interested in measuring productivity, it’s usually for one of these reasons:
Looking for Key Performance Indicators (KPIs) to report to the executive team
Identify and debug inefficiencies in team workflows
Get an organization-wide overview of engineering health
Measure team performance
These are all good questions to ask. But metrics like lines of code, PRs, and code commits, are all bad answers.
2. DORA Metrics
DORA metrics first came on the scene in 2016 via the State of DevOps Report, and gained momentum after the release of the book Accelerate. These four metrics seemed to answer many questions about the effectiveness of software delivery:
Deployment frequency: how often does the team release to production?
MTTR (Mean Time to Recovery): how long does it take to recover from a failure in production?
Change failure rate: of all releases, how many contain a defect?
Lead time: how long does it take for a commit to get to production?
When asked to sum up DORA research myself, I use four words: small changes, frequently released.
This practice enables teams to accelerate their delivery, while minimizing risk. But as an industry, we get really excited by frequent releases and moving faster, while downplaying or outright ignoring DORA’s call to reduce batch size. But both are key to performance.
DORA metrics are really useful if your teams are not yet practicing continuous delivery, and if your methodologies typically skew more toward waterfall. These metrics can help you benchmark against the industry, and they’ll also hold you accountable to your decisions as you start to see results of initiatives to accelerate software delivery.
If you are part of an organization that practices continuous delivery already, insights from DORA metrics may not be helpful. In some discussions leading up to this article, Gergely shared his own experience on such a team, sharing, “It gave reassurance to us, though it did not change anything we did.” But benchmarking your team using DORA metrics can give you a lot of direction if you are a low or mid-range performer.
The one exception is MTTR. If outages are very rare, I recommend that every team runs drills to determine their MTTR quarterly, or at least every 6 months. There are a lot of definitions of MTTR, so a first step is defining it based on your SLAs, or your team’s culture. Alerting, monitoring, and your own on-call processes play big roles in MTTR, and data about them don’t show up in the other DORA metrics.
Since DORA metrics are so clearly articulated, they’re very accessible to leaders outside of software engineering. Such a tidy list of metrics seems to fit nicely into reporting dashboards already used by leadership teams. And this is where I have found myself in almost every leadership position I’ve held; with an executive team wanting KPIs and productivity metrics from the engineering org.
My own boss asking for engineering productivity metrics was a driver for using a tool like GitPrime in the past, and I know many of you reading this right now are responsible for KPIs, OKRs, and other health metrics. It’s not surprising; CEOs expect reports. They get them from sales, marketing, and every other department. It’s reasonable they’d expect something similar from software engineering, a department which often consists of the most highly paid positions in the company.
So, DORA seemed to be an answer to many of the questions CEOs and executives have about development team performance. And because these metrics became popular quickly, sometimes it was the CEO themself who asked for it.
But DORA metrics aren’t intended to measure the productivity of teams, and certainly never intended to measure the productivity of individuals. They began as a way to benchmark DevOps adoption across the industry, and measure the performance of delivery practices. Using them as benchmarking data to track how your organization stacks up against others in the industry is sensible. They’re also effective for benchmarking against yourself, checking to see if your investments in your software delivery processes had the right outcomes.
3. Rolling out a developer productivity tool
Needing metrics to report on performance and also wanting to unblock my team and help them deliver more, I worked with a few other senior leaders to pilot and eventually roll out GitPrime to our engineering organization.
As I’ve mentioned already, it did not go as I’d hoped. I made three mistakes: