Measuring Software Engineering Productivity
How do you measure developer productivity, how are DORA and SPACE related, and some hard-learned lessons on this topic.
Q: I’d like to measure the productivity of my engineering team and use it to improve how we work. How would you suggest I get started?
This question comes up sooner or later at every company. At startups, it’s often the CEO telling the CTO they want to see some metrics, as they’re concerned how much the engineering team costs and want to promote efficiency. At larger companies, engineering leaders often want ways to “debug” teams with productivity issues.
A few weeks ago, we did a community thread on measuring engineering productivity in which engineering leaders in the community shared thoughtful insights on what worked in their environments.
To dive deeper in this topic, I pulled in Laura Tacho, a VP of engineering and a leadership coach who’s faced this exact issue, has rolled out GitPrime (now Pluralsight Flow) and got burnt in the process and has since dived deep into this space.
Laura is well-known in the Docker community, being involved from the early days as a software engineer and then an engineering leader. She was Senior Director of Engineering at CloudBees; a CI/CD SaaS solution. She took on VP of Engineering roles at two scaleups, before becoming a leadership coach to engineering leaders.
Laura gave the talk What Dashboards Don’t Tell You at LeadDev London in June 2022. She also runs the cohort-based course Measuring Development Team Performance, which is a hands-on take on this topic. The course is three days long and applications for the next cohort close on 4th August.
In this article, we will cover:
What is productivity?
DORA Metrics
Rolling out a developer productivity tool
The SPACE framework and an updated mental model
Where to start
Lessons learned
As a disclaimer, I am an investor in and an advisor in DX which Laura mentions in this article. Laura wrote this guest post independently, and I did not ask her to mention the company.
Over to Laura:
In software engineering, we spend a lot of time evaluating metrics. We’re used to looking at observability data in a dashboard, running queries, and finding inefficiencies. Then we issue a configuration change or merge a pull request and we can see results on the dashboard to validate our choices – or not. This feedback loop works for many of the problems we encounter on a daily basis: observe, debug, patch, repeat. But when you apply this method to systems of humans, you won’t get the result you hope for.
I’m talking about developer productivity metrics.
When I was a Senior Director of Engineering at CloudBees several years ago, I helped roll out GitPrime to 300+ developers. It was a flop. (It’s worth noting this was prior to GitPrime’s acquisition and rebranding to Pluralsight Flow, and before CloudBees released its own engineering efficiency tool.) Hindsight tells me why: beyond my own mistakes, the fundamental idea I could boil down productivity to a handful of metrics which look nice on a dashboard, was – and remains – a myth.
I learned these lessons the hard – and expensive – way. Like so many of you, I sought quantitative data to help me diagnose bottlenecks and free up the flow of work. On top of that, I had pressure from my senior leadership team to quantify performance and report on it. So, I bought into the idea that quantitative data is always better than qualitative data, though my teams had plenty to share about what slowed them down.
This experience taught me a lot about measurements, incentives, and motivation. Most importantly, it set me on a path of research and analysis to understand how and why engineering leaders measure development team performance.
1. What is productivity?
This is a simple question, but it lacks a simple answer. Developers might tell you productivity means coding time, or the number and size of contributions to a codebase. Managers might tie productivity to on-time delivery of projects. Executives might have an altogether different perspective.
So here is our first problem. Even after decades of trying to measure productivity, we’ve yet to agree on a definition.
A 2022 study by Cornell University titled How Developers and Managers Define and Trade Productivity for Quality, outlines this in clear and surprising ways. It asks how developers and managers each define productivity, and more interestingly, how they think the other group defines it. It found that while developers define productivity in terms of activity in the codebase, managers often focus more on the quality and performance of delivered projects. The gap here makes it difficult to have useful conversations about measuring and improving productivity.
In my own experience, I find that performance and productivity are often used interchangeably. For some teams, they are the same thing. If released software is not performant – meaning it is of low quality, or simply does not function as intended – then the team is not productive, because they’re not hitting their goals.
This is usually the case on teams which practice lean methodologies, or teams where engineering is a main stakeholder in feature design, alongside product and design. Let’s say a team is working on a new landing page UI with Google SSO, designed to increase signups for a free trial.
The product team prioritizes the initiative, design is responsible for the UI, and engineering advocates implementing SSO for a more frictionless account creation experience and to be able to refactor some spaghetti code in the existing authentication service. The developers are chugging out code, but when it’s released, signups don’t increase. From a business perspective, this project is not a success and the team has not been productive. There is output, but not good outcomes.
For other engineering teams, productivity is a measure of activity in a codebase, and not bound to business results. On these teams, development teams often don’t have agency to decide what is being built, but they do control how (and importantly, how fast).
In a parallel universe, the same engineering team is handed wireframes and a product spec from their product owner. They execute efficiently, deliver on time, and the code holds up against spiky production loads. Still, the number of signups doesn’t go up. In this scenario, the engineering team is productive, even though the intended outcome isn’t realized.
Which of these scenarios is more similar to how yours works? Has your team collectively and explicitly defined productivity, or are you using a definition that’s assumed to be understood by everyone?
My own definition of productivity has evolved over time. Four years ago, you would find me on stage and on podcasts talking about the benefits of adopting DORA (DevOps Research and Assessment) metrics, after the book Accelerate was published in 2018. Years before that, I talked about story points and velocity.
Over time, my definition of productivity has matured, but also become more complex. It changes depending on context. A small team at a startup needs to have a different definition of productivity from a team within an org with 1,000+ developers. The scope of a team’s responsibility, as well as their business objectives, all influence definitions of productivity.
When I work with engineering organizations interested in measuring productivity, it’s usually for one of these reasons: