Measuring developer productivity? A response to McKinsey, Part 2

The consultancy giant has devised a methodology they claim can measure software developer productivity. But that measurement comes at a high price – and we offer a more sensible approach. Part 2.

Aug 31, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers.

If you’re not yet a full subscriber, you missed issues like Building a simple game; The ML toolset and other articles. Subscribe to get two full issues every week. 👇

This is the second, and final part of the response of the two of us – Gergely Orosz and Kent Beck – to the McKinsey article “Yes, you can measure software developer productivity.”

We believe that introducing a kind of framework that McKinsey is proposing is wrong-headed and certain to backfire. Such a framework will most likely do far more harm than good to organizations – and to the engineering culture at companies and the damage could take years to undo.

In Part 1 of this two-part article, we covered:

A mental model of the software engineering cycle
Where does the need for measuring productivity come from?
How do sales and recruitment measure productivity so accurately?
Measurement tradeoffs in software engineering

We now wrap up this topic with:

The danger of only measuring outcomes and impact
Team vs individual performance
Why does engineering cost so much?
How do you decide how much to invest in engineering?
How do you measure developers?

This article diverges into two voices at the end of it. Read thoughts from Kent Beck in this article - published in his excellent newsletter Software Design: Tidy First?

1. The danger of only measuring outcomes and impact

So far, sales and recruitment have been on something of an accountability pedestal, as both capture team performance and individual performance with undisputable metrics. However, we have seen the dark side of only focusing on measuring – and rewarding – outcomes and impact: that people game the system for their own benefit in ways that defeat the purpose of measurement, and ultimately disadvantage the business by generating junk data.

Below, Kent shares what he’s seen happen when the only thing that matters is sales quotas:

“Individual goals discourages people from working together, as everyone is chasing their own quota. And this can lead to missed opportunities.
For example, take the opportunity to close a mega-sale that spans several regions. For this, multiple sales representatives must work together, and the sale would be credited to the representative who discovered the opportunity. But other sales folks are incentivized not to help. The company loses an attractive prospect without knowing it. Individual incentives can work against the long-term profitability goals of the company.
I saw firsthand how a ‘star’ salesperson operated. They always hit their quota, and collected a hefty bonus. How did they do it? Well, they knew that sales could fall through any time, so they always had several prospects ‘in the pocket’ who they could convert anytime, and so put them off until the end of each quarter. They only converted them if they needed to hit their quota. In order to maximize personal gains, they did not maximize the company’s gain.”

In the area of recruitment, Gergely experienced how rewarding recruiters by candidates closed, can backfire later:

“I once worked with a recruiter whom other hiring managers raved about. This recruiter had a 100% close rate. Closing means that when a candidate gets an offer, we get them to sign. Back then, my part of the organization was so stretched that recruiters did most of the closing conversations and took care of details. Most recruiters closed 70-80% at most. I was told this recruiter is a ‘rockstar’ among their peers.
I discovered in a painful way how they did it. About 6 months after this recruiter left, performance reviews and bonuses time came around. Several engineers in our group complained about their bonus amounts, saying they’d been “guaranteed” 10x the amount then they actually got. After some digging, all signs pointed to the rockstar recruiter; they’d made verbal promises to engineers in private settings, which were outrageously untrue.
This recruiter focused on outcomes, and ignored several unwritten – as well as written – rules. It took us managers months to sort out the mess, and left engineers feeling tricked and with declining faith in the company.”

Measuring outcomes and impact is important, but there must be checks and balances which ensure outcomes are reached the right way. In the end, this is exactly what a healthy company culture is about. In contrast, in a “cutthroat” or toxic culture only easily-measurable outcomes and impact matter – and ends always justify means. A healthier culture takes outcomes and impact into account, and will curtail the rewards of outcomes achieved in unprofessional ways, or ways that don’t consider collaboration or the bigger picture.

2. Team vs individual performance

What’s more important, team performance or individual performance? Sport provides a pointer, as an industry where individual performance can be measured quite accurately.

Take soccer as an example. There are many examples that prove team performance trumps individual performance. A team with objectively worse players can beat an opponent with more talented players by playing as a team. This was resoundingly proved when Greece won the Euro 2004 international soccer tournament with a squad ranked the 15th most likely to triumph from 16 national teams taking part. The secret behind this success? The documentary King Otto reveals it came down to teamwork, playing to players’ strength, and an outstanding German coach, Otto Rehhagel.

It’s common for teams filled with star players to struggle for success, despite possessing individuals with objectively superior skills. The Spanish club team Real Madrid proved this with its “Galácticos” recruitment policy in the early-mid 2000s, where superstar players were signed but the team regularly failed to win trophies.

We see similar dynamics in software engineering: teams punching well above their skill and experience level by working together, morale being high, and a manager with the right intuition. I’ve also seen a team filled with senior-or-above engineers that struggle to deliver expected outcomes, suffer low morale, confused direction, as well as poor management and leadership.

Let’s look at another sport, ice hockey. This uses an interesting statistic called “plus-minus:” which measures a player’s goal differential, which captures how many more goals a team scores and how many fewer it concedes, when that player is on the ice. It’s a sort of “contribution to the team success” indicator, and is useful for identifying players who make a team much more efficient.

Could we find a kind of “plus-minus” indicator for software engineers? If such an indicator existed, it could be worth measuring. However, a five-on-five hockey game and a software engineering project consisting of 5-10 engineers, designers, testers, and product specialists is very different. Hockey teams play games weekly, there’s strict time limits; and the terms of victory are very clear: score more. In contrast, software projects tend to last much longer, may have no time limit, and there’s no simple scoring system.

Individual performance does not directly predict team performance. And if it’s not possible to deduct team performance from individual performance in a domain that’s as easy to measure as sports, then we can expect even less success in software engineering.

Team performance is easier to measure than individual performance. Engineering teams track performance by projects shipped, business impact (eg revenue, profit generated, churn reduced etc), and other indicators, similarly to how sports teams track performance via numbers of wins, losses, and other stats.

3. Why does engineering cost so much?

The question of “why does engineering cost so much” is one that will be a surprisingly frequent one. Here’s a suggestion on how to tackle this specific question:

Imagine a world where the company spends 0% of its budget on engineering. I know: it’s absurd. But do this. What would this mean for the company? What would customers experience? How would business trend?
Now, imagine where the company spends 100% on engineering, and 0% on everything else. What would happen?
Now that we know the two extremes: what percentage of the overall budget does the company spend on engineering? With this number: the decision is what would happen if we moved this number down by a few percentages, or increased it by a few percentages. Which approach would benefit the business more, and why?

This exercise turns the question from “why does engineering cost so much” to a comparison exercise, where the decision is whether to reduce or increase engineering spend by $X, versus make this investment – or reduction – with another area.

4. How do you decide how much to invest in engineering?

Another common reason that C-level executives want to measure the productivity of engineering is because they want to get a sense of how much it’s worth to invest further into engineering – versus allocating the planned investment in, say, sales, or marketing.

The real question an executive is asking, however, is not about productivity. The question is: “How much should we invest into engineering?”

To answer this question, consider that software engineering results are unpredictable, especially measured on a small scale. Sure: there are industries where you know exactly what you want to build, and engineering is merely an execution game. But at companies where engineering innovates: the decision on what to invest is more akin to how oil companies decide where and what to invest.

Oil companies don’t know for sure how much profit an oil drilling operation will build. So they make smart investments. It’s impossible to tell if any single exploratory drill will uncover a new and profitable oil field: so they fund several of these at once; expecting that some will, eventually bring promising results. Armed with more data, bigger investment decisions are then made.

It is a pragmatic approach for engineering leaders – and executives – to approach investing in software engineering, as a research and development activity, in a similar way. Place small but inexpensive bets: and double down on the ones that show tangible promise.

5. How do you measure developers?

This section is where our voices diverge. The below is my (Gergely’s) take on this question. Check out Kent Beck’s answer to the same question in this article. If you’ve not done so, I recommend subscribing to Software Design: Tidy First? written by Kent.

So, how do you measure developer productivity? Here is the framework I suggest.

Understand what the real need is. When someone asks how to measure developer productivity, it is never the true question. To discern what’s really being asked, consider these things: who is asking, and what is their real goal? The real topic will be something like:

“I need to decide which areas to invest more headcount in. Which allocation will give the business the best return?”
“I want to do performance management and identify low and high performers.”
“I want to pinpoint problematic teams, debug and fix them.”
“Our investors want us to reduce costs and I need to figure out how much I can cut without significantly impacting the business.”
“I need to justify the cost of engineering to the CEO who thinks we’re too expensive.”

Reframe the question. As an engineering leader, you’ll often get the request “we want to measure the productivity of your team.” Instead of going along: take a step back.

Here is what Abi Noda – the cofounder of DX, author of the Engineering Enablement newsletter and the co-author of the paper A new way to measure developer productivity paper – suggests how to do this (disclaimer: I am an investor and advisor at DX):

“My advice for eng leaders in this situation: reframe the problem. What the CEO wants is to know you’re being a good steward of their investment in engineering. You can demonstrate strong stewardship by providing a full picture of engineering, including:
Business impact
System performance (are our systems fast, reliable, etc)
And developer effectiveness (speed, ease, quality, satisfaction).”

Know that people optimize for what is being measured. Employees are smart enough to know that if a measure is used to evaluate them, then they should optimize that measure. This is captured by Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

*Goodhart’s law. Image source: Sketchplanations*

For example, I recall what happened when a team using Scrum started to measure whether or not they achieved sprint goals, measured in velocity points. Our PM and EM started to term teams that met their sprint goals as having ‘completed the sprint,’ and teams that didn’t do this as having ‘failed the sprint.’ We defined sprint goals in story points, and not as the work committed.

The next thing I knew, one developer visibly working toward a promotion began to inflate estimations of tasks, and also to ‘sneak in’ easy-to-complete tasks that were estimated as high story points to the sprint. On paper, we did more story points.

I asked this developer why he was doing it, to which he replied that he didn’t want the sprint to fail, as it could make him look bad. The end result was that as a team we worked with much less focus, and it felt like people cared about story points, not building the stuff our customers wanted.

When you measure in the open, aim for team outcomes and impact, not effort and output. People will figure out how to “game” what you measure, and optimize for this. Here’s what happens when you measure each of the areas:

Measure effort: create high-effort busywork of dubious value
Measure output: increase the quantity of the output by what’s easiest to do. This might not help with outcomes or impact.
Measure outcomes: aim to beat targets, even if this means taking shortcuts
Measure impact: get creative in reaching this with less effort and output

Don’t ignore the effort and output people and teams produce, though! Instead of measuring them, use this to debug issues with outcomes or impact.

For example, if you measure lines of code produced and tie performance incentives to it, engineers will optimize for this number and increase tech debt. But if you measure outcomes, then notice an engineer is barely shipping features, you will want to look into what code they produce, its quality, how they spend their time, and so on.

Suggestions on how to think about each phase in the effort/output/outcome/impact model

Be aware that frameworks which measure effort and output change behavior – and not always in obvious ways. If you measure the number of pull requests: people will create smaller pull requests, without changing much else. Is this a desirable thing? Perhaps it is: and if so, then introducing this measurement will shape the engineering culture in that direction.

However, be on the lookout for unexpected behavior changes. For example, it might not be desirable if people now start to split pull requests to separate the code changes and the test changes: just to increase their PR count: as this approach is counterproductive.

A problem with measurements focused on effort and output is that they transform the engineering culture to one where ‘slack time’ – free time between two pieces of work – is frowned upon. At companies that operate like factories, this might not be a problem. But for companies where engineering is a profit center, and a spontaneous interaction between colleagues during downtime could lead to prototyping and shipping a new idea, then frameworks measuring effort and outputs will squash culture.

Know that measurement interferes with the system. Every new thing you start measuring will lead to engineers optimizing to make that measure look better. The more you measure, the more you shape the culture and how people work.

It’s pretty easy to see how a highly productive engineering team becomes much less productive when the leadership starts to measure things, especially if it’s effort and output.

Keep in mind the risks of each measurement. Be skeptical when consultants claim it is possible to measure without impacting how people work. It never is. The best you can do is to introduce changes that are actually desirable ones.

Running a high-performing team is a hands-on approach. Looking to metrics to get numbers that show that a team is high-performing is wishful thinking. In practice, the way you can tell you have a high performing team is:

The impact of the team is in-line, or above with expectations. Look at impact measurements like revenue generated, contribution to profit, cost reduction, or other measures that tie back to profitability, growth or other key business metrics
Engineers on the team are working efficiently – in the way that ‘efficient’ makes sense for this team
The leader (or leaders) on the team are hands-on enough to spot issues with execution, and address them promptly

Take two teams. Team A: communicates their impact clearly, and has a hands-on leader on the team. Team B: has fancy metrics and a hands-off leader. I will take Team A over Team B, anytime. Related reading:

Yes, you can measure developer productivity, but at what cost? McKinsey is not wrong to state developer productivity can be measured. However, they avoid the question of what is the cost of the measurements.

It is possible to measure the impact of an engineering team and engineering organization. And you should be doing this, if you’re not already! For example, when at Uber, my team had a wiki page where we listed our current and completed projects, and their impact, divided per quarter. Here is how it looked (🔒 document with more examples)

Capturing the impact of an engineering team — my approach. Full subscribers can access this document – and many other resources – here. — Capturing the impact of an engineering team – my approach. Full subscribers can access an extended version of this docuemnt with more examples – and many other resources – here.

I found it curious that relatively few teams captured their impact in the format my team did: and having “hard impact data” made discussions about everything much easier: prioritites, reorgs, headcount allocation, and so on.

Attributing this impact to individual engineers is also possible, but the more granular you make the attribution, the more this interferes with incentives, and the more likely you are to create an organization which incentivizes busywork.

My suggestion is that if you measure, then start with impact. Not individual impact: but team impact.

And, of course, as an engineering leader: stay close to the work: be hands-on when you can, and definitely remain technical.

If the impact is off, roll up your sleeves and debug the issues – which involves looking at effort, and outputs. But don’t default to measuring effort and output, “just in case” there could be an issue with outcomes or impact.

How do you measure developers? Kent Beck’s take
A new way to measure developer productivity – from the creators of DORA and SPACE
The full circle on developer productivity with Steve Yegge
Staying hands-on, as an engineering manager
Staying technical, as an engineering manager
Measuring software engineering productivity with Laura Tacho
Measuring engineering efficiency at LinkedIn
Platform teams and developer productivity with Adam Rogal, director of developer platform at DoorDash
How Uber measures engineering productivity

Hire Faster With The Pragmatic Engineer Talent Collective

If you’re hiring software engineers or engineering leaders, join The Pragmatic Engineer Talent Collective. It’s the #1 talent collective for software engineers and engineering managers. Get weekly drops of outstanding software engineers and engineering leaders open to new opportunities. I vet every software engineer and manager - and add a note on why they are a standout profile.

Companies like Linear use this collective to hire better, and faster. Read what companies hiring say. And if you’re hiring, apply here:

Apply Now

Featured Pragmatic Engineer Jobs

Senior Frontend Developer at TalentBait. €60-80K + equity. Barcelona, Spain.
Technical Lead at Alby. £95-120K + equity. London or Remote (UK).
Senior Software Engineer, Missions at Ably. £80-100K + equity. Remote (UK).
Senior Software Engineer at LatchBio. $120-220K + equity. San Francisco.
Software Engineer at Freshpaint. $130-210K + equity. Remote (US).
Senior Software Engineer, Developer Ecosystems at Ably. £80-100K. Remote (UK).
Senior Web Engineer, Activation at Ably. £75-85K. Remote (UK).
Web Engineer at Ably. £70-75K. Remote (UK).
Founding Engineer at Layerup. $120-180K + equity. San Francisco.
Founding Engineer at Hotplate. $165-195K + equity. San Francisco.

See more senior engineer and leadership roles with great engineering cultures on The Pragmatic Engineer Job board - or post your own.

A guest post by

Kent Beck

Programmer, artist, coach coach, singer/guitarist, peripatetic. Learning to be me. Full-time content producer. Mailto:kentlbeck@gmail.com

Owais

Dec 22, 2023

I started reading this article with the mindset that it provides an alterantive to McKinsey's approach of measuring developer productivity. I am diasspointed. It does elaborate on obvious things but it doesn't have an answer. I would rephrase the question that still stands.

Even if my team is doing good enough, how do I measure individual's productivity to reward them or give proper feedback to executives in case of downsizing?

The Pragmatic Engineer

Discussion about this post

Ready for more?