Measuring Engineering Efficiency at LinkedIn
Learnings and insights from a principal engineer at LinkedIn and veteran of developer tools and productivity, Max Kanat-Alexander.
👋 Hi, this is Gergely with a 🔒 subscriber-only issue 🔒 of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers.
A few months ago, I tweeted an observation that so many developer productivity tools which vendors build, seem to be designed for CEOs and CFOs to buy them, not to make engineering teams more productive. This is a big difference! Max Kanat-Alexander, principal engineer at LinkedIn, replied, saying:
“The more I read in this space, the more I want to publish what we are doing at LinkedIn.”
Max is an expert when it comes to developer productivity; he’s been in this segment for close to twenty years, spending the majority of it working on tools for software engineers. Over the past decade, he’s spent more and more time working in the developer productivity problem space.
I jumped at the opportunity to learn more from Max about his experiences in this field, and to get a glimpse of what teams at LinkedIn are doing – and why.
In today’s issue, we cover:
Max’s career path
Developer experience learnings from Google
Developer productivity goals at LinkedIn
Developer productivity approaches at LinkedIn
Developer productivity dashboards
The road ahead
Why is developer productivity so hard to measure?
We’ll also take a look at developer productivity dashboards that LinkedIn has built, like this one:
With that, it’s over to Max.
1. Max’s career path: Bugzilla, Google, YouTube, LinkedIn
I worked on Bugzilla for eight years. I started my career as a technical support engineer at a company then-called Kerio, which made mail servers and firewalls. They wanted a bug tracker put in place, and the CEO asked if I had experience with Bugzilla. I’d done some volunteer bug triaging for the Mozilla project during college, so I was given the task to install Bugzilla for the company.
The problem was Bugzilla was ugly. However, there was a Red Hat fork that looked a bit nicer and which ran on Postgres, instead of MySQL which the main project used. I liked Postgres better, so I installed this version. However, Red Hat later abandoned this fork, which was a problem when we needed to upgrade. There I was, having to choose between migrating over their Postgres database to a MySQL one, or adding Postgres support to Bugzilla. Well, of course I decided to add Postgres support to Bugzilla! And I refactored the whole database layer while I was at it.
To get my changes out, I had to become the release manager of the Bugzilla project. To become a release manager, I had to start doing a lot of code reviews. And while doing all this, I worked my way into becoming one of the two co-leads of the Bugzilla project!
This is kind of how things go in open source. If you want to do the work, you will eventually be in charge of the thing. This might be less true on larger projects, but for small projects, this is how it goes. It’s how I got into the Bugzilla project, and then did lots of refactoring, cleaning up of spaghetti code, and bringing the team along. Later, we became the most popular bug tracking system in the world!’
Eight years at Google. I joined Google, coming as the guy who was the architect of Bugzilla, a project written mostly in Perl and shipped in Linux boxes. On my first day at Google, my manager asked me, “Do you want to work on YouTube for the Xbox 360 in C# and Silverlight, developed on a Windows machine?” I said, “Okay, let me do it: it’s a new and different experience.”
And gosh, it was great. Visual Studio with C# was by far the best development environment I’d had up until that point. I don’t think people realize how good Microsoft is at developer tools, and instead think of them as an enterprise company. But my view is that Microsoft is the best developer tools company in the world.
Within Google, I quickly gravitated towards developer productivity. I became the technical lead for Code Health across YouTube, helping developers become more productive, working on large-scale refactorings, development practices, test frameworks and education, across all YouTube teams. I did a year-long stint working on the Java platform, and then moved to be technical lead for Google’s Code Health efforts. I was also the main author of Google’s code review guidelines.
Three years at LinkedIn. I joined LinkedIn more than three years ago, to work on developer tools and productivity, which is where I am today.
2. Developer experience at Google
I worked for eight years at Google, a lot of this on developer productivity. Here are a few things I learned about this space at the company. Note that I’m no longer a Google employee, so my views don’t represent that of Google.
At Google, I was a member of the Code Health group, and later led this collective. Within the group, we frequently had people show up to our meetings and ask:
“What metric should I use to measure code quality?”
We would tell them the same thing, every time:
“Don't use quantitative metrics to measure code quality.”
People would come and ask us about capturing more exotic measurements, for example, “cyclomatic complexity,” which measures the number of potential paths through a function. Over and again, we would explain these metrics are minimally useful, and that what matters for code quality are characteristics like simplicity, maintainability, and similar things.
Over time, I started to develop a deeper and more nuanced approach to the question of which metrics to measure:
“Simple code” means “easy to read, understand, and correctly modify.” This was my first realization, as I built up a mental model of code quality.
Code quality is experienced by a human being, and isn’t a quantity to be measured. Code quality is not a quantity. There’s no “amount” of simple code. Instead, you have to find out from human beings what they are experiencing.
How to measure developer productivity? This was a question I kept getting during my tenure at Google. And it was not just from staff, but also from people outside the company. We knew commonly used metrics like “lines of code” were bad. A quote which I’m fond of about this, is from the computer scientist, Edsger W. Dijkstra, in 1988:
“If we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.”
I had easily more than 100 conversations about this over the course of 10 years, with a variety of people. I eventually realized nobody could measure productivity because they had no idea what the word “productivity” meant.
When people can’t figure out how to measure something, it’s because they haven’t defined the thing they’re trying to measure. Measuring productivity was like trying to measure “foobar.” No one I talked to could define what it was they wanted to measure.
In 2017, I ended up publishing some general thoughts on measuring developer productivity, in which I touch on the importance of defining what we want to measure.
I wrote a very popular internal document at Google about some suggestions for developer productivity measurement. It included a list of experiments we could do to progress in this field. In the end, I didn’t get around to doing any of them.
Still, the popularity of the document was eye-opening. Developer productivity was something a lot of people cared about very deeply, and really wanted to solve!
Years after writing this internal document, I engaged in a project to reduce the time automated tests took to run. I worked on it for about three months and our work impacted tens of thousands of developers at Google, by optimizing lots of details on automated test execution.
As part of this project, I wanted to figure out if the project had accomplished anything that mattered. I dug through the results and confirmed we’d saved a lot of machine time, thanks to the tests running quicker. However, I could not find evidence that we’d saved human time, at least not in a way we were able to detect. I took my analysis a step further, and wrote a document in which I came to the conclusion that optimizing for human developer time is almost always the only worthwhile investment. I used actual numbers to come to this conclusion:
The value of human time at most software companies is 20-100x the value of machine time. This means it’s possible to lose money by doing optimizations which save only machine time, but not any human time!
Towards the end of my tenure at Google, I started to read more documents and papers published by the company’s Engineering Productivity Research team. I learned a lot from Ciera Jaspan, who was the Tech Lead of that group. See Ciera’s authored publications here.
The “Goals, Signals, Metrics” framework is the most important thing I learned from Ciera, while at Google. This is a framework for defining metrics:
Define the actual goal you’re trying to accomplish. Opt for a descriptive definition, for example, say “here is a description of what we want to actually accomplish with our work.” Not the vaguer, “I want to measure X.”
Define signals. How would you know if you accomplished that goal, if you had infinite knowledge of everything? These are called “signals.” For example, a signal is “how happy are people with my product?” You cannot directly know how happy people are with your product, unless you’re magically all-knowing. However, if having happy customers is one of your goals, then this is the right signal for that.
Figure out your metrics. These are proxies for that signal. Metrics are always proxies, there are no perfect metrics.
There’s a lot more to know about that framework. In 2019, Ciera gave a talk where she walked through the framework and gave examples. It’s been super useful.
One of the most effective things the Code Health group did at Google, was add a question on code complexity to the company’s internal engineering survey. Adding a question to a Google-wide engineering survey is a big deal, because it makes leadership pay attention when areas don’t get good scores. Most engineering managers want their teams to ship products, but also to make sure their reports are happy. If they get a survey result which indicates engineers are unhappy with an area, then most managers want to do something about it. They might not know what to do about it, but they at least become motivated to.
3. Developer productivity goals at LinkedIn
At LinkedIn, the project to measure developer productivity started when the leads of the Tools organization wanted to better understand the needs of our internal customers. To do that, we needed to look at our success with actual data from internal customers. The first meetings about understanding their needs through data took place just before I started at LinkedIn. Our original goals were to:
Transform internal developer tools through a deep understanding of our developers’ needs.
Help transform engineering as a whole, through data-driven insights.
3.1 Transform engineering through data-driven insights
Across the software industry, many companies have adopted the “religion” of being data-driven when it comes to user-facing products. It’s a big area of focus when training new Product Managers: how to collect data, how to present it, how to analyze it, and what to do with the results. However, this is not necessarily an area of focus for software engineers.
Most internal tooling teams don’t have product managers. This means it’s up to chance whether a team has the capability or know-how to plan feature roadmaps and priorities, based on data. When internal tooling teams do have PMs, they often struggle to have impact, meaning it’s not a great fit for many. Still, not having a product manager creates a skill set gap.
Many internal tools teams without product managers do look at and operate on data. However, setting up data pipelines, creating effective and easy-to-understand visualizations, and putting a planning process in place which incorporates them, is not part of standard software engineer training, anywhere.
It’s pretty much hit-or-miss as to whether an internal tooling team will operate on data insights. This is especially true when considering if they’d like to do so in a way that’s deeply baked into the culture of the team.
When I say, “transform engineering as a whole through data-driven insights,” this is what I’m talking about: to make it easy for every team to be able to operate on data about tools, productivity, craftsmanship, and related areas.
3.2 Setting realistic goals
How we went about setting ‘realistic’ goals is an interesting topic because there’s different ways of being realistic: