How Uber is Measuring Engineering Productivity
Inside Uber’s launch of the Eng Dashboard. How do engineers and managers feel about this new tool, and which metrics does it track?
On Thursday, 4 August, Uber held an All-Hands meeting. Presenting at the event was Uber’s CEO, Dara Khosrowshahi.
During the meeting, Dara showcased a new tool for all of engineering: the Eng Metrics Dashboard. It’s a dashboard which shows pull review metrics – which Uber calls ‘diffs’ – code review metrics and focus time stats. This is what it looks like at first glance:
For the past two months I’ve been talking with software engineers and engineering managers about their experience with this new tool, and I’ve seen their views on its usefulness shift. In this issue, we venture inside Uber to get a sense of what the Eng Metrics Dashboard is, the problems it might attempt to solve and feedback from engineers.
We cover:
The history of tracking coding stats. Uber has tracked pull requests – referred to as diffs – before, but in a way which was hidden from engineers. We also look at how Facebook and Amazon have similar approaches in place for tracking pull requests.
Uber’s Eng Metrics Dashboard: a high-level overview. A look at what it is and the metrics it tracks.
Why is the Eng Metrics Dashboard sponsored by the CEO? It’s highly unusual for a CEO to not only care about, but also to champion, an engineering productivity tool. My analysis on why it’s happened at Uber.
Initial worries about the new approach. Upon the announcement, both software engineers and engineering managers were worried this tool could be misused. Two months on, were their concerns valid?
Spotting outliers and regional differences. Using the tool to spot outliers, and how different regions have varying diff and code review counts.
Measuring software engineering productivity. Having previously covered this topic, how does Uber’s approach fit into practices which are worth following, or best avoided? My analysis.
Appendix: Uber’s Eng Metrics Dashboard: a visual walkthrough. The mockup of most parts of the Eng Metrics Dashboard. Subscribers can access the full mockup in one document in that section.
1. The history behind tracking coding stats
The Eng Metrics Dashboard has not appeared from nowhere. The first version at Uber of a tool to track high-level metrics for developer coding productivity, was built in around 2017. It was led by the Developer Platform organization, under VP of Engineering Matthew Mengerink, who had recently joined from Google. That tool included statistics like aggregated diff count at an organization level.
Access to the first version of the tool was limited to director-and-above levels. Engineering leadership believed data on individuals would result in engineers optimizing for the wrong behaviors. This worry was why usage was limited to organization leaders and why the tool supported only aggregate reporting.
I was aware of this tool while at Uber, but I never had access due to not being at director-level. During 2020 – my final year at the company – there was talk of potentially making the tool a bit more accessible at manager-level, but I never saw it happen.
Engineering leadership sometimes quoted statistics for all of engineering, derived from the tool. For example, at one All-Hands during 2020, a statistic was shared that showed software engineering productivity did not reduce upon adoption of an all-remote work pattern, but in fact went slightly up. This was displayed on a graph showcasing the number of diffs per engineer on a weekly basis, across all of engineering.
Back in 2020, I thought this was a clever way to use diff statistics. There was no target on diff frequency, and most engineers were unaware leadership would even look at this data in aggregate. Because of this, engineers did not optimize their work for more diffs, and I appreciated how this type of data collection was useful and showcased interesting information.
Some organizations were tracking diff count per engineer before the Eng Metrics Dashboard. I talked with an engineer who shared how their organization built a custom dashboard to track the number of diffs per engineer per week, before a company-wide dashboard launched. Leadership in this organization already encouraged engineers to target 4–5 diffs per week, or about one code change request per workday.
Approaches at Amazon, Facebook and GitLab
Unsurprisingly, Uber is not alone in building tooling for engineers to get a sense of engineering productivity. Amazon and Facebook have similar tools in place.
At Amazon, the tool is called Crux. From my article Inside Amazon’s Engineering Culture:
Crux: Amazon’s code review system. People can see stats of everyone on your team for number of pull requests or number of code reviews, statistics on lines of code.
I talked with an engineer at Amazon who shared that software development (SDM) managers know the metrics in Crux can be “gamed,” for example via automated commits or trivial code submissions. This person believed Crux statistics mattered very little – if at all – for promotions.
Here’s what a former Amazon software engineer, who left in 2021, shared about this tool:
“Crux shows the standard code statistics stuff: the number of reviews you did and submitted, how many revisions it took on average to get approved, and when these were submitted in terms of time of day and days. It also visualizes lines of code (LOC) statistics.
While I was working on my promotion document, my manager specifically called out that Crux is only one datapoint. My manager also said that people can game metrics by submitting really long but trivial pieces of code, such as auto generated stuff, or deliberately doing lots of very small commits. At Amazon, everyone I've worked with has favored smaller, often stacked commits because these are easier to read and faster to review.
I don't think Crux changed how we wrote code or did code reviews.”
Facebook also has a similar internal tool for code metrics. As I cover in Inside Facebook’s Engineering Culture:
Internal manager tools for code metrics: both engineers and managers have access to internally built tools to visualize code-level metrics like lines of code (LOC), diff count (basically, number of pull requests or PRs.) People can add others as a way to compare stats with their own numbers.
There are no targets at Facebook either, and engineers I talked with did not think stats carry much weight during the performance review process. Of course, there can be exceptions when these statistics are relevant, such as performance reviews for the Coding Machine archetype.
GitLab has been setting targets to increase merge requests (MRs) for years. GitLab shares its engineering objective-key-results (OKRs) publicly. Since 2020, the company has put goals in place to increase engineering productivity. One way it measured productivity was through tracking merge requests per engineer. In 2020, the company set the goal to increase average merge requests per engineer by 20% for all teams.
The company tracked both the average merge requests per engineer, per month, as well as total merge requests across all of engineering and engineering subgroups. Following the focus on increasing merge requests in 2020 and in 2021 Q1, I have seen no more goals published for improving or tracking this metric.
2. Uber’s Eng Metrics Dashboard: a high-level overview
The Eng Metrics Dashboard has three main parts: