AI Tooling for Software Engineers: Rolling Out Company-Wide (Part 3)
How large tech companies are using internal AI tools. Also: guidelines and practical approaches for embracing LLM tools for software development on the individual dev, and organizational level
Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:
Before we start: you can now see use “table of contents” quick navigation on the right side of each article, when reading the newsletter on the web. Just click on the sidebar, and you can navigate this article — and all other The Pragmatic Engineer articles. See it in action on the web. Happy browsing!
There’s no shortage of big claims about what LLM tools will be able to do, or should be able to do in the software engineering field. But what do they actually do, right now? We asked software engineers who regularly use these tools, and engineering leaders who oversee these tools in their organizations.
This article is based on a survey of 216 professionals and is the third and final part of a mini-series on GenAI tooling. It covers how these tools are being used ‘day-to-day’ in tech workplaces, and what engineers think about them. Today, we cover:
AI usage guidelines. A quarter of respondents follow company-wide usage guidelines. Interestingly, a minority of companies have banned GenAI tools over security and copyright worries.
Internal LLMs at Meta, Microsoft, Netflix, Pinterest, Stripe. Large, fast-moving companies not only embrace GenAI tools, but build their own internal versions for their engineers. Vendors are starting to offer similar boxed products.
Reservations and concerns. Most common reservations, and how to overcome them – and why devs tend to start using LLMs regularly.
Advice for devs to get started with AI tools. Start small, verify outputs, don’t “outsource” coding and other advice.
Advice for engineering leaders to roll out AI tooling, org-wide. A roundup of how companies adopted these tools successfully: guidelines, tooling, training, and how these impact junior engineers.
Measuring the impact of GenAI tools. Most engineering leaders say these tools have no visible or measurable impact – at least not yet. We suspect this is because the impact is hard to measure: and not due to the lack of impact.
AI strategy. Why do companies incorporate GenAI into their software engineering workflow? Experimentation and hoping for increased productivity are two big reasons.
In Part 1 of this series, we covered:
Survey overview
Popular software engineering AI tools
AI-assisted software engineering workflows
The good
The bad
What’s changed since last year?
Part 2 was about:
What are AI tools similar to?
State of AI tooling in 2024: opinions
Critiques of AI tools
Changing views about AI tooling over time
Which tasks can AI already replace?
Time saved – and what it’s used for
Now, let’s dive into this final part of this mini-series.
1. AI usage guidelines across companies
We asked survey participants “how is AI tooling used for development at your company?” The responses reveal different approaches:
The most referenced approaches:
No formal guidelines. Around 25% of respondents (53 out of 216) say their company has no recommendations about AI tooling. People use it as they see fit.
Some guidelines. 25% (50 respondents) say their workplaces have rules and guidelines for AI tooling.
AI tools banned. Circa 12.5% (25 responses) say their businesses ban usage of AI tools, mostly due to concerns about code security, and potential copyright infringement. We previously covered how several open source projects have banned AI-generated code commits for this reason.
Working on guidelines. 7% of respondents (15 people) share that their company is trialing AI tooling, or is in the process of adopting guidelines.
Strongly encourage AI tool usage. 6% of respondents (12 people) work at places which encourage using these tools wherever possible.
Common features of guidelines across workplaces, based on survey responses:
Specifying which tools and LLM models may be used
No inputting of sensitive information into AI tools like ChatGTP
No entering of internal (closed-sourced) code into AI chat tools
It’s pretty clear some guidelines are responses to fears that LLMs may retain the data employees input and use it for training. This is also a reason why a handful of respondents shared that their companies go through the added complexity of running LLMs on their own infrastructure. It’s a reminder that LLM solutions which don’t store company data have a strong selling point for tech companies.
2. Internal LLMs at Meta, Netflix, Pinterest, Stripe
Only a fraction of respondents say their companies strongly encourage the use of LLM tools, but some of these are cutting-edge market leaders in tech. Let’s take a look at how a well-built internal LLM can help a business.
Meta
The social media giant has been investing heavily in ML and AI since before ChatGPT was released. Back in 2022, we covered how Meta was already preparing for AI/ML ‘wartime’ by investing heavily both in AI hardware, and hiring large numbers of AI and ML engineers. This investment has not slowed down since, and it’s little surprise that Meta seems to have built one of the leading in-house AI tools.
Meta’s internal tool is called Metamate. Director of Engineering Esther Crawford describes it:
“It’s an AI for employees that’s trained on an enormous corpus of internal company docs. I use it all the time for efficiency gains.
Any sizable company operating without an internal AI tool is already behind the curve.”
Esther explains what Metamate does:
“It has a ton of capabilities from summarizing to coding. Simple use cases:
Here’s a practical example on how useful Meta’s tool is, from Shana Britt E, director of strategic initiatives:
“Recent use case: Performance reviews. Writing self-review, cleaning up peer reviews. For self-review, it can capture your diffs landed, status updates about your work from documents you published, etc. and puts it in a nice summary that you can then review and edit.”
Microsoft
The company offers Microsoft Copilot for Microsoft 365 for enterprises, and is dogfooding this system inside the company. I talked with software engineers who confirmed that the internal Microsoft Copilot is integrated with internal documents, and can thus provide more relevant context. It is also used in places like pull request reviews – although for this use case, I heard it’s more of a hit-and-miss in the quality of feedback.
Stripe
The fintech company has a similar system to Metamate. Miles Matthias, product manager, shares:
“We have something similar [to Metamate] at Stripe and I spend a bunch of my time talking to it. I can imagine a world where I’m basically having a voice conversation with it all day every day as ‘work’ - especially when agents boom.”
Netflix
The company has a place to access Netflix-provided versions of LLMs. A senior software engineer told us:
“There are AI guidelines, and corporate-provided versions of GPT, Claude and other models in a unified interface. People can share prompts that they find useful to colleagues.
My org is also exploring AI for our specific use cases, but thus far have not found any AI tools to be where we need. There is an opportunity to automate some manual business processes and we thought GenAI could help, but it seems traditional engineered solutions are still much better than GenAI."
Pinterest
The company builds internal LLM tools. One clever utility is called Text-to-SQL: a feature where internal users can use plain text to ask for a type of query, and the tool generates the right SQL to be used with the company’s internal data store called Querybook. The engineering team improved the first version with RAG, to help identify the right table names to use (we previously did a deepdive on applied RAG). The results are promising. As the company shares:
“We find a 35% improvement in task completion speed for writing SQL queries using AI assistance.”
Vendors offering similar capabilities
There are plenty of vendors offering a “Metamate-like” experience out of the box. Glean seems to be the leader in this area. Other options include Village Labs, Microsoft Copilot for M365, Coveo and Akooda. This category is relatively new and there are plenty of up-and-coming startups. Search for terms like “AI company knowledge management tools” to find them.
The productivity perception of these systems rarely matches reality. Despite being a leader in the AI field, Meta is just figuring out how these tools can help it operate more efficiently. Metamate sounds impressive – and it’s ahead of what most companies have – but it doesn’t work optimally just yet, as we hear. I got this detail from talking with current engineers working at Meta.
The reason companies like Meta are investing so much into this area was articulated by CEO Mark Zuckerberg two months ago, on the company’s earnings call. He talked about how AI investments will take years to pay off, and Meta wants to be early. He said:
“You can also get a pretty good sense of when things are going to work years in advance. And I think that the people who bet on those early indicators tend to do pretty well, which is why I wanted to share in my comments the early indicator that we had on Meta AI, which is [...] early.”
3. Reservations and concerns
When starting to use AI tooling, companies and developers often need to overcome reservations, or find workarounds. Let’s start by summarizing these reservations.
Reasons for not using AI tooling
Reasons for disallowing – or heavily limiting – AI tools include security and privacy worries; especially about internal, confidential information, and proprietary code being leaked. A few respondents also mention customer data.
Several larger companies have worked around these concerns by using in-house, self-hosted, LLMs, and their security and compliance teams add filtering to the inputs and outputs of these tools. This approach is clever:
Security and compliance teams can tweak filters to catch confidential or customer information that shouldn’t be shared
If confidential information is fed into a self-hosted model, this data does not leave the company to an external vendor
The obvious downside is that it’s not trivial to build and maintain. However, given that leading tech companies already have internal models and are heavy users, it’s likely other businesses will follow by either building in house, or using a vendor offering hosted LLMs with capability for internal security teams to tweak filters.
Developers’ reservations
But it’s not just companies dragging their feet; developers are also hesitant about LLMs in the survey:
Commonly cited ethical and environmental concerns:
“The model used to power most AIs represents a large theft of labor from the commons, all to deliver increasingly terrible results.” – Senior software engineer, 19 YOE
“I have ethical concerns about code theft, and environmental concerns about energy consumption.” – Project lead, 9 YOE
“I feel their massive energy use goes against my personal code of ethics” – Software engineer, 8 YOE
“I am uncomfortable with its resource and energy usage, biases and AI hype, as ways to concentrate even more money and power at big tech companies and their culty leaders, which feels too adjacent to the Blockchain hype and grifts from a few years back for comfort.” – Software engineer, 40 YOE
These criticisms are valid. Large language models are known to be trained on copyrighted code, as well as on copyleft-licensed code, where the license is not complied with. And the surge in energy usage is also real, as covered in Is GenAI creating more carbon pollution by cloud providers?:
“It appears that the latest technological wave of GenAI may be getting in the way of corporate climate goals. Large language models (LLM) are very hardware and energy-intensive, and Azure, Google Cloud and AWS have all rapidly expanded their data center capacity and power usage, in order to meet demand. With energy usage surging, so are carbon emissions; which is the opposite direction from what they need to be going in, if these companies are to hit Net Zero in 2030 or any time after.
Google: carbon emissions up 48% in 2023, compared to 2019
Microsoft: carbon emissions up 30% in 2023, compared to 2020.”
There are clear benefits to GenAI, but also technological downsides. The ethical concerns seem to have no easy answers, while the history of computing has been about making computers more energy efficient, so we should expect the same here. At the same time, it’s concerning that GenAI is used to justify creating data centers which consume massive amounts of energy, or considering nuclear-powered data centers to keep up with computing demand.
Not enough utility, yet: We previously summarized negative sentiments in “Unimpressed” critiques in Part 2 of this survey. Common complaints about AI from engineers include:
Useful for simple stuff only, poor performance in more complex tasks
Little use outside of repetitive changes and boilerplate generation
Unreliable due to generating buggy code
Seen as a “fancy autocomplete”
More a hindrance than a help
Tedious to work with
Here are two more comments from engineers who stopped using AI tools:
“Seems useful for certain tasks, particularly writing related. For specific coding I could see it being used to generate more boilerplate, but personally my team tends to invest more in common libraries that reduce boilerplate anyway (while ensuring best practices are followed)” – Senior software engineer, 5 YOE
“ChatGPT is a novel tool with some potential to speed up boilerplate work and learning/investigation. It is not a high value for expert software engineers yet, but I’m optimistic that it will improve in a few years.” – Principal software engineer 20 YOE
These reservations are valid, but survey responses show that using LLM tools for 6+ months changes the views of many developers: mostly to a more positive, or more grounded, viewpoint. If you have an underwhelming first impression of these tools, it might be worth trying them daily for a bit before making up your mind.
Why do devs start using LLMs?
We asked tech professionals why they started using these tools. The most common responses listed by frequency:
Company pushes LLM usage. Several large businesses set targets for departments of numbers of developers using LLM tools. Companies buying GitHub Copilot licenses also pushed workers to onboard. We’ve heard about company mandates, LLM training, and management expecting devs to use these tools for productivity.
To be more efficient / lazy to google it. Developers may adopt these tools to become more efficient, or because they can’t be bothered doing a web search. All found the tools help them get unblocked faster.
Peer pressure. Friends and peers in tech, and cofounders at startups, recommended them
Hype. Non-stop news about AI played a role in influencing software engineers to check out how the new technology works.
Pressure to keep up. Not wanting to fall behind in the industry, while seeing others use AI tools.
Curiosity. Discovering how the tech can help with their work, or how much better (or worse) it works compared to their existing workflow (especially versus Googling when solving problems, or using Stack Overflow)
An interesting detail for us is that company mandates and pushes are the single most-cited reasons for starting to use AI tools. It seems these do work – at least for that initial “push” to give the tools a go.