<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Pragmatic Engineer]]></title><description><![CDATA[Big Tech and startups, from the inside. Highly relevant for software engineers, AI engineers and engineering leaders, useful for those working in tech.]]></description><link>https://newsletter.pragmaticengineer.com</link><image><url>https://substackcdn.com/image/fetch/$s_!6TJt!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecbf7ac-260b-423b-8493-26783bf01f06_600x600.png</url><title>The Pragmatic Engineer</title><link>https://newsletter.pragmaticengineer.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 06 May 2026 03:53:41 GMT</lastBuildDate><atom:link href="https://newsletter.pragmaticengineer.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Gergely Orosz]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[pragmaticengineer@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[pragmaticengineer@substack.com]]></itunes:email><itunes:name><![CDATA[Gergely Orosz]]></itunes:name></itunes:owner><itunes:author><![CDATA[Gergely Orosz]]></itunes:author><googleplay:owner><![CDATA[pragmaticengineer@substack.com]]></googleplay:owner><googleplay:email><![CDATA[pragmaticengineer@substack.com]]></googleplay:email><googleplay:author><![CDATA[Gergely Orosz]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Designing Data-Intensive Applications: The Cloud & Doing the Right Thing]]></title><description><![CDATA[How the cloud changes the way we build applications, and why engineers&#8217; ethical choices matter more than ever. Excerpt from the book, &#8216;Designing Data-Intensive Applications&#8217;, 2nd edition]]></description><link>https://newsletter.pragmaticengineer.com/p/designing-data-intensive-applications-book-excerpt</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/designing-data-intensive-applications-book-excerpt</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 05 May 2026 16:46:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!C6W6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In 2016, <a href="https://martin.kleppmann.com/">Martin Kleppmann</a> published <em>&#8216;Designing Data-Intensive Applications&#8217;,</em> which quickly became a go-to book for those of us building backend applications and distributed systems. In it, Martin combined his experience as a startup founder with observations from his time at LinkedIn, and invested years of rigorous, fulltime research in the title.</p><p>Nine years later, he felt the time was ripe for an updated edition, with cloud computing much more widespread than in 2016. So, Martin teamed up with software engineer and investor, <a href="https://cnr.sh/">Chris Riccomini</a>, a former colleague at LinkedIn and the author of <a href="https://nostarch.com/missing-readme">The Missing README</a>, for a full refresh of the book which brings it right up to date for the present day.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C6W6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C6W6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 424w, https://substackcdn.com/image/fetch/$s_!C6W6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 848w, https://substackcdn.com/image/fetch/$s_!C6W6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 1272w, https://substackcdn.com/image/fetch/$s_!C6W6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C6W6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png" width="1310" height="940" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:940,&quot;width&quot;:1310,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C6W6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 424w, https://substackcdn.com/image/fetch/$s_!C6W6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 848w, https://substackcdn.com/image/fetch/$s_!C6W6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 1272w, https://substackcdn.com/image/fetch/$s_!C6W6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09e245ca-cbda-4c91-b38d-36c8074a7800_1310x940.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>My copy of the new edition</em></figcaption></figure></div><p>Martin was recently on The Pragmatic Engineer Podcast, where <a href="https://newsletter.pragmaticengineer.com/p/designing-data-intensive-applications">we discussed</a> this updated volume and many related cloud computing matters. We also looked into some topics that have become less relevant over time, like details on MapReduce.</p><p>I asked Martin if this newsletter could share an excerpt of the updated edition of the book about a timeless, important topic, and he generously agreed. So, today we cover:</p><ol><li><p>Cloud versus self-hosting tradeoffs</p></li><li><p>Doing the right thing as a software engineer</p></li></ol><p>These excerpts are only part of the book; the first edition has been on my shelf for years and is now in well-worn condition. I jumped at the chance to get the second edition, and if you&#8217;re interested in building resilient systems, I recommend it as an excellent resource.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/&quot;,&quot;text&quot;:&quot;Get the second edition of the book&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/"><span>Get the second edition of the book</span></a></p><p><em>My usual disclaimer: as with all my recommendations, I was not paid for this article, and none of the links are affiliates. See <a href="https://blog.pragmaticengineer.com/ethics-statement/">my ethics statement</a> for more.</em></p><div><hr></div><p><em>The excerpt below is from &#8220;<a href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/">Designing Data-Intensive Applications&#8221; second edition</a>, by Martin Kleppmann and Chris Riccomini. Copyright &#169; 2026 Martin Kleppmann, Chris Riccomini. Published by O&#8217;Reilly Media, Inc. Used with permission.</em></p><h2>1. Cloud versus self-hosting tradeoffs</h2><p><em>This excerpt is from Chapter 1: &#8220;Trade-Offs in Data Systems Architecture&#8221;</em></p><p>For anything that an organization needs to do, one of the first questions is whether it should be done in-house or outsourced. That is, should you build or should you buy?</p><p>Ultimately, this is a question about business priorities. A common rule of thumb is that things that are a core competency or a competitive advantage of your organization should be done in-house, whereas things that are non-core, routine, or commonplace should be left to a vendor [<a href="https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47e0">20</a>]. To give an extreme example, most companies do not fabricate their own CPUs, since it is cheaper to buy them from the semiconductor manufacturers.</p><p>With software, two important decisions to be made are who builds the software and who deploys it. The spectrum of possibilities is illustrated in Figure 1-2. At one extreme is bespoke software that you write and run in-house; at the other extreme are widely-used cloud services or SaaS products that are implemented and operated by an external vendor and that you access only through a web interface or API.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZkjW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZkjW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 424w, https://substackcdn.com/image/fetch/$s_!ZkjW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 848w, https://substackcdn.com/image/fetch/$s_!ZkjW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 1272w, https://substackcdn.com/image/fetch/$s_!ZkjW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZkjW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png" width="1456" height="273" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:273,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZkjW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 424w, https://substackcdn.com/image/fetch/$s_!ZkjW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 848w, https://substackcdn.com/image/fetch/$s_!ZkjW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 1272w, https://substackcdn.com/image/fetch/$s_!ZkjW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc75be7d6-2d6d-4564-a598-cfb501bbcc84_2048x384.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><em>Figure 1-2. The spectrum of decisions on outsourcing software and its operations</em></figcaption></figure></div><p>The middle ground is off-the-shelf software (open source or commercial) that you self-host, or deploy yourself &#8211; for example, if you download MySQL and install it on a server you control. This could be on your own hardware (often called &#8216;on-premises,&#8217; even if the server is in a rented datacenter rack and not literally on your own premises), or on a virtual machine (VM) in the cloud (<em>infrastructure as a service</em>, or IaaS). There are more points along this spectrum, such as taking open source software and running a modified version of it.</p><p>A related question is how you deploy services, either in the cloud or on premises &#8211; for example, whether you use an orchestration framework such as Kubernetes. However, choice of deployment tooling is beyond the scope of this book, since other factors have a greater influence on the architecture of data systems.</p><h3>Pros &amp; Cons of Cloud Services</h3><p>Using a cloud service, rather than running comparable software yourself, essentially outsources the operation of that software to the cloud provider. There are good arguments for and against this approach. Cloud providers claim that using their services saves time and money and allows you to move faster compared to setting up your own infrastructure.</p><p>Whether using a cloud service is actually cheaper and easier than self-hosting depends very much on your skills and the workload on your systems, however. If you already have experience of setting up and operating the systems you need, and if your load is quite predictable (i.e., the number of machines you need does not fluctuate wildly), then it&#8217;s often cheaper to buy your own machines and run the software on them yourself [<a href="https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47e0">21</a>, <a href="https://specbranch.com/posts/one-big-server/">22</a>].</p><p>On the other hand, if you need a system that you don&#8217;t already know how to deploy and operate, adopting a cloud service is often easier and quicker than learning to manage the system. Hiring and training staff specifically to maintain and operate the system can get very expensive. You still need an operations team when you&#8217;re using the cloud, but outsourcing the basic system administration can free up your team to focus on higher-level concerns.</p><p>Outsourcing the operation of a system to a company that specializes in running it can potentially result in better service, since the provider gains operational expertise from providing the service to many customers. On the other hand, if you run the service, you can configure and tune it to perform well on your particular workload. A cloud service would likely be unwilling to make such customizations on your behalf.</p><p>Cloud services are particularly valuable if the load on your systems varies a lot over time. If you provision your machines to be able to handle peak load, but those computing resources are idle most of the time, the system becomes less cost-effective. In this situation, cloud services have the advantage that they can make it easier to scale your computing resources up or down in response to changes in demand.</p><p>For example, analytical systems often have extremely variable load. Running a large analytical query quickly requires a lot of computing resources in parallel, but once the query completes, those resources sit idle until a user makes the next query. Predefined queries (e.g., for daily reports) can be enqueued and scheduled to smooth out the load, but for interactive queries, the faster you want them to complete, the more variable the workload becomes. If your dataset is so large that querying it quickly requires significant computing resources, using the cloud can save money as you can return unused resources to the provider rather than leaving them idle. For smaller datasets, this difference is less significant.</p><p>The biggest downside of a cloud service is that you have no control over it:</p><ul><li><p>If it is lacking a feature you need, all you can do is politely ask the vendor whether they will add it; you generally cannot implement it yourself.</p></li><li><p>If the service goes down, all you can do is wait for it to recover.</p></li><li><p>If you are using the service in a way that triggers a bug or causes performance problems, diagnosing the issue will be difficult. With software that you run yourself, you can get performance metrics and debugging information from the operating system to help you understand its behavior, and you can look at the server logs. With a service hosted by a vendor, you usually do not have access to these internals.</p></li><li><p>If the service shuts down or becomes unacceptably expensive, or if the vendor changes their product in a way you don&#8217;t like, you are at their mercy; continuing to run an old version of the software is usually not an option, so you&#8217;ll be forced to migrate to an alternative service [23]. This risk is mitigated if alternative services expose a compatible API, but for many cloud services there are no standard APIs, which raises the cost of switching, making vendor lock-in a problem.</p></li><li><p>If the cloud provider is in another country and a political conflict arises between that country and your own, you risk being locked out of the service due to imposed sanctions.</p></li><li><p>The cloud provider needs to be trusted to keep the data secure, which can complicate the process of complying with privacy and security regulations.</p></li></ul><p>Despite all these risks, it has become more and more popular for organizations to build new applications on top of cloud services, or to adopt a hybrid approach in which cloud services are used for some aspects of a system. However, cloud services will not subsume all in-house data systems. Many older systems predate the cloud, and for any services that have specialist requirements that existing cloud services cannot meet, in-house systems remain necessary. For example, very latency-sensitive applications such as high-frequency trading require full control of the hardware.</p><h3>Cloud-Native System Architecture</h3><p>Besides having a different economic model (subscribing to a service instead of buying hardware and licensing software to run on it), the rise of the cloud has also had a profound effect on how data systems are implemented on a technical level. The term &#8220;cloud native&#8221; is used to describe an architecture that is designed to take advantage of cloud services.</p><p>In principle, almost any software that you can self-host could also be provided as a cloud service, and indeed, such managed services are now available for many popular data systems. However, systems that have been designed from the ground up to be cloud native have been shown to have several advantages: better performance on the same hardware, faster recovery from failures, being able to quickly scale computing resources to match the load, and supporting larger datasets [<a href="https://media.amazonwebservices.com/blog/2017/aurora-design-considerations-paper.pdf">24</a>, <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2019/05/socrates.pdf">25</a>, <a href="https://www.usenix.org/system/files/nsdi20-paper-vuppalapati.pdf">26</a>]. Table 1-2 lists some examples of both types of systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GWaK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GWaK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 424w, https://substackcdn.com/image/fetch/$s_!GWaK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 848w, https://substackcdn.com/image/fetch/$s_!GWaK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 1272w, https://substackcdn.com/image/fetch/$s_!GWaK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GWaK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png" width="1268" height="378" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:378,&quot;width&quot;:1268,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GWaK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 424w, https://substackcdn.com/image/fetch/$s_!GWaK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 848w, https://substackcdn.com/image/fetch/$s_!GWaK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 1272w, https://substackcdn.com/image/fetch/$s_!GWaK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78c9fb84-7b57-4970-9dd3-64032ba79950_1268x378.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Table 1-2. Examples of self-hosted and cloud-native database systems</em></figcaption></figure></div><h4>Layering of cloud services</h4><p>Many self-hosted data systems have simple system requirements; they run on a conventional operating system such as Linux or Windows, they store their data as files on the filesystem, and they communicate via standard network protocols such as TCP/IP. A few systems depend on special hardware such as GPUs (for ML) or remote direct memory access (RDMA) network interfaces, but on the whole, self-hosted software tends to use generic computing resources: CPUs, RAM, a filesystem, and an IP network.</p><p>In a cloud, this type of software can be run in an IaaS environment, using one or more VMs (or instances) with a certain allocation of CPUs, memory, disk, and network bandwidth. Compared to physical machines, cloud instances can be provisioned faster and come in a greater variety of sizes, but otherwise they are similar to traditional computers: you can run any software you like on them, but you are responsible for administering it yourself.</p><p>In contrast, the key idea of cloud-native services is not only to use the computing resources managed by your operating system, but also to build upon lower-level cloud services to create higher-level services. For example:</p><ul><li><p>Object storage services such as Amazon S3, Azure Blob Storage, and Cloudflare R2 store large files. They provide more limited APIs than a typical filesystem (basic file reads and writes), but they have the advantage that they hide the underlying physical machines; the service automatically distributes the data across many machines so that you don&#8217;t have to worry about running out of disk space on any one machine. Even if some machines or their disks fail entirely, no data is lost.</p></li><li><p>Many other services are, in turn, built upon object storage and other cloud services. For instance, Snowflake is a cloud-based analytical database (data warehouse) that relies on S3 for data storage [<a href="https://www.usenix.org/system/files/nsdi20-paper-vuppalapati.pdf">26</a>], and some other services, in turn, build upon Snowflake.</p></li></ul><p>As always with abstractions in computing, there is no one right answer to what you should use. As a general rule, higher-level abstractions tend to be more oriented toward particular use cases. If your needs match the situations for which a higher-level system is designed, using the existing higher-level system will probably meet your needs with much less hassle than building it yourself from lower-level systems would. On the other hand, if no high-level system meets your needs, building it yourself from lower-level components is the only option.</p><h4>Separation of storage and compute</h4><p>In traditional computing, disk storage is regarded as durable (we assume that once something is written to disk, it will not be lost). To tolerate the failure of an individual hard disk, RAID (redundant array of independent disks) is often used to maintain copies of the data on several disks attached to the same machine. RAID can be implemented either in hardware or in software by the operating system, and it is transparent to the applications accessing the filesystem.</p><p>In the cloud, compute instances (VMs) may also have local disks attached, but cloud-native systems typically treat these disks more like an ephemeral cache and less like long-term storage. This is because the local disk becomes inaccessible if the associated instance fails, or if the instance is replaced with a bigger or a smaller one (on a different physical machine) to adapt to changes in load.</p><p>As an alternative to local disks, cloud services also offer virtual disk storage that can be detached from one instance and attached to a different one (e.g., Amazon EBS, Azure managed disks, and persistent disks in Google Cloud). Such a virtual disk is not a physical disk, but rather a cloud service provided by a separate set of machines that emulates the behavior of a disk (a block device, where each block is typically 4 KiB in size). This technology makes it possible to run traditional disk-based software in the cloud, but the block device emulation introduces overheads that can be avoided in systems that are designed from the ground up for the cloud [<a href="https://media.amazonwebservices.com/blog/2017/aurora-design-considerations-paper.pdf">24</a>]. The use of virtual disks also makes the application very sensitive to network glitches, since every I/O operation on the virtual block device is a network call [<a href="https://planetscale.com/blog/the-real-fail-rate-of-ebs">27</a>].</p><p>To address this problem, cloud-native services generally avoid using virtual disks and instead build on dedicated storage services that are optimized for particular workloads. Object storage services such as S3 are designed for long-term storage of fairly large files, ranging from hundreds of kilobytes to several gigabytes in size. The individual rows or values stored in a database are typically much smaller than this; cloud databases therefore typically manage smaller values in a separate service and store larger data blocks (containing many individual values) in an object store [<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2019/05/socrates.pdf">25</a>, <a href="https://blog.colinbreck.com/predicting-the-future-of-distributed-systems/">28</a>].</p><p>In traditional systems architecture, the same computer is responsible for both storage (disk) and computation (CPU and RAM), but in cloud-native systems, these two responsibilities have become somewhat separated, or disaggregated [<a href="https://dl.acm.org/doi/abs/10.1145/3514221.3526055">9</a>, <a href="https://www.usenix.org/system/files/nsdi20-paper-vuppalapati.pdf">26</a>, <a href="https://www.thenile.dev/blog/storage-compute-separation">29</a>, <a href="https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-intelligent-scalable-storage">30</a>]: for example, S3 only stores files, and if you want to analyze that data, you will have to run the analysis code somewhere outside of S3. This implies transferring the data over the network.</p><p>Furthermore, cloud-native systems are often multitenant, which means that rather than having a separate machine for each customer, data and computation from several customers are handled on the same shared hardware by the same service [<a href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch01.html#Vanlightly2023serverless">31</a>]. Multitenancy can enable better hardware utilization, easier scalability, and easier management by the cloud provider, but it also requires careful engineering to ensure that one customer&#8217;s activity does not affect the performance or security of the system for other customers [<a href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/ch01.html#Jonas2019">32</a>].</p><h3>Operations in the Cloud Era</h3><p>Traditionally, the people managing an organization&#8217;s server-side data infrastructure were known as database administrators (DBAs), or system administrators (sysadmins). More recently, many organizations have tried to integrate the roles of software development and operations into teams with a shared responsibility for both backend services and data infrastructure; the DevOps philosophy has guided this trend. Site reliability engineers (SREs) are Google&#8217;s implementation of this idea [<a href="https://www.oreilly.com/library/view/site-reliability-engineering/9781491929117/">33</a>].</p><p>The role of operations is to ensure that services are reliably delivered to users (including configuring infrastructure and deploying applications) and to ensure a stable production environment (including monitoring and diagnosing any problems that may affect reliability). For self-hosted systems, operations traditionally involve a significant amount of work at the level of individual machines, such as capacity planning (e.g., monitoring available disk space and adding more disks before you run out of space), provisioning new machines, moving services from one machine to another, and installing operating system patches.</p><p>Many cloud services present an API that hides the individual machines implementing the service. For example, cloud storage replaces fixed-size disks with metered billing, where you can store data without planning your capacity needs in advance, and you are then charged based on the space used. Moreover, many cloud services remain highly available, even when individual machines have failed.</p><p>This shift in emphasis from individual machines to services has been accompanied by a change in the role of operations. The high-level goal of providing a reliable service remains the same, but the processes and tools have evolved.</p><p>The DevOps/SRE philosophy places greater emphasis on the following:</p><ul><li><p>Setting up automation; preferring repeatable processes over manual one-off jobs</p></li><li><p>Using ephemeral VMs and services rather than long-running servers</p></li><li><p>Enabling frequent application updates</p></li><li><p>Learning from incidents</p></li><li><p>Preserving the organization&#8217;s knowledge about the system, even as individuals come and go [<a href="https://queue.acm.org/detail.cfm?id=3434773">34</a>]</p></li></ul><p>With the rise of cloud services, a bifurcation of roles has occurred. Operations teams at infrastructure companies specialize in the details of providing a reliable service to a large number of customers, while the customers of the service spend as little time and effort as possible on infrastructure [<a href="https://www.pluralsight.com/resources/blog/cloud/the-future-of-ops-jobs">35</a>].</p><p>Customers of cloud services still require operations, but they focus on different aspects, such as choosing the most appropriate service for a given task, integrating services with each other, and migrating from one service to another. Even though metered billing removes the need for capacity planning in the traditional sense, it&#8217;s still important to know what resources you are using for which purpose so that you don&#8217;t waste money on cloud resources that are not needed. Capacity planning becomes financial planning, and performance optimization becomes cost optimization [<a href="https://medium.com/riskified-technology/over-pay-as-you-go-for-your-datastore-11a29ae49a8b">36</a>]. Additionally, cloud services do have resource limits or quotas (such as the maximum number of processes you can run concurrently), which you need to know about and plan for before you run into them [<a href="https://thenewstack.io/serverless-doesnt-mean-devopsless-or-noops/">37</a>].</p><p>Adopting a cloud service can be easier and quicker than provisioning and running your own infrastructure, although you still have to learn how to use the cloud service and perhaps work around its limitations. Integration among services becomes a particular challenge as a growing number of vendors offer an ever-broader range of cloud services targeting different use cases [<a href="https://erikbern.com/2021/11/30/storm-in-the-stratosphere-how-the-cloud-will-be-reshuffled.html">38</a>, <a href="https://benn.substack.com/p/the-data-os">39</a>]. ETL is only part of the story; operational cloud services also need to be integrated with each other. At present, we lack standards to facilitate this sort of integration, so it often involves significant manual effort.</p><p>Other operational aspects that cannot fully be outsourced to cloud services include maintaining the security of an application and the libraries it uses, managing the interactions between your own services, monitoring the load on your services, and tracking down the cause of problems such as performance degradations or outages. While the cloud is changing the role of operations, the need for operations is as great as ever.</p><h2>2. Doing the right thing as a software engineer</h2><p><em>The excerpt below is a section from Chapter 14, &#8220;Doing the Right Thing&#8221;</em></p><p>In the final chapter of this book, let&#8217;s take a step back. Throughout, we have examined a wide range of architectures for data systems, evaluated their pros and cons, and explored techniques for building reliable, scalable, and maintainable applications. However, we have left out a fundamental part of the discussion, which we should now fill in.</p><p>Every system is built for a purpose; every action we take has both intended and unintended consequences. The purpose may be as simple as making money, but the consequences may be far-reaching. We, the engineers building these systems, have a responsibility to carefully consider those consequences and to ensure that our decisions do not cause harm.</p><p>We talk about data as an abstract thing, but remember that many datasets are about people: their behavior, their interests, their identities. We must treat such data with humanity and respect. Users are humans too, and human dignity is paramount [<a href="https://schmud.de/posts/2024-08-18-data-is-a-bad-idea.html">1</a>].</p><p>Software development increasingly involves making important ethical choices. There are guidelines to help software engineers navigate these issues, such as the ACM Code of Ethics and Professional Conduct [<a href="https://www.acm.org/code-of-ethics">2</a>], but they are rarely discussed, applied, or enforced in practice. As a result, engineers and product managers sometimes take a cavalier attitude to privacy and the potential negative consequences of their products [<a href="https://www.linkedin.com/blog/engineering/archive/making-hard-choices-the-quest-for-ethics-in-machine-learning">3</a>, <a href="https://www.theguardian.com/commentisfree/2015/dec/06/algorithm-writers-should-have-code-of-conduct">4</a>].</p><p>A technology is not good or bad in itself &#8211; what matters is how it is used and how it affects people. This is true of a software system such as a search engine in much the same way as it is for a weapon like a gun. The ethical responsibility is ours to bear; it is not sufficient for software engineers to focus exclusively on the technology and ignore its consequences.</p><p>In contrast to much of computing, however, the concepts at the heart of ethics are not fixed or determinate in their precise meaning; they require interpretation, which may be subjective [<a href="https://cacm.acm.org/opinion/ethical-ai-is-not-about-ai/">5</a>]. What makes something &#8220;good&#8221; or &#8220;bad&#8221; is not well defined, and serious discourse on the subject among computing professionals is lacking [<a href="https://www.benzevgreen.com/wp-content/uploads/2019/11/19-ai4sg.pdf">6</a>]. Reasoning about ethics is difficult, but also too important to ignore. What does this entail? &#8220;Ethics&#8221; are not a checklist with which to comply; it&#8217;s a participatory and iterative process of reflection, in dialogue with people involved and accountability for the results [<a href="https://cacm.acm.org/opinion/ethics-as-a-participatory-and-iterative-process/">7</a>].</p><h3>Predictive Analytics</h3><p>Predictive analytics is a major part of why people are excited about big data and AI. It&#8217;s also an area that is fraught with ethical dilemmas. Using data analysis to predict the weather, or the spread of diseases, is one thing [<a href="https://cacm.acm.org/news/what-happens-when-big-data-blunders/">8</a>]; it is another matter to predict whether a convict is likely to reoffend, whether an applicant for a loan is likely to default, or whether an insurance customer is likely to make expensive claims [<a href="https://www.cl.cam.ac.uk/research/security/seminars/archive/video/2023-03-07-t196231.html">9</a>]. The latter have a direct effect on people&#8217;s lives.</p><p>Naturally, payment networks want to prevent fraudulent transactions, banks want to avoid bad loans, airlines want to avoid hijackings, and companies want to avoid hiring ineffective or untrustworthy people. From their point of view, the cost of a missed business opportunity is low, but the cost of a bad loan or a problematic employee is much higher, so it is expected for organizations to be cautious. If in doubt, they are better off saying &#8220;no&#8221;.</p><p>However, as algorithmic decision making becomes more widespread, someone who has (accurately or falsely) been labeled as risky by an algorithm may suffer a large number of &#8220;no&#8221; decisions. Systematically being excluded from jobs, air travel, insurance coverage, property rental, financial services, and other key aspects of society is such a large constraint on an individual&#8217;s freedom that it has been called &#8220;algorithmic prison&#8221; [<a href="https://www.theatlantic.com/technology/archive/2014/02/welcome-to-algorithmic-prison/283985/">10</a>]. In countries that respect human rights, the criminal justice system presumes innocence until proven guilty; on the other hand, automated systems can systematically and arbitrarily exclude a person from participating in society without any proof of guilt and little chance of appeal.</p><h4>Bias &amp; discrimination</h4><p>Decisions made by an algorithm are not necessarily any better or any worse than those made by a human. Everyone is likely to have biases, even if they actively try to counteract them, and discriminatory practices can become culturally institutionalized. There is hope that basing decisions on data, rather than subjective and instinctive human assessments, could be more fair and give a better chance to people who are often overlooked or disadvantaged in the traditional system [<a href="https://www.theatlantic.com/magazine/archive/2013/12/theyre-watching-you-at-work/354681/">11</a>].</p><p>When we develop predictive analytics and AI systems, we are not merely automating a human&#8217;s decision by using software to specify the rules for when to say &#8220;yes&#8221; or &#8220;no&#8221;; we are leaving the rules themselves to be inferred from data. However, the patterns learned by these systems are opaque: even if the data indicates a correlation, we may not know why. If the input to an algorithm carries a systematic bias, the system will most likely learn and amplify that bias in its output [<a href="https://www.theguardian.com/technology/2016/aug/03/algorithm-racist-human-employers-work">12</a>].</p><p>In many countries, anti-discrimination laws prohibit treating people differently depending on protected traits such as ethnicity, age, gender, sexuality, disability, or beliefs. Other features of a person&#8217;s data may be analyzed, but what happens if they are correlated with protected traits? For example, in racially segregated neighborhoods, a person&#8217;s postal code or even their IP address is a strong predictor of race. Put like this, it seems ridiculous to believe that an algorithm could somehow take biased data as input and produce fair and impartial output from it [<a href="https://www.scientificamerican.com/article/how-a-machine-learns-prejudice/">13</a>, <a href="https://www.ftc.gov/system/files/ftc_gov/pdf/EEOC-CRT-FTC-CFPB-AI-Joint-Statement%28final%29.pdf">14</a>]. Yet this belief often seems to be implied by proponents of data-driven decision making; an attitude that has been satirized as &#8220;machine learning is like money laundering for bias&#8221; [<a href="https://idlewords.com/talks/sase_panel.htm">15</a>].</p><p>Predictive analytics systems merely extrapolate from the past; if the past is discriminatory, they codify and amplify that discrimination [<a href="https://www.zdnet.com/article/artificial-intelligence-in-healthcare-is-racist/">16</a>]. If we want the future to be better than the past, moral imagination is required, and that&#8217;s something only humans can provide [<a href="https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815">17</a>]. Data and models should be our tools, not our masters.</p><h4>Responsibility and Accountability</h4><p>Automated decision-making raises the question of responsibility and accountability [<a href="https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815">17</a>]. If a human makes a mistake, they can be held accountable, and the person affected by the decision can appeal. Algorithms make mistakes too, but who is accountable when they go wrong? [<a href="https://www.nytimes.com/2016/08/01/opinion/make-algorithms-accountable.html">18</a>] When a self-driving car causes an accident, who is responsible? If an automated credit scoring algorithm systematically discriminates against people of a particular race or religion, is there any recourse? If a decision by your ML system comes under judicial review, can you explain to the judge how the algorithm made its decision? People should not be able to evade responsibility by blaming an algorithm.</p><p>Credit rating agencies are a classic example of collecting data to make decisions about people. A bad credit score makes life difficult, but at least a credit score is normally based on relevant facts about a person&#8217;s actual borrowing history, and any errors in the record can be corrected (although the agencies normally do not make this easy). Scoring algorithms based on machine learning, however, typically use a much wider range of inputs and are much more opaque, making it harder to understand how a particular decision has come about and whether someone is being treated in an unfair or discriminatory way [<a href="https://arxiv.org/abs/1606.08813">19</a>].</p><p>A credit score summarizes &#8220;how did you behave in the past?&#8221; whereas predictive analytics usually work on the basis of &#8220;who is similar to you, and how did people like you behave in the past?&#8221; Drawing parallels to others&#8217; behavior implies stereotyping people; for example, based on where they live (a close proxy for race and socioeconomic class). What about people put in the wrong bucket? Furthermore, if a decision is incorrect because of erroneous data, recourse is almost impossible [<a href="https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815">17</a>].</p><p>Much data is statistical in nature, which means that even if the probability distribution on the whole is correct, individual cases may well be wrong. For example, if the average life expectancy in your country is 80 years, that doesn&#8217;t mean you&#8217;re expected to drop dead on your 80th birthday. From the average and the probability distribution, you can&#8217;t say much about the age to which someone will live. Similarly, the output of a prediction system is probabilistic and may well be wrong in individual cases.</p><p>A blind belief in the supremacy of data for making decisions is not only delusional, but also positively dangerous. As data-driven decision making becomes more widespread, we will need to figure out how to avoid reinforcing existing biases, how to make algorithms accountable and transparent, and how to fix them when they inevitably make mistakes.</p><p>We will also need to figure out how to realize the positive potential of data and prevent it from being used to harm people. For example, analytics can reveal financial and social characteristics about personal lives. On the one hand, this power could be used to focus aid and support to help those who need it most. On the other hand, it is sometimes used by predatory businesses seeking to identify vulnerable people and sell them risky products such as high-cost loans or worthless college degrees [<a href="https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815">17</a>, <a href="https://www.commerce.senate.gov/wp-content/uploads/media/doc/12.18.13%20Senate%20Commerce%20Committee%20Report%20on%20Data%20Broker%20Industry.pdf">20</a>].</p><h4>Feedback loops</h4><p>Even with predictive applications with less immediately far-reaching effects on people, such as recommendation systems, there are difficult issues that we must confront. When services become good at predicting the content users want to see, they may end up showing them only opinions they already agree with, leading to echo chambers in which stereotypes, misinformation, and polarization can breed. We already know the impact that social media echo chambers can have on election campaigns.</p><p>When predictive analytics affect people&#8217;s lives, particularly pernicious problems arise because of self-reinforcing feedback loops. For example, consider the case of employers using credit scores to evaluate potential hires. You may be a good worker with a good credit score, but suddenly find yourself in financial difficulties due to a misfortune beyond your control. As you miss payments on your bills, your credit score suffers, and you will be less likely to find work. Joblessness pushes you toward poverty, which further worsens your score, making it even harder to find employment [<a href="https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815">17</a>]. It&#8217;s a downward spiral due to poisonous assumptions, hidden behind a camouflage of mathematical rigor and data.</p><p>As another example of a feedback loop, economists found that when gas stations in Germany introduced algorithmic prices, competition was reduced and prices for consumers went up because the algorithms learned to collude [<a href="https://economics.yale.edu/sites/default/files/clark_acex_jan_2021.pdf">21</a>].</p><p>We can&#8217;t always predict when such feedback loops may happen. However, many consequences can be predicted by thinking about an entire system (not just the computerized parts, but also the people interacting with it), in an approach known as &#8220;systems thinking&#8221; [<a href="https://www.amazon.nl/Thinking-Systems-Primer-Diana-Wright/dp/1844077268">22</a>]. We can try to understand how a data analysis system responds to different behaviors, structures, or characteristics. Does the system reinforce and amplify existing differences between people (e.g., making the rich richer or the poor poorer), or does it try to combat injustice? Even with the best intentions, we must beware of the possibility of unintended consequences.</p><h3>Surveillance</h3><p><em>The excerpt below is from another section in Chapter 14, &#8220;Doing the Right Thing&#8221;</em></p><p>As a thought experiment, try replacing the word &#8220;data&#8221; with &#8220;surveillance&#8221;, and observe whether common phrases still sound so good [<a href="https://x.com/hashbreaker/status/598076230437568512">23</a>]. How about this: &#8220;In our surveillance-driven organization we collect real-time surveillance streams and store them in our surveillance warehouse. Our surveillance scientists use advanced analytics and surveillance processing in order to derive new insights.&#8221;</p><p>This thought experiment is unusually polemical for this book, &#8220;<em>Designing <strong>Surveillance</strong>-Intensive Applications</em>&#8221;, but strong words are needed to emphasize this point. In our attempts to make software &#8220;eat the world&#8221; [<a href="https://a16z.com/why-software-is-eating-the-world/">24</a>], we have built the greatest mass surveillance infrastructure ever seen. We are rapidly approaching a world in which every inhabited space contains at least one internet-connected microphone, in the form of smartphones, smart TVs, voice-controlled assistant devices, baby monitors, and even children&#8217;s toys that use cloud-based speech recognition. Many of these devices have terrible security track records [<a href="https://arstechnica.com/information-technology/2016/01/how-to-search-the-internet-of-things-for-photos-of-sleeping-babies/">25</a>].</p><p>What is new compared to the past is that digitization has made it easy to collect large amounts of data about people. Surveillance of our location and movements, our social relationships and communications, our purchases and payments, and our health data has become almost unavoidable. A surveillance organization may end up knowing more about a person than that person knows about themselves; for example, identifying illnesses or economic problems before that individual is aware of them.</p><p>Even the most totalitarian, repressive regimes of the past could only dream of putting a microphone in every room and forcing every person to constantly carry a device capable of tracking their location and movements. Yet the benefits that we get from digital technology are so great that we now voluntarily accept this state of total surveillance. The difference is just that the data is being collected by corporations to provide us with services, rather than government agencies seeking control [<a href="https://www.schneier.com/books/data-and-goliath">26</a>].</p><p>Not all data collection necessarily qualifies as surveillance, but examining it as such can help us understand our relationship with the data collector. Why are we seemingly happy to accept surveillance by corporations? Perhaps you feel you have nothing to hide; in other words, you are totally in line with existing power structures, you are not a marginalized minority, and you needn&#8217;t fear persecution [<a href="https://grugq.tumblr.com/post/142799983558/nothing-to-hide">27</a>]. Not everyone is so fortunate. Or perhaps it&#8217;s because the purpose seems benign; it&#8217;s not overt coercion and conformance, merely better recommendations and more personalized marketing. However, combined with the discussion of predictive analytics from the last section, that distinction seems less clear.</p><p>We are already seeing behavioral data about driving, tracked by vehicles without drivers&#8217; consent, affecting their insurance premiums [<a href="https://www.ftc.gov/news-events/news/press-releases/2025/01/ftc-takes-action-against-general-motors-sharing-drivers-precise-location-driving-behavior-data">28</a>], and health insurance coverage that depends on people wearing a fitness tracking device. When surveillance is used to make decisions that hold sway over important aspects of life, such as insurance coverage or employment, it starts to appear less benign. Data analysis can also reveal surprisingly intrusive things; for example, the movement sensor in a smartwatch or fitness tracker can be used to work out what you are typing (e.g., passwords) with fairly good accuracy [<a href="https://arxiv.org/abs/1512.05616">29</a>]. Sensor accuracy and algorithms for analysis are only going to get better.</p><h2>Takeaways</h2><p>Thanks to Martin for writing this book, and to himself and Chris for doing a revamp for the second edition. The volume is now even more relevant to how we build systems in 2026 and beyond. You can purchase a hard copy from <a href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/">the publisher&#8217;s website</a> or <a href="https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1098119061">Amazon</a>.</p><p>The first edition has a timeless quality because it focused on the fundamentals of large systems, and the new second edition follows the same approach, as laid out in its preface:</p><blockquote><p>&#8220;Although the landscape of technologies for processing and storing data is diverse and fast-changing, the underlying principles endure. If you understand those principles, you&#8217;re in a position to see where each tool fits in, how to make good use of it, and how to avoid its pitfalls. This book focuses on those principles.&#8221;</p></blockquote><p>Since the first edition appeared nine years ago, some things have changed in the tech industry:</p><ul><li><p><strong>Much greater focus on the cloud. </strong>Building large systems on top of cloud infrastructure is more common. This brings lower complexity as cloud primitives hide a lot of implementation complexity, but it also means accepting more risk because when the cloud is down, so is your system.</p></li><li><p><strong>Systems which AI tools build upon are more relevant. </strong>Vector databases, <a href="https://www.geeksforgeeks.org/python/pandas-create-test-and-train-samples-from-dataframe/">DataFrames</a> (for training datasets), and the processing of large amounts of training data with batch processing systems are relevant to anyone building production AI systems.</p></li><li><p><strong>Local-first software. </strong>Martin focuses on this area in his work, and with AI, we could see more demand for running models locally. Operating systems like Ubuntu are also <a href="https://newsletter.pragmaticengineer.com/i/195753987/4-betting-on-local-first-and-plans-for-agentic-workflows">focusing on this</a>.</p></li><li><p><strong>Formal methods. </strong>The advent of AI-generated code means this topic is getting more attention industry-wide, and the second edition covers it.</p></li><li><p><strong>Regulation and legal context. </strong>Regulations like the EU&#8217;s General Data Protection Regulation (GDPR) are something software engineers increasingly need to know about, and the book now covers it.</p></li></ul><p>If I had to summarize the evolution of the book in its second edition, it would be more focus on cloud and AI, and more on local-first software, testing, and how regulations affect engineers. Interestingly, this mirrors how the tech industry has developed over time, too.</p><p>I very much appreciate that the book closes with the final chapter focused on &#8220;doing the right thing&#8221; as a software engineer. Software systems have wide-ranging societal impact, and engineers working on these systems have a great say in what gets built, and how it gets built. As engineers, we owe it the very least to ourselves to consider the broader impact of our decisions &#8212; and doing so might also force us to make important ethical choices. There&#8217;s less discussion of the ethics angle on software engineering: and I&#8217;m glad that Martin and Chris did not shy away from going deeper into this topic.</p><p>If you&#8217;d like to get more background on the book &#8211; and on the hard parts of building large-scale systems &#8211; check out <a href="https://newsletter.pragmaticengineer.com/p/designing-data-intensive-applications">our podcast episode with Martin Kleppmann.</a></p>]]></content:encoded></item><item><title><![CDATA[The Pulse: AI load breaks GitHub – why not other vendors?]]></title><description><![CDATA[Also: Anthropic&#8217;s speed run to break devs&#8217; goodwill, big price increases from GitHub Copilot, Mitchell Hashimoto on the &#8220;building block economy,&#8221; and more]]></description><link>https://newsletter.pragmaticengineer.com/p/the-pulse-github-breaks</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-pulse-github-breaks</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Thu, 30 Apr 2026 14:23:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mdep!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F218f2a28-6c30-4753-8c43-2e33ce891050_1656x1038.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The Pulse is a series covering events, insights, and trends within Big Tech and startups. Notice an interesting event or trend? Hit reply and share it with me.</em></p><p>Today, we cover:</p><ol><li><p><strong>Load from AI breaks GitHub &#8211; but why not other vendors? </strong>GitHub&#8217;s reliability is less than one nine, and getting worse. Prolific open source contributor, Mitchell Hashimoto, is quitting GitHub because he thinks it&#8217;s not suited for professional work. GitHub&#8217;s leadership blames the 3.5x increase in service load as the cause of degradation &#8211; or it might be self-inflicted.</p></li><li><p><strong>Anthropic&#8217;s speedrun to destroy trust.</strong> Anthropic could do no wrong until recently, but in the past month, that&#8217;s all changed. Silently nerfing Claude Code, banning companies from Claude, and baffling price rises all add to a sense that Anthropic is in its &#8220;extraction&#8221; era of generating more revenue for the same or worse service.</p></li><li><p><strong>Industry pulse. </strong>Dramatic price increases at GitHub Copilot, explosive growth at Codex, Google scrambling to build a good coding model, Cursor might be bought by SpaceX, AI agent deletes car business, and more.</p></li><li><p><strong>Mitchell Hashimoto &amp; the &#8220;building block economy</strong>.<strong>&#8221; </strong>Ghostty&#8217;s creator finds that open source &#8220;building blocks&#8221; are the best way to win massive adoption by software components &#8211; but it&#8217;s got harder to build a business on top of open building blocks.</p></li></ol><p><em>The bottom of this article could be cut off in some email clients. <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-github-breaks">Read the full article uninterrupted, online.</a></em></p><h2>1. Load from AI breaks GitHub &#8211; but why not other vendors?</h2><p>GitHub&#8217;s reliability has been beyond unacceptable recently: last month, third party measurements pinned it at <a href="https://newsletter.pragmaticengineer.com/i/192229275/1-does-github-still-merit-top-git-platform-for-ai-native-development-status">one nine</a> (right at 90%). This month, reliability has been down to <em>zero</em> nines &#8211; 86% &#8211; as per <a href="https://mrshu.github.io/github-statuses/">a third-party tracker</a>, and last week, things got even worse: a frankly embarrassing data integrity incident, more outages, and a partial explanation from GitHub, eventually.</p><h3>Data integrity incident</h3>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-github-breaks">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Building Pi, and what makes self-modifying software so fascinating]]></title><description><![CDATA[Mario Zechner, creator of Pi, joins Armin Ronacher to explore AI coding&#8217;s limits, arguing that human judgment still matters most in an agent-driven world.]]></description><link>https://newsletter.pragmaticengineer.com/p/building-pi-and-what-makes-self-modifying</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/building-pi-and-what-makes-self-modifying</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Wed, 29 Apr 2026 14:30:17 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/195661847/609e90b3dfa49402fb98b56be3c601e9.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3>Stream the latest episode</h3><p><strong>Listen and watch now on <a href="https://youtu.be/n5f51gtuGHE">YouTube</a>, <a href="https://open.spotify.com/episode/1fDw9cSN5Xx6wkgVQLKTHs">Spotify</a>, and <a href="https://podcasts.apple.com/us/podcast/the-pragmatic-engineer/id1769051199">Apple</a>.</strong> See the episode transcript at the top of this page, and timestamps for the episode at the bottom.</p><h3><strong>Brought to You by</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gh57!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" width="800" height="70" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:70,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.pragmaticengineer.com/i/185094534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>&#8226; <strong><a href="http://statsig.com/pragmatic">Statsig</a></strong> &#8211; &#8288; The unified platform for flags, analytics, experiments, and more. Stop switching between different tools, and have them all in one place.</p><p>&#8226; <strong><a href="https://www.sonarsource.com/pragmatic/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-ai&amp;utm_content=podcast-sonar-ai-lp&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Sonar</a> &#8212; </strong>The makers of SonarQube, the industry standard for code verification and automated code review. As AI agents generate extreme volumes of code, verification can&#8217;t be optional: SonarQube acts as the independent, zero&#8209;trust, multi-layered verification engine that checks every line of code against your quality, security, and architectural standards, so only safe, reliable, and auditable code reaches production. <a href="https://www.sonarsource.com/plans-and-pricing/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=sq-download&amp;utm_content=podcast-sonar-verification&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Try it out for yourself</a>.</p><p>&#8226; <strong><a href="https://workos.com/">WorkOS</a></strong> &#8211; Designing large systems is about tradeoffs. But one thing isn&#8217;t a tradeoff: enterprise features. WorkOS gives you APIs to ship enterprise features &#8211; SSO, directory sync, RBAC, audit logs &#8211; in days, not months. Visit <a href="http://workos.com">WorkOS.com</a> to learn more.</p><h3><strong>In this episode</strong></h3><p>Mario Zechner is the creator of <a href="https://github.com/badlogic/pi-mono">Pi</a>, a minimalist, self-modifying AI coding agent, that is the foundation upon which OpenClaw (created by Peter Steinberger) is built. Meanwhile, Armin Ronacher is the creator of Flask, and a longtime user of Pi. The pair are also friends.</p><p>I sat down with Mario and Armin for the latest episode of the Pragmatic Engineer Podcast for an interesting conversation about AI and their reservations about it &#8211; even though both are heavily invested in building AI-powered tools.</p><p>Mario explains why he built Pi, and gives his take on why it has become so popular. Armin walks us through how he uses AI tools, including building a game with Pi, and why he always puts human judgment firmly at the heart of his approach.</p><p>We cover the risks of over-automation, the limits of agentic workflows, and why strong engineers with informed judgment still matter. We also get into the challenges of working with code written by non-engineers, and whether open source can withstand a tidal wave of agent-generated code.</p><h3>My observations from the conversation with Mario and Armin</h3><p>Here are 9 of my most interesting takeaways from talking with Armin and Mario:</p><div id="youtube2-n5f51gtuGHE" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;n5f51gtuGHE&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/n5f51gtuGHE?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>1. Pi was built because Claude Code became unpredictable. </strong>Mario was a big fan of Claude Code at first. But as the team behind it pushed velocity and added features, he found that bugs multiplied and the tool&#8217;s behavior started to change. Mario wanted an AI harness that behaves in a stable, consistent way. He observed that the addition of new features caused Claude Code to act unpredictably, so resolved to add as few features as possible to Pi.</p><p><strong>2. It should be MUCH easier to build specialized tools for specific tasks. </strong>Different projects need different harness types because, as Mario points out, the same hammer is not ideal for every single construction job. As such, Pi is built with the goal of allowing the creation of specialized harnesses. It can modify itself so that a user can create the bespoke harness needed for any task. Mario believes it&#8217;s a preview of how self-modifiable software might look in the future.</p><p><strong>3. Automation bias is one of the biggest risks of working with AI agents.</strong> Once devs confirm that an AI agent can produce acceptable code, they start to review its output less often, even though agents can &#8211; and do! &#8211; produce slop. Mario advises being far more sceptical with agents, and cautions that the quality of their output isn&#8217;t guaranteed, however well they performed previously.</p><p><strong>4. AI agents decrease code quality, but this is not on purpose. </strong>From talking with 30+ engineering teams, Armin found that code quality is down everywhere, and serious projects are shipping with &#8220;vibe slop.&#8221; A potential cause of this is that keeping agentic output clean and of high quality takes <em>deliberate</em> effort, but it&#8217;s not clear to many devs exactly <em>how </em>to do this. There&#8217;s also PR review fatigue and automation bias (the assumption that AI agents invariably generate good code).</p><p><strong>5. New trend: AI makes it harder for senior engineers to reject pointless complexity. </strong>Historically, senior engineers kept software complexity at bay simply by saying &#8220;no&#8221; a lot. But Armin observes that these days, more junior engineers and product managers deploy agent-scripted counterarguments when a senior colleague kicks an idea to the curb. This makes decision-making exhausting, and more bad ideas make it into production as a result.</p><p><strong>6. Junior engineers &gt; AI agents. </strong>Mario points out that, unlike humans, agents don&#8217;t retain lessons in the same way, nor feel the pain of bad code. Junior engineers do, and the pain of maintenance teaches them to simplify interfaces and avoid bad abstractions &#8211; which are both qualities of an effective senior engineer. In this way, a junior engineer is more valuable than an AI agent!</p><p><strong>7. Agents refactor less because they feel no &#8220;pain.&#8221; </strong>Humans rewrite bad interfaces because maintaining them <em>hurts</em>, whereas agents will obliviously churn out and extend a terrible structure, <em>ad infinitum</em>. This is a big reason why AI agents keep adding more tech debt.</p><p><strong>8. Frictionless shipping can actually be harmful. </strong>Armin notes that some friction is desirable; for example, multi-reviewer approvals on critical services, SLO gates (different gates based on the service level objective offered), and migration checklists. The good thing about friction is that it makes humans stop and think.</p><p><strong>9. Does not being in San Francisco help people stay grounded about AI? </strong>I asked Mario how he keeps level-headed about AI while building one of the most popular AI agent harnesses. In response, he credits living in Austria, being a father, and enjoying the great outdoors, as his antidotes to all the hype.</p><h3><strong>The Pragmatic Engineer deepdives relevant for this episode</strong></h3><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code">The creator of OpenClaw: &#8220;I ship code that I don&#8217;t read&#8221;</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/building-great-sdks">Building great SDKs</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/what-is-inference-engineering">What is inference engineering? Deepdive</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/the-impact-of-ai-on-software-engineers-2026">The impact of AI on software engineers in 2026: key trends</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/cycles-of-disruption-in-the-tech">Cycles of disruption in the tech industry</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/the-ai-engineering-stack">The AI engineering stack</a></p><h3><strong>Timestamps</strong></h3><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE">00:00</a>) Intro</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=450s">07:30</a>) How Mario, Armin, and Peter Steinberger met</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=915s">15:15</a>) How 30 dev teams use AI agents: learnings</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=1310s">21:50</a>) The importance of judgment</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=1466s">24:26</a>) Challenges when non-engineers write code</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=1710s">28:30</a>) Downsides of over-automation</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=1938s">32:18</a>) Pi</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=2889s">48:09</a>) OpenClaw + Pi</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=3054s">50:54</a>) &#8220;Clankers&#8221;</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=3452s">57:32</a>) Open source and AI</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=3622s">1:00:22</a>) Complexity as the enemy</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=3770s">1:02:50</a>) Building an AI-native startup</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=4312s">1:11:52</a>) &#8220;Slow the F down&#8221;</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=4600s">1:16:40</a>) MCPs vs. CLI</p><p>(<a href="https://www.youtube.com/watch?v=n5f51gtuGHE&amp;t=5103s">1:25:03</a>) Predictions and staying up to date</p><h3><strong>References</strong></h3><p><strong>Where to find Mario Zechner:</strong></p><p>&#8226; X: <a href="https://x.com/badlogicgames">https://x.com/badlogicgames</a></p><p>&#8226; LinkedIn: <a href="https://www.linkedin.com/in/mariozechner">https://www.linkedin.com/in/mariozechner</a></p><p>&#8226; Website: <a href="https://mariozechner.at">https://mariozechner.at</a></p><p><strong>Where to find Armin Ronacher:</strong></p><p>&#8226; X: <a href="https://x.com/mitsuhiko">https://x.com/mitsuhiko</a></p><p>&#8226; LinkedIn: <a href="https://www.linkedin.com/in/arminronacher">https://www.linkedin.com/in/arminronacher</a></p><p>&#8226; Website: <a href="https://mitsuhiko.at">https://mitsuhiko.at</a></p><p>&#8226; Blog: <a href="https://lucumr.pocoo.org">https://lucumr.pocoo.org</a></p><p><strong>Mentions during the episode:</strong></p><p>&#8226; Python, Go, Rust, TypeScript and AI with Armin Ronacher: <a href="https://newsletter.pragmaticengineer.com/p/python-go-rust-typescript-and-ai">https://newsletter.pragmaticengineer.com/p/python-go-rust-typescript-and-ai</a></p><p>&#8226; Pi: <a href="https://pi.dev">https://pi.dev</a></p><p>&#8226; OpenClaw: <a href="https://openclaw.ai">https://openclaw.ai</a></p><p>&#8226; Flask: <a href="https://flask.palletsprojects.com/en/stable">https://flask.palletsprojects.com/en/stable</a></p><p>&#8226; The creator of Clawd: &#8220;I ship code that I don&#8217;t read&#8221;: <a href="https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code">https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code</a></p><p>&#8226; Amiga 500: <a href="https://en.wikipedia.org/wiki/Amiga_500">https://en.wikipedia.org/wiki/Amiga_500</a></p><p>&#8226; i486: <a href="https://timeline.intel.com/1989/meet-the-i486">https://timeline.intel.com/1989/meet-the-i486</a></p><p>&#8226; Peter Steinberger on X: <a href="https://x.com/steipete">https://x.com/steipete</a></p><p>&#8226; Sentry: <a href="https://sentry.io">https://sentry.io</a></p><p>&#8226; Nat Friedman on X: <a href="https://x.com/natfriedman">https://x.com/natfriedman</a></p><p>&#8226; Chroma: <a href="https://www.trychroma.com">https://www.trychroma.com</a></p><p>&#8226; Siemens: <a href="https://www.siemens.com">https://www.siemens.com</a></p><p>&#8226; Y Combinator: <a href="https://www.ycombinator.com">https://www.ycombinator.com</a></p><p>&#8226; The Final Bottleneck: <a href="https://lucumr.pocoo.org/2026/2/13/the-final-bottleneck">https://lucumr.pocoo.org/2026/2/13/the-final-bottleneck</a></p><p>&#8226; Children&#8217;s Learning With Tablet Technology is Often Too Passive: <a href="https://news.utexas.edu/2017/08/22/childrens-learning-with-tablet-technology-is-often-passive">https://news.utexas.edu/2017/08/22/childrens-learning-with-tablet-technology-is-often-passive</a></p><p>&#8226; Amp: <a href="https://ampcode.com">https://ampcode.com</a></p><p>&#8226; OpenCode: <a href="https://opencode.ai">https://opencode.ai</a></p><p>&#8226; Agent Design Is Still Hard: <a href="https://lucumr.pocoo.org/2025/11/21/agents-are-hard">https://lucumr.pocoo.org/2025/11/21/agents-are-hard</a></p><p>&#8226; How Linux is built with Greg Kroah-Hartman: <a href="https://newsletter.pragmaticengineer.com/p/how-linux-is-built-with-greg-kroah">https://newsletter.pragmaticengineer.com/p/how-linux-is-built-with-greg-kroah</a></p><p>&#8226; Mario&#8217;s post on X about complexity: </p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/badlogicgames/status/2031128616545747414&quot;,&quot;full_text&quot;:&quot;your biggest enemy is still complexity. it's also your agent's biggest enemy. but it has no holistic view of your code base, so it keeps adding complexity.\n\nand you think that's how it's supposed to be, because the clanker shat it out, and you don't know the stack.\n\nglhf!&quot;,&quot;username&quot;:&quot;badlogicgames&quot;,&quot;name&quot;:&quot;Mario Zechner&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1553485821767991296/87k3l720_normal.jpg&quot;,&quot;date&quot;:&quot;2026-03-09T22:02:53.000Z&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:25,&quot;retweet_count&quot;:35,&quot;like_count&quot;:406,&quot;impression_count&quot;:47354,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>&#8226; VibeTunnel: <a href="https://vibetunnel.sh">https://vibetunnel.sh</a></p><p>&#8226; Thoughts on slowing the F down: <a href="https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down">https://mariozechner.at/posts/2026-03-25-thoughts-on-slowing-the-fuck-down</a></p><p>&#8226; StackOverflow: <a href="https://stackoverflow.com">https://stackoverflow.com</a></p><p>&#8226; David Cramer on LinkedIn: <a href="https://www.linkedin.com/in/dmcramer">https://www.linkedin.com/in/dmcramer</a></p><p>&#8226; Stainless: <a href="https://www.stainless.com">https://www.stainless.com</a></p><p>&#8212;</p><p>Production and marketing by <a href="https://penname.co/">Pen Name</a>. </p><p></p>]]></content:encoded></item><item><title><![CDATA[How will AI change operating systems? Part 1: Ubuntu and Linux]]></title><description><![CDATA[A deepdive with the Canonical team into how AI is changing Ubuntu, why they&#8217;re betting on local-first LLMs, and a look into other Linux distributions]]></description><link>https://newsletter.pragmaticengineer.com/p/ubuntu-and-ai</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/ubuntu-and-ai</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 28 Apr 2026 14:25:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!4X83!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AI is affecting how many of us software engineers build; we&#8217;re prompting more code and producing much more of it. The tools are also adapting, with command-line interfaces gradually becoming more popular than IDEs. But what about operating systems? To find out, I reached out to the leading Linux distribution &#8211; the team at Ubuntu &#8211; and the Windows team, about how AI is changing their operating systems.</p><p>Today&#8217;s article focuses on Linux and Ubuntu, and we&#8217;ll cover Windows in a follow-up issue.<em> Obviously, I reached out to Apple but heard nothing back, unsurprisingly. If you&#8217;re reading this and happen to work at Apple, it&#8217;d be great to learn more!</em></p><p><a href="https://jnsgr.uk/">Jon Seager</a> is VP of Engineering at Canonical &#8211; the company behind Ubuntu &#8211; and has provided new details about what the team there has built for AI support, and some new ideas that they&#8217;re brewing up. Today, we cover:</p><ol><li><p><strong>Hardware enablement: support for GPUs, NPUs and DPUs. </strong>When you turn on a machine with AI accelerators, Ubuntu aims for the hardware to perform at its full potential. This means having proper driver support for PCs and cloud data centers&#8217; computing units.</p></li><li><p><strong>Hardware partnerships. </strong>Working closely with NVIDIA, AMD, and Intel means Ubuntu can support those vendors&#8217; new hardware from release day.</p></li><li><p><strong>CPU architecture variants</strong>. New versions in a CPU family add to, or change, features. An operating system needs to support a new version of the CPU architecture variant in order to fully utilize it. Ubuntu does this for the x86&#8209;64 family, making it a <em>lot</em> more performant on newer CPUs &#8211; while still supporting older CPUs.</p></li><li><p><strong>Local-first bet &amp; plans for agentic workflows</strong>. There&#8217;s a big focus on running local models and using &#8220;inference snaps&#8221; which help choose the right model with the right quantization. There is the intention to support agentic workflows at the OS level, one day, which is currently at the early exploration stage.</p></li><li><p><strong>Developer ecosystem</strong>. There&#8217;s a plan to add more support for AI dev tools, a focus on sandboxing at the OS level, a push to support ARM64 laptops more, and we touch on the popularity of Windows Subsystem for Linux (WSL).</p></li><li><p><strong>Engineering culture. </strong>A skeptical attitude to AI at Canonical has given way to one where experimentation is encouraged and devs<strong> </strong>lean into AI tools, but there are no targets for token usage or amounts of AI-generated code.</p></li><li><p><strong>What other Linux distributions are doing. </strong>Arch Linux takes the &#8220;DIY your AI setup&#8221; approach, Omarchy makes it easy to install AI tools, while Red Hat Enterprise Linux ships with AI integrated into the command-line and support for AI accelerators &amp; popular AI tools.</p></li></ol><p><em>The bottom of this article could be cut off in some email clients. <a href="https://newsletter.pragmaticengineer.com/p/ubuntu-and-ai">Read the full article uninterrupted, online.</a></em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.pragmaticengineer.com/p/ubuntu-and-ai&quot;,&quot;text&quot;:&quot;Read the full article online&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.pragmaticengineer.com/p/ubuntu-and-ai"><span>Read the full article online</span></a></p><h2>1. Hardware enablement: support for GPUs, NPUs &amp; DPUs</h2><p>Jon mentioned he detects a &#8220;Dotcom Boom&#8221;-era vibe in the industry, like around when &#8220;web 1.0&#8221; was created, and indeed, lots of startups today aim to be the Google-style success story of this &#8220;AI era&#8221;. At Canonical, the team asked: what does that mean for Ubuntu as an operating system?</p><p>For instance, should Ubuntu join the competition and try to position itself closer to AI, or keep focusing on what they&#8217;ve done for decades: build an operating system? Jon said:</p><blockquote><p>&#8220;We need to make sure to remain a relatable and accessible system. I don&#8217;t think we should blur the line between application features and the OS itself. So, the most powerful thing we can do is hardware enablement.&#8221;</p></blockquote><p>Hardware enablement means that if a computer (typically, a laptop) has AI-related hardware, Ubuntu should allow it to make full use of it. This involves adding support for GPUs, NPUs, DPUs and other types of accelerator cards. Let&#8217;s briefly go through each.</p><h3>GPUs</h3><p>As is likely widely known by readers, &#8216;GPU&#8217; stands for Graphics Processing Unit. Originally built for graphics rendering, its #1 use case is no longer in video games but for AI training and inference. GPUs come in two forms:</p><ul><li><p>Integrated GPUs: located on the same <a href="https://en.wikipedia.org/wiki/Die_(integrated_circuit)">die</a> (integrated circuit) as the CPU, like GPUs on Apple&#8217;s M-series processors</p></li><li><p>Discrete GPUs: separate chips on their own board; often for gaming, or in standalone GPU rigs for AI and ML workloads</p></li></ul><p>NVIDIA leads the market in discrete GPUs for rigs with its <a href="https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/">Blackwell family</a>, and in standalone GPU cards with the <a href="https://www.nvidia.com/en-us/geforce/rtx/">NVIDIA RTX</a> series. Other vendors like AMD offer GPUs for data centers (like the <a href="https://www.amd.com/en/products/accelerators/instinct/mi300.html">Instinct MI300 Series</a>) and for PCs with the <a href="https://www.amd.com/en/products/graphics/desktops/radeon.html">AMD Radeon</a> series.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!unR1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!unR1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 424w, https://substackcdn.com/image/fetch/$s_!unR1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 848w, https://substackcdn.com/image/fetch/$s_!unR1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 1272w, https://substackcdn.com/image/fetch/$s_!unR1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!unR1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png" width="1440" height="970" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:970,&quot;width&quot;:1440,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!unR1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 424w, https://substackcdn.com/image/fetch/$s_!unR1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 848w, https://substackcdn.com/image/fetch/$s_!unR1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 1272w, https://substackcdn.com/image/fetch/$s_!unR1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7757d200-11b3-4ad0-ac48-3cebcdcf78aa_1440x970.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Hands full: NVIDIA CEO Jensen Huang with the Blackwell GPU (left) and GB200 superchip. Source: <a href="https://fortune.com/2024/03/19/nvidia-new-blackwell-chip-ai-carbon-footprint-problem/">Forbes</a></em></figcaption></figure></div><h3>NPUs</h3><p>Neural Processing Units (NPUs) are also called &#8220;AI accelerators.&#8221; This is a dedicated block on the System-on-a-chip (SoC), on modern processors especially designed for running <a href="https://newsletter.pragmaticengineer.com/p/what-is-inference-engineering">AI inference</a> efficiently on&#8209;device. Since 2022, many modern processors have had a dedicated NPU block, including all Apple&#8217;s M-series chips (from M1 and up), Intel&#8217;s Core Ultra and Core Ultra &#8220;Series 2&#8221;, AMD&#8217;s Ryzen AI 300 series, and also Qualcomm&#8217;s Snapdragon X Elite and Snapdragon X Plus.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4X83!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4X83!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 424w, https://substackcdn.com/image/fetch/$s_!4X83!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 848w, https://substackcdn.com/image/fetch/$s_!4X83!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 1272w, https://substackcdn.com/image/fetch/$s_!4X83!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4X83!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png" width="1075" height="612" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:612,&quot;width&quot;:1075,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4X83!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 424w, https://substackcdn.com/image/fetch/$s_!4X83!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 848w, https://substackcdn.com/image/fetch/$s_!4X83!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 1272w, https://substackcdn.com/image/fetch/$s_!4X83!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa93367d9-f13e-4630-952d-68caf3c34f4e_1075x612.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>AMD&#8217;s Ryzen AI Pro Series 3000 processors have dedicated NPUs, like most modern laptop processors</em></figcaption></figure></div><p>A number shared for each NPU is TOPS. TOPS means Tera (trillions) of Operations Per Second, and the said operation is a &#8220;multiply-accumulate&#8221; (MAC) one, which <a href="https://www.qualcomm.com/news/onq/2024/04/a-guide-to-ai-tops-and-npu-performance-metrics">Qualcomm describes as:</a></p><blockquote><p>&#8220;A multiply-accumulate (MAC) operation executes the mathematical formulas at the core of AI workloads. A matrix multiply consists of a series of two fundamental operations: multiplication and addition to an accumulator. A MAC unit can, for example, run one of each per clock cycle, meaning it executes two operations per clock cycle. A given NPU has a set number of MAC units that can operate at varying levels of precision, depending on the NPU&#8217;s architecture.&#8221;</p></blockquote><p>How TOPS is calculated: TOPS = 2 &#215; MAC unit count &#215; Frequency / 1 trillion.</p><p>&#8220;Frequency&#8221; refers to the clock speed (cycles per second) at which an NPU and its MAC units (as well as a CPU or GPU) operate, which directly influences overall performance. Processors at higher frequencies allow for more operations, but higher frequencies also mean more energy consumed, heat generated, and battery life decreased. The TOPS number that&#8217;s quoted for processors is generally the peak operating frequency.</p><p>NPUs are often ideal for low-power, local inference, and for running smaller, local models. They can be useful for things like Local speech&#8209;to&#8209;text (dictation, captions, meeting transcription), video background blur/replacement or auto&#8209;framing, small local language summarization, etc. NPUs are more typical of laptop and PC processors, although some phone processors ship with them like the iPhone (A-series chips) and Google&#8217;s Tensor processor in Pixel phones. Basically, NPUs promise to bring efficiently-running local models on laptops one step closer.</p><h3>DPUs</h3><p>Data Processing Units (DPUs) are typically found in data centers, moving massive amounts of data fast. NVIDIA&#8217;s explanation:</p><blockquote><p>&#8220;The CPU is for general-purpose computing, the GPU is for accelerated computing, and the DPU, which moves data around the data center, does data processing.</p><p>A DPU is a new class of programmable processor that combines three key elements. A DPU is a system on a chip, or SoC, that combines:</p><ul><li><p>An industry-standard, high-performance, software-programmable, multi-core CPU, typically based on the widely used Arm architecture, tightly coupled to the other SoC components.</p></li><li><p>A high-performance network interface capable of parsing, processing and efficiently transferring data at line rate, or the speed of the rest of the network, to GPUs and CPUs.</p></li><li><p>A rich set of flexible and programmable acceleration engines that offload and improve applications&#8217; performance for AI and machine learning, zero-trust security, telecommunications, and storage, among others.&#8221;</p></li></ul></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DX7g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DX7g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 424w, https://substackcdn.com/image/fetch/$s_!DX7g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 848w, https://substackcdn.com/image/fetch/$s_!DX7g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!DX7g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DX7g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png" width="1456" height="990" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:990,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DX7g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 424w, https://substackcdn.com/image/fetch/$s_!DX7g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 848w, https://substackcdn.com/image/fetch/$s_!DX7g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!DX7g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb36c2607-8ed3-4da3-8fa9-c93d6dbf890d_1500x1020.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>NVIDIA BlueField-3 DPU</em></figcaption></figure></div><p>Several major chipmakers manufacture DPUs, of which NVIDIA&#8217;s BlueField family is the most widespread. Others include AMD Pensando DPUs (Elba, Giglio), and Intel IPU / DPU cards (E2100, E2200 series).</p><p>DPUs are most commonly deployed inside Hyperscale cloud providers (AWS, Azure, GCP, OCI), or in AI and high-performance computing (HPC) data centers, or larger private clouds. DPUs make sense when GPU traffic is huge, or when the network telemetry overhead is so great that it could overwhelm the CPUs processing the data transfer.</p><h2>2. Hardware partnerships</h2><p>It&#8217;s easiest to add support to hardware by working with leading chip manufacturers, so Ubuntu has relationships with hardware vendors for that reason. As a result, the OS sometimes offers day-one support for cutting-edge AI supercomputers.</p><h3>Partnership with NVIDIA</h3><p>In September 2025, Canonical announced it would package and distribute the full NVIDIA CUDA toolkit directly within Ubuntu&#8217;s repositories. This deal collapsed into a single standard <a href="https://linuxize.com/post/how-to-use-apt-command/">apt</a> install, something that had previously been a multi-step manual installation process of downloading from NVIDIA&#8217;s site, importing GPG keys, pinning a separate APT repo &#8211; and praying nothing broke.</p><p>Packaging and distributing the CUDA toolkit makes developing with CUDA easier. From Jon:</p><blockquote><p>&#8220;One of the trickiest things for developers who have to use this tech is the dance of matching the right version of Python, with the right version of CUDA, with the right driver. Projects end up with different versions of CUDA, and then machines end up breaking because the driver configuration gets inadvertently broken along the way.</p><p>The number one thing we can do as an operating system is to make this setup as easy as possible.&#8221;</p></blockquote><p>Ubuntu&#8217;s strategy of working directly with chipmakers seems to be working. NVIDIA recently discontinued its custom NVIDIA DGX OS &#8212; a modified Ubuntu it maintained for years &#8212; and now ships plain Ubuntu. Jon:</p><blockquote><p>&#8220;Previously, NVIDIA shipped NVIDIA DGX OS for which NVIDIA had an agreement with Canonical where they could take Ubuntu, modify it with the kernel modules and software they needed, do some product-specific optimization, and ship that as NVIDIA DGX OS.</p><p>This more recent development sees NVIDIA just shipping Ubuntu as it comes.</p><p>When NVIDIA released the DGX Spark, a $4,000 AI workstation with an ARM64 chipset, it shipped running vanilla Ubuntu as the only supported operating system.&#8221;</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4om0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4om0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 424w, https://substackcdn.com/image/fetch/$s_!4om0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 848w, https://substackcdn.com/image/fetch/$s_!4om0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!4om0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4om0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png" width="1456" height="923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:923,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4om0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 424w, https://substackcdn.com/image/fetch/$s_!4om0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 848w, https://substackcdn.com/image/fetch/$s_!4om0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 1272w, https://substackcdn.com/image/fetch/$s_!4om0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87eb9b54-c572-4157-8593-a5e8798bd0cc_2048x1298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>NVIDIA DGX Spark AI supercomputer: one of several NVIDIA DGX servers powered by NVIDIA&#8217;s DGX OS</em></figcaption></figure></div><p>At CES 2026 in January, Canonical <a href="https://canonical.com/blog/nvidia-vera-rubin-ubuntu-support">announced</a> Ubuntu support for the NVIDIA Vera Rubin NVL72 rack-scale architecture, with day-one platform readiness in Ubuntu, version <a href="https://documentation.ubuntu.com/release-notes/26.04/">26.04 LTS</a> (Long-Term Support: at least 15 years for enterprise customers).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ORn3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ORn3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!ORn3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!ORn3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!ORn3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ORn3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png" width="1200" height="675" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:675,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ORn3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 424w, https://substackcdn.com/image/fetch/$s_!ORn3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 848w, https://substackcdn.com/image/fetch/$s_!ORn3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 1272w, https://substackcdn.com/image/fetch/$s_!ORn3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74d5121a-c63e-48eb-8d00-83acb326d458_1200x675.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>The NVIDIA Vera Rubin NVL72 rack</em></figcaption></figure></div><h3>AMD and Intel</h3><p>It&#8217;s clear Ubuntu and NVIDIA enjoy a strong partnership, but Canonical aims to remain neutral, Jon says:</p><blockquote><p>&#8220;We have an amazing partnership with NVIDIA, but we do the same with Intel, the same with AMD, the same with Qualcomm, and the same with MediaTek because in reality there is hardware being released every day, and if we don&#8217;t maintain those partnerships, the ecosystem becomes even more fragmented than it already naturally is.&#8221;</p></blockquote><p>Last December, Ubuntu announced native support for AMD ROCm, and also ships with Intel&#8217;s OpenVINO toolkit. Ubuntu 26.04 LTS will be the first major distribution to natively package all three GPU compute stacks &#8212; NVIDIA, AMD, and Intel &#8212; with long-term enterprise support. Under Ubuntu Pro, ROCm LTS releases receive up to 15 years of security maintenance.</p><p><em>Security maintenance means that if vulnerabilities or critical incompatibilities are discovered in an LTS version, Canonical will patch them even if the upstream vendor no longer supports those versions and no longer backports security patches.</em></p><p>AMD Instinct accelerators are gaining traction in HPCs and sovereign AI deployments, as enterprises look for alternatives to CUDA-locked hardware. AMD&#8217;s SVP and Chief Software Officer, Andrej Zdravkovic, said the partnership would make it &#8220;easier for developers and enterprises to deploy AMD solutions on supported systems.&#8221;</p><p><strong>Chip vendors want to collaborate because it means less work for them to add operating system-level support.</strong> Jon:</p><blockquote><p>&#8220;It&#8217;s a win-win on both ends. Silicon companies are in the business of building the best chips they can, and partnering with Canonical means they have to concentrate on fewer things which are not their core focus. My hope is that partnering with Canonical helps them to focus on what they&#8217;re best at, while enabling us to help with what we&#8217;re best at: integrating, shipping and maintaining a Linux distribution.&#8221;</p></blockquote><h2>3. Architecture variants</h2><p>Modern x86 processors support multiple instruction set generations: x86_64 v1, v2, v3, v4, and v5. ARM has a similar hierarchy. Each generation adds capabilities, such as AVX-512 instructions that accelerate machine learning workloads.</p><p>Let&#8217;s take the x86_64 instruction set. The instruction set is versioned. These are the versions:</p><ul><li><p>For x86_64: v1, v2, v3, v4, v5&#8230;</p></li><li><p>For ARM: ARM v8.2, v8.3, v9&#8230;</p></li></ul><p><strong>Until recently, Ubuntu ran slower on newer CPUs in order to keep supporting older ones. </strong>So, when installing Ubuntu compiled for AMD64, the OS supported architecture variants for AMD64 v1.</p><p>Supporting v1 has the advantage that the oldest of AMD64 processors can run this Ubuntu version. But if Ubuntu decided to support v2 instructions, then v1 processors could not run the OS! The OS did not use the new instructions; for example, a modern processor with hardware accelerators like AVX-512, didn&#8217;t use them.</p><p><strong>Canonical has reworked its build infrastructure to produce binaries with </strong><em><strong>specific</strong></em><strong> architecture variant support.</strong> So, in the case of running an x86_64 v3 compatible processor, you can download an Ubuntu OS variant that&#8217;s compiled specifically for x86_64 v3.</p><p>One tradeoff the Ubuntu team had to make was building binaries several times, which takes up more processing time and storage at their end. Then again, the Ubuntu team doing this once means that users don&#8217;t need to do recompilation, which made it an easy tradeoff, Jon told me.</p><p>Now, Ubuntu supports x86_64 v3 as an architecture variant and plans to do more. Jon says:</p><blockquote><p>&#8220;Today, we&#8217;ve released x86_64 v3 as a variant, but the capability in our build and delivery pipelines unlocks the ability to add variants for the next RISC-V RVA versions, for ARMv9, ARMv10, ARMv11 and so on.</p><p>We will start now onboarding variants to make sure that when you go and buy your latest Snapdragon laptop, your operating system and all of the parts of it are using the silicon to its fullest.&#8221;</p></blockquote><p><strong>Adding support for architecture variants was a significant undertaking. </strong>Jon explains:</p><blockquote><p>&#8220;This work was especially complex because combined with having the hardware physically available in the build farm, Canonical also needed to make the build scheduler aware, and thread the capability through the build systems of Debian packages, Snaps, OCI images, virtual machine images, etc. As it stands, the capability exists for Debian packages, and support for further package types will land shortly.</p><p>In addition to the build infrastructure, work needed to be done on downstream package managers (apt, snap, &#8230;) and schedulers to ensure they pull the right version of packages, and consideration needs to be given to what happens if a VM containing x86_64 v3 code ends up trying to boot on v1 hardware, and so on.&#8221;</p></blockquote><h2>4. Betting on local-first &amp; plans for agentic workflows</h2><p>If you&#8217;ve tried to run an LLM locally on your machine, you&#8217;ll know it comes with friction. Jon:</p>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/ubuntu-and-ai">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Pulse: AI token spending out of control – what’s next?]]></title><description><![CDATA[Details from 15 tech companies on the rapid growth of token spend, and their responses to it. Also: AI vendors can&#8217;t keep up with demand, plummeting morale at Meta, and more.]]></description><link>https://newsletter.pragmaticengineer.com/p/the-pulse-ai-token-spending-out-of</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-pulse-ai-token-spending-out-of</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Thu, 23 Apr 2026 16:51:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RLFW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66a69a14-0903-4f04-a1d4-13222f40c4ee_1834x1074.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hello from Florida &#8211; today and tomorrow, I&#8217;m at React Miami. I&#8217;ve always wanted to attend this conference, and finally made it happen. If you&#8217;re around, say hi!</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kpwS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kpwS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 424w, https://substackcdn.com/image/fetch/$s_!kpwS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 848w, https://substackcdn.com/image/fetch/$s_!kpwS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!kpwS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kpwS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png" width="1456" height="1315" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b122385b-0692-46ab-9650-dd7901513149_1488x1344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1315,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kpwS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 424w, https://substackcdn.com/image/fetch/$s_!kpwS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 848w, https://substackcdn.com/image/fetch/$s_!kpwS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!kpwS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb122385b-0692-46ab-9650-dd7901513149_1488x1344.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">(L-R): Myself, NeetCode founder, Navdeep Singh, &amp; YouTuber &amp; Twitch streamer, ThePrimeagen at React Miami</figcaption></figure></div><p>Let&#8217;s get to today&#8217;s topics:</p><ol><li><p><strong>New trend: token spend breaks budgets &#8211; what next? </strong>In the past 2-3 months, spending on AI agents has exploded at many tech companies, and the ramifications of this are starting to dawn on engineering leaders. We&#8217;ve sourced details from 15 companies, including the different ways they are coping with this realization.</p></li><li><p><strong>New trend: more AI vendors can&#8217;t keep up with demand. </strong>Related to massively increased spending, GitHub Copilot and Anthropic are starting to limit less-profitable individual users, so they can serve business users whose spend has easily 10x&#8217;d in the last few months. The exception is OpenAI and Codex.</p></li><li><p><strong>Morale at Meta hits all-time low? </strong>Business is booming but devs at Meta are furious and worried due to looming layoffs, and an invasive tracking program rolled out to all US employees.</p></li></ol><h2>1. New trend: token spend breaks budgets &#8211; what next?</h2>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-ai-token-spending-out-of">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Designing Data-intensive Applications with Martin Kleppmann]]></title><description><![CDATA[Martin Kleppmann on scaling, his updated Designing Data-Intensive Applications, and what&#8217;s next for AI-era systems.]]></description><link>https://newsletter.pragmaticengineer.com/p/designing-data-intensive-applications</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/designing-data-intensive-applications</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Wed, 22 Apr 2026 16:19:26 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/194990093/b984a6b1c943fb163612882a754d2ac8.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3>Stream the latest episode</h3><p><strong>Listen and watch now on <a href="https://youtu.be/SVOrURyOu_U">YouTube</a>, <a href="https://open.spotify.com/episode/0iJ8NpuQvAeO9Yhp41givL">Spotify</a>, and <a href="https://podcasts.apple.com/us/podcast/designing-data-intensive-applications-with-martin/id1769051199?i=1000763097607">Apple</a>.</strong> See the episode transcript at the top of this page, and timestamps for the episode at the bottom.</p><h3><strong>Brought to You by</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gh57!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" width="800" height="70" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:70,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.pragmaticengineer.com/i/185094534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>&#8226; <strong><a href="http://statsig.com/pragmatic">Statsig</a></strong> &#8211; &#8288; The unified platform for flags, analytics, experiments, and more. Stop switching between different tools, and have them all in one place.</p><p>&#8226; <strong><a href="https://www.sonarsource.com/pragmatic/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-ai&amp;utm_content=podcast-sonar-ai-lp&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Sonar</a></strong> &#8211; The makers of SonarQube, the industry standard for code verification and automated code review. Sonar helps teams close the &#8220;architecture gap&#8221; by preventing code complexity and structural decay. <a href="https://www.sonarsource.com/solutions/architecture/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-sonar-architecture26&amp;utm_content=podcast-sonar-architecture&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Learn how Sonar</a> is empowering the Agent Centric Development Cycle with new architecture management capabilities that ensure both humans and AI agents respect your system&#8217;s blueprint.</p><p>&#8226; <strong><a href="https://workos.com/">WorkOS</a></strong> &#8211; Designing large systems is about tradeoffs. But one thing isn&#8217;t a tradeoff: enterprise features. WorkOS gives you APIs to ship enterprise features &#8211; SSO, directory sync, RBAC, audit logs &#8211; in days, not months. Visit <a href="http://workos.com">WorkOS.com</a> to learn more.</p><h3><strong>In this episode</strong></h3><p><a href="https://martin.kleppmann.com">Martin Kleppmann</a> is a researcher and the author of <a href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/">Designing Data-Intensive Applications</a>, one of the most influential books on modern distributed systems. As of this month, the second, heavily <a href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058/">updated edition of the book is out</a>.</p><p>In this episode of Pragmatic Engineer, we discuss Martin&#8217;s career in tech building startups, how he ended up writing this iconic book, and what he&#8217;s focused on, these days, after moving from industry, into academia.</p><p>We talk about the tradeoffs behind modern infrastructure, how the cloud has changed what it means to scale, and the thinking behind Designing Data-Intensive Applications, including what&#8217;s changing in the second edition.</p><p>Martin reflects on lessons from building startups like Rapportive, which he sold to LinkedIn, and shares how his experience in both academia and industry shaped his perspective.</p><p>We also explore what&#8217;s ahead: why formal verification may become more important in an AI-assisted world, the challenges of building local-first software, and his recent research into using cryptography to improve transparency in supply chains without exposing sensitive data.</p><h3><strong>Key observations from Martin</strong></h3><p>Here are 12 of my most interesting takeaways from talking with Martin:</p><p><strong>1. Seeing Kafka as it was built at LinkedIn heavily shaped the ideas behind the book.</strong> Kafka (a popular event streaming platform) was open-sourced while Martin was at LinkedIn. Seeing this large system up close helped Martin build a mental model of how various data systems fit together, what they have in common, and their fundamental principles.</p><p><strong>2. Martin wrote the book because he wished he had this resource when they were &#8220;drowning&#8221; in design decisions at his startup.</strong> At Rapportive, they hit database performance problems and were searching in the dark, with no idea what to do, because they lacked foundations. Martin wrote the book, so hopefully others won&#8217;t have to learn the fundamentals the hard way that his team did.</p><p><strong>3. Knowing system internals as a superpower for application developers.</strong> Martin maintains that Designing Data-Intensive Applications is not a book for people who build databases or even infrastructure, but it&#8217;s helpful for application developers to develop an intuition for making good design decisions and debugging performance issues they will encounter.</p><p><strong>4. Multi-region and multi-cloud are risk/cost trade-offs, not best practices. </strong>Martin does not believe that there is a &#8220;best practice&#8221; in deciding whether to go multi-region or multi-cloud. This decision is a tradeoff between risk and costs. It&#8217;s a business decision to be made. Designing Data-Intensive Applications gives engineers the vocabulary to articulate the tradeoffs, not to dictate answers.</p><p><strong>5. Scaling </strong><em><strong>down</strong></em><strong> can be as challenging as scaling up</strong>. When talking about scaling systems, most engineers associate this with scaling up. But building a system that can operate efficiently and scale down when there&#8217;s less traffic is an exciting (and challenging) problem as well! Solutions like Serverless are valuable building blocks for scaling down efficiently.</p><p><strong>6. Replication for fault tolerance is more relevant these days than sharding.</strong> Though the book has a full chapter on sharding, Martin said that the cloud has reduced the need for manual sharding for the majority of teams. This is also because machines are increasingly bigger, and more workloads fit on a single machine. Sharding across machines is increasingly a specialist concern; replication for fault tolerance, however, is still relevant at every scale.</p><p><strong>7. MapReduce might be &#8220;dead,&#8221; but it is still worth knowing about.</strong> The second edition of the book cut most MapReduce coverage because Martin observed that, these days, practically nobody uses it: technologies like Spark and Flink have replaced MapReduce. The second edition of the book has a reference to MapReduce purely as a learning tool, for understanding partitioned batch systems.</p><p><strong>8. Distributed systems theory makes deliberately paranoid assumptions: this is on purpose! </strong>The theory assumes that there&#8217;s no upper bound on how long it might take for a message to go over the network:  it might arrive in 100 microseconds or 10 years. Clocks, crashes, and network delays all get similarly worst-case treatment. Occasionally, reality will hit some of these extremes!</p><p><strong>9. An engineer&#8217;s job is increasingly about surfacing risks &#8212; including societal ones &#8212; to decision-makers. </strong>Martin believes that engineers need to articulate tradeoffs in a way that enables business leaders to make informed decisions. These tradeoffs include reputational and societal risks, not just technical ones.</p><p><strong>10. Formal verification was too expensive to use across the industry, and LLMs may change this. </strong>Martin said that he never used formal verification in his time in the industry because it was too time-consuming. Now he sees two things happening at once:</p><ul><li><p>LLMs are producing so much code that human review becomes the bottleneck</p></li><li><p>LLMs are getting good at writing formal proofs as well</p></li></ul><p>Put both together, and we might see more formal verification happening!</p><p><strong>11. Building local-first software has difficult engineering challenges.</strong> Decentralized access control sounds trivial, but it becomes pretty hard without a single server to arbitrate. For example, a revoked user can make a concurrent edit, and different devices will disagree about what happened. Martin is currently working in this problem space.</p><p>&#8203;<strong>12. Industry and academia dismiss each other, and this is not great for either field! </strong>The tech industry calls academia &#8220;theoretical&#8221; and misses useful research. Academia, in turn, often calls industry work just engineering and misses the interesting problems they solve. Martin has worked in both industry and academia, and would like to build better respect in both directions. The best PhD students he works with have a few years of real engineering experience.</p><h3><strong>The Pragmatic Engineer deepdives relevant for this episode</strong></h3><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/bluesky">Building Bluesky: a distributed social network</a> (Martin is an advisor at Bluesky)</p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/uber-move-to-cloud">Inside Uber&#8217;s move to the cloud</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/the-history-of-servers-the-cloud?">The history of servers, the cloud, and what&#8217;s next</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/the-past-and-future-of-backend-practices">The past and future of modern backend practices</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/how-kubernetes-is-built-with-kat">How Kubernetes is built</a></p><h3><strong>Timestamps</strong></h3><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U">00:00</a>) Early career</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=346s">05:46</a>) Building Rapportive</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=647s">10:47</a>) Working at LinkedIn</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=849s">14:09</a>) Writing Designing Data-Intensive Applications</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=1380s">23:00</a>) Reliability, scalability, and repeatability</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=1584s">26:24</a>) DDIA: the second edition</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=1850s">30:50</a>) Tradeoffs of using cloud services</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=2342s">39:02</a>) How the cloud changed scaling</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=2573s">42:53</a>) The trouble with distributed systems</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=2942s">49:02</a>) Ethics for software engineers</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=3165s">52:45</a>) Formal verification</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=3612s">1:00:12</a>) Academia vs. industry</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=3830s">1:03:50</a>) Local-first software</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=4190s">1:09:50</a>) Computer science education</p><p>(<a href="https://www.youtube.com/watch?v=SVOrURyOu_U&amp;t=4712s">1:18:32</a>) Martin&#8217;s current research and advice</p><h3><strong>References</strong></h3><p><strong>Where to find Martin: </strong></p><p>&#8226; LinkedIn: <a href="https://www.linkedin.com/in/martinkleppmann">https://www.linkedin.com/in/martinkleppmann</a></p><p>&#8226; Bluesky: <a href="https://bsky.app/profile/martin.kleppmann.com">https://bsky.app/profile/martin.kleppmann.com</a></p><p>&#8226; Website: <a href="https://martin.kleppmann.com">https://martin.kleppmann.com</a></p><p>&#8226; Distributed Systems lecture series: <a href="https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_HdUFe97RItdiB">https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_HdUFe97RItdiB</a></p><p>&#8226; Designing Data Intensive Applications, 2nd edition: <a href="https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058">https://learning.oreilly.com/library/view/designing-data-intensive-applications/9781098119058</a></p><p><strong>Mentions during the episode:</strong></p><p>&#8226; Selenium: <a href="https://www.selenium.dev">https://www.selenium.dev</a></p><p>&#8226; SauceLabs: <a href="https://saucelabs.com">https://saucelabs.com</a></p><p>&#8226; Rapportive on YC&#8217;s website: <a href="https://www.ycombinator.com/companies/rapportive">https://www.ycombinator.com/companies/rapportive</a></p><p>&#8226; Kafka: <a href="https://engineering.linkedin.com/teams/data/data-infrastructure/streams/kafka">https://engineering.linkedin.com/teams/data/data-infrastructure/streams/kafka</a></p><p>&#8226; The Log: What every software engineer should know about real-time data&#8217;s unifying abstraction: <a href="https://engineering.linkedin.com/teams/data/data-infrastructure/streams/kafka">https://engineering.linkedin.com/teams/data/data-infrastructure/streams/kafka</a></p><p>&#8226; Materialized View: </p><div class="embedded-publication-wrap" data-attrs="{&quot;id&quot;:2070040,&quot;name&quot;:&quot;Materialized View&quot;,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!U8M8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9aa647-ffea-4b83-8a65-2f854d4e5de3_720x720.png&quot;,&quot;base_url&quot;:&quot;https://materializedview.io&quot;,&quot;hero_text&quot;:&quot;Software infrastructure hot takes, projects, papers, developer interviews, and deep dives. Brought to you by Chris Riccomini.&quot;,&quot;author_name&quot;:&quot;Chris&quot;,&quot;show_subscribe&quot;:true,&quot;logo_bg_color&quot;:&quot;#ffffff&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPublicationToDOMWithSubscribe"><div class="embedded-publication show-subscribe"><a class="embedded-publication-link-part" native="true" href="https://materializedview.io?utm_source=substack&amp;utm_campaign=publication_embed&amp;utm_medium=web"><img class="embedded-publication-logo" src="https://substackcdn.com/image/fetch/$s_!U8M8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a9aa647-ffea-4b83-8a65-2f854d4e5de3_720x720.png" width="56" height="56" style="background-color: rgb(255, 255, 255);"><span class="embedded-publication-name">Materialized View</span><div class="embedded-publication-hero-text">Software infrastructure hot takes, projects, papers, developer interviews, and deep dives. Brought to you by Chris Riccomini.</div><div class="embedded-publication-author-name">By Chris</div></a><form class="embedded-publication-subscribe" method="GET" action="https://materializedview.io/subscribe?"><input type="hidden" name="source" value="publication-embed"><input type="hidden" name="autoSubmit" value="true"><input type="email" class="email-input" name="email" placeholder="Type your email..."><input type="submit" class="button primary" value="Subscribe"></form></div></div><p>&#8226; The Missing README: A Guide for the New Software Engineer: <a href="https://www.amazon.com/Missing-README-Guide-Software-Engineer/dp/1718501838">https://www.amazon.com/Missing-README-Guide-Software-Engineer/dp/1718501838</a></p><p>&#8226; How AWS S3 is built: <a href="https://newsletter.pragmaticengineer.com/p/how-aws-s3-is-built">https://newsletter.pragmaticengineer.com/p/how-aws-s3-is-built</a></p><p>&#8226; MapReduce: <a href="https://en.wikipedia.org/wiki/MapReduce">https://en.wikipedia.org/wiki/MapReduce</a></p><p>&#8226; Prediction: AI will make formal verification go mainstream: <a href="https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html">https://martin.kleppmann.com/2025/12/08/ai-formal-verification.html</a></p><p>&#8226; Isabelle proof assistant: <a href="https://isabelle.in.tum.de">https://isabelle.in.tum.de</a></p><p>&#8226; Rocq: <a href="https://rocq-prover.org">https://rocq-prover.org</a></p><p>&#8226; Lean: <a href="https://lean-lang.org">https://lean-lang.org</a></p><p>&#8226; TLA+: <a href="https://github.com/tlaplus">https://github.com/tlaplus</a></p><p>&#8226; FizzBee: <a href="https://fizzbee.io">https://fizzbee.io</a></p><p>&#8226; Local-First Software: You Own Your Data, in spite of the Cloud: <a href="https://martin.kleppmann.com/papers/local-first.pdf">https://martin.kleppmann.com/papers/local-first.pdf</a></p><p>&#8226; How AI assistance impacts the formation of coding skills: <a href="https://www.anthropic.com/research/AI-assistance-coding-skills">https://www.anthropic.com/research/AI-assistance-coding-skills</a></p><p>&#8226; Cryptography: <a href="https://en.wikipedia.org/wiki/Cryptography">https://en.wikipedia.org/wiki/Cryptography</a></p><p>&#8212;</p><p>Production and marketing by <a href="https://penname.co/">Pen Name</a>. </p><p></p>]]></content:encoded></item><item><title><![CDATA[Learnings from conducting ~1,000 interviews at Amazon]]></title><description><![CDATA[Steve Huynh, formerly Principal Engineer at Amazon, shares observations from 10+ years of interviewing software engineers, and an excerpt from his new book, Technical Behavioral Interview]]></description><link>https://newsletter.pragmaticengineer.com/p/learnings-from-conducting-1000-interviews</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/learnings-from-conducting-1000-interviews</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 21 Apr 2026 12:49:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_3W5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Steve Huynh, formerly Principal Engineer at Amazon, shares observations from Bar Raiser, and an excerpt from his new book, Technical Behavioral Interview</em></p><p>Tech interviews have two parts: the technical interview &#8211; with a focus on things like coding, software architecture, problem solving &#8211; and the behavioral part &#8211; with a focus on past experience, and the situations that show you&#8217;d be a good fit at the company you&#8217;re interviewing with, along with things like attitude, motivation, culture fit. Technical interviews are going through a big change, thanks to AI tools: some companies are bringing in new, AI-assisted types of interviews, while others are trying to make &#8220;pre-AI&#8221; type interviews work.</p><p>What doesn&#8217;t seem to be changing is the second type of interviews: the behavioral ones. I&#8217;ve found the topic of behavioral interviews from a software engineer&#8217;s perspective somewhat under-discussed &#8211; even though this interview carries huge weight in securing an offer and what level you come in at. No matter how strong your technical skills are, especially at mid-sized and larger companies, you are unlikely to get an offer if you are deemed to not be a fit for what the company is looking for.</p><p>Steve Huynh was an engineer at Amazon for 17 years &#8211; I previously did a podcast episode with him on the reality of being a principal engineer at Amazon. During this time, Steve conducted nearly 1,000 interviews, of which around 600 were Bar Raiser ones. <em>Bar Raiser interviews are unique to Amazon: it&#8217;s an interview conducted by someone outside of the hiring team, with the goal of ensuring that the new hire raises the company&#8217;s talent bar.</em></p><p>After leaving the e-commerce giant, Steve spent 2 years researching and writing the book <a href="https://www.amazon.com/dp/1548441708">Technical Behavioral Interview: An Insider&#8217;s Guide</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_3W5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_3W5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_3W5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_3W5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_3W5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_3W5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_3W5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_3W5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_3W5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_3W5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc215ccd0-2cd3-4ab3-93a5-9ba11a7ba196_2048x1536.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>My copy of <a href="https://www.amazon.com/dp/1548441708">Technical Behavioral Interview: an Insider&#8217;s Guide</a></em></figcaption></figure></div><p>Today, we cover two topics on interviews and behavioral interviews:</p><p><strong>1. Learnings from conducting ~1,000 behavioral interviews at Amazon</strong>. Steve reflects of major observations from his 17 years at Amazon, covering:</p><ul><li><p>You&#8217;re over-prepared for one interview and unprepared for the other</p></li><li><p>How you deliver the story matters as much as the story itself</p></li><li><p>The interview is an audition for what it&#8217;s like to work with you</p></li></ul><p><strong>2. What companies are looking for during behavioral interviews</strong>. An excerpt from Steve&#8217;s new book, Technical Behavioral Interview, covering ~75% of a full chapter of the book (out of the 14 total chapters.) We get into:</p><ul><li><p>Understanding fit: role and company</p></li><li><p>The four dimensions that determine your level</p></li><li><p>What each level looks like</p></li><li><p>Reading and calibrating your own level</p></li><li><p>Researching what companies really value</p></li></ul><p><em>Longtime readers might remember Steve from my podcast with him a year back: <a href="https://newsletter.pragmaticengineer.com/p/what-is-a-principal-engineer-at-amazon">What is a Principal Engineer at Amazon? With Steve Huynh</a></em></p><p><em>My usual disclaimer: as with all my recommendations, I was not paid for this article, and none of the links are affiliates. See <a href="https://blog.pragmaticengineer.com/ethics-statement/">my ethics statement</a> for more.</em></p><p>With this, it&#8217;s over to Steve:</p><div><hr></div><h2>1. Learnings from conducting ~1,000 behavioral interviews at Amazon</h2><p>A Bar Raiser is a specially trained interviewer whose job is to ensure that every hire raises the average talent level at Amazon. I had veto power over any candidate. I sat on nearly a thousand interview loops across every level from intern to Principal Engineer.</p><p>After 50 or so interviews as a Bar Raiser, the patterns became impossible to miss. And this was the biggest one:</p><p>The candidates who didn&#8217;t get offers seldom failed because they lacked technical skill. <strong>They failed because of how they presented themselves.</strong></p><p>For sure, technical preparation is crucial, and I&#8217;m not telling you to skip it. But most candidates have massive blind spots when it comes to non-technical matters, which is a big problem. Why? Because that blind spot is where most hiring decisions are made.</p><p>The Bar Raiser who trained me put it this way:</p><p>Technical skills are the ante. They get you into the game. But they&#8217;re not what wins you the hand.</p><p>I didn&#8217;t fully appreciate what that meant until I&#8217;d seen candidates who were technically very strong get rejected because of everything else.</p><p>Think about it. By the time you&#8217;re sitting in a final round of interviews, you&#8217;ve already passed at least one technical screen or take-home assignment. The company already knows you could probably do the job. They already know you want to work with them.</p><p>But that&#8217;s not what the final round is for.</p><p><strong>The final round is when the team figures out whether they want to work with you.</strong> Being technically proficient is part of it, but it&#8217;s not all of it. Can you explain your thinking clearly when you&#8217;re stumped? How do you handle it when things go wrong? Can they picture you in a design review or in a tough conversation with a partner team?</p><p>Fit.</p><p>Fit is what decides most hiring outcomes, yet it&#8217;s the thing most candidates spend the least time preparing for. After nearly a thousand interviews, I can tell you exactly where the gap is and how you can close it.</p><h3>Learning #1: You&#8217;re over-prepared for one interview and unprepared for the other</h3><p>The average candidate preparing for a tech interview probably spends 95% of their time on technical preparation and 5% on everything else. Some spend literally zero on everything else.</p><p>I get why. Technical preparation feels concrete. You can grind coding problems and measure your progress. You can study system design patterns and feel yourself getting sharper. There&#8217;s a clear input/output relationship. Do more problems, get better at problems.</p><p>For most technical interviews, even if you haven&#8217;t seen the exact problem before, you can still do a decent job. It&#8217;s simply not possible to prepare for every problem, so it&#8217;s expected that you can reason through an unfamiliar coding question and pick up on hints the interviewer gives you. You can work through a system design problem by applying fundamentals you already know. It&#8217;s expected that you will encounter new questions during an interview, so it isn&#8217;t fatal if you&#8217;re a competent engineer who can think on your feet.</p><p>However, the non-technical rounds are the opposite. You cannot wing them and expect to do well. When an interviewer says, &#8220;Tell me about a time something went wrong on a project and how you handled it&#8221; and you haven&#8217;t thought about that question before, there is no hint they can give you. There&#8217;s no reasoning your way through it in real time. You either have a prepared story ready to go, or you&#8217;re going to mumble your way through a word salad while the interviewer watches.</p><p>I&#8217;ve seen this play out hundreds of times. A candidate would crush the coding round, then I would ask them about a difficult decision they made, and they would fall apart. They would pick a half-remembered example, start rambling, backtrack to add context they forgot, in the process losing track of the question. Then, five minutes later, they would land on something like, &#8220;So, yeah, it worked out in the end.&#8221;</p><p>These candidates were often strong coders, but that didn&#8217;t matter. At the debriefs, the feedback was always some version of &#8220;I couldn&#8217;t get a concrete answer about their experience. Every story was vague and unconvincing.&#8221; We couldn&#8217;t extend an offer when a candidate couldn&#8217;t articulate how they worked.</p><p>The technical bar was met, but the hiring decision was made in the behavioral round.</p><p>Here&#8217;s what&#8217;s frustrating about this. Non-technical preparation takes a fraction of the time for technical.</p><p>If you&#8217;re going to spend 80 to 100 hours preparing for an interview cycle, spending a single weekend on your stories might be the highest-leverage investment you make.</p><p>Ten hours of story prep can completely change the outcome of your behavioral rounds. Meanwhile, your 80th hour of LeetCode will give you almost nothing you didn&#8217;t already have at 60.</p><p>The returns on technical prep diminish rapidly. The returns on story prep are exponential because almost nobody does it at all.</p><p><strong>What to do:</strong> How are you currently splitting your interview prep time? If it&#8217;s 99% technical and 1% everything else, you&#8217;re over-indexed on the part with diminishing returns and under-indexed on the part where hiring decisions get made. You don&#8217;t need to cut your technical prep dramatically. Just reallocate. If you&#8217;re planning to spend 80 hours preparing, take 10 of those hours and move them to non-technical preparation. That reallocation will do more for your odds than 10 more hours working on practice problems.</p><h3>Learning #2: How you deliver the story matters as much as the story itself</h3><p>You can have the most impressive accomplishment of your career ready for your interview and completely waste it with bad delivery. The most common version of this is what I call the &#8220;ramble and stumble.&#8221;</p><p>The candidate starts talking, and you genuinely can&#8217;t tell if they&#8217;re figuring out the story as they go or if they&#8217;ve simply never said these words out loud before. Or they might give you five minutes of context and then still backtrack to add details they forgot. By the time they reach the outcome, you&#8217;ve lost track of how you got there.</p><p>Here&#8217;s something that&#8217;s always struck me as odd. If you had a big presentation at work, you&#8217;d spend hours preparing for it, right?. You&#8217;d think about the structure, the flow, the key points. You&#8217;d rehearse it. You might even do a couple of dry runs with a colleague. Nobody wants to walk into a presentation and wing it.</p><p>But in a job interview, where the stakes are arguably higher than any single presentation you&#8217;ll ever give? People wing those constantly. They walk in having never practiced their stories out loud. They might have thought about them, but they&#8217;ve never spoken the words, heard how they sound, or timed how long they take. Then they&#8217;re surprised when the words come out as a mess.</p><p>Think about any other high-stakes skill. You wouldn&#8217;t expect to be good at golf without practicing at the driving range. You wouldn&#8217;t expect to give a great keynote the first time you stepped on stage. Nobody calls a musician fake for rehearsing before a concert.</p><p>But for some reason, many people feel that preparing interview stories is inauthentic. As if it&#8217;s cheating somehow. As if the &#8220;real&#8221; version of you is the one that stumbles through an unrehearsed answer under pressure.</p><p>It&#8217;s not. The real you communicates clearly what you&#8217;ve done and what you&#8217;re capable of.</p><p><strong>What to do: </strong>Good delivery doesn&#8217;t require a lot of charisma or natural presentation skills, but it does require practice. Start with the two questions that come up in virtually every interview: &#8220;Tell me about yourself&#8221; and &#8220;Why do you want to work here?&#8221; Write down your answers. Then record yourself delivering them. Watch the recording and take notes. Where did you ramble? Where did you fill space with filler words? Did you look nervous? Then do it again. And again. Keep going until you watch the recording back and think &#8220;That sounds like someone I&#8217;d like to work with.&#8221;</p><p>Once those two are solid, pick stories from your career and do the same thing. This process will be uncomfortable at first. Most people hate watching themselves on camera. Do it anyway. Thirty minutes of this will up-level your interview performance much more than 20 hours of coding exercises could ever do.</p><h3>Learning #3: The interview is an audition for what it&#8217;s like to work with you</h3><p>Most candidates think the interview is an exam. If you get the right answers, then you&#8217;ll pass the test and get the job. That&#8217;s simply not how it works. Yes, you are being evaluated, and what you say matters. But there is no answer key. The interviewer doesn&#8217;t have a rubric with the &#8220;correct&#8221; responses to which they compare your answers. They&#8217;re forming an impression of you as a person, and that impression is far more nuanced than &#8220;right&#8221; or &#8220;wrong.&#8221;</p><p>By the time you&#8217;re sitting across from the interviewer, you&#8217;ve already jumped through some technical hoops. The company already has evidence from your resume that you can code or design systems at the level they need. That bar has been cleared. The final round goes deeper on the technical side, but it&#8217;s also trying to answer a completely different question: Would we want this person on the team? Would we trust their judgment in a crisis? Would they make our team&#8217;s software better or worse?</p><p>As a Bar Raiser, my specific job was to determine whether a candidate would raise the bar, meaning that they would be better than at least 50% of the people already at the company in that role.</p><p>The thing most people don&#8217;t realize is that the type of coding we asked about in interviews wasn&#8217;t what we did on the job. Nobody was writing algorithms on a whiteboard during their workday. The questions we asked tested problem-solving ability in an artificial environment.</p><p>But the behavioral questions, the soft questions, those tested situations we dealt with every single day. Navigating disagreements, handling projects that were going sideways, influencing without authority, making tradeoffs with incomplete information. These weren&#8217;t hypothetical scenarios pulled out of a textbook. They were just another Tuesday.</p><p>So when I asked a candidate to tell me about a time they had to push back on a stakeholder, I wasn&#8217;t waiting to hear the right answer; I was picturing them in our next planning and prioritization meeting. When they described how they handled a conflict on their team, I was asking myself whether I&#8217;d want to be in that room with them. Every answer was a preview of what it would be like to work alongside that person day to day.</p><p>The candidates who treated it like a test tried to figure out what I wanted to hear and then gave me that answer. That&#8217;s exactly the wrong approach. They gave polished, rehearsed answers with no rough edges and perfect endings where everything worked out and every decision was the right one. I&#8217;d walk out thinking &#8220;I have no idea what it would actually be like to work with this person.&#8221; And when that uncertainty showed up across multiple interviews in the debrief, it almost always turned into a &#8220;No.&#8221;</p><p><strong>What to do: </strong>For each story you&#8217;re preparing, stop thinking about what the interviewer wants to hear. Instead, think about what you&#8217;d want to hear from someone interviewing to join your team. You&#8217;d want to hear how they actually think. You&#8217;d want the real version of what happened, including the parts that were hard and the calls that were close. You&#8217;d want to walk away feeling like you understood what it would be like to work with them on a tough problem. Give your interviewer that same thing. Be honest and let them see how you think. That&#8217;s worth more than any polished answer.</p><h3>What ~1,000 interviews taught me</h3><p>After all those interviews, the lesson I keep coming back to is simple.</p><p><strong>The people who get hired are the ones who can walk into a room and tell a clear story. </strong>This story is about their work and their capabilities, and makes the interviewer think, &#8220;I want to work with that person.&#8221;</p><p>Being able to tell this story is a skill. And like any skill, it gets better with practice. Most people never practice it because they don&#8217;t think of it as something you can prepare for, but you can. And a little preparation here goes further than almost anything else you can do for your career.</p><h2 style="text-align: justify;">2. What companies are looking for during behavioral interviews</h2><p><em>The below are excerpts from Chapter 2 from <a href="https://www.amazon.com/dp/1548441708">Technical Behavioral Interview: An Insider&#8217;s Guide</a>. Some sections have been cut out and lightly edited for this article. Copyright &#169; 2026 Steve Huynh. Used with permission.</em></p><div><hr></div><p style="text-align: justify;">Technical skills alone don&#8217;t determine your offer. Otherwise, those who can solve the coding and system design problems would get the same result. Instead, companies use behavioral interviews to answer two critical questions: <em>Do you fit with both the role and the company?</em> And if you do fit<em>, at what level will you be most effective?</em></p><p style="text-align: justify;"></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H27U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H27U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 424w, https://substackcdn.com/image/fetch/$s_!H27U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 848w, https://substackcdn.com/image/fetch/$s_!H27U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 1272w, https://substackcdn.com/image/fetch/$s_!H27U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H27U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png" width="1398" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1398,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H27U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 424w, https://substackcdn.com/image/fetch/$s_!H27U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 848w, https://substackcdn.com/image/fetch/$s_!H27U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 1272w, https://substackcdn.com/image/fetch/$s_!H27U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b79d6bb-4d5d-44eb-96eb-16014fdee589_1398x918.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Get both right, and you will receive an offer at the appropriate level. Get the fit wrong, and you&#8217;ll be rejected regardless of your skills. Get the level wrong, and you&#8217;ll be either down-leveled or rejected for being underqualified.</p><p style="text-align: justify;">This chapter explains how companies make their assessments of fit and level by analyzing the signals in your stories. Once you understand these dimensions, you&#8217;ll pick better stories and signal the right level.</p><h2 style="text-align: justify;">Understanding Fit: Role and Company</h2><p style="text-align: justify;">The primary consideration for any tech role is whether you have the technical skills to do the job. Companies will assess this mostly through the technical parts of the interview, for example, coding challenges, system design, or whatever technical evaluation matches your role. If you can&#8217;t demonstrate the core technical capability, nothing else matters.</p><p style="text-align: justify;">But technical skills alone don&#8217;t predict success. Companies learned this the hard way by hiring smart people who couldn&#8217;t work effectively in their environment. That&#8217;s why behavioral interviews focus on two additional types of fit:</p><p style="text-align: justify;"><strong>Role Fit:</strong> Can you handle the specific challenges and working conditions of this position? A backend role at a fast-growing startup requires different capabilities than a backend role at an established enterprise. The technical skills might be similar, but the role demands will be different.</p><p style="text-align: justify;"><strong>Company Fit:</strong> Will you thrive in the environment in which this organization operates? This goes beyond surface-level culture. They are assessing whether your working style, decision-making approach, and values match with how the company gets things done.</p><h2 style="text-align: justify;">How Companies Detect Fit Through Signals</h2><p style="text-align: justify;">Companies can&#8217;t directly ask the question, &#8220;Would you fit here?&#8221; What candidate would torpedo their chance of success by answering with a &#8220;No&#8221;? Instead, companies look for signals in your stories that indicate alignment or misalignment.</p><p style="text-align: justify;"><strong>Role Fit Signals</strong> emerge from how you describe handling situations similar to what the role requires:</p><ul><li><p style="text-align: justify;">If the role requires working with ambiguous requirements, do your stories show comfort with uncertainty?</p></li><li><p style="text-align: justify;">If the position involves cross-team coordination, do you show an ability to cope with organizational complexity?</p></li><li><p style="text-align: justify;">If the job needs rapid iteration, do your examples show shipping quickly and adjusting based on feedback?</p></li></ul><p style="text-align: justify;"><strong>Company Fit Signals</strong> come from the choices you made and how you describe them:</p><ul><li><p style="text-align: justify;">A company that values &#8220;bias for action&#8221; looks for stories that show you moving quickly despite incomplete information.</p></li><li><p style="text-align: justify;">An organization that prizes &#8220;customer obsession&#8221; wants to hear examples of you going deep to understand user needs.</p></li><li><p style="text-align: justify;">A place that emphasizes &#8220;radical transparency&#8221; seeks stories that show you sharing information openly, even when you&#8217;re uncomfortable.</p></li></ul><p style="text-align: justify;">The same story can send different signals to different companies. You spending three weeks perfecting a solution might demonstrate attention to quality at one company but analysis paralysis at another. Moving fast and fixing issues later demonstrates good judgment at a growth startup but recklessness at an established healthcare company.</p><h3 style="text-align: justify;">Common &#8220;Mis-Fits&#8221;</h3><p style="text-align: justify;">Even a talented candidate will get rejected sometimes if they are not a good fit. The same behaviors that are positive at one company can signal poor fit at another.</p><p style="text-align: justify;"><strong>Independence vs. Collaboration</strong>: This covers both how you work and how you make decisions. Some companies need people who pick up a problem, run with it, and come back with a solution. Others expect you to bring the team along at every step. These often go together: companies that want you to work solo also tend to want you to make calls on your own, and companies that want collaborative work also want group buy-in on decisions.</p><p style="text-align: justify;">If every story you tell involves going off and building something alone, consensus-driven companies will worry you&#8217;ll steamroll people or make choices that won&#8217;t stick. Flip it around: if every story involves checking with the group before you act, companies that prize individual ownership will wonder whether you can make a decision without a meeting.</p><p style="text-align: justify;"><strong>Speed vs. Thoroughness</strong>: Startups often need rapid experimentation, where you ship MVPs and iterate based on feedback, while companies in healthcare or finance require careful validation before any release. This tension also shows up in how teams think about code quality: some organizations will happily spend extra weeks on clean architecture, while others want a working solution on deadline even if the code needs cleanup later. Whereas stories about methodical testing might bore a startup, your &#8220;ship it and fix it&#8221; examples could terrify a medical device company.</p><p style="text-align: justify;"><strong>Excellence vs. Pragmatism</strong>: Some organizations value technical excellence and clean architecture above all else. Others need pragmatic solutions that ship on deadline even if imperfect. Focusing on perfect code fails at deadline-driven companies, just as accepting technical debt everywhere fails at companies maintaining critical infrastructure.</p><p style="text-align: justify;"><strong>Innovation vs. Stability</strong>: Some roles require creating new solutions and challenging existing approaches, while others need you to maintain and optimize proven systems. If you say that you&#8217;re constantly reinventing established processes, teams that value stability will not consider you a good fit. Conversely, stories that show you only follow existing patterns will disappoint teams that are looking for creative problem-solving</p><p style="text-align: justify;"><strong>Direct vs. Diplomatic</strong>: Some cultures prize radical candor and want you to say exactly what you think. Others value maintaining harmony and face-saving communication. If you are too blunt, you will not fit in well at a relationship-focused company. If you are not direct enough, you will not like working at a company that values &#8220;disagree and commit.&#8221;</p><p style="text-align: justify;"><strong>Data vs. Intuition</strong>: Some companies require data to justify every decision (&#8221;data-driven&#8221; cultures), while others trust experienced judgment and move on gut feel. Showing that you make decisions based on instinct does not impress analytical companies, and telling a company that values experienced judgment that you conduct three A/B tests to choose a button color will get you struck off their list.</p><p style="text-align: justify;"><strong>Specialist vs. Generalist</strong>: Large companies often want deep experts who master one domain, while smaller companies need people who are comfortable wearing multiple hats. Know which sort of company you are walking into.</p><p style="text-align: justify;">Once you understand fit, you can pick stories that match the company and the role.</p><h2 style="text-align: justify;">The Four Dimensions That Determine Your Level</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZC_j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZC_j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 424w, https://substackcdn.com/image/fetch/$s_!ZC_j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 848w, https://substackcdn.com/image/fetch/$s_!ZC_j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 1272w, https://substackcdn.com/image/fetch/$s_!ZC_j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZC_j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png" width="1398" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1398,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZC_j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 424w, https://substackcdn.com/image/fetch/$s_!ZC_j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 848w, https://substackcdn.com/image/fetch/$s_!ZC_j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 1272w, https://substackcdn.com/image/fetch/$s_!ZC_j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17d2f9f7-4e20-440e-bfc8-b158f2668801_1398x918.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Companies assess your level through four dimensions that appear in every story you tell. Each dimension reveals different aspects of your capability. Together, they show the company where you operate most effectively.</p><h3>Scope (Dimension #1)</h3><p>Scope provides a measure of the number of people on your team and, extending outward as you advance, whose work was affected by your actions. The greater the number affected, the higher your level for this dimension.</p><p style="text-align: justify;"><strong>Entry Level:</strong> Your work affects your own productivity and starts to help other team members. For example, you might improve how you handle assigned tasks or fix issues that were slowing down a few teammates.</p><p style="text-align: justify;"><strong>Mid Level:</strong> Your work affects aspects of the team and shapes how it operates. You might redesign a process that changes a significant part of how your team works or solve problems that affect most of the team&#8217;s effectiveness.</p><p style="text-align: justify;"><strong>Senior Level:</strong> Your work directly impacts your entire team and is beginning to influence at least one other team. Perhaps you create solutions that change how your whole team operates and affect workflows in adjacent teams, or you solve problems that require coordination with other groups. You may also start collaborating more closely with product or design partners on your immediate team&#8217;s work.</p><p style="text-align: justify;"><strong>Staff Level:</strong> Your work directly impacts at least two teams and is beginning to have an influence on the broader division or organization. Examples of this include developing technical strategies that change how multiple teams make decisions and solving problems that require buy-in across several parts of engineering. Your influence extends beyond engineering into product, design, and program management as you shape solutions that affect how cross-functional partners work.</p><p style="text-align: justify;"><strong>Principal Level:</strong> Your work affects many teams or changes how large parts of the organization operate. Perhaps you have created technical strategies that have influenced how dozens of teams make decisions. Or you have solved problems that cut across a large engineering organization. At this level, your influence regularly extends into business strategy, shaping decisions alongside product, design, program, and business leadership.</p><h3 style="text-align: justify;">Contribution (Dimension #2)</h3><p>Contribution captures what you did, not what happened around you. It is important to be precise about the line between &#8220;I&#8221; and &#8220;we.&#8221; Companies will expect to see evidence of increasing leadership and ownership as you advance in your career.</p><p><strong>Entry Level:</strong> You execute assigned work and are beginning to take ownership of small pieces. Examples: implementing solutions designed by others; fixing bugs in existing systems; taking full responsibility for well-defined features within larger projects.</p><p style="text-align: justify;"><strong>Mid Level:</strong> You own complete solutions from problem to implementation while also guiding others. Perhaps you have identified issues, designed the approaches, implemented them, and you have verified that they work, and you have helped your teammates understand the reasons for your decisions.</p><p style="text-align: justify;"><strong>Senior Level:</strong> You lead initiatives requiring coordination. You&#8217;re expected to make progress even when the requirements are unclear or the path forward is uncertain. Examples of this include driving technical decisions for your team; mentoring others through complex problems; architecting solutions to be implemented by others; and ensuring quality work outcomes for many people.</p><p style="text-align: justify;"><strong>Staff Level:</strong> You lead cross-team initiatives and establish technical direction, often in situations where the right approach isn&#8217;t obvious and stakeholders have competing priorities. This could look like defining technical approaches that are adopted by multiple teams, creating systems that enable other teams to solve problems on their own, or driving agreement on complex technical decisions across several teams.</p><p style="text-align: justify;"><strong>Principal Level:</strong> You create organizational capabilities and establish new ways of working. At this level, you&#8217;re frequently operating in highly ambiguous environments where you must define the problem before you can solve it. You might define technical standards that guide dozens of teams, build systems that enable others to solve entire classes of problems, or transform how the organization approaches its hardest challenges.</p><h3 style="text-align: justify;">Impact (Dimension #3)</h3><p>Impact shows what changed for the better as a result of your work. Companies want to see that your work produced results worth the investment. Strong stories put numbers on the impact and connect technical wins to business or user outcomes.</p><p style="text-align: justify;"><strong>Entry Level:</strong> You improve your personal productivity and are starting to help the team work better. Examples include reducing the time you spend on repetitive tasks, fixing issues that were slowing down teammates, or improving the quality of code in the areas you touch. Even simple measures matter at this level: time saved or bugs prevented.</p><p style="text-align: justify;"><strong>Mid Level:</strong> You improve team effectiveness in specific areas and influence team-wide practices. Perhaps you reduced deployment times for specific workflows, eliminated categories of bugs in your domain, or you created tools that have made the team more productive in particular areas. You can quantify these improvements and connect them to broader outcomes like feature velocity or reliability.</p><p style="text-align: justify;"><strong>Senior Level:</strong> You transform how your entire team works and are starting to have an impact beyond your team. For example, you might have introduced new workflows that changed your team&#8217;s capabilities. Or perhaps you eliminated major sources of operational problems, or the improvements that you have created have been adopted by adjacent teams. Your impact extends beyond just engineering metrics to product outcomes, user experience, or operational costs.</p><p style="text-align: justify;"><strong>Staff Level</strong>: You improve how multiple teams operate and drive organizational improvements. These sorts of impact come from achievements such as establishing practices that several teams adopt, solving infrastructure problems that were impeding multiple teams, or creating new capabilities that open up new types of work across teams. Your measurable impact can be tied to business metrics like revenue, customer retention, or time-to-market.</p><p style="text-align: justify;"><strong>Principal Level</strong>: You create organizational capabilities and drive strategic changes. Impact at this level could come from establishing technical foundations that dozens of teams use to build upon, solving problems that were blocking major business initiatives, or creating leverage that compounds benefits across the company. Your impact is measured in business outcomes and strategic capability, not just technical improvements.</p><h3 style="text-align: justify;">Difficulty (Dimension #4)</h3><p>Difficulty reflects the complexity of problems you&#8217;ve tackled, the constraints you have faced, and the trade-offs you have managed. Under this category, solving easy problems with big impacts is less impressive than hard problems solved well.</p><p style="text-align: justify;"><strong>Entry Level:</strong> You work on straightforward problems within established patterns. For example, you might face challenges learning new technologies or debugging unfamiliar code, but the path forward becomes clearer once you understand the problem or ask for help.</p><p style="text-align: justify;"><strong>Mid Level:</strong> You work through challenges and obstacles in your work. The problems you tackle have more moving parts and less obvious solutions. These could be competing requirements or having to work through technical complexity you haven&#8217;t seen before. Or perhaps you have had to manage dependencies within your team that affected your timeline or figure out solutions when the approach wasn&#8217;t immediately obvious.</p><p style="text-align: justify;"><strong>Senior Level:</strong> You manage constraints and make technical decisions with team-level architectural implications. The problems you solve involve multiple interacting systems and competing concerns. You might have to balance needs across multiple stakeholders with different priorities. Maybe you make architectural decisions that affect how your whole team works, or you have to work around technical limitations that require creative solutions, or solve problems that require you to address both technical and business factors.</p><p style="text-align: justify;"><strong>Staff Level: </strong>You manage competing trade-offs across multiple teams while handling problems with significant technical and organizational complexity. Examples of difficulty at staff level include:</p><ul><li><p style="text-align: justify;">Balancing different technical approaches when teams have genuinely conflicting needs.</p></li><li><p style="text-align: justify;">Creating solutions that affect how several teams work together.</p></li><li><p style="text-align: justify;">Making architectural decisions that have to work across diverse contexts.</p></li><li><p style="text-align: justify;">Getting teams to agree when the technically optimal solution differs for each team.</p></li></ul><p style="text-align: justify;"><strong>Principal Level:</strong> You handle fundamental trade-offs between competing organizational needs or solve problems where no clear solution exists. The complexity at this level often involves novel problems that lack established patterns or precedents. You might balance technical excellence against delivery speed at organizational scale; work within organizational constraints while maintaining technical integrity; create approaches for entire classes of problems the company hasn&#8217;t solved before; or make decisions that affect company strategy and require executive buy-in.</p><h3 style="text-align: justify;">What Each Level Looks Like</h3><p style="text-align: justify;">Here&#8217;s how the same types of accomplishments look across each level. These aren&#8217;t templates. They&#8217;re meant to help you develop a sense for the difference between a mid-level story and a senior one. Compare adjacent levels and notice what actually changes as you move up and down.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KY-A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KY-A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 424w, https://substackcdn.com/image/fetch/$s_!KY-A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 848w, https://substackcdn.com/image/fetch/$s_!KY-A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 1272w, https://substackcdn.com/image/fetch/$s_!KY-A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KY-A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png" width="1398" height="897" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:897,&quot;width&quot;:1398,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KY-A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 424w, https://substackcdn.com/image/fetch/$s_!KY-A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 848w, https://substackcdn.com/image/fetch/$s_!KY-A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 1272w, https://substackcdn.com/image/fetch/$s_!KY-A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac2b115d-cc3e-4f34-8143-a62b7b1b3eb1_1398x897.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2 style="text-align: justify;">Researching What Companies Really Value</h2><p style="text-align: justify;">You&#8217;ll never have perfect information about what a specific company values, but a little focused research will often reveal surprising insights that most other candidates will miss. The difference between having even partial intelligence and going in blind can be whether or not you emphasize the right things in your stories.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jzl-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jzl-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 424w, https://substackcdn.com/image/fetch/$s_!jzl-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 848w, https://substackcdn.com/image/fetch/$s_!jzl-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 1272w, https://substackcdn.com/image/fetch/$s_!jzl-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jzl-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png" width="1398" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7366e636-8732-48d6-bccc-7e508c635992_1398x918.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1398,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jzl-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 424w, https://substackcdn.com/image/fetch/$s_!jzl-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 848w, https://substackcdn.com/image/fetch/$s_!jzl-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 1272w, https://substackcdn.com/image/fetch/$s_!jzl-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7366e636-8732-48d6-bccc-7e508c635992_1398x918.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3 style="text-align: justify;">Start With Your Recruiter</h3><p style="text-align: justify;">Most candidates treat recruiters as gatekeepers to avoid, but if you do this, you will waste your best source of insider information. Recruiters want you to succeed, because their performance is based on the number of accepted offers received by the candidates they put forward. They have prep materials, they know the interviewers&#8217; focus areas, and they understand what they are looking for.</p><p style="text-align: justify;">Ask your recruiter directly: &#8220;What should I know about this company&#8217;s current challenges?&#8221; Or &#8220;What competencies matter most for this role?&#8221; Or &#8220;Can you share any interview prep materials?&#8221; Many recruiters have documents about interview format, team priorities, or even the specific behavioral competencies they evaluate. The questions that are used as examples in the prep materials have a high likelihood of being asked in the interviews.</p><h3 style="text-align: justify;">Mine Publicly Available Information</h3><p style="text-align: justify;">When companies repeat certain words when describing job opportunities, they&#8217;re telling you what matters. For example, a job posting that mentions &#8220;fast-paced&#8221; several times signals something different than one emphasizing compliance. Those words are there for a reason.</p><p style="text-align: justify;"><strong>Where to dig:</strong></p><ul><li><p style="text-align: justify;"><strong>Engineering blogs:</strong> How do they describe their wins? What problems do they celebrate solving?</p></li><li><p style="text-align: justify;"><strong>Tech talks and conferences:</strong> What topics do their engineers present? Speed of delivery? Scale? Innovation?</p></li><li><p style="text-align: justify;"><strong>Open source contributions:</strong> What they choose to open source reveals their priorities. If they open source developer tools, this suggests they value community. If they are happy to make internal tools public, this shows transparency.</p></li><li><p style="text-align: justify;"><strong>Technical documentation:</strong> The existence of public API docs or technical guides (and the quality thereof) shows how they support both users and their own teams.</p></li><li><p style="text-align: justify;"><strong>Status pages and postmortems:</strong> Companies that publish detailed postmortems demonstrate that they value learning from failure. A company that shares their incident response processes likely has a strong operational culture.</p></li></ul><p style="text-align: justify;">Even companies without engineering blogs will leave traces. Product release patterns tell you about their development pace. Technology choices show their priorities: newer frameworks suggest a focus on innovation, whereas relying on proven technologies indicates they prefer stability.</p><h3 style="text-align: justify;">Look for Patterns in Discussions</h3><p style="text-align: justify;">Glassdoor, Blind, and Reddit contain gold buried amongst rubble. Ignore the rubble (e.g., individual rants). Instead, look for patterns across multiple posts. If five different people mention &#8220;lots of process&#8221; or &#8220;no work-life balance&#8221; or &#8220;amazing learning culture,&#8221; that&#8217;s a pattern you will want to know about.</p><p style="text-align: justify;">Pay attention to what people complain about and what they praise. Complaints about &#8220;too many meetings&#8221; may suggest the company has a collaborative, consensus-driven culture, or, alternatively, that productivity within the company is inhibited by an excessive number of meetings. Praise for &#8220;autonomy&#8221; indicates they trust their people to make decisions without checking in. Both types of comments reveal what behaviors the companies will reward.</p><h3 style="text-align: justify;">Talk to Current Employees</h3><p style="text-align: justify;">If you know someone at the company, ask them directly what behaviors get rewarded and, conversely, what behaviors will cause people to struggle. Skip surface-level queries about culture, and ask specific questions:</p><ul><li><p style="text-align: justify;">&#8220;When someone gets promoted here, what do they do to earn it?&#8221;</p></li><li><p style="text-align: justify;">&#8220;What behaviors get negative feedback?&#8221;</p></li><li><p style="text-align: justify;">&#8220;How does the team make decisions when there&#8217;s disagreement?&#8221;</p></li><li><p style="text-align: justify;">&#8220;What surprised you most about working here?&#8221;</p></li></ul><p style="text-align: justify;">Current employees will tell you truths the company website never would. Perhaps they&#8217;ll tell you that at their company, &#8220;customer obsession&#8221; really means checking usage data before writing code, or that &#8220;ownership&#8221; means being available to resolve production issues at two o&#8217;clock in the morning.</p><h3 style="text-align: justify;">What You&#8217;re Really Looking For</h3><p style="text-align: justify;">All this research serves one purpose: understanding what stories will resonate at your interview. Think of it as finding the real intersection between your experience and what they care about.</p><p style="text-align: justify;">If research reveals they prize speed over perfection, then emphasize stories that tell how you shipped quickly and iterated. If they value technical depth, highlight examples of diving deep to understand root causes. If they care about collaboration, make sure your story focuses on cross-team work rather than solo accomplishments.</p><p style="text-align: justify;">The research will also help you decide whether this company is the right place for you. If everything you learn suggests they value the kinds of behaviors you don&#8217;t naturally demonstrate or don&#8217;t want to develop, then perhaps you don&#8217;t need to pursue that particular role.</p><h3 style="text-align: justify;">Putting It All Together</h3><p style="text-align: justify;">Companies aren&#8217;t just evaluating whether you can do the job. They&#8217;re also assessing whether you&#8217;ll thrive in their specific environment and at what level you&#8217;ll be most effective. These two dimensions determine not just whether you will get an offer, but also whether that offer will position you for success.</p><p style="text-align: justify;">Understanding fit helps you know which of your experiences will connect most with what the company values. This small company needs someone who ships fast and figures things out alone. That enterprise needs someone who navigates processes and builds consensus. Neither is inherently better than the other. They&#8217;re simply different environments that reward different approaches.</p><p style="text-align: justify;">Understanding levels helps you position your stories appropriately. The same project can demonstrate entry-level execution, mid-level ownership, or senior-level leadership depending on your actual contribution and how you frame it. Get this wrong and you will either get rejected for overreaching or down-leveled for not properly communicating your capabilities.</p><p style="text-align: justify;">The payoff is immediate. You&#8217;ll pick better stories, focus on the right details, and make it easier for interviewers to see what you can do. You&#8217;ll make better decisions about which roles actually match who you are and what you want to do. The goal isn&#8217;t to get <em>any</em> offer. The goal is to get the <em>right</em> offer at the <em>right</em> level at the <em>right</em> company to ensure your success.</p><h2 style="text-align: justify;">Takeaways</h2><p><em>Gergely, again. </em>Thanks to Steve for both sharing his learnings, as an interviewer, and for sharing nearly a full chapter from his whole book. The book goes a lot deeper than the above sample chapter. A few of the ones I found helpful:</p><ul><li><p>High-signal storytelling (Chapter 3): a framework for explaining your work in a way that &#8220;sticks&#8221; with the interviewer</p></li><li><p>9 competencies with many examples and stories throughout the book: ones like &#8220;delivery&#8221; (Chapter 6), &#8220;earning trust and dealing with conflict&#8221; (Chapter 8) and &#8220;Strategic leadership and thinking big (Chapter 13)&#8221;</p></li><li><p>Examples of what interviewers typically see as key signals, yellow flags and red flags</p></li></ul><p>If you would like to have a fresh resource to prepare for behavioural interviews at tech companies, the full book offers far more explanations, tactics and exercises to do so:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.amazon.com/dp/1548441708&quot;,&quot;text&quot;:&quot;Get the full book on Amazon&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.amazon.com/dp/1548441708"><span>Get the full book on Amazon</span></a></p><p><em>Steve also writes a newsletter titled <a href="https://alifeengineered.substack.com/about">A Life Engineered</a>: you can sign up to it <a href="https://alifeengineered.substack.com/about">here</a>.</em></p><p><strong>It&#8217;s helpful to understand how and why companies hire, and what they look for. </strong>To us engineers, hiring processes often look illogical from the outside. We&#8217;ll ask things like:</p><ul><li><p>&#8220;Why does the interview process not resemble day-to-day work?&#8221;</p></li><li><p>&#8220;I already have open source code I wrote: why does the company need to do a coding interview to confirm what is clear: that I need to code?&#8221;</p></li><li><p>&#8220;Why did I get a rejection, even though I did well on all of the interviews?&#8221;</p></li></ul><p>It feels to me that there are similarities between hiring and dating: both parties show up with goals and expectations in their head, which are often not communicated. Sometimes there&#8217;s a match; sometimes there is not. This phase of a relationship is often about &#8220;selling:&#8221; as a candidate on the job market, it&#8217;s about selling yourself, and convincing the company that you would be a fit for what they are looking for.</p><p><strong>Doing your research on the company is underrated, and not all that many candidates do so, in my observation. </strong>When I was a hiring manager at Uber, roughly half of the people who got on the call with me did not do <em>any</em> research about the company, and perhaps 1 out of 10 candidates did any research on the team they interviewed for &#8211; when we had public blog posts about our work, on the company blog! So those showing up prepared helped them stand out in the &#8220;motivation&#8221; dimension, from the get go.</p><p><strong>It all starts with being able to pass the &#8220;technical&#8221; interviews &#8211; but it&#8217;s a mistake to sleep on the &#8220;behavioural parts.&#8221; </strong>To state the obvious: candidates who do not do well on the technical interview rounds will not get offers. But I&#8217;ve personally had to say to several candidates who did great on the technical side of things, but turned out to be misaligned with what we were looking for, as confirmed on the behavioral rounds.</p><p>And I do believe you can uplevel in doing better on these behavioral rounds: starting with researching what the company&#8217;s culture is like, practicing how to present yourself better, and putting yourself in the shoes of the interviewers, understanding what they are looking for.</p><p>I know plenty of software engineers who refuse to do any preparation for interviews, staying &#8220;if the company doesn&#8217;t want me as I am, they don&#8217;t deserve me anyway.&#8221; This is a valid strategy, and can work for highly  in-demand professionals, the same way as showing up to a first date in sweatpants and slippers can still work out for highly attractive and desirable people. For the rest of us not as incredibly in-demand for a position we&#8217;re applying for: it&#8217;s probably worth putting in additional effort, in hopes for better outcomes during interviews.</p>]]></content:encoded></item><item><title><![CDATA[The Pulse: ‘Tokenmaxxing’ as a weird new trend]]></title><description><![CDATA[&#8230; which will probably be the shortest-lived trend because it&#8217;s so wasteful. Also: coding AI agent subsidies could be ending, Cal.com going closed source and blaming it on AI, and more.]]></description><link>https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Thu, 16 Apr 2026 16:47:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ts7y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3f3ad311-8920-43d7-b127-df05eae6c00c_1286x1050.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The Pulse is a series covering events, insights, and trends within Big Tech and startups. Notice an interesting event or trend? Hit reply and share it with me.</em></p><p>Today, we cover:</p><ol><li><p><strong>Tokenmaxxing: weird new trend.</strong> At Meta, Microsoft, Salesforce and other large companies, devs are purposefully burning tokens (and money!) to inflate their AI usage and hit AI usage metrics which they treat as targets.</p></li><li><p><strong>Are coding AI-agent subsidies doomed? </strong>At the same time as Anthropic stopped subsidizing enterprise plans, Uber managed to burn through its entire 2026 AI token budget in just 3 months. I expect per-engineer AI budgets to be rolled out across more companies soon.</p></li><li><p><strong>Industry Pulse. </strong>The myth of Claude Mythos, Claude&#8217;s degradation, Cal.com going open source due to AI threat, Vercel open sources its &#8220;agent factories&#8221; tool, sensible AI usage guidelines in the Linux kernel, and more.</p></li><li><p><strong>Cal.com goes closed source &#8211; but is it really because of AI? </strong>The open source Calendly alternative moved a good part of its code to a closed repo, citing AI and security concerns. But perhaps this was just a business model change that would have happened, AI or not.</p></li></ol><h2>1. Tokenmaxxing: weird new trend</h2>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-tokenmaxxing-as-a-weird">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The impact of AI on software engineers in 2026: key trends]]></title><description><![CDATA[Our AI tooling survey finds concerns about mounting AI costs, more engineers hitting usage limits, and AI tools having uneven effects upon different types of engineers]]></description><link>https://newsletter.pragmaticengineer.com/p/the-impact-of-ai-on-software-engineers-2026</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-impact-of-ai-on-software-engineers-2026</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 14 Apr 2026 16:01:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ekej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Recently, we <a href="https://newsletter.pragmaticengineer.com/p/ai-tooling-2026">ran a survey</a> asking readers of The Pragmatic Engineer how you use AI tools, which tools you use, what does and doesn&#8217;t work, and what it&#8217;s like working with AI, in general.</p><p>For today&#8217;s issue, we&#8217;ve dug into your 900+ responses to look for trends in AI tool usage among software engineers and engineering leaders. This article surfaces insights that are less about specific tools, and more about the effect these tools have on tech professionals. We cover:</p><ol><li><p><strong>Costs. </strong>Unsurprisingly, companies pay for most tool usage, and those responsible for budgets are increasingly nervous that AI-related costs are headed only one way: up.</p></li><li><p><strong>Usage limits. </strong>Around 30% of respondents say they have hit limits. Switching tools, upgrading plans, or moving over to API pricing are common responses.</p></li><li><p><strong>Impact on &#8220;Builders.&#8221; </strong>Folks who make larger code changes and do &#8220;quality-of-life&#8221; work are builders, and they&#8217;re also dealing with more AI slop. Some also grapple with a loss of professional identity.</p></li><li><p><strong>AI tools speed up &#8220;Shippers.&#8221; </strong>Engineers who focus more on getting things done are the most positive about AI tools. But they also add tech debt faster and might build the wrong things.</p></li><li><p><strong>&#8220;Coasters:&#8221; learning faster while generating AI slop. </strong>Less adept engineers can uplevel faster with AI, but they generate a lot of &#8220;AI slop&#8221; while doing so, which frustrates builders.</p></li><li><p><strong>Changing software engineer &amp; engineering manager (EM) roles.</strong> Engineers have to orchestrate and context switch more often, while engineering managers can be more hands on. It&#8217;s interesting to see the engineer and manager roles becoming more similar.</p></li><li><p><strong>Other impacts on the craft. </strong>We&#8217;re going from &#8220;how&#8221; to build to &#8220;what&#8221; to build, solo devs are seeing improved results, workloads are increasing with AI tools, and more.</p></li></ol><p>We previously published a detailed summary of the survey which focused on <a href="https://newsletter.pragmaticengineer.com/p/ai-tooling-2026">AI tooling for software engineers</a>, covering the most-used AI tools, trends, AI agent usage, company size and usage, and tools that engineers love.</p><h2>1. Costs</h2><p>Concern about the cost of AI tools is a trend throughout the survey, with around 15% of respondents mentioning it in some way.</p><p><strong>Tech companies foot the bill for the majority of spending on AI tools</strong>. More respondents say their employers pay for AI coding tools than those who say they pay themselves, and predictably, employers fund more expensive packages than what individuals buy personally.</p><p>Companies commonly pay for &#8220;max&#8221; plans with the likes of Claude Code, Cursor, and Codex (around $100-200/month per engineer), although some companies&#8217; budgets only stretch to $20/month per engineer &#8211; around the price point of GitHub Copilot, and the cheapest Claude or ChatGPT subscriptions.</p><p>The most-mentioned AI tool spending patterns:</p><ul><li><p><strong>When companies pay: </strong>~$200/month plans. Many have enterprise subscriptions, sometimes with subsidies and vendor lock-in. Some companies allow usage-based coverage on top of monthly plans.</p></li><li><p><strong>When personally paying for tools: </strong>~$20/month or free tiers. This can stack up across different tools. Around 5% of respondents have separate work and personal subscriptions, and free tier usage is widespread for personal use.</p></li></ul><p>For now, companies seem to be in the experimentation phase with AI tools, and several respondents say that they believe their companies have unsustainable AI-tooling budgets. This is likely because businesses are figuring out the best way of leveraging the tools, and the message to engineers at such places is to not worry about price and usage while that unfolds. A CTO at a small, US-based company shares:</p><blockquote><p><em>&#8220;</em>Right now, we&#8217;re not sweating the costs because we&#8217;re trying to evolve best practices for the tools, but that has resulted in some devs really blowing through budget, so we may start instituting caps on spending.&#8221;</p></blockquote><h3>Breaking the budget</h3><p>At small and mid-sized companies, leadership teams seem more comfortable about going over budget, than engineers running out of budget. There are more accounts from C-level folks and founders about racking up large bills than there are from engineers. A CPTO (Chief Product and Technology Officer) at a mid-sized company:</p><blockquote><p>&#8220;I ran up several monthly bills of $600 with Cursor. We have the dev team subscribed to ~$100/month plans. We&#8217;re now in the process of moving the rest of the team to Claude Code, as we can get more resources for around $100/month in cost.&#8221;</p></blockquote><p><strong>Top spenders can be allocated higher budgets.</strong> A number of tech businesses have separate, larger budgets for their heaviest AI users. A senior C++ engineer working in the video game industry says:</p><blockquote><p>&#8220;I&#8217;ve become my team&#8217;s AI champion. In theory, my limits are higher than normal, but I keep myself limited to what others can use, so I can show them useful things they can do.&#8221;</p></blockquote><p><strong>UK and EU companies worry more about budgets than US-based ones. </strong>Most responses which mention finance teams pushing back against spending even $30-50/month per engineer on AI tools are based in the UK and EU. One amusing example is a 10-person, seed-stage startup where the CEO questioned why they were paying as much as &#163;25/month per engineer for one of the cheapest AI tools around.</p><p>In general, it feels like European companies want to see clear value-add in order to justify an increase in tooling spend, whereas US companies are more comfortable with investing first and measuring impact later. At present, the impact of these tools is hard to quantify.</p><p><strong>A niche approach is that of AI teams educating devs to use cheaper models. </strong>Some European companies go as far as offering education to new joiners on using cheaper models. From an AI Enablement Lead at a 1,000+ person, digital transformation company:</p><blockquote><p>&#8220;Within our organisation, we&#8217;ve had incidents where our Claude users have overshot their limits. We&#8217;re now attempting to educate devs in knowing the difference between different models (knowing when to use Claude Sonnet versus Claude Opus).&#8221;</p></blockquote><h3>Cost trajectory worries</h3><p>The cost trajectory of AI tools is generally considered unsustainable in our survey.<strong> </strong>Devs using the tools heavily tend to hit usage limits, and their employers then have to pay more. At places with API-based pricing, usage is increasing. Those in leadership positions who are responsible for budgets are generally concerned about the direction of costs.</p><p><strong>Subsidies are keeping things at bay &#8211; for now. </strong>A common enough pattern in our survey is of heavily-subsidized enterprise plans that come with vendor lock-in. Several responses raise concerns about what will happen when the subsidies run dry. Experienced engineering leaders recall that cloud providers also played the same game of subsidizing for a few years, then raising prices when a customer was fully &#8220;locked in.&#8221;</p><p><strong>The AI hype cycle is dampening awkward conversations about budgets at some places. </strong>A principal engineer at a Fintech tells us:</p><blockquote><p>&#8220;The AI hype has created a special, generous budget for AI tools, and there&#8217;s no effective budget &#8211; yet!&#8221;</p></blockquote><p><strong>But some finance teams are getting grumpy. </strong>A CTO at a sports-tech company says:</p><blockquote><p>&#8220;It&#8217;s hard to keep our CFO supportive about investing in these tools because the productivity benefits have proven difficult to conclusively prove. The point that resonated the most was the loss of value when people hit daily limits: having to stop work immediately! Surprisingly, our CFO is still pushing back, despite having experience of getting a lot of value through their own AI usage with their spreadsheets.&#8221;</p></blockquote><p><strong>Most survey respondents think the price of AI tools will have to rise</strong>. If that happens, it would cause problems at several companies &#8211; particularly those in Europe:</p><blockquote><p>&#8220;I cannot see how the spend on AI tools is fiscally sustainable in its current form; Max 100 with Claude Code is $100 a month. A single small task powered by Kimi K2.5 using OpenCode is $5, mostly in input cost. If we assume that the third party inference providers are doing so at a sustainable price, the much more expensive Opus model cannot be sustainable, never mind profitable at these plan costs.&#8221; &#8212;<em> Founder at a seed-stage company, Europe.</em></p><p>&#8220;From the economic perspective, at some point, these companies will need more funding or profit, I&#8217;m curious how much it costs them to have a proper agent, and still become profitable. It feels slow when you run out of credits when working on repetitive tasks.&#8221; &#8212; <em>Principal Software Engineer at a seed-stage company, Europe</em></p></blockquote><h2>2. Usage limits</h2><p>Another major trend in our survey results is the topic of usage limits:</p><ul><li><p><strong>Hitting limits: ~30%</strong> of respondents. Running out of tokens or hitting reset limits is frustrating and disruptive, especially when you&#8217;re working on a task or are in a flow state. The majority of respondents who complain about hitting limits are on cheaper plans (typically $20/month.) But this issue is also mentioned about higher subscription levels.</p></li><li><p><strong>Under the limit: ~20%</strong> of respondents. Avoiding usage caps generally correlates with being on more expensive plans with higher limits, or in roles with enough non-coding work for it not to matter, or when devs do enough work &#8220;manually&#8221; for AI usage to not be an issue.</p></li></ul><h3>Why users of AI tools hit limits</h3><p>Common reasons cited in the survey:</p><p><strong>Being a new AI user or a power user. </strong>These are two distinct groups, but an engineering manager at a mid-sized company in Canada says that each one similarly blows through token limits for different reasons:</p><blockquote><p>&#8220;We&#8217;re mindful of trying to manage costs by setting AI spend limits across the org. We have two subsets of users at odds with each other:</p><ol><li><p>Individuals who are still learning and blow through their credits at an inordinate rate, forcing us to keep limits low.</p></li><li><p>Power users who hit the limit through regular use and apply pressure to raise the limit.</p></li></ol><p>It&#8217;s a tough balance.&#8221;</p></blockquote><p><strong>Using Opus for all work. </strong>A few engineers mention being careful about how they use Opus because it previously ate up their token budgets. Here&#8217;s a software engineer at a mid-sized company in Europe:</p><blockquote><p>&#8220;I made the mistake of using Opus in the past and burning through budgets quickly. Now, my routine is to start in &#8216;plan&#8217; mode with Opus. I then paste the acceptance criteria and description of the issue and let the plan mode figure it out. I then switch to Composer or Sonnet and have the agent take over from there.&#8221;</p></blockquote><p><strong>Mistakes that eat up tokens are easy to make. </strong>These include starting on work or a problem from the wrong end, using AI directly for a task rather than opting for a simple script, trying some new tool or technique that ends up consuming tokens (OpenClaw and Ralph Loops are cited), and others.</p><h3>What happens when the limit is hit?</h3><p>Hitting the limit with an AI tool is inconvenient and happens to many developers, who take a variety of next steps:</p><p><strong>Switch the model or tool. </strong>Around a quarter of respondents who hit limits mentioned switching. From a software engineer at working at Atlassian:</p><blockquote><p>&#8220;In my company, for Cursor and Windsurf we have monthly limits. Our internal coding tool (called codelassian) also has daily prompt and hourly token limits. When I hit a limit in one tool, I switch to the other.&#8221;</p></blockquote><p><strong>Upgrade to a pricier plan. </strong>When it&#8217;s an option, this is a no-brainer at most places, especially as the alternative would be devs twiddling their thumbs waiting for the limit to be reset. A senior engineering manager at a mid-sized company says:</p><blockquote><p>&#8220;In my team, we are regularly hitting session limits with Claude. We upgraded some teammates to the Max 20x plan &#8211; and on this plan we have not been hitting limits, so far.&#8221;</p></blockquote><p><strong>Adopt API-based pricing. </strong>This is the easiest way to keep working without abandoning a task you&#8217;re knees-deep in. A senior engineer at a large company says:</p><blockquote><p>&#8220;The company provides both the Claude and Copilot corporate offerings. When the limits are reached, I tend to use API keys that my teammates give me.&#8221;</p></blockquote><h2>3. Impact on &#8220;builders&#8221;</h2><p>We identified three different types of professional in the survey:</p><ul><li><p><strong>Builders</strong>: those who care about quality, good architecture, following good coding practices, and who talk about the craft of software engineering, etc.</p></li><li><p><strong>Shippers</strong>: those who primarily focus on outcomes for a product, features, testing, and experimenting with users. A fair number of leaders, managers, and engineers who were more hands-off with coding before AI tools are in this category, as are product engineers.</p></li><li><p><strong>Coasters</strong>: engineers who are not considered particularly good or great engineers, but they get the work done. They often do this without much taste or concern for quality, and seem to be mostly coasting along and doing what they&#8217;re told.</p></li></ul><p>The overall consensus in our survey results is that AI will amplify and multiply tendencies and patterns that existed before, and the impact of the tools varies accordingly among users. Let&#8217;s start with the impact we&#8217;ve observed upon builders in the responses:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ekej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ekej!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 424w, https://substackcdn.com/image/fetch/$s_!Ekej!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 848w, https://substackcdn.com/image/fetch/$s_!Ekej!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 1272w, https://substackcdn.com/image/fetch/$s_!Ekej!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ekej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png" width="1324" height="758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:1324,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ekej!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 424w, https://substackcdn.com/image/fetch/$s_!Ekej!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 848w, https://substackcdn.com/image/fetch/$s_!Ekej!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 1272w, https://substackcdn.com/image/fetch/$s_!Ekej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F174b9ab2-ef0b-4a40-ba4a-0de330620ff0_1324x758.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>The good and bad of AI tools, as shared by respondents in the Builder archetype</em></figcaption></figure></div><p>Builders say they get value from AI tools in the following areas:</p><p><strong>Larger code changes. </strong>Builders generally find AI helpful for work like:</p><ul><li><p>Refactoring</p></li><li><p>Migrations</p></li><li><p>Improving test coverage</p></li><li><p>Carrying out large codebase changes</p></li></ul><p>All these are changes that are laborious, but not very challenging technically. They also require experience in knowing what you want to do and how to do it.</p><p><strong>Accomplishing &#8220;quality of life&#8221; tasks. </strong>Builders mention that with AI tools, they get to fix and improve things like nagging bugs that otherwise wouldn&#8217;t be &#8220;worth it&#8221; in time invested, but the barrier to entry is lower with AI.</p><p>A good example of this is in last week&#8217;s podcast <a href="https://newsletter.pragmaticengineer.com/p/dhhs-new-way-of-writing-code">with David Heinemeier Hansson (DHH)</a>, the creator of Ruby on Rails, in which he revealed how one of their engineers optimized P1 &#8211; the fastest 1% of web requests:</p><blockquote><p>&#8220;One of our most agent accelerated people asked: &#8220;What about P1? What about the floor? Can we fix the floor?&#8221; He found that the floor [of request speed] was 4 milliseconds.</p><p>Well, 4 milliseconds can add up if you have a bunch of fast requests. So, he just said: &#8220;We&#8217;re going to optimize P1. The fastest 1% of requests, we&#8217;re going to make them even faster.&#8221; He took it from 4 milliseconds to less than half a millisecond. He did this P1 project over a couple of days, like a side project.</p><p>He had an intuition that there was something here. He let agents run with it. The work ended up being 12 pull requests, and about 2,500 lines of code changed.</p><p>This is exactly why the explosion of the pie suddenly lets us look at problems we would never have contemplated looking at before.&#8221;</p></blockquote><p><strong>Typing is no longer a bottleneck. </strong>Some builders report falling even more in love with coding with the help of AI and agents, since physically typing out code is no longer a bottleneck for them. They enjoy being able to prompt. From one &#8220;builder&#8221;:</p><blockquote><p>&#8220;For someone who loves to build &#8211; but also values code quality, performance, reliability, and security &#8211; I ship a lot more quality code faster, if for no other reason than because the AI can read and write 100x faster than me. I get to stay at the conceptual level of shipping a product, and I can dive into debugging with the agent as needed. But if the agent has a good handle on the situation I can give it as much of the tedious parts as I wish.&#8221;<em> &#8211; Staff Engineer, at a large tech company, US</em></p></blockquote><p>The negative sides of AI tools, as experienced by builders:</p><ul><li><p><strong>More AI slop. </strong>Builders seem to be the most overwhelmed and derailed by reviewing a lot more AI-generated code. They can get frustrated with low-quality code shipped by colleagues which could be categorized as &#8220;AI slop.&#8221;</p></li><li><p><strong>More debugging. </strong>AI-generated code introduces bugs and issues, and builders tend to spend the most time debugging and fixing those issues.</p></li><li><p><strong>Identity loss. </strong>Some builder-types report a sense of identity loss and even some grief. Much of this relates to no longer doing hands-on coding because they cannot justify it, since AI agents generate pretty decent code faster than someone can type it.</p></li></ul><h2>4. AI tools speed up &#8220;Shippers&#8221;</h2><p>The &#8220;shipper&#8221; archetype thrives when they get things to production quickly. This group is by far the most enthusiastic about AI tools in survey responses. They are also the ones who praise &#8211; or hype up &#8211; the tools because of their personal experiences of shipping much faster with them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yBgU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yBgU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 424w, https://substackcdn.com/image/fetch/$s_!yBgU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 848w, https://substackcdn.com/image/fetch/$s_!yBgU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 1272w, https://substackcdn.com/image/fetch/$s_!yBgU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yBgU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png" width="1278" height="742" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66546d37-a053-4179-8750-3e7784309ac8_1278x742.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:742,&quot;width&quot;:1278,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yBgU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 424w, https://substackcdn.com/image/fetch/$s_!yBgU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 848w, https://substackcdn.com/image/fetch/$s_!yBgU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 1272w, https://substackcdn.com/image/fetch/$s_!yBgU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66546d37-a053-4179-8750-3e7784309ac8_1278x742.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Good and bad things about AI tools for shippers</em></figcaption></figure></div><p>The biggest upsides mentioned by shippers:</p>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-impact-of-ai-on-software-engineers-2026">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[DHH’s new way of writing code]]></title><description><![CDATA[David Heinemeier Hansson shares why he shifted to an agent-first AI workflow, and what it means for how software is built and who builds it.]]></description><link>https://newsletter.pragmaticengineer.com/p/dhhs-new-way-of-writing-code</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/dhhs-new-way-of-writing-code</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Wed, 08 Apr 2026 17:16:28 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/193375117/e43a05202591d6438766f475d8a031ba.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3>Stream the latest episode</h3><p><strong>Listen and watch now on <a href="https://youtu.be/JiWgKRgdgpI">YouTube</a>, <a href="https://open.spotify.com/episode/3N2FJc9kPkYvK0m2S4ubop">Spotify</a>, and <a href="https://podcasts.apple.com/us/podcast/the-pragmatic-engineer/id1769051199">Apple</a>.</strong> See the episode transcript at the top of this page, and timestamps for the episode at the bottom.</p><h3><strong>Brought to You by</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gh57!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" width="800" height="70" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:70,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.pragmaticengineer.com/i/185094534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><strong>&#8226; <a href="http://statsig.com/pragmatic">Statsig</a></strong> &#8211; &#8288; The unified platform for flags, analytics, experiments, and more. Stop switching between different tools, and have them all in one place.</p><p><strong>&#8226; <a href="https://workos.com/">WorkOS</a></strong> &#8211; The infrastructure B2B and AI-native companies use to sell to enterprise. It covers everything enterprise security requires: SSO, SCIM, RBAC, Audit Logs, AI governance, and more. Engineering teams ship it in days. Trusted by 2,000+ fast-growing companies, including OpenAI, Anthropic, Cursor, and Vercel. <a href="http://workos.com/">WorkOS.com</a></p><p><strong>&#8226; <a href="https://www.sonarsource.com/pragmatic/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-ai&amp;utm_content=podcast-sonar-ai-lp&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Sonar</a></strong> &#8211; The makers of SonarQube, the industry standard for automated code review. <a href="https://www.sonarsource.com/pragmatic/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-ai&amp;utm_content=podcast-sonar-ai-lp&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Sonar</a> helps reduce outages, improve security, and lower risks associated with AI and agentic coding. <a href="https://www.sonarsource.com/products/sonarqube/advanced-security/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-advanced-security&amp;utm_content=podcast-sonarqube-advanced-security&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">See how SonarQube Advanced Security</a> is empowering the Agent Centric Development Cycle (AC/DC) with new capabilities.</p><h3><strong>In this episode</strong></h3><p>David Heinemeier Hansson (DHH) is the creator of Ruby on Rails and Omarchy, co-founder and CTO of <a href="https://37signals.com/">37signals</a> (maker of Basecamp and HEY), and the author of several books including the best-seller, <em><a href="https://www.amazon.com/Remote-Office-Required-Jason-Fried/dp/0091954673">Remote: Office Not Required</a></em>, co-written with <a href="https://world.hey.com/jason">Jason Fried</a>.</p><p>Six months ago, in an episode of <a href="https://lexfridman.com/dhh-david-heinemeier-hansson">the Lex Fridman podcast</a>, David shared how he doesn&#8217;t use AI tools to write code: he types out all his code. But things have changed a lot since then.</p><p>In this episode, we discuss his approach to building software, how it&#8217;s changed in the last six months, and why he now takes an agent-first approach, and how he barely writes any code by hand. We go into how he uses AI agents: which alter how he builds and explores ideas, but also how his standards of quality and craft remain the same.</p><p>We also discuss how 37signals thinks about product development, from the role of designers to the importance of aesthetics and taste. David gets into how he sees beauty and functionality as closely linked, and why strong opinions about design lead to better software.</p><p>Finally, we look into the uneven impact of AI which amplifies senior engineers while creating challenges for junior developers, and what this may mean for the role of the software engineer.</p><h3><strong>Key observations from DHH</strong></h3><p>Here are 12 of my most interesting takeaways from talking with DHH:</p><div id="youtube2-JiWgKRgdgpI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;JiWgKRgdgpI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/JiWgKRgdgpI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>1. His philosophy on AI has not changed, but the available tools very much have. </strong>Autocomplete-style coding assistants were genuinely annoying for experienced developers six months ago. Things changed with the shift from tab-completion to agent harnesses, plus the emergence of powerful models like Opus 4.5 &#8211; when agents started producing code which DHH does want to merge with little to no alteration.</p><p><strong>2. Beautiful code and products aren&#8217;t matters of vanity; they&#8217;re signals of correctness.</strong> Dipping into philosophy, DHH says: &#8220;When something is beautiful, it&#8217;s likely to be correct.&#8221; He argues that Steve Jobs wanted the <em>inside</em> of a computer to be beautiful because people who care about circuit board layout are also those who sweat on the details of the UI.</p><p><strong>3. DHH&#8217;s development workflow, today:</strong> he runs <a href="https://github.com/tmux/tmux/wiki">tmux</a> to have two models running, and <a href="https://neovim.io/">neovim</a> in the center. Specifics:</p><ul><li><p>One fast LLM running (typically Gemini 2.5) in one split terminal</p></li><li><p>A slow but more powerful model in another terminal (usually Opus)</p></li><li><p>NeoVim for reviewing diffs via <a href="https://github.com/jesseduffield/lazygit">Lazygit</a></p></li></ul><p><strong>4. Ruby on Rails seems to be enjoying a Renaissance thanks to AI.</strong> Rails is one of the most token-efficient ways of building web apps and is well-suited for agent workflows. Testing is part of the framework, which helps agents write tests and validate their own outputs. It also produces code that humans can read and verify, which matters when reviewing agent output at speed.</p><p><strong>5. A big win from using AI agents is tackling stuff that you wouldn&#8217;t have before.</strong> A senior engineer at 37signals ran a &#8220;P1 optimization&#8221; project to improve the <em>fastest</em> 1% of requests. They optimized the P1 from being at 4 milliseconds, to under half a millisecond. This is the sort of work that wouldn&#8217;t have been considered previously!</p><p><strong>6. Running several AI agents feels less like &#8220;project management&#8221; and more like &#8220;wearing a mech suit.&#8221; </strong>Being a project manager of agents did not appeal to DHH, but now that he&#8217;s building with several agents, he feels like he&#8217;s in control of the work which is being hyper-accelerated.</p><p><strong>7. Senior engineers benefit from AI a lot more than juniors.</strong> At 37signals, senior engineers gain more from AI tools as they can validate whether an agent&#8217;s output is production-ready. DHH also notes that Amazon reached the same conclusion, and no longer lets junior programmers ship agent-generated code to production without review.</p><p><strong>8. 37signals has one designer for every two engineers.</strong> The company has around 20 software engineers and 10 designers. Designers do far more than design; they&#8217;re also product managers and &#8220;implementers&#8221; rolled into one. On top of making things look good, they figure out what should be built, how it should work, and often build the first version. DHH compares design at 37signals to jewelry design: &#8220;you should know the properties of gold. You should know how it bends.&#8221;</p><p><strong>9. AI agents could turn 37signals&#8217; &#8220;designer model&#8221; into the industry standard. </strong>AI tools now empower designers to implement more of their vision directly, and DHH suspects the rest of the industry is converging toward what 37signals has always done: working with small teams, where designers are also builders.</p><p><strong>10. Command Line Interfaces (CLI) feel like the ultimate AI interface, which validates the Unix philosophy of the 1970s</strong>. DHH is building CLIs for all 37signals products because they let agents chain tools together. &#8220;GitHub also has a CLI, and Sentry as well,&#8221; he says. &#8220;You can tie all these things together so an agent can check errors, write a fix, post a PR, and report back to basecamp.&#8221;</p><p><strong>11. The demise of the two-month product development cycle described in the book </strong><em><strong>&#8216;Shape Up: Stop Running in Circles and Ship Work that Matters&#8217;. </strong></em>The <a href="https://basecamp.com/shapeup">2019 title</a> by Ryan Signer covered how 37signals worked at the time, and DHH reveals that this methodology now needs rewriting because AI acceleration has made that timeline feel slow.</p><p><strong>12. Eight hours of sleep is non-negotiable &#8211; even during an AI gold rush!</strong> DHH believes the dopamine loop of shipping with agents is intoxicating and can lead to higher risk of burnout. So, he sleeps eight hours and doesn&#8217;t use an alarm.</p><h3><strong>The Pragmatic Engineer deepdives relevant for this episode</strong></h3><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/are-ai-agents-actually-slowing-us">Are AI agents actually slowing us down?</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/how-claude-code-is-built">How Claude Code is built</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/the-future-of-software-engineering-with-ai">The future of software engineering with AI: six predictions</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/the-ai-engineering-stack">The AI Engineering Stack</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/mitchell-hashimoto">Mitchell Hashimoto&#8217;s new way of writing code</a></p><p>&#8226; <a href="https://newsletter.pragmaticengineer.com/p/how-linux-is-built-with-greg-kroah">How Linux is built</a> with Greg Kroah-Hartman</p><h3><strong>Timestamps</strong></h3><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI">00:00</a>) Intro</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=131s">02:11</a>) Omarchy and Ruby on Rails</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=505s">08:25</a>) 37signals overview</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=612s">10:12</a>) Launching HEY</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=1118s">18:38</a>) Building HEY</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=1367s">22:47</a>) Designers at 37signals</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=1688s">28:08</a>) The craft of design</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=1912s">31:52</a>) Why DHH now embraces AI workflows</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=2385s">39:45</a>) The AI inflection point</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=2663s">44:23</a>) DHH&#8217;s agent-first workflow</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=3309s">55:09</a>) AI&#8217;s impact on junior developers</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=3788s">1:03:08</a>) Developer experience with AI</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=4603s">1:16:43</a>) What does AI mean for developers?</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=5013s">1:23:33</a>) 37signals teams and hiring</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=5900s">1:38:20</a>) Work-life balance with AI</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=6101s">1:41:41</a>) Why DHH keeps building</p><p>(<a href="https://www.youtube.com/watch?v=JiWgKRgdgpI&amp;t=6324s">1:45:24</a>) Closing</p><h3><strong>References</strong></h3><p><strong>Where to find DHH: </strong></p><p>&#8226; X: <a href="https://x.com/dhh">https://x.com/dhh</a></p><p>&#8226; LinkedIn: <a href="https://www.linkedin.com/in/david-heinemeier-hansson-374b18221">https://www.linkedin.com/in/david-heinemeier-hansson-374b18221</a></p><p>&#8226; Website: <a href="https://dhh.dk">https://dhh.dk</a></p><p>&#8226; Newsletter: <a href="https://world.hey.com/dhh">https://world.hey.com/dhh</a></p><p>&#8226; Podcast: <a href="https://37signals.com/podcast">https://37signals.com/podcast</a></p><p><strong>Mentions during the episode:</strong></p><p>&#8226; Omarchy: <a href="https://omarchy.org">https://omarchy.org</a></p><p>&#8226; Linux: <a href="https://www.linux.org">https://www.linux.org</a></p><p>&#8226; Ubuntu: <a href="https://ubuntu.com">https://ubuntu.com</a></p><p>&#8226; Arch Linux: <a href="https://archlinux.org">https://archlinux.org</a></p><p>&#8226; Hyprland: <a href="https://hypr.land">https://hypr.land</a></p><p>&#8226; Ruby on Rails: <a href="https://rubyonrails.org">https://rubyonrails.org</a></p><p>&#8226; Basecamp: <a href="https://basecamp.com">https://basecamp.com</a></p><p>&#8226; Fizzy: <a href="https://www.fizzy.do">https://www.fizzy.do</a></p><p>&#8226; Jason Fried on X: <a href="https://x.com/jasonfried">https://x.com/jasonfried</a></p><p>&#8226; HEY: <a href="https://www.hey.com">https://www.hey.com</a></p><p>&#8226; Shape Up: Stop Running in Circles and Ship Work that Matters: <a href="https://basecamp.com/shapeup">https://basecamp.com/shapeup</a></p><p>&#8226; Zolt&#225;n Hossz&#250; applying to 37signals: <a href="https://zoltan.co/37signals">https://zoltan.co/37signals</a></p><p>&#8226; Daring Fireball: <a href="https://daringfireball.net">https://daringfireball.net</a></p><p>&#8226; Smalltalk: <a href="https://en.wikipedia.org/wiki/Smalltalk">https://en.wikipedia.org/wiki/Smalltalk</a></p><p>&#8226; DHH: Future of Programming, AI, Ruby on Rails, Productivity &amp; Parenting | Lex Fridman Podcast #474: </p><div id="youtube2-vagyIcmIGOQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;vagyIcmIGOQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/vagyIcmIGOQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>&#8226; Homer&#8217;s typing Bird: </p><div id="youtube2-R_rF4kcqLkI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;R_rF4kcqLkI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/R_rF4kcqLkI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>&#8226; Real-world engineering challenges: building Cursor: <a href="https://newsletter.pragmaticengineer.com/p/cursor">https://newsletter.pragmaticengineer.com/p/cursor</a></p><p>&#8226; Building a best-selling game with a tiny team &#8211; with Jonas Tyroller: <a href="https://newsletter.pragmaticengineer.com/p/thronefall">https://newsletter.pragmaticengineer.com/p/thronefall</a></p><p>&#8226; Andrej Karpathy on X: <a href="https://x.com/karpathy">https://x.com/karpathy</a></p><p>&#8226; Reflexive AI usage is now a baseline expectation at Shopify: </p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/tobi/status/1909251946235437514&quot;,&quot;full_text&quot;:&quot;https://t.co/6i6h3sKi3x&quot;,&quot;username&quot;:&quot;tobi&quot;,&quot;name&quot;:&quot;tobi lutke&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1999293930936909824/_HWYanot_normal.jpg&quot;,&quot;date&quot;:&quot;2025-04-07T14:28:30.000Z&quot;,&quot;photos&quot;:[],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:287,&quot;retweet_count&quot;:947,&quot;like_count&quot;:6762,&quot;impression_count&quot;:2262661,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>&#8226; Claude Code: <a href="https://code.claude.com">https://code.claude.com</a></p><p>&#8226; OpenCode: <a href="https://opencode.ai">https://opencode.ai</a></p><p>&#8226; MacBook Neo: <a href="https://www.apple.com/macbook-neo/">https://www.apple.com/macbook-neo/</a></p><p>&#8226; tmux: <a href="https://github.com/tmux/tmux/wiki">https://github.com/tmux/tmux/wiki</a></p><p>&#8226; Kimi K2.5: <a href="https://kimik2ai.com/k2.5">https://kimik2ai.com/k2.5</a></p><p>&#8226; Agent first, agent native: <a href="https://basecamp.com/agents">https://basecamp.com/agents</a></p><p>&#8226; Sentry: <a href="https://sentry.io">https://sentry.io</a></p><p>&#8226; Moore&#8217;s law: <a href="https://en.wikipedia.org/wiki/Moore%27s_law">https://en.wikipedia.org/wiki/Moore%27s_law</a></p><p>&#8226; The Bitter Lesson: <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">http://www.incompleteideas.net/IncIdeas/BitterLesson.html</a></p><p>&#8226; Scaling Uber with Thuan Pham (Uber&#8217;s first CTO): <a href="https://newsletter.pragmaticengineer.com/p/scaling-uber-with-thuan-pham-ubers">https://newsletter.pragmaticengineer.com/p/scaling-uber-with-thuan-pham-ubers</a></p><p>&#8226; Waymo: <a href="https://waymo.com">https://waymo.com</a></p><p>&#8226; Elon Musk: &#8220;There will not be a steering wheel&#8221; in 20 years: <a href="https://www.axios.com/2017/12/15/elon-musk-there-will-not-be-a-steering-wheel-in-20-years-1513304216">https://www.axios.com/2017/12/15/elon-musk-there-will-not-be-a-steering-wheel-in-20-years-1513304216</a></p><p>&#8226; Leopold Aschenbrenner &#8212; 2027 AGI, China/US super-intelligence race, &amp; the return of history: </p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:145136502,&quot;url&quot;:&quot;https://www.dwarkesh.com/p/leopold-aschenbrenner&quot;,&quot;publication_id&quot;:69345,&quot;publication_name&quot;:&quot;Dwarkesh Podcast&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!QEPJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F90fa9666-5b8b-4685-a8fb-4b64cb7e0333_1080x1080.png&quot;,&quot;title&quot;:&quot;Leopold Aschenbrenner &#8212; 2027 AGI, China/US super-intelligence race, &amp; the return of history&quot;,&quot;truncated_body_text&quot;:null,&quot;date&quot;:&quot;2024-06-04T15:39:37.715Z&quot;,&quot;like_count&quot;:89,&quot;comment_count&quot;:16,&quot;bylines&quot;:[{&quot;id&quot;:4281466,&quot;name&quot;:&quot;Dwarkesh Patel&quot;,&quot;handle&quot;:&quot;dwarkesh&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!5eJb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb715ffd1-f7d7-4755-af88-c48efe647f5b_400x400.jpeg&quot;,&quot;bio&quot;:&quot;Host of Dwarkesh Podcast&quot;,&quot;profile_set_up_at&quot;:&quot;2021-06-09T22:58:10.864Z&quot;,&quot;reader_installed_at&quot;:&quot;2022-04-03T20:37:19.142Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:246192,&quot;user_id&quot;:4281466,&quot;publication_id&quot;:69345,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:69345,&quot;name&quot;:&quot;Dwarkesh Podcast&quot;,&quot;subdomain&quot;:&quot;dwarkesh&quot;,&quot;custom_domain&quot;:&quot;www.dwarkesh.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Deeply researched interviews&quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/90fa9666-5b8b-4685-a8fb-4b64cb7e0333_1080x1080.png&quot;,&quot;author_id&quot;:4281466,&quot;primary_user_id&quot;:4281466,&quot;theme_var_background_pop&quot;:&quot;#D10000&quot;,&quot;created_at&quot;:&quot;2020-07-18T16:36:25.723Z&quot;,&quot;email_from_name&quot;:&quot;Dwarkesh Patel&quot;,&quot;copyright&quot;:&quot;Dwarkesh Patel&quot;,&quot;founding_plan_name&quot;:&quot;Founding Member&quot;,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;enabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:null,&quot;is_personal_mode&quot;:false,&quot;logo_url_wide&quot;:null}}],&quot;twitter_screen_name&quot;:&quot;dwarkesh_sp&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:100,&quot;status&quot;:{&quot;bestsellerTier&quot;:100,&quot;subscriberTier&quot;:5,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;bestseller&quot;,&quot;tier&quot;:100},&quot;paidPublicationIds&quot;:[3087928,1163860,1134099,6819723,2118966,3409707,89120,22108,104058],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;podcast&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://www.dwarkesh.com/p/leopold-aschenbrenner?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!QEPJ!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F90fa9666-5b8b-4685-a8fb-4b64cb7e0333_1080x1080.png" loading="lazy"><span class="embedded-post-publication-name">Dwarkesh Podcast</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title-icon"><svg width="19" height="19" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
  <path d="M3 18V12C3 9.61305 3.94821 7.32387 5.63604 5.63604C7.32387 3.94821 9.61305 3 12 3C14.3869 3 16.6761 3.94821 18.364 5.63604C20.0518 7.32387 21 9.61305 21 12V18" stroke-linecap="round" stroke-linejoin="round"></path>
  <path d="M21 19C21 19.5304 20.7893 20.0391 20.4142 20.4142C20.0391 20.7893 19.5304 21 19 21H18C17.4696 21 16.9609 20.7893 16.5858 20.4142C16.2107 20.0391 16 19.5304 16 19V16C16 15.4696 16.2107 14.9609 16.5858 14.5858C16.9609 14.2107 17.4696 14 18 14H21V19ZM3 19C3 19.5304 3.21071 20.0391 3.58579 20.4142C3.96086 20.7893 4.46957 21 5 21H6C6.53043 21 7.03914 20.7893 7.41421 20.4142C7.78929 20.0391 8 19.5304 8 19V16C8 15.4696 7.78929 14.9609 7.41421 14.5858C7.03914 14.2107 6.53043 14 6 14H3V19Z" stroke-linecap="round" stroke-linejoin="round"></path>
</svg></div><div class="embedded-post-title">Leopold Aschenbrenner &#8212; 2027 AGI, China/US super-intelligence race, &amp; the return of history</div></div><div class="embedded-post-cta-wrapper"><div class="embedded-post-cta-icon"><svg width="32" height="32" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg">
  <path classname="inner-triangle" d="M10 8L16 12L10 16V8Z" stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round"></path>
</svg></div><span class="embedded-post-cta">Listen now</span></div><div class="embedded-post-meta">2 years ago &#183; 89 likes &#183; 16 comments &#183; Dwarkesh Patel</div></a></div><p>&#8226; Terminator 2 Things &amp; Ideas we would have never thought: </p><div id="youtube2-_GHX3iZtuKg" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;_GHX3iZtuKg&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/_GHX3iZtuKg?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>&#8226; Commodore 64: <a href="https://en.wikipedia.org/wiki/Commodore_64">https://en.wikipedia.org/wiki/Commodore_64</a></p><p>&#8226; PlayStation: <a href="https://www.playstation.com">https://www.playstation.com</a></p><p>&#8226; Jevons paradox: <a href="https://en.wikipedia.org/wiki/Jevons_paradox">https://en.wikipedia.org/wiki/Jevons_paradox</a></p><p>&#8226; OpenClaw: <a href="https://openclaw.ai">https://openclaw.ai</a></p><p>&#8226; The creator of Clawd: &#8220;I ship code I don&#8217;t read&#8221;: <a href="https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code">https://newsletter.pragmaticengineer.com/p/the-creator-of-clawd-i-ship-code</a></p><p>&#8226; John Carmack on X: <a href="https://x.com/ID_AA_Carmack">https://x.com/ID_AA_Carmack</a></p><p>&#8226; TDD, AI agents and coding with Kent Beck: <a href="https://newsletter.pragmaticengineer.com/p/tdd-ai-agents-and-coding-with-kent">https://newsletter.pragmaticengineer.com/p/tdd-ai-agents-and-coding-with-kent</a></p><p>&#8226; <em>Extreme Programming Explained: Embrace Change</em>: <a href="https://www.amazon.com/Extreme-Programming-Explained-Embrace-Change/dp/0321278658">https://www.amazon.com/Extreme-Programming-Explained-Embrace-Change/dp/0321278658</a></p><p>&#8226; <em>Smalltalk Best Practice Patterns</em>: <a href="https://www.amazon.com/Smalltalk-Best-Practice-Patterns-Kent/dp/013476904X">https://www.amazon.com/Smalltalk-Best-Practice-Patterns-Kent/dp/013476904X</a></p><p>&#8226; From IDEs to AI Agents with Steve Yegge: <a href="https://newsletter.pragmaticengineer.com/p/from-ides-to-ai-agents-with-steve">https://newsletter.pragmaticengineer.com/p/from-ides-to-ai-agents-with-steve</a></p><p>&#8212;</p><p>Production and marketing by <a href="https://penname.co/">Pen Name</a>. </p><p></p>]]></content:encoded></item><item><title><![CDATA[Cycles of disruption in the tech industry: with software pioneers Kent Beck & Martin Fowler]]></title><description><![CDATA[Parallels between technology shifts in the past decades and what we&#8217;re seeing with AI. Also: ways to avoid burnout when working with AI agents, TDD back in style, and more.]]></description><link>https://newsletter.pragmaticengineer.com/p/cycles-of-disruption-in-the-tech</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/cycles-of-disruption-in-the-tech</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 07 Apr 2026 16:27:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6D40!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The recent <a href="https://www.pragmaticsummit.com/">Pragmatic Summit</a> saw two legends of software development share a stage in what was one of the most popular sessions at our debut live event in San Francisco. In front of a packed audience, Martin Fowler and Kent Beck tackled a range of highly-relevant topics, with me hosting proceedings.</p><p>Martin and Kent go back decades, and Martin jokes that his career is &#8220;mostly about writing down Kent Beck&#8217;s ideas.&#8221; They first collaborated in the 1990s, and each has published influential books &#8211; &#8216;<em>Extreme Programming Explained&#8217; </em>and <em>&#8216;Test-Driven Development&#8217; </em>by Kent, and <em>&#8216;Refactoring&#8217; </em>and<em> &#8216;Patterns of Enterprise Application Architecture&#8217; </em>by Martin.</p><p>At the Pragmatic Summit, they each shared a wealth of hard-earned learnings and decades-worth of perspective, along with a healthy dose of skepticism. Needless to say, the conversation did not disappoint, and this article summarizes what we discussed in their own words. You can also <a href="https://youtu.be/CZs8J1ZD0CE">check out the full recording</a>.</p><p>We cover:</p><ol><li><p><strong>Technology shifts similar to AI. </strong>The arrival of the microprocessor, introduction of object-oriented languages, the Internet, and agile software development principles were all major changes &#8211; but one big difference was that it took time for these technologies to be adopted. Not so with AI.</p></li><li><p><strong>Agile and AI similarities. </strong>With Agile, company incentives were often misaligned, &#8220;snake oil&#8221; vendors were everywhere, and a &#8220;mid pack&#8221; of developers who resisted the change saw their career prospects hit. These trends look likely to repeat with AI.</p></li><li><p><strong>What&#8217;s happening inside companies</strong>. There&#8217;s some confusion &#8211; and even panic &#8211; at large companies, while AI tools don&#8217;t work nearly as well on large and complex codebases as on greenfield projects. Also, a &#8220;re-soloing&#8221; of software development is inbound.</p></li><li><p><strong>Avoiding burnout with AI agents</strong>. Set and maintain boundaries, and pay attention. Martin suggests to catch when you start producing &#8220;negative value&#8221;: that&#8217;s when to take a break.</p></li><li><p><strong>Unhealthy performance metrics.</strong> Companies are starting to measure things like frequency of pull requests &#8211; when they should be looking to quantify outcomes and results.</p></li><li><p><strong>Lower quality on purpose? </strong>It seems every business is optimizing for speed with AI, but quality can get dropped. Also: building features is more obvious with AI, than investing in &#8220;futures.&#8221;</p></li><li><p><strong>Test-Driven Development (TDD): tests no longer optional? </strong>Kent pioneered TDD, and today it&#8217;s more relevant than ever for working with AI.</p></li><li><p><strong>Thriving in an AI-native industry.</strong> Focus on working with agents to express your craft, try to get more enjoyment in <em>understanding</em> your domain, and take on more ambitious work.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6D40!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6D40!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 424w, https://substackcdn.com/image/fetch/$s_!6D40!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 848w, https://substackcdn.com/image/fetch/$s_!6D40!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 1272w, https://substackcdn.com/image/fetch/$s_!6D40!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6D40!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png" width="1456" height="1326" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1326,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6D40!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 424w, https://substackcdn.com/image/fetch/$s_!6D40!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 848w, https://substackcdn.com/image/fetch/$s_!6D40!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 1272w, https://substackcdn.com/image/fetch/$s_!6D40!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe801e0ca-cd70-4ed7-9d94-9ca1e44509a1_1594x1452.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Martin Fowler (center), Kent Beck (right), and me at The Pragmatic Summit</em></figcaption></figure></div><p><em>Before we start, a programming note: this week, there will be no The Pulse on Thursday &#8212; I&#8217;ll be attending <a href="https://www.ai.engineer/europe">AI Engineer Europe</a> in London on Thursday and Friday, including doing a fireside chat, and hosting one with Linear CTO Tuomas Artman.</em></p><h2>1. Technology shifts similar to AI</h2><p><strong>Do you recall a tech change as similarly promising and unpredictable as AI?</strong></p><p><strong>Martin: </strong>&#8220;Nothing has hit with the magnitude of AI. This is a whole size different from anything we&#8217;ve faced before. On a smaller scale, we were very much involved in the growth of object oriented languages, which scared a lot of people. It didn&#8217;t scare us so much because we were part of it.</p><p>Looking back, the internet had a huge impact on us all, and of course, Agile software development, too. Agile had a very big impact on a lot of organizations: you could tell by how hard they resisted it. We had to persuade people of the importance of these technological changes; yes, even the internet! It may sound surprising but there were people who didn&#8217;t think it was important.</p><p>The thing about AI is that today there is no argument about how important it is.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NR_1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NR_1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 424w, https://substackcdn.com/image/fetch/$s_!NR_1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 848w, https://substackcdn.com/image/fetch/$s_!NR_1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 1272w, https://substackcdn.com/image/fetch/$s_!NR_1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NR_1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NR_1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 424w, https://substackcdn.com/image/fetch/$s_!NR_1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 848w, https://substackcdn.com/image/fetch/$s_!NR_1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 1272w, https://substackcdn.com/image/fetch/$s_!NR_1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F819fa7b4-c3f3-4ab7-9a89-263e19893663_1600x1067.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Martin Fowler (left) speaks at the Summit</em></figcaption></figure></div><p><strong>Kent: </strong>&#8220;The other analogy I have is the introduction of the microprocessor. Before that, computers were big boxes; you couldn&#8217;t move them around. If you wanted another computer, you&#8217;d mortgage your house for it. Having a computer was a <em>big</em> deal.</p><p>I was a kid in Silicon Valley with my dad as a programmer when the Intel 4004 hit the market [in 1971]. We went: &#8220;Wait a minute, that <em>chip</em> is a computer? Oh my goodness!&#8221; The possibilities of computing suddenly expanded thanks to it. If you could figure out how to write software on this chip and figure out how to design hardware around this thing, you could suddenly do things you hadn&#8217;t even imagined.</p><p>And so I think part of AI is this expansion of imagination. I&#8217;m writing projects that are ridiculously ambitious: I&#8217;m working on a persistent <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a>. I&#8217;m writing library-quality code for Rust.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kug2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kug2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 424w, https://substackcdn.com/image/fetch/$s_!Kug2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 848w, https://substackcdn.com/image/fetch/$s_!Kug2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 1272w, https://substackcdn.com/image/fetch/$s_!Kug2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kug2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png" width="1390" height="808" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:808,&quot;width&quot;:1390,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Kug2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 424w, https://substackcdn.com/image/fetch/$s_!Kug2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 848w, https://substackcdn.com/image/fetch/$s_!Kug2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 1272w, https://substackcdn.com/image/fetch/$s_!Kug2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F334ad5f3-31fe-4454-b222-7b256dfde10d_1390x808.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Kent predicts AI will expand software engineering like the Intel 4004 did. Source: <a href="https://www.intel.com/content/www/us/en/history/virtual-vault/articles/the-intel-4004.html">Intel</a></em></figcaption></figure></div><h3>Balancing skepticism and curiosity</h3><p><strong>What was the feeling in the industry during those revolutions, and the differences between professionals who thrived back then and those who didn&#8217;t?</strong></p><p><strong>Martin:</strong> &#8220;There was a mix of people chasing the hype and those saying, &#8220;this new thing is nothing special.&#8221; I think you&#8217;ve always got to have that balance of skepticism and curiosity, and to be selective about it. I mean, I have been completely skeptical about some big changes: Blockchain was one I was extremely skeptical about.</p><p>My skepticism is well-rooted because I&#8217;ve seen so much &#8220;snake oil&#8221; over the years. In fact, my skepticism has to be absolute and total, which means I have to be skeptical about my skepticism! To be that skeptical also requires curiosity: you&#8217;ve got to be curious enough to say &#8220;how do I probe in order to detect signs of something useful?&#8221;</p><p>You also need to be aware that your early interactions may not actually be a <em>true</em> signal. When I started playing around with AI, it was with GitHub Copilot a year and a half ago. I was pretty unimpressed; it would give you something wonderful, but most of the time it gave you such garbage that you would just delete it right away. If that had been my only impression of AI, I would&#8217;ve immediately flipped the &#8220;<a href="https://en.wikipedia.org/wiki/Bozo_bit">bozo bit</a>&#8221; on it, like I did with blockchain.&#8221;</p><p><strong>Kent:</strong> &#8220;Here&#8217;s the thing, the capabilities of AI can change week to week. I&#8217;ll try something with Gemini one week and it fails miserably. Then Claude Code works pretty well, and then it doesn&#8217;t. And then I try Gemini for the same thing and it works, when it hadn&#8217;t worked last week!</p><p>People want an answer, but the answer&#8217;s always changing. In this environment, you can&#8217;t possibly have <em>the </em>answer. That&#8217;s the bad news, but the good news is that nobody else has the answer either. So, you&#8217;re just as smart as everybody else because we&#8217;re all equally ignorant.&#8221;</p><h2>2. Agile and AI similarities</h2><p><strong>In 2001, the &#8216;Agile Manifesto&#8217; came out, of which you were both co-authors. I think many companies are expecting the same thing with AI as Agile promised: better, faster, cheaper software. But how did Agile adoption really play out?</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o3em!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o3em!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 424w, https://substackcdn.com/image/fetch/$s_!o3em!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 848w, https://substackcdn.com/image/fetch/$s_!o3em!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 1272w, https://substackcdn.com/image/fetch/$s_!o3em!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o3em!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o3em!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 424w, https://substackcdn.com/image/fetch/$s_!o3em!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 848w, https://substackcdn.com/image/fetch/$s_!o3em!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 1272w, https://substackcdn.com/image/fetch/$s_!o3em!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe4d78ab-2662-4ef7-b18f-a25317de5dd5_1600x1067.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Full house: The conversation with Martin (left) and Kent (right) drew a large audience</em></figcaption></figure></div><p><strong>Kent:</strong> &#8220;It turns out people don&#8217;t want faster, cheaper, better! Inside some companies, the incentives are misaligned with actually achieving that. And so as geeks trying to achieve these improvements and saying: &#8220;it&#8217;s 40% better, 12% cheaper and less fattening,&#8221; people will punish you if that doesn&#8217;t align with <em>their</em> incentives inside organizations.</p><p>In the ideal organization, everybody would care about the same things, but that&#8217;s just not the way it works! So, if AI is coming along to promise the same things, we&#8217;re going to see the same reaction as before.&#8221;</p><p><strong>Martin:</strong> &#8220;An obvious difference is the sheer magnitude and speed there is with AI. Also, I think there will be a big difference between people who use it well and people who use it badly. The trick is figuring out how to use it well and putting the effort in to learn. There will be a big distinction between those two groups.</p><p>But I suspect there will still be some similarities with Agile. The core notions behind Agile and extreme programming are solid and good, but a huge snake-oil industry appeared around it &#8211; the &#8220;Agile industrial complex&#8221;, as I refer to it. This is also happening with AI right now, and it&#8217;s often hard to see the difference between snake oil and the real stuff.&#8221;</p><h3>AI as an amplifier</h3><p><strong>Kent:</strong> &#8220;AI is an amplifier. If you&#8217;re young and learning quickly, AI can amplify your learning. I personally think this is the golden age of the junior programmer. I get people coming to me all the time saying things like &#8220;my son started his second year in CS and wants to go into something more commercial like art history.&#8221; And I&#8217;d say, &#8220;this is like if you&#8217;re a carpenter and they just introduced the circular saw and you think, &#8216;oh, well, carpentry is over. Anybody can build a house now.&#8217; Well, no! Now, you have more powerful tools. You have less of the crummy work to do.</p><p>I think that young people are going to learn faster, and experienced folks who are working effectively are going to work quicker and more effectively.&#8221;</p><h3>Developers stuck in the middle</h3><p><strong>Kent: </strong>My concern is that there&#8217;s a &#8220;middle&#8221; of people who got into programming as a way to make money. If we look back at the Dotcom crash, there was a &#8220;mid pack&#8221; of such people who ended up going into real estate, more or less. But today, I don&#8217;t know where that &#8220;middle&#8221; will go, and it&#8217;s also much bigger now than 25 years ago.&#8221;</p><p><strong>Martin:</strong> &#8220;But that middle has also been &#8220;flushed out&#8221; to some degree by retrenchment in the software industry at the <a href="https://newsletter.pragmaticengineer.com/p/zirp">end of the zero interest rate period</a>. So, that&#8217;s an interesting difference because we&#8217;ve had these things occurring at once: the AI boom, and the economic headwinds of the past 2-3 years.</p><p>This is an interesting mix that wasn&#8217;t present in the &#8216;90s with the Dotcom Boom. Back then, it was pretty much <em>all</em> a solid boom.&#8221;</p><h3>Return of &#8220;let&#8217;s get rid of programmers!&#8221;</h3><p><strong>Kent: </strong>&#8220;Another interesting confluence of factors is the periodic, &#8220;we can get rid of all the programmers, woo-hoo&#8221; trend, which started with Cobol in the 1970s. With Cobol, business analysts were supposedly going to be able to write the programs, and the logic was that we wouldn&#8217;t need programmers anymore. That comes back repeatedly.</p><p>Agile, however, was definitely <em>not</em> a &#8220;let&#8217;s get rid of programmers&#8221; trend. With Agile, we wanted programmers to be more <em>effective</em> in their jobs. And since we started it, and were programmers, we were able to push that agenda pretty effectively.</p><p>However, today the &#8220;get rid of programmers&#8221; trend is repeating. As programmers, it behooves us to think about why they keep wanting to get rid of us. Some of that&#8217;s about us as programmers, and some of it not. Still, we should think about why people periodically want to axe us. In the end, this trend amps up the fear factor that everybody&#8217;s experiencing.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fZXk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fZXk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 424w, https://substackcdn.com/image/fetch/$s_!fZXk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 848w, https://substackcdn.com/image/fetch/$s_!fZXk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 1272w, https://substackcdn.com/image/fetch/$s_!fZXk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fZXk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fZXk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 424w, https://substackcdn.com/image/fetch/$s_!fZXk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 848w, https://substackcdn.com/image/fetch/$s_!fZXk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 1272w, https://substackcdn.com/image/fetch/$s_!fZXk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c75e55a-42f8-42be-bac8-5af9c6d7ab19_1600x1067.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>In the middle of the discussion</em></figcaption></figure></div><h3>&#8220;Re-soloing&#8221; of programming</h3><p><strong>Kent:</strong> &#8220;A big trend is the &#8220;re-soloing&#8221; [reduced in-person collaboration] of programming.</p><p>A big part of extreme programming (XP) was creating a safe social environment for basically antisocial people.<em> </em>On an XP team, people are talking to each other for hours a day, and are happy to do so because it&#8217;s set up to be a positive experience.</p><p>Now, I see programmers saying, &#8220;I&#8217;ve got six agents, so really I&#8217;m managing a team.&#8221; No, you&#8217;re not: you&#8217;re using six tools at once, which is fine, but it&#8217;s very different from having a conversation with somebody who sees things slightly differently, or has a different energy level from you on the day.</p><p>We used to have programmers in individual offices with doors, and you&#8217;d shut the door and slide the pizza underneath. That was easy to manage, but then along came this messy, social, complicated, chaotic process of software development, which just happened to produce really good results.</p><p>But now, instead of 50 people on my team, I can have five and they don&#8217;t have to talk to each other, and each can have 10 agents. Is that the same? No, it&#8217;s not.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NdVf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NdVf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 424w, https://substackcdn.com/image/fetch/$s_!NdVf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 848w, https://substackcdn.com/image/fetch/$s_!NdVf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!NdVf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NdVf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png" width="1456" height="966" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:966,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NdVf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 424w, https://substackcdn.com/image/fetch/$s_!NdVf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 848w, https://substackcdn.com/image/fetch/$s_!NdVf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!NdVf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97d1f960-88db-4779-bee7-6511138902c0_1600x1062.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Swag: As well as the usual merch at the Summit, there were books by speakers, including Martin and Kent</em></figcaption></figure></div><h3>More effective two-pizza teams &amp; the future of pairing</h3><p><strong>Martin: </strong>&#8220;Are we seeing two-pizza teams [of 5-10 people] becoming one-pizza teams because agents don&#8217;t eat pizza, or do we see two-pizza teams staying and becoming much more effective and capable? My bet is on more effective two-pizza teams.</p><p>We&#8217;re beginning to see some interesting feedback in terms of pair programming. With pair programming, is it one human and the genie (AI) programming, or is it two humans and one genie? If it&#8217;s two of us, perhaps we can control the genie a bit better, and we also have interaction.</p><p>I&#8217;ll be very interested in reports of people trying to control genies in pairs, possibly even beyond pairs. There&#8217;s also the whole &#8216;mob programming&#8217; thing, and how that will go with genies. I don&#8217;t necessarily think that one person and many genies is the right answer.&#8221;</p><p><strong>Kent</strong>: &#8220;My experience of pairing with two humans, plus one or more genies, has been very positive. And the fact the AI is slow is really nice. Every time models come out and are faster, I&#8217;m like, &#8220;Oh, there&#8217;s less time to talk.&#8221; When the AI goes away for three minutes, we can talk about our philosophy of naming, or how we express conditionals, or about what we should be doing next. But if it pops back in 15 seconds, you don&#8217;t have time for that conversation.&#8221;</p><h2>4. Avoiding burnout with AI agents</h2><p><strong>Do you find yourself getting close to burnout, especially when spinning up multiple threads? Do you have strategies for managing the mental impact?</strong></p>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/cycles-of-disruption-in-the-tech">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Pulse: Industry leaders return to coding with AI]]></title><description><![CDATA[Mark Zuckerberg and Garry Tan join the trend of C-level folks jumping back into coding with AI. Also: a bad week for Claude Code and GitHub, and more]]></description><link>https://newsletter.pragmaticengineer.com/p/the-pulse-industry-leaders-return</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-pulse-industry-leaders-return</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Thu, 02 Apr 2026 16:29:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Lw9q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf88b1aa-309a-49a2-bf21-63c4b281cefa_1552x456.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The Pulse is a series covering events, insights, and trends within Big Tech and startups. Notice an interesting event or trend? Hit reply and share it with me.</em></p><p>Today, we cover:</p><ol><li><p><strong>Founders back coding with AI: Mark Zuckerberg &amp; Garry Tan.</strong> The Meta chief is shipping diffs after 20 years, while Garry Tan at Y Combinator is knee-deep in coding, 15 years later. Founders with technical backgrounds being hands-on with AI agents could be a good thing &#8211; especially when the &#8220;honeymoon&#8221; period ends.</p></li><li><p><strong>A bad week for Claude Code and GitHub. </strong>Claude Code&#8217;s source code was leaked when a sourcemap file was accidentally uploaded, and revealed that the tool uses anti-distillation to deal with competitors, and also some potential future features such as an always-on background agent. Also: DMCA copyright strikes from Anthropic raise a big question: can a codebase that is fully AI-generated be covered by copyright?</p></li><li><p><strong>Industry pulse. </strong>Meta sets targets for AI-generated code, GitHub&#8217;s 6 years of reliability issues, massive job losses at Oracle, GitHub Copilot rollouts and then rolls back ads, RAM prices fall (for now), and more.</p></li></ol><h2>1. Founders back coding with AI: Mark Zuckerberg &amp; Garry Tan</h2><p>Two interesting stories of AI tools encouraging busy founders to start writing code again with AI agents.</p><h3>Mark Zuckerberg back to landing diffs, 20 years later</h3>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-industry-leaders-return">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Scaling Uber with Thuan Pham (Uber’s first CTO)]]></title><description><![CDATA[Thuan Pham (Uber's first CTO) on scaling Uber from constant outages to global infrastructure, the shift to microservices and platform teams, and how AI is reshaping engineering.]]></description><link>https://newsletter.pragmaticengineer.com/p/scaling-uber-with-thuan-pham-ubers</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/scaling-uber-with-thuan-pham-ubers</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Wed, 01 Apr 2026 16:49:59 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/192665750/cb39c381900b42debebc05f86342e85d.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3>Stream the latest episode</h3><p><strong>Listen and watch now on <a href="https://youtu.be/3jjRNVfm3V4">YouTube</a>, <a href="https://open.spotify.com/episode/13v42Y6P0TH36fVxZsMVmc">Spotify</a>, and <a href="https://podcasts.apple.com/us/podcast/the-pragmatic-engineer/id1769051199">Apple</a>.</strong> See the episode transcript at the top of this page, and timestamps for the episode at the bottom.</p><h3><strong>Brought to You by</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gh57!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" width="800" height="70" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:70,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.pragmaticengineer.com/i/185094534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>&#8226; <strong><a href="http://statsig.com/pragmatic">Statsig</a></strong> &#8211; &#8288; The unified platform for flags, analytics, experiments, and more.Stop switching between different tools, and have them all in one place.</p><p>&#8226; <strong><a href="https://workos.com/">WorkOS</a></strong> &#8211; Everything you need to make your app enterprise ready. WorkOS gives you APIs to ship enterprise features in days: features like authentication, SSO, SCIM, RBAC, audit logs. Visit <a href="http://workos.com">WorkOS.com</a></p><p>&#8226; <strong><a href="https://www.sonarsource.com/pragmatic/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-ai&amp;utm_content=podcast-sonar-ai-lp&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Sonar</a></strong> &#8211; The makers of SonarQube, the industry standard for automated code review. <a href="https://www.sonarsource.com/pragmatic/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-ai&amp;utm_content=podcast-sonar-ai-lp&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Sonar</a> helps reduce outages, improve security, and lower risks associated with AI and agentic coding. <a href="https://www.sonarsource.com/products/sonarqube/advanced-security/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-advanced-security&amp;utm_content=podcast-sonarqube-advanced-security&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">See how SonarQube Advanced Security</a> is empowering the Agent Centric Development Cycle (AC/DC) with new capabilities like malicious package detection to provide the same rigorous guardrails for AI agents as you would for a human developer.</p><h3><strong>In this episode</strong></h3><p><a href="https://www.linkedin.com/in/thuanqpham/">Thuan Pham</a> was Uber&#8217;s first and longest-serving CTO, and today he&#8217;s the CTO of Faire, a B2B wholesale platform. Back when Thuan joined Uber, it had around 40 engineers and 30,000 rides per day, and the system crashed multiple times a week. Over seven years, he helped rebuild the system, move it from a monolith to microservices, and scaled the engineering organization behind it. <em>I had the privilege of working with Thuan for four of those seven years. Later, the very first issue of The Pragmatic Engineer newsletter was a <a href="https://newsletter.pragmaticengineer.com/p/program-platform-split-uber">deepdive into Uber&#8217;s Program and Platform split</a>. This episode of the podcast contains a nice &#8220;full circle&#8221; moment, where Thuan shares even more details about why Uber chose to embrace that structure.</em></p><p>We discuss what it takes to operate and build in that kind of environment. Thuan explains how he divided his time at Uber into three &#8220;tours of duty,&#8221; from stabilizing a fragile system, to re-architecting it, and scaling the org.</p><p>We go deep into the platform-and-program split, the Helix app rewrite, and what it took to launch Uber in China in just five months (the original estimate was 18 months). We also cover Uber&#8217;s in-house tools and explain why they were necessary to support rapid growth.</p><p>Finally, we discuss his role today as CTO of Faire, how the company is using AI, and how he sees AI changing software engineering.</p><h3><strong>Key observation from Thuan</strong></h3><p>14 takeaways from Thuan that I find the most interesting:</p><div id="youtube2-3jjRNVfm3V4" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;3jjRNVfm3V4&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/3jjRNVfm3V4?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>1. Your professional reputation is a compounding asset that pays off unpredictably</strong>. Bill Gurley recruited Thuan to Uber based on knowing him from a startup a decade earlier. Similarly, when Thuan needed to hire for critical infrastructure teams at Uber, he reached out to engineers at VMware whom he&#8217;d previously worked with, and they followed him to the ridesharing app because they trusted him.</p><p><strong>2. The program/platform split came before microservices</strong>. The concept of cross-functional &#8220;program&#8221; teams and dedicated &#8220;platform&#8221; teams became necessary because an org split across backend, frontend and mobile engineers slowed down in execution speed when Uber grew to around 100 engineers. Every feature required negotiating bandwidth across the mobile, backend, and dispatch teams. Thuan, Travis Kalanick, and Jeff Holden literally used color-coded sticky notes with people&#8217;s names to reorganize into self-sufficient teams. We cover more about this split in the deepdive, <a href="https://newsletter.pragmaticengineer.com/p/program-platform-split-uber">The Platform and Program split at Uber.</a></p><p><strong>3. Microservices at Uber were more about surviving hypergrowth than anything else.</strong> Uber needed to decompose its massive monolith called &#8220;API.&#8221; To do so, a simple rule was applied: anything new needed to be built outside of the monolith so that no team blocked another. Teams started to build microservices, but decomposing the monolith took a good two years. Fun fact: in 2026, Uber has somewhat fewer microservices (around 4,500) than back in 2016 (around 5,000).</p><p><strong>4. When retiring a monolith, sometimes it gets even bigger before shrinking.</strong> After Uber decided to pull services out of the massive monolith, it still kept growing because the business kept adding features! There was an ugly middle phase before the monolith started to shrink. Keep this in mind if you look into decomposing a monolith.</p><p><strong>5. Expect multiple rewrites during hypergrowth.</strong> The right architecture depends on how fast a product and company are growing. At Uber, repeated rewrites were common because each one &#8220;bought&#8221; another window of survival for the company. Thuan&#8217;s recommendation is to understand that a rewrite simply means a company is outrunning its existing architecture: this is not necessarily a bad thing!</p><p><strong>6. Controversial launch advice: start with the hardest launch first. </strong>When Uber rolled out in China, Travis insisted on starting with Chengdu, the <em>largest</em> launch city. Looking back, it was scary but also helpful, as launching in the &#8220;hardest&#8221; city first gave the team confidence and made subsequent city launches much easier.</p><p><strong>7. Travis Kalanick spent 30+ hours interviewing Thuan. </strong>This took place over two weeks, as a series of one-on-ones. The sessions became a simulation of working together: disagreeing, aligning, and working things out. I&#8217;ve yet to hear of such an intense &#8211; and technical! &#8211; recruitment process by another CEO.</p><p><strong>8. Uber is the only major company that had a &#8220;Senior 1&#8221; and &#8220;Senior 2&#8221; level &#8211; and Thuan is unapologetic.</strong> Thuan introduced the Senior 1 (L5A) and Senior 2 (L5B) levels because the jump from senior (L5) to Staff (L6) became very big, and larger than between previous levels. One problem this split level created was that Uber&#8217;s L5B was akin to Google&#8217;s and Facebook&#8217;s L6/E6. Thuan resisted the title inflation of just renaming L5B to &#8216;Staff&#8217;.</p><p><strong>9. Name your services clearly; you don&#8217;t work at a &#8220;Mickey Mouse shop.&#8221; </strong>As Uber grew more complex, whimsical service names (like &#8220;Mustafa&#8221;) made navigating systems more tricky, and onboarding for new joiners more painful. Thuan sent a company-wide email which called for professional-sounding naming conventions, and reminded everyone that Uber was not a &#8220;Mickey Mouse shop.&#8221; The email didn&#8217;t fully solve the issue, but did force the growing org to take itself more seriously.</p><p><strong>10. Great engineering talent is global, so bring the opportunity to developers. </strong>During Thuan&#8217;s time, Uber opened nine engineering offices worldwide in order to access world-class talent. For example, the relatively small Denmark office built and operated core parts of Uber&#8217;s infrastructure, such as the trip datastore, <a href="https://www.uber.com/blog/schemaless-part-two-architecture/">Schemaless</a>.</p><p><strong>11. What&#8217;s the most important part of a CTO&#8217;s job? </strong>Thuan thinks that it&#8217;s to build a high-talent-density team, and to &#8220;see around the corner&#8221; 18&#8211;24 months in advance. As he puts it: &#8220;your team handles the six-month problems, while you figure out what the organization needs to look like two years from now.&#8221;</p><p><strong>12. The hardest use case of AI in software engineering is building new features on legacy codebases. </strong>At Faire, Thuan&#8217;s team uses &#8220;swarm coding&#8221; (orchestrated AI agents working in parallel) and some engineers there have doubled their output in three months. But generating greenfield code is easy; the real challenge is dealing with millions of lines of code and building features on top with all those existing dependencies.</p><p><strong>13. AI raises the floor, but doesn&#8217;t change what makes engineers great. </strong>AI enables people who can&#8217;t code to produce decent apps, but great engineers are still finding ways to leverage AI tools and accelerate even more. The differentiators remain the same as before AI: curiosity, fearlessness, and a willingness to innovate and learn new things.</p><p><strong>14. Thuan&#8217;s career advice: think of it in phases. </strong>Each segment of your career has different priorities, which Thuan sees like this:</p><ul><li><p>First 5&#8211;10 years: seek maximum learning and push yourself hard.</p></li><li><p>Mid-career as a senior/staff engineer: seek roles to make an outsized impact in, perhaps at a smaller company.</p></li><li><p>In leadership roles: teach and coach others, and bring them along with you.</p></li></ul><h3><strong>The Pragmatic Engineer deepdives relevant for this episode</strong></h3><ul><li><p><a href="https://newsletter.pragmaticengineer.com/p/how-uber-uses-ai-for-development">How Uber uses AI for development: inside look</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/program-platform-split-uber">The Platform and Program split at Uber</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/uber-eng-productivity">How Uber is measuring engineering productivity</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/uber-move-to-cloud">Inside Uber&#8217;s move to the cloud</a></p></li><li><p><a href="https://blog.pragmaticengineer.com/uber-app-rewrite-yolo/">Uber&#8217;s crazy YOLO app rewrite, from the front seat</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/how-uber-built-its-observability-platform">How Uber built its observability platform</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/developer-experience-at-uber">Developer experience at Uber</a> with Gautam Korlam</p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/the-scoop-46">Uber&#8217;s engineering level change</a></p></li></ul><h3><strong>Timestamps</strong></h3><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4">00:00</a>) Intro</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=332s">05:32</a>) Getting into tech</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=969s">16:09</a>) The dot-com bust</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=1242s">20:42</a>) VMware</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=1589s">26:29</a>) Getting hired by Travis at Uber</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=2002s">33:22</a>) Early days at Uber and scaling challenges</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=2457s">40:57</a>) Uber&#8217;s China launch</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=2832s">47:12</a>) The platform and program split</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=3026s">50:26</a>) From monolith to microservices</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=3218s">53:38</a>) Internal tools at Uber</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=3425s">57:05</a>) Helix: Uber&#8217;s mobile app rewrite</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=3595s">59:55</a>) Thuan&#8217;s email about naming</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=3723s">1:02:03</a>) Org structure changes under</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=3994s">1:06:34</a>) Thuan&#8217;s work philosophy</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=4343s">1:12:23</a>) The &#8220;three tours of duty&#8221; at Uber</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=4537s">1:15:37</a>) Why Thuan left Uber</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=4654s">1:17:34</a>) Coupang and Nubank</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=4919s">1:21:59</a>) Faire</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=5131s">1:25:31</a>) How Faire uses AI</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=5304s">1:28:24</a>) AI&#8217;s impact on software engineering</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=5469s">1:31:09</a>) The role of the CTO</p><p>(<a href="https://www.youtube.com/watch?v=3jjRNVfm3V4&amp;t=5713s">1:35:13</a>) Career advice</p><h3><strong>References</strong></h3><p><strong>Where to find Thuan Pham:</strong></p><p>&#8226; LinkedIn: <a href="https://www.linkedin.com/in/thuanqpham">https://www.linkedin.com/in/thuanqpham</a></p><p><strong>Mentions during the episode:</strong></p><p>&#8226; HP Labs: <a href="https://www.hp.com/hk-en/shop/tech-takes/post/what-is-hp-labs">https://www.hp.com/hk-en/shop/tech-takes/post/what-is-hp-labs</a></p><p>&#8226; Silicon Graphics: <a href="https://en.wikipedia.org/wiki/Silicon_Graphics">https://en.wikipedia.org/wiki/Silicon_Graphics</a></p><p>&#8226; Miro: <a href="https://miro.com">https://miro.com</a></p><p>&#8226; VMware: <a href="https://www.vmware.com">https://www.vmware.com</a></p><p>&#8226; Bill Gurley on LinkedIn: <a href="https://www.linkedin.com/in/billgurley">https://www.linkedin.com/in/billgurley</a></p><p>&#8226; Travis Kalanick on X: <a href="https://x.com/travisk">https://x.com/travisk</a></p><p>&#8226; DiDi: <a href="https://web.didiglobal.com">https://web.didiglobal.com</a></p><p>&#8226; The Platform and Program Split at Uber: A Milestone Special: <a href="https://newsletter.pragmaticengineer.com/p/the-platform-and-program-split-at">https://newsletter.pragmaticengineer.com/p/the-platform-and-program-split-at</a></p><p>&#8226; Rewriting Uber Engineering: The Opportunities Microservices Provide: <a href="https://www.uber.com/blog/building-tincup-microservice-implementation">https://www.uber.com/blog/building-tincup-microservice-implementation</a></p><p>&#8226; Up: Portable Microservices Ready for the Cloud: <a href="https://www.uber.com/blog/up-portable-microservices-ready-for-the-cloud/">https://www.uber.com/blog/up-portable-microservices-ready-for-the-cloud</a></p><p>&#8226; How Uber Built its Observability Platform: <a href="https://newsletter.pragmaticengineer.com/p/how-uber-built-its-observability-platform">https://newsletter.pragmaticengineer.com/p/how-uber-built-its-observability-platform</a></p><p>&#8226; The Uber Engineering Tech Stack, Part I: The Foundation: <a href="https://www.uber.com/blog/tech-stack-part-one-foundation">https://www.uber.com/blog/tech-stack-part-one-foundation</a></p><p>&#8226; How Ringpop from Uber Engineering Helps Distribute Your Application: <a href="https://www.uber.com/blog/ringpop-open-source-nodejs-library">https://www.uber.com/blog/ringpop-open-source-nodejs-library</a></p><p>&#8226; PostgreSQL: <a href="https://en.wikipedia.org/wiki/PostgreSQL">https://en.wikipedia.org/wiki/PostgreSQL</a></p><p>&#8226; MySQL: <a href="https://www.mysql.com">https://www.mysql.com</a></p><p>&#8226; Uber&#8217;s Crazy YOLO App Rewrite, From the Front Seat: <a href="https://blog.pragmaticengineer.com/uber-app-rewrite-yolo">https://blog.pragmaticengineer.com/uber-app-rewrite-yolo</a></p><p>&#8226; Hypergrowth startups: Uber and CloudKitchens with Charles-Axel Dein: <a href="https://newsletter.pragmaticengineer.com/p/high-growth-startups-uber-and-cloudkitchens">https://newsletter.pragmaticengineer.com/p/high-growth-startups-uber-and-cloudkitchens</a></p><p>&#8226; Coupang: <a href="https://www.aboutcoupang.com">https://www.aboutcoupang.com</a></p><p>&#8226; Nubank: <a href="https://international.nubank.com.br">https://international.nubank.com.br</a></p><p>&#8226; Max Rhodes on LinkedIn: <a href="https://www.linkedin.com/in/max-rhodes">https://www.linkedin.com/in/max-rhodes</a></p><p>&#8226; Sequoia: <a href="https://sequoiacap.com">https://sequoiacap.com</a></p><p>&#8226; Wyan Gretzky&#8217;s quote: <a href="https://www.brainyquote.com/quotes/wayne_gretzky_383282">https://www.brainyquote.com/quotes/wayne_gretzky_383282</a></p><p>&#8212;</p><p>Production and marketing by <a href="https://penname.co/">Pen Name</a>. </p><p></p>]]></content:encoded></item><item><title><![CDATA[What is inference engineering? Deepdive]]></title><description><![CDATA[Many engineers use inference daily, but inference engineering is a bit obscure &#8211; and an area rich with interesting challenges. Philip Kiely, author of the new book, &#8220;Inference Engineering,&#8221; explains]]></description><link>https://newsletter.pragmaticengineer.com/p/what-is-inference-engineering</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/what-is-inference-engineering</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 31 Mar 2026 17:01:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FctC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Two years ago, we learned about <a href="https://blog.pragmaticengineer.com/how-does-chatgpt-work/">how LLMs work</a> at a high level from <a href="https://blog.pragmaticengineer.com/how-does-chatgpt-work/">the ChatGPT team</a>, and today, almost all software engineers use large language models (LLMs) in our day-to-day work. The most visible part of using an LLM is <strong>inference</strong>; when an existing model takes an input (prompt) and generates an output, one token at a time. So, with AI models and AI agents everywhere across the tech industry in 2026, that means so is inference.</p><p><strong>And now, inference engineering is becoming more widespread, too, as open LLM models grow more capable. </strong>This is because with closed models, inference engineering is done only by the AI engineers who build the model, whose number might add up to a few thousand globally. In contrast, with the open models which tech companies are adopting, it&#8217;s possible to tweak them to perform better at inference. For example, Cursor built its new Composer 2.0 model <a href="https://newsletter.pragmaticengineer.com/i/192229275/backlash-after-cursor-hides-that-composer-2-is-based-on-open-source-model">on top of</a> the open Kimi 2.5 model, and successfully used plenty of inference engineering approaches to make it even faster.</p><p>So, based on this industry-wide prevalence and the related need for superior technical performance, it&#8217;s worth understanding a bit about what inference engineering actually is, and some interesting approaches worth knowing about, as a software engineer.</p><p>For some answers, I turned to <a href="https://x.com/philipkiely">Philip Kiely</a>, a software engineer who has been working for four years at the inference startup, Baseten. With his hard-earned experience, Philip has written an excellent, in-depth book about precisely this topic, <em>&#8220;Inference Engineering.&#8221;</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FctC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FctC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FctC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FctC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FctC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FctC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FctC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FctC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FctC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FctC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff39fe534-3703-4096-acc3-fcc01d4d5d00_1600x1200.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>My personal copy of Inference Engineering</em></figcaption></figure></div><p>In today&#8217;s issue, we cover:</p><ol><li><p><strong>Setting the stage: why is inference so important? </strong>More capable, widespread, open models are driving demand for inference engineering.</p></li><li><p><strong>What is inference? </strong>As the phase that comes after training a model, the inference layer introduces new engineering challenges like batching, caching, and quantization.</p></li><li><p><strong>When is inference engineering needed?</strong> Investing in this area is typically worth it when your product and usage scales up, and there are product requirements which a current, off-the-shelf solution lacks.</p></li><li><p><strong>What hardware does inference use? </strong>Datacenter GPUs are the most common, while on-premises, air-gapped GPUs are also employed.</p></li><li><p><strong>What software does inference use? </strong>Commonly-used software includes NVIDIA&#8217;s CUDA and Dynamo, as well as hardware-agnostic projects like PyTorch, vLLM, and others, which are growing in popularity.</p></li><li><p><strong>What infrastructure does inference need? </strong>Autoscaling is a baseline requirement. Kubernetes is a popular choice for autoscaling inside a cluster, while multi-cloud inference might be necessary for high-scale use cases.</p></li><li><p><strong>Five approaches to make inference faster. </strong>Quantization (reducing the numerical precision of a model&#8217;s weights), speculative decoding (taking advantage of spare compute to generate &#8220;draft tokens&#8221;), caching, parallelism (tensor parallelism and expert parallelism) and disaggregation (separating the prefill and decode phases to run on separate workers, not the same GPU).</p></li></ol><p>This deepdive uses a few abbreviations and concepts that are everyday lingo for inference engineers, but maybe are not for those less versed in the domain:</p><ul><li><p><strong>CUDA: </strong>Compute Unified Device Architecture. NVIDIA&#8217;s proprietary API to program NVIDIA GPUs for high-performance computing, including LLM-related use cases.</p></li><li><p><strong>TTFT</strong>: time to first token. Think of this as the &#8220;time to process the prompt.&#8221; This metric determines the perceived responsiveness of models and GenAI systems.</p></li><li><p><strong>TPS</strong>: tokens per second. Akin to a model&#8217;s &#8220;typing speed.&#8221;</p></li><li><p><strong>ITL</strong>: intertoken latency. The time between generating one token and the next.</p></li><li><p><strong>KV cache</strong>: key-value cache. The cached results of the attention algorithm, reused between requests to speed up inference. <em>We cover more on KV cache in the <a href="https://newsletter.pragmaticengineer.com/i/141865286/challenge-1-kv-cache-and-gpu-ram">Scaling ChatGPT deepdive</a>.</em></p></li><li><p><strong>Prefill / decode: </strong>the two phases of inference. Prefill is when the model takes the full input and processes tokens, outputting the KV cache. Decode is the phase in which the model generates one token at a time.</p></li><li><p><strong>MoE</strong>: Mixture of Experts. An architecture that enables models to be pretrained with far less compute. <a href="https://huggingface.co/blog/moe#what-is-a-mixture-of-experts-moe">More details on this approach.</a></p></li></ul><p>Below is an introduction to inference adapted from Philip&#8217;s book, &#8220;<em>Inference Engineering,&#8221; which </em>is <a href="https://baseten.com/inference-engineering">free to download as an e-book</a>. Physical copies are currently sold out, but Philip is printing more as fast as possible.</p><p><em>My usual disclaimer: as with all my recommendations, I was not paid to mention this book, and no links in this article are affiliates. See my <a href="https://blog.pragmaticengineer.com/ethics-statement/">ethics statement</a> for more.</em></p><p><em>With that, it&#8217;s over to Philip:</em></p><h2>1. Setting the stage: why is inference so important?</h2><p>Inference is the most valuable category in the AI industry, but inference engineering, on the other hand, is still in its infancy. In their work, inference engineers work across the stack from CUDA to Kubernetes in pursuit of faster, less expensive, and more reliable serving of generative AI models in production.</p><p>When ChatGPT launched in late 2022, there were perhaps a few hundred inference engineers in the world, and they didn&#8217;t call themselves that. These specialists mostly worked at frontier labs like OpenAI, Midjourney, and Anthropic, or at big tech companies like Google and NVIDIA.</p><p>Back then, it looked like this might be the way of the AI industry: that training generative AI models would be so hard and expensive that only a handful of companies would develop closed models and thereby require inference engineering for production serving. In that alternate future, the rest of the world would be mere consumers of AI via APIs, renting intelligence a token at a time.</p><p>Three years later, it turns out that training generative AI models is indeed both hard and expensive &#8211; but it&#8217;s not so hard and expensive to be limited to a handful of players. Instead, a proliferation of open models &#8211; more than two million and counting on <a href="https://huggingface.co/">Hugging Face</a> (the &#8220;GitHub for AI&#8221;) &#8211; means that today every engineer can now deploy their own intelligence to power AI products.</p><p>Research labs around the world, from OpenAI and NVIDIA Nemotron in America, to Mistral AI and Black Forest Labs in Europe, to Alibaba Qwen, DeepSeek AI, Z AI, and Moonshot AI in China, regularly release open models of all modalities.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hir-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hir-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 424w, https://substackcdn.com/image/fetch/$s_!Hir-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 848w, https://substackcdn.com/image/fetch/$s_!Hir-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 1272w, https://substackcdn.com/image/fetch/$s_!Hir-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hir-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png" width="1456" height="1188" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1188,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hir-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 424w, https://substackcdn.com/image/fetch/$s_!Hir-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 848w, https://substackcdn.com/image/fetch/$s_!Hir-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 1272w, https://substackcdn.com/image/fetch/$s_!Hir-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fb80caa-61e3-4346-bafc-3ffda6aa18bf_1600x1305.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Well over two million open models on Hugging Face, 25 times more than five years ago</figcaption></figure></div><p>Despite closed models getting smarter and cheaper, the movement into open models is accelerating, which differ by the availability of their weights:</p><ul><li><p><strong>Closed model: </strong>A proprietary model whose weights are unavailable to the public, like GPT-5 and Claude Sonnet.</p></li><li><p><strong>Open model: </strong>A model whose weights are publicly available, like Llama or DeepSeek, and which is usually released under the MIT license, or a similar permissive license (some models restrict commercial use, so always double-check license terms).</p></li></ul><p>Before December 2024, there was a meaningful gap in intelligence between closed and open models, but when DeepSeek V3 and R1 were released, that gap disappeared. <em>Note from Gergely: we previously covered <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-122-deepseek-rocks-the">how DeepSeek&#8217;s release rocked the AI industry.</a></em></p><p>Today, new closed models are matched by open models within months if not weeks, and occasionally, open models like Kimi K2 Thinking even exceed closed models&#8217; capabilities for brief periods.</p><p>Despite the fact that open models are constantly chasing closed models on benchmarks, they nonetheless change the equation for AI product builders. And as both types improve, closed and open models cross capability thresholds.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TyQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TyQk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 424w, https://substackcdn.com/image/fetch/$s_!TyQk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 848w, https://substackcdn.com/image/fetch/$s_!TyQk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!TyQk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TyQk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png" width="1456" height="997" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:997,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TyQk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 424w, https://substackcdn.com/image/fetch/$s_!TyQk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 848w, https://substackcdn.com/image/fetch/$s_!TyQk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!TyQk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0874a12e-2bf7-4ab8-80ce-19ba8db78283_1600x1096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Open and closed models improve rapidly, making new products possible</em></figcaption></figure></div><p>In 2022, it was impossible to build the kinds of AI-native products that define the industry today. But over time, closed models got smarter and new categories like customer service voice agents and AI-powered IDEs became possible. The early models were slow, expensive, and unreliable, but the capabilities were there and AI engineers began building companies around them.</p><p><strong>As open models crossed the same capability thresholds, these folks began using them to replace closed models.</strong> Many also began fine-tuning open models to cross capability thresholds faster, and even exceed closed model quality in their specific product and domain.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k0BW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k0BW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 424w, https://substackcdn.com/image/fetch/$s_!k0BW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 848w, https://substackcdn.com/image/fetch/$s_!k0BW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!k0BW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k0BW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png" width="1456" height="997" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:997,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k0BW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 424w, https://substackcdn.com/image/fetch/$s_!k0BW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 848w, https://substackcdn.com/image/fetch/$s_!k0BW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!k0BW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F776ec68f-41aa-401d-8f4f-8afb231ab590_1600x1096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Customizing open models retains control over latency, reliability, and economics</em></figcaption></figure></div><p>Switching to open models means the opportunity to use inference engineering to make the models powering AI products better in new ways:</p><ul><li><p><strong>Latency:</strong> Closed model APIs are built for throughput, but open models can be optimized for real-time applications.</p></li><li><p><strong>Availability: </strong>While APIs for GPT and Claude are stuck at two nines of uptime, it&#8217;s possible to achieve four nines or better with dedicated deployments of open models.</p></li><li><p><strong>Cost:</strong> Open models are often at least 80 percent less expensive at scale.</p></li></ul><p>So, whereas three years ago it looked like inference engineering was a niche field, the fact is that today, every company aiming to build truly differentiated and competitive AI products needs an inference strategy.</p><p>AI-native startups like Cursor, Clay, Gamma, and Mercor are redefining hypergrowth by building products that rely on open and in-house models. Leading digital native companies like Notion and Superhuman succeed by deeply integrating AI capabilities into their category-defining products.</p><p>Elsewhere, a new generation of blended research and engineering teams &#8211; World Labs, Writer, Mirage, and dozens more &#8211; are building businesses by training and productizing their own foundation models.</p><p>Adoption is even strong in enterprise and regulated industries, which historically were slow to adopt new technologies. Companies like OpenEvidence, Abridge, and Ambience are making generative AI ubiquitous in healthcare, while at the world&#8217;s largest companies, AI initiatives are moving past the pilot stage into massive user adoption. Market-wide demand for inference means that everyone from developers to executives has the opportunity to learn inference engineering and use it to advance their career and business.</p><p>I&#8217;ve been incredibly fortunate to have a front-row seat in the fastest-moving market in history over the last four years at Baseten, where we power mission-critical inference for the best AI products, including every company listed in the previous paragraphs.</p><p><strong>The good news is that you are early. </strong>There are still relatively few professionals working on inference, and newcomers can become experts quickly. Also, the potential and impact of inference is becoming ever clearer, but the domain is still in its infancy. That means there are enormous opportunities to solve novel, interesting, and deeply technical problems at all levels of the stack.</p><h2>2. What is inference?</h2><p>Inference is the second phase of a generative AI model&#8217;s lifecycle:</p><ul><li><p><strong>Training:</strong> The process of learning model weights from data.</p></li><li><p><strong>Inference:</strong> Serving generative AI models in production.</p></li></ul><p>During the past decade&#8217;s machine learning (ML) boom, hundreds of thousands of data scientists and ML engineers became familiar with the full lifecycle of training and inference for ML models.</p><p>Inference for classic ML models is relatively straightforward. In the early days of Baseten, we ran inference for models built with tools like XGBoost on lightweight CPUs with a simple software stack.</p><p>In contrast, inference for generative AI models is complex. You can&#8217;t simply take model weights, get some GPUs, and expect inference to be fast and reliable enough for large-scale production use. Doing inference well requires three layers:</p><ul><li><p><strong>Runtime:</strong> Optimizing the performance of a single model on a single GPU-backed instance.</p></li><li><p><strong>Infrastructure:</strong> Scaling across clusters, regions, and clouds without creating silos, while maintaining excellent uptime.</p></li><li><p><strong>Tooling:</strong> Providing engineers working on inference with the right level of abstraction to balance control with productivity.</p></li></ul><p>These three layers must work together to create a system that can handle mission-critical inference at scale.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Aqry!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Aqry!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 424w, https://substackcdn.com/image/fetch/$s_!Aqry!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 848w, https://substackcdn.com/image/fetch/$s_!Aqry!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!Aqry!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Aqry!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png" width="1456" height="1461" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1461,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Aqry!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 424w, https://substackcdn.com/image/fetch/$s_!Aqry!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 848w, https://substackcdn.com/image/fetch/$s_!Aqry!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!Aqry!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcba5fc58-2204-4293-820e-c6b60467e165_1595x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A complete inference stack includes runtime and infrastructure optimizations</figcaption></figure></div><p>The runtime layer is responsible for ensuring an individual model running on a GPU (or across several GPUs in a single instance) runs as performantly and efficiently as possible. This layer depends on a sophisticated software stack, from CUDA, to PyTorch, to inference engines like vLLM, SGLang, and TensorRT-LLM. Low-level optimization is important, with kernels like FlashAttention delivering significant performance gains.</p><p>The runtime layer relies on a number of model performance techniques that apply new research to the challenges of inference on generative AI models:</p><ul><li><p><strong>Batching:</strong> Run incoming requests in parallel, weaving them together on a token-by-token basis to increase throughput.</p></li><li><p><strong>Caching:</strong> Reuse the KV cache &#8211; the cached results of the attention algorithm &#8211; between requests that share prefixes.</p></li><li><p><strong>Quantization:</strong> Lower the precision of select pieces of the model to access more compute and reduce memory burden.</p></li><li><p><strong>Speculation:</strong> Generate and validate draft tokens to produce more than one token per forward pass during decode.</p></li><li><p><strong>Parallelism:</strong> Efficiently leverage more than one GPU to accelerate large models without introducing new bottlenecks.</p></li><li><p><strong>Disaggregation:</strong> Separate the two phases of LLM inference, prefill and decode, onto independently scaling workers.</p></li></ul><p>These model performance techniques are used for all modalities and not just LLMs, such as vision language models, embedding models, automatic speech recognition, speech synthesis, image generation, and video generation, which extend the capabilities of AI systems and require their own inference optimizations. But these runtime optimizations are not enough: no matter how performant a single instance of a model server is, it will eventually receive more traffic than it can handle. This is not a CUDA problem or a PyTorch problem, it&#8217;s a systems problem that needs to be solved at the infrastructure layer.</p><p>The nature of infrastructure problems changes at each level of scale. At first, the problems are around autoscaling: knowing when to add and remove replicas, and figuring out how to do so quickly.</p><p><strong>Past a certain scale &#8211; generally a few hundred GPUs &#8211; infrastructure problems are defined by capacity.</strong> To get access to enough GPUs, inference engineers begin spreading workloads across multiple regions and cloud providers. This quickly leads to silos, where models in one cluster may be starved for resources while other clusters have unused capacity. The final level of scale in infrastructure is a global system that treats all available resources as a single unified pool of compute.</p><p>Thoughtful multi-cloud infrastructure also improves reliability, protecting against downtime in any individual region or cloud provider. And for global applications, running inference near to end users improves end-to-end latency.</p><p>Once these runtime and infrastructure capabilities are built, they need to be presented at the appropriate level of abstraction. Inference providers like Baseten and internal teams building inference need to consider what tooling and developer experience to provide as the critical third layer in a complete inference platform.</p><p>Of course, developer experience is subjective. For inference, one extreme is the black box: give a platform model weights, and get back an API. At the other extreme is providing only basic constructs for compute, network, disk, and so forth.</p><p>The right developer experience is somewhere in the middle, where inference engineers have enough control to run mission-critical inference confidently, and enough abstraction to work productively.</p><p>This article &#8211; which is an excerpt of <em>Inference Engineering</em> &#8211; presents an overview of the technologies and techniques that power inference across all three layers of runtime, infrastructure, and tooling.</p><h2>3. When is inference engineering needed?</h2><p>Inference engineering adds speed and scale to AI products by optimizing production serving of generative models. Optimization means identifying the best solution from a range of options.</p><p>Before optimizing model performance and building robust infrastructure, you need to know what &#8220;best&#8221; means for your product; many performance improvements come from making tradeoffs in latency, throughput, and quality. In practice, optimization is often about finding the right balance, rather than maximizing a single factor.</p><p>For example, NFL players are big, fast, and strong. But they&#8217;re not as big as sumo wrestlers, fast as Olympic sprinters, or strong as champion powerlifters. Their bodies and skills are optimized to fulfill the specific demands of their position over the course of a full season.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BjhW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BjhW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 424w, https://substackcdn.com/image/fetch/$s_!BjhW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 848w, https://substackcdn.com/image/fetch/$s_!BjhW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 1272w, https://substackcdn.com/image/fetch/$s_!BjhW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BjhW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png" width="1456" height="733" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:733,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BjhW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 424w, https://substackcdn.com/image/fetch/$s_!BjhW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 848w, https://substackcdn.com/image/fetch/$s_!BjhW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 1272w, https://substackcdn.com/image/fetch/$s_!BjhW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5cd0e2b2-8865-4806-9c03-58289d752f05_1600x805.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Like elite athletes, inference services must be specialized for the demands of their workloads</em></figcaption></figure></div><p>Similarly, your inference system must be optimized to fulfill the specific demands of your model, product, and traffic. The more constraints you can introduce, the better the outcomes that can be achieved.</p><p>You should know:</p><ul><li><p><strong>Model requirements: </strong>Which model(s) do you need to run inference on?</p></li><li><p><strong>Application interface: </strong>How will inputs be delivered to the model, and how is the output expected to be formatted?</p></li><li><p><strong>Latency budget: </strong>How fast does your product need to respond to a user action, end-to-end?</p></li><li><p><strong>Unit economics: </strong>How much sense does it make to spend on a per-request, per-user, or per-month basis?</p></li><li><p><strong>Usage patterns: </strong>How many concurrent users are you serving, and is there any pattern to their usage (e.g., more activity during business hours)?</p></li></ul><p>Early in building an AI product, the answers to these questions may not be clear. At this point, it&#8217;s often better to use off-the-shelf APIs whenever possible, rather than investing in dedicated inference. But as a product scales, the requirements become clear and inference engineering becomes a worthwhile pursuit.</p><h2>4. What hardware does inference use?</h2><p>Inference engineering relies on accelerators: powerful hardware designed to load terabytes of data and perform trillions of operations per second.</p><p>The most common type of accelerator for inference is the GPU, and the market leader in GPUs for inference is NVIDIA. My book focuses on inference engineering for NVIDIA GPUs in the datacenter, and also covers other vendors of datacenter accelerators and local inference.</p><p>Across vendors, there are three types of GPUs on the market:</p><ul><li><p><strong>Datacenter GPUs: </strong>Racked servers with interconnected high-performance GPUs. Example: NVIDIA B200.</p></li><li><p><strong>Workstation GPUs: </strong>Individual desktop GPUs for professional workflows. Example: NVIDIA RTX Pro 6000.</p></li><li><p><strong>Personal computing GPUs: </strong>Individual desktop GPUs for everyday use. Example: NVIDIA GeForce RTX 5090.</p></li></ul><p>Inference at scale uses datacenter GPUs mounted on racks: refrigerator-sized chassis with standardized power, networking, and cooling.</p><p>Datacenter GPUs like the <a href="https://www.nvidia.com/en-us/data-center/dgx-b200/">NVIDIA B200</a> offer the highest individual performance, and more importantly, include high-bandwidth GPU-to-GPU interconnects, are installed in highly standardized configurations, and are available by the millions in datacenters worldwide.</p><p>I doubt there is a B200 GPU running under your desk, but if there is, please send me a picture! Instead, inference on datacenter GPUs runs in one of three modes:</p><ul><li><p><strong>Cloud: </strong>GPUs are rented in someone else&#8217;s datacenter, usually hyperscalers like AWS and GCP, or neoclouds like CoreWeave and Nebius.</p></li><li><p><strong>On-premises: </strong>GPUs are purchased and installed in a datacenter that you control directly.</p></li><li><p><strong>Air-gapped: </strong>GPUs are installed on-prem and you need to physically access the GPUs to run inference.</p></li></ul><p>Most inference engineers use cloud GPUs. Large enterprises and governments run on-prem and air-gapped deployments, but cloud-based GPUs offer the flexibility and access that fast-growing AI products need to scale.</p><p>Even with constraints, navigating the hardware landscape is complex. From variations between cloud providers, to NVIDIA&#8217;s personal naming conventions, there are many nuances in selecting the right accelerator.</p><h2>5. What software does inference use?</h2><p>NVIDIA&#8217;s market dominance in the inference space is in no small part due to the robust, mature software ecosystem around its hardware. Hardware iteration cycles are slow. Best-in-class hardware companies like Apple and NVIDIA release new architectures and generations at most annually, with two-year release cycles being more common. But software iteration is fast. Often, to run a newly released open model on day zero, you need to install a nightly build or other pre-release version of each software dependency just to get support for the new model.</p><p>Software&#8217;s fast iteration cycle and lower barrier to entry dramatically expands the landscape of inference engineering. There are countless companies building software at various levels of the inference stack, in contrast to hardware, which centers on NVIDIA and a few competitors.</p><p>For inference engineers, these are some of the key software players:</p><ul><li><p><strong>NVIDIA:</strong> Invests heavily in its own sometimes-proprietary software ecosystem, from CUDA up to Dynamo.</p></li><li><p><strong>Hugging Face:</strong> Maintains a model registry for all open models plus <a href="https://huggingface.co/docs/transformers/en/index">transformers</a> (models built on the <a href="https://en.wikipedia.org/wiki/Transformer_(deep_learning)">transformer architecture</a>), and <a href="https://huggingface.co/docs/diffusers/index">diffusers</a> (models built on the <a href="https://en.wikipedia.org/wiki/Diffusion_model">diffusion-based</a> generative ones).</p></li><li><p><strong>The Linux Foundation:</strong> Maintains hardware-agnostic projects like PyTorch and vLLM.</p></li><li><p><strong>LMSYS Org:</strong> Develops essential tools for inference and evaluation, most notably SGLang.</p></li></ul><p>There are thousands more companies, universities, and research institutions making essential open-source contributions to inference. Over time, technologies have been built at increasing levels of abstraction:</p><ul><li><p><strong>CUDA: </strong>Direct communication to the GPU for explicit control over computations and memory.</p></li><li><p><strong>Deep learning frameworks: </strong>Abstractions over CUDA for training, exporting, and running neural networks in Python.</p></li><li><p><strong>Inference engines: </strong>Highly configurable PyTorch-backed inference for common architectures.</p></li><li><p><strong>NVIDIA Dynamo: </strong>Sits on top of inference engines to power large-scale deployments.</p></li></ul><p>Most inference engineering today happens at the higher levels of abstraction, configuring and deploying inference engines and orchestrating inference across multiple GPUs. No matter which level of the stack you work at, it&#8217;s essential to have a strong mental model for the adjacent levels of abstraction to guide your work.</p><h2>6. What infrastructure does inference need?</h2><p>When you scale production traffic, your assumptions are rigorously tested. Everything from input and output sequence lengths, to traffic patterns, to what topic a user decides to chat about; they all impact your observed performance in production. And maintaining secure, robust infrastructure is an entirely different skillset from optimizing model inference on the GPU.</p><p>No matter how fast and efficiently a single instance can serve a model, the service will be overwhelmed if traffic gets high enough. It&#8217;s an infrastructure problem, not with PyTorch or CUDA, and it requires a different mindset and different technologies.</p><p>Scaling in production introduces new complexities about where and how to get GPUs, balance traffic across them, and prevent downtime. The goal of autoscaling is to ensure you always have enough resources to serve all incoming requests, while maintaining latency SLAs and without wasting money on idle GPUs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k4ge!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k4ge!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 424w, https://substackcdn.com/image/fetch/$s_!k4ge!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 848w, https://substackcdn.com/image/fetch/$s_!k4ge!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 1272w, https://substackcdn.com/image/fetch/$s_!k4ge!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k4ge!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png" width="1456" height="854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:854,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k4ge!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 424w, https://substackcdn.com/image/fetch/$s_!k4ge!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 848w, https://substackcdn.com/image/fetch/$s_!k4ge!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 1272w, https://substackcdn.com/image/fetch/$s_!k4ge!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58d466c2-fc03-4920-80ba-909b6d08a471_1600x939.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Without autoscaling, inference systems waste resources during traffic lulls and miss SLAs during traffic spikes</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5hww!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5hww!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 424w, https://substackcdn.com/image/fetch/$s_!5hww!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 848w, https://substackcdn.com/image/fetch/$s_!5hww!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 1272w, https://substackcdn.com/image/fetch/$s_!5hww!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5hww!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png" width="1456" height="854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:854,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5hww!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 424w, https://substackcdn.com/image/fetch/$s_!5hww!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 848w, https://substackcdn.com/image/fetch/$s_!5hww!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 1272w, https://substackcdn.com/image/fetch/$s_!5hww!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5223ec4d-8b1e-4eb4-86f2-364e026cf470_1600x939.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>A strong autoscaling system for inference matches resources to demand</em></figcaption></figure></div><p>Autoscaling systems use Kubernetes, an open-source container orchestration system, along with a cluster-level system for provisioning and deallocating compute. Kubernetes can run one or more replicas of a model container, each on its own instance. An instance includes the GPUs and other hardware resources that the container requires.</p><p>Unless your traffic is unusually consistent, there probably isn&#8217;t a specific number of replicas that perfectly matches your needs.</p><p>Autoscaling is the practice of dynamically adjusting the number of replicas allocated to a given model within a cluster. There are two ways to make autoscaling decisions:</p><ul><li><p><strong>Utilization:</strong> Scale up or down based on GPU utilization signals like memory usage or compute usage.</p></li><li><p><strong>Traffic:</strong> Scale up and down based on the number of requests being processed in the system.</p></li></ul><p>Utilization and traffic don&#8217;t always match. For example, in LLM prefill, a few requests with hundreds of thousands of uncached input tokens could cause much higher utilization than many small requests with high cache hit rates.</p><p>Traffic-based scaling decisions can be made proactively, while utilization is a lagging indicator. Use both in combination to keep system resources matched with demand.</p><p>When designing a traffic-based autoscaling system, you want to configure five factors:</p><ul><li><p><strong>Min replicas:</strong> What is the minimum number of replicas that stay running, regardless of traffic?</p></li><li><p><strong>Max replicas:</strong> What is the maximum number of replicas you can allocate when traffic is high?</p></li><li><p><strong>Autoscaling window:</strong> How long is the sliding timeframe used to measure traffic and make autoscaling decisions?</p></li><li><p><strong>Scale down delay:</strong> For how long after a scale-down is suggested do you wait, in case there&#8217;s another traffic spike?</p></li><li><p><strong>Concurrency target:</strong> How many requests can each replica handle at once?</p></li></ul><p>The exact configuration determines how well the autoscaling system achieves its goals of maintaining latency SLAs without wasting resources. For example, increasing the scale-down delay prevents premature scaledowns for spikey traffic, but could result in unnecessary spend after traffic has properly cooled down.</p><p>Autoscaling within a single cluster works up to a certain point, but high-volume deployments serving a global user base need thousands of GPUs distributed around the world.</p><p>It&#8217;s straightforward to build multi-cloud inference as a collection of siloed compute across different cloud providers. But in these setups, there&#8217;s no way to use inter-cloud compute fluidly, and moving workloads across clouds is a tedious, error-prone process.</p><p>True multi-cloud inference requires building a multi-region, multi-provider bin packing tool, which treats distinct pools of compute as fungible with each other. Like Kubernetes within a single cluster, multi-cloud capacity management must take a global view, enabling global scheduling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hHbk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hHbk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 424w, https://substackcdn.com/image/fetch/$s_!hHbk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 848w, https://substackcdn.com/image/fetch/$s_!hHbk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 1272w, https://substackcdn.com/image/fetch/$s_!hHbk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hHbk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png" width="1456" height="916" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:916,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hHbk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 424w, https://substackcdn.com/image/fetch/$s_!hHbk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 848w, https://substackcdn.com/image/fetch/$s_!hHbk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 1272w, https://substackcdn.com/image/fetch/$s_!hHbk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40620524-e497-4f72-9f40-c84605d4461f_1600x1007.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>A multi-cloud approach extends the idea of control and workload planes to a multi-cluster, multi-region system</em></figcaption></figure></div><p>Running true multi-cloud inference unlocks:</p><ul><li><p><strong>Capacity:</strong> Pool capacity from multiple providers for greater, more flexible GPU access.</p></li><li><p><strong>Redundancy: </strong>Split inference across providers for resiliency against outages.</p></li><li><p><strong>Latency:</strong> Run inference close to end users to reduce network latency overhead.</p></li><li><p><strong>Compliance: </strong>Run inference in compliance with data sovereignty and other regulatory requirements.</p></li></ul><p>Scaling from one cluster in one cloud to many clusters in many clouds requires a new coordination layer. A multi-cloud architecture contains:</p><ul><li><p><strong>Control plane:</strong> Handles model deployment and global scaling decisions, receives real-time event streams.</p></li><li><p><strong>Workload planes:</strong> Handles direct inference traffic and in-cluster scaling decisions, reports utilization and demand.</p></li></ul><p>This separation of responsibilities ensures that individual workload planes can serve traffic independently. If something happens to the control plane or any given workload plane, other workloads should be unaffected.</p><h2>7. Five approaches to make inference faster</h2><p>One of the coolest things about working in inference engineering is that, unlike many industries where new academic research takes years or decades to be adopted, techniques from new papers are live in production within months or even weeks.</p><p>But there is a gap to bridge between research and production, and some of the most visible inference engineering work of all comes from doing so.</p><p>Real-world traffic defies constraints. But with volume, you can adapt systems over time to match the changing nature of usage. Tuning the parameters of inference engines, speculation algorithms, and model servers isn&#8217;t a one-time task. Instead, either through iterative deployments or dynamic runtime adjustments, you can continuously improve the performance of an inference system.</p><p>Finding the right combination of techniques and configurations takes patient experimentation. I remember an internal hackathon during which one of Baseten&#8217;s inference engineers worked on an autocomplete model for code, and ended up trying 77 different configurations via a handwritten script before finding a non-obvious solution that doubled TPS (tokens per second) for a customer&#8217;s model.</p><p>Sometimes, techniques are symbiotic or incompatible, which makes inference optimization even more complex. For example, quantizing the KV cache alleviates a bottleneck in disaggregation, but increasing batch sizing reduces the compute available for speculation. An inference engineer&#8217;s challenge is always to create a balanced set of optimizations that delivers more than the sum of its parts.</p><p>Let&#8217;s look into the key categories of applied research for inference acceleration: quantization, speculation, caching, parallelism, and disaggregation.</p><h3>Approach #1: Quantization</h3><p>Quantization means reducing the numerical precision of a model&#8217;s weights. It improves latency (both TTFT [time to first token] and TPS, increases system throughput, and opens up headroom for other optimizations like disaggregation, speculation, and prefix caching to be even more effective. But when it goes wrong, quantization can materially reduce a model&#8217;s output quality.</p><p>Models are trained with weights, activations, and other components represented in a certain native number format. Usually, this is <a href="https://en.wikipedia.org/wiki/Bfloat16_floating-point_format">BF16</a> or <a href="https://en.wikipedia.org/wiki/Half-precision_floating-point_format">FP16</a>, although 8-bit and 4-bit native precisions are becoming more popular for training.</p><p>Post-training quantization works by changing those model weights and other values from their native number format, to a lower-precision format. Cutting precision in half improves performance in both phases of inference:</p><ul><li><p><strong>Prefill: </strong>Compute-bound prefill now runs on lower-precision Tensor Cores with twice the FLOPS.</p></li><li><p><strong>Decode: </strong>Memory-bound decode now loads half as much data per value, effectively doubling memory bandwidth.</p></li></ul><p>Working with quantized data introduces overheads, so it&#8217;s not linearly twice as fast to go from 16 to 8 bits. In practice, quantization down a single level of precision generally offers 30%-50% better performance for LLMs. The catch with quantization is that it runs the risk of reducing a model&#8217;s output quality, and has the potential to introduce precision errors throughout the calculations that power inference.</p><p>Precision errors compound over time. Consider what happens when you square and cube different precisions of Pi:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VrYE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VrYE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 424w, https://substackcdn.com/image/fetch/$s_!VrYE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 848w, https://substackcdn.com/image/fetch/$s_!VrYE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 1272w, https://substackcdn.com/image/fetch/$s_!VrYE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VrYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png" width="902" height="308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8988ece-35ae-4377-9551-ca6adc714317_902x308.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:308,&quot;width&quot;:902,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32978,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.pragmaticengineer.com/i/192753237?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VrYE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 424w, https://substackcdn.com/image/fetch/$s_!VrYE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 848w, https://substackcdn.com/image/fetch/$s_!VrYE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 1272w, https://substackcdn.com/image/fetch/$s_!VrYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988ece-35ae-4377-9551-ca6adc714317_902x308.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Most of the work in quantization is in preventing precision errors and minimizing their impact on the final model output.</p><p>Sixteen-bit, 8-bit, and 4-bit precisions are the primary formats for inference. Number formats contain:</p><ul><li><p><strong>Precision:</strong> The number of bits used to express a single value in the format. For example, FP16 uses 16 bits.</p></li><li><p><strong>Type:</strong> Whether these bits are interpreted to represent an integer (non decimal) or a floating-point number (decimal).</p></li><li><p><strong>Scale factor:</strong> A multiplier used to map values from a low-precision format back to the higher-precision format.</p></li></ul><p>Combined, these attributes determine the two factors behind how well a number format represents the values used in inference:</p><ul><li><p><strong>Dynamic range:</strong> The difference between the lowest and highest value that can be represented in the format.</p></li><li><p><strong>Granularity:</strong> The number of parameters or other values that are quantized along a single scale factor.</p></li></ul><p>Dynamic range is essential to low-precision inference without quality loss. Sixteen bits can represent 65,536 distinct values, while 8 bits can only represent 256 different values. The dynamic range is the distribution of these values &#8211; the difference between the smallest and largest available value.</p><p>Dynamic range explains why floating-point formats are better than integer formats for inference. Floating-point formats have three properties:</p><ul><li><p><strong>Sign: </strong>A single bit that represents whether a number is positive or negative.</p></li><li><p><strong>Exponent: </strong>A set of bits that, taken together, represent an exponent factor.</p></li><li><p><strong>Mantissa: </strong>A set of bits that together represent the base value multiplied by two to the exponent.</p></li></ul><p>An FP8 number in an E4M3 data format means it has a 4-bit exponent and a 3-bit mantissa, with the remaining bit for the sign. Integer formats only have sign and value bits.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mHG6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mHG6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 424w, https://substackcdn.com/image/fetch/$s_!mHG6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 848w, https://substackcdn.com/image/fetch/$s_!mHG6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 1272w, https://substackcdn.com/image/fetch/$s_!mHG6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mHG6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png" width="1456" height="849" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:849,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mHG6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 424w, https://substackcdn.com/image/fetch/$s_!mHG6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 848w, https://substackcdn.com/image/fetch/$s_!mHG6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 1272w, https://substackcdn.com/image/fetch/$s_!mHG6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F689475aa-baa7-44f7-9ad9-8cb5d3dda6b6_1600x933.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Floating-point number formats have exponent and mantissa bits, along with the sign bit</em></figcaption></figure></div><p>The exponent in floating-point numbers gives it a higher dynamic range, meaning it can better express very large and very small numbers. This is important because outlier values are significant in inference, and floating-point number formats better represent outliers after quantization.</p><p>Within floating-point formats, there are multiple options at each precision, like FP4, MXFP4, and NVFP4. These formats differ in granularity, or in the number of values quantized by a single scale factor.</p><p>Quantization can be applied at three levels:</p><ul><li><p><strong>Tensor level: </strong>Calculate a single scale factor for the entire QKV tensor.</p></li><li><p><strong>Channel level: </strong>Calculate a different scale factor for each feature vector within the tensor.</p></li><li><p><strong>Block level: </strong>Within each feature vector, divide the vector into blocks of N values and calculate a scale factor for each block.</p></li></ul><p>More granular quantization has a lower chance of smoothing over outliers, which preserves quality. However, more granularity also introduces extra overhead for storing and applying scale factors.</p><p>The components of a model have varying sensitivities to quantization. Reducing the precision of more sensitive components runs a higher risk of quality degradation. From the least to most sensitive components:</p><ol><li><p><strong>Weights: </strong>the linear layers are least sensitive to quantization.</p></li><li><p><strong>Activations: </strong>The intermediate output of activation functions are only somewhat sensitive to quantization. They are rarely quantized as they are such a tiny fraction of the model&#8217;s weights.</p></li><li><p><strong>KV cache: </strong>The cached values from the attention calculation are moderately sensitive to quantization.</p></li><li><p><strong>Attention: </strong>The attention layers of a model are highly sensitive to quantization, especially equations like softmax.</p></li></ol><p>Within each component, you can get more selective about quantization.</p><p>Even in linear layers and activations &#8211; generally the least sensitive to quantization due to their size &#8211; early and late layers, like the input and output layers of the neural network, may be left in their original precision as these layers are more sensitive.</p><p>While quantizing weights and activations helps performance, KV cache quantization gives an additional boost to techniques like prefix caching and disaggregation. The KV cache is a valuable resource and quantizing it allows inference engines to store more of it in memory and read it more quickly.</p><p>However, the KV cache for each token is used by each subsequent token. This means precision errors introduced by quantization can compound from token to token. Compounding errors are exactly why attention layers are the riskiest to quantize: not only is attention very sensitive to dynamic range, but each attention calculation relies on the results of each previous attention calculation. Therefore, over a sequence of thousands of tokens, errors accumulate quickly.</p><p>All but the most aggressive quantization schemes run functions like softmax in their original precision.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C71B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C71B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 424w, https://substackcdn.com/image/fetch/$s_!C71B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 848w, https://substackcdn.com/image/fetch/$s_!C71B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!C71B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C71B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png" width="1453" height="1600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1600,&quot;width&quot;:1453,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C71B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 424w, https://substackcdn.com/image/fetch/$s_!C71B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 848w, https://substackcdn.com/image/fetch/$s_!C71B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!C71B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbbf0898f-e872-4167-8d6b-9e019982e65a_1453x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Quantization risk is low for weights and activations, moderate for KV cache, and high for attention</em></figcaption></figure></div><p>A moderate approach to low-precision inference uses a format like FP8 with high dynamic range &#8211; if possible, a microscaling format like MXFP8 &#8211; to carefully quantize select linear layers, activations, and often KV cache values. Even with these high dynamic range formats, components of the attention layer are rarely quantized.</p><h3>Approach #2: Speculative decoding</h3><p>The decode phase of LLM inference is an autoregressive process in which tokens are generated one at a time. The bottleneck on decode is memory bandwidth, with compute sitting idle at low-to-moderate batch sizes as weights are read from memory.</p><p>Speculative decoding takes advantage of that spare compute to try and generate multiple tokens per forward pass through the target model. If an inference engine could generate two, three, or even more tokens for each round-trip of weights through memory, it would generate far more tokens per second. Note, speculative decoding only improves TPS / ITL (inter-token latency), not TTFT (time to first token.)</p><p>There are multiple algorithms for speculative decoding and they share a common mechanism:</p><ol><li><p>The speculator generates one or more <strong>draft tokens</strong>.</p></li><li><p>The <strong>target model</strong> &#8211; or the underlying model that you&#8217;re trying to accelerate &#8211; performs <strong>validation</strong> on these tokens to check if they match what the model would generate.</p></li><li><p>The target model accepts any valid draft tokens and generates an additional token itself, completing the forward pass.</p></li></ol><p>This generates N+1 tokens per forward pass, or iteration through the decode loop, where N is the number of accepted draft tokens.</p><p>Generating draft tokens is not free, it takes both compute and memory. However, it is much faster for a target model to validate a draft token than to generate an original token. If you imagine a sudoku puzzle, solving it is hard, but checking if the solution is correct is very easy. For the target model, generating a token is like solving a sudoku, while validating a draft token is like checking a finished sudoku.</p><p>The performance uplift from any speculative decoding strategy depends on three factors:</p><ol><li><p><strong>Draft token cost: </strong>Time taken to generate a draft token.</p></li><li><p><strong>Draft sequence length: </strong>The number of draft tokens generated per forward pass.</p></li><li><p><strong>Token acceptance rate:</strong> The percentage of draft tokens accepted by the target model.</p></li></ol><p>Token acceptance rate is high early in the draft sequence, but draft tokens get less reliable deeper in the sequence.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yXtg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yXtg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 424w, https://substackcdn.com/image/fetch/$s_!yXtg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 848w, https://substackcdn.com/image/fetch/$s_!yXtg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!yXtg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yXtg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png" width="1182" height="1600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1600,&quot;width&quot;:1182,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yXtg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 424w, https://substackcdn.com/image/fetch/$s_!yXtg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 848w, https://substackcdn.com/image/fetch/$s_!yXtg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!yXtg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2aa828ac-64d7-41cb-9cf3-37a308efabfb_1182x1600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Speculative decoding from draft token generation and validation to prefix acceptance with subsequent token generation</figcaption></figure></div><p><strong>Aim for short, high-percentage sequences</strong> because while generating and validating tokens is inexpensive relative to generating tokens in the original model, it still comes with meaningful overhead. Additionally, once a single draft token is rejected as wrong, all subsequent tokens in the sequence are also rejected.</p><p>Working with speculation is interesting because so many factors affect token acceptance rate. The big one is the temperature parameter &#8211; higher temperatures yield token distributions that are harder to predict, reducing the effectiveness of speculative decoding. But even factors as simple as subject matter can make a difference on acceptance rate if the draft model or additional decoder head used for speculation is better versed in, say, math than history.</p><p>Another limitation on speculative decoding is that it&#8217;s most useful at low batch sizes where there are spare compute cycles. At higher batch sizes, speculative decoding must be dynamically disabled as compute is too saturated to afford verification.</p><p>Each speculation algorithm navigates these tradeoffs differently, and careful implementation of the right algorithm for the situation can lead to major improvements in TPS.</p><h3>Approach #3: Caching</h3><p>During prefill, the inference engine builds a KV cache (a store of keys and values for each token) on the input sequence. It then updates the KV cache for each token during decode. As inference is autoregressive, the value for each new token depends on the value of every previous token in the sequence.</p><p>Every inference engine uses KV caching by default on a request-by-request basis. Without KV caching, LLM inference would be unbearably slow since each previous value in the entire sequence would need to be recalculated for each subsequent token.</p><p>However, engineers can get more utility from the KV cache by reusing it between requests rather than solely within each inference sequence.</p><p>Consider the following two prompts, each with four tokens on most tokenizers:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1jj0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1jj0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 424w, https://substackcdn.com/image/fetch/$s_!1jj0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 848w, https://substackcdn.com/image/fetch/$s_!1jj0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 1272w, https://substackcdn.com/image/fetch/$s_!1jj0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1jj0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png" width="1456" height="883" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:883,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1jj0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 424w, https://substackcdn.com/image/fetch/$s_!1jj0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 848w, https://substackcdn.com/image/fetch/$s_!1jj0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 1272w, https://substackcdn.com/image/fetch/$s_!1jj0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa294de82-8f97-475a-baa2-fb0e9bb83329_1600x970.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>A pair of four-token sequences with two-token matching prefixes</em></figcaption></figure></div><p>By default, the inference engine has to run prefill on all four tokens of each prompt. But the first tokens of each prompt &#8211; &#8220;Weather in&#8221; &#8211; form a shared prefix between the pair.</p><p>With prefix caching, you can reuse the KV cache from the first request to improve TTFT on the second request by skipping prefill on the first two tokens and reading in the existing KV cache instead.</p><p>When you see pay-per-token APIs charge less for &#8220;cache hit&#8221; input tokens than &#8220;cache miss&#8221; tokens, this is why: reusing cached tokens takes very little compute power and time. As an inference engineer, you can apply the same principle to reduce latency, improve throughput and therefore save money on your own deployments.</p><p>Saving two tokens won&#8217;t make a big impact on TTFT, but prefix caching can skip prefill on thousands of tokens in certain domains:</p><ul><li><p><strong>Complex system prompts: </strong>Agents, customer-facing chatbots, RAG scaffolds, and tool calls often feature long, complex system prompts on every call.</p></li><li><p><strong>Code completion: </strong>Code completion, code generation, and other coding functions require passing the same thousands of lines of code as shared context.</p></li><li><p><strong>Documents and retrieval: </strong>Document summarization, question answering, and retrieval all add repeated context ahead of user prompts.</p></li><li><p><strong>Multi-turn conversations: </strong>Ordinary conversations repeat back every message in a chat template, increasing the savings from prefix caching with every turn.</p></li></ul><p>Prefix caching works from the start of the input sequence until the first non-repeated token. The fourth token in the weather example, a question mark, is shared between the two input sequences. However, the prefix ends at the first non-repeated token, so the fourth token isn&#8217;t read from cache.</p><p>Since prefixes end at the first unique token, your context engineering determines TTFT savings. Consider a different approach to the same prompt:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ehf3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ehf3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 424w, https://substackcdn.com/image/fetch/$s_!ehf3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 848w, https://substackcdn.com/image/fetch/$s_!ehf3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 1272w, https://substackcdn.com/image/fetch/$s_!ehf3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ehf3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png" width="1456" height="793" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:793,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ehf3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 424w, https://substackcdn.com/image/fetch/$s_!ehf3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 848w, https://substackcdn.com/image/fetch/$s_!ehf3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 1272w, https://substackcdn.com/image/fetch/$s_!ehf3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb65f49c0-53a8-49cf-bade-4144a02a728b_1600x871.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>A pair of four-token sequences with no prefix match. The first tokens are different, so it doesn&#8217;t matter that the next three are the same</em></figcaption></figure></div><p>Here, there is no savings from prefix caching, as the very first token differs between the two sequences, even though every subsequent token is the same. To take advantage of prefix caching, ensure that novel tokens are as late in your prompt as possible.</p><h3>Approach #4: Parallelism</h3><p>Tensor Parallelism (TP) should be your default strategy for multi-GPU model inference. It supports dense models like Llama 405B, and the MoE (mixture of experts) models that currently dominate the open model landscape.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hyIE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hyIE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 424w, https://substackcdn.com/image/fetch/$s_!hyIE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 848w, https://substackcdn.com/image/fetch/$s_!hyIE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 1272w, https://substackcdn.com/image/fetch/$s_!hyIE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hyIE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png" width="1456" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hyIE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 424w, https://substackcdn.com/image/fetch/$s_!hyIE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 848w, https://substackcdn.com/image/fetch/$s_!hyIE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 1272w, https://substackcdn.com/image/fetch/$s_!hyIE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3fca43a-177d-483f-aaaa-09a77d7ded48_1600x635.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tensor Parallelism splits weights across GPUs, effectively sharing VRAM resources to run large models fast</em></figcaption></figure></div><p>TP works by splitting apart each layer of the model (as opposed to Pipeline Parallelism, which keeps layers intact) and distributing the layer fragments across the allocated GPUs. For each layer, the expense of reading from weights&#8217; memory and executing matrix multiplication is shared across the GPUs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!upQ4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!upQ4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 424w, https://substackcdn.com/image/fetch/$s_!upQ4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 848w, https://substackcdn.com/image/fetch/$s_!upQ4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 1272w, https://substackcdn.com/image/fetch/$s_!upQ4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!upQ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png" width="1456" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!upQ4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 424w, https://substackcdn.com/image/fetch/$s_!upQ4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 848w, https://substackcdn.com/image/fetch/$s_!upQ4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 1272w, https://substackcdn.com/image/fetch/$s_!upQ4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ba6857f-6b2f-43a1-8bb7-d6bd5956cfba_1600x635.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>For an MoE models, each expert runs across multiple GPUs with Tensor Parallelism</em></figcaption></figure></div><p>However, the results of each layer need to be communicated in an all-reduce fashion (across all eight GPUs) into a single output before the next layer can be computed. In nodes with high-bandwidth intra-node NVLink and NVSwitch, this communication overhead is minimized.</p><p>Increasing Tensor Parallelism improves TPS on a per-user basis, assuming the model is large enough and the sequences are long enough that the communication overhead doesn&#8217;t outweigh the faster forward pass &#8211; which is the case for most frontier models.</p><p>Expert Parallelism (EP) neatly divides experts across GPUs, so that in a model with 128 experts served in EP8 across eight GPUs, each GPU hosts 16 full experts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ky6k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ky6k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 424w, https://substackcdn.com/image/fetch/$s_!ky6k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 848w, https://substackcdn.com/image/fetch/$s_!ky6k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 1272w, https://substackcdn.com/image/fetch/$s_!ky6k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ky6k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png" width="1456" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ky6k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 424w, https://substackcdn.com/image/fetch/$s_!ky6k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 848w, https://substackcdn.com/image/fetch/$s_!ky6k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 1272w, https://substackcdn.com/image/fetch/$s_!ky6k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f4985f5-2ca5-46d3-82e1-dcd6b5d79e11_1600x635.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>EP runs each expert within a single GPU, with each GPU hosting multiple experts</em></figcaption></figure></div><p>EP improves total system throughput, making inference more scalable and less expensive. With individual experts processing tokens separately, each token takes just as long, but the system as a whole can handle more simultaneous tokens.</p><p>Many deployments use a mix of TP and EP to achieve both benefits.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lp5z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lp5z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 424w, https://substackcdn.com/image/fetch/$s_!Lp5z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 848w, https://substackcdn.com/image/fetch/$s_!Lp5z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 1272w, https://substackcdn.com/image/fetch/$s_!Lp5z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lp5z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png" width="1456" height="782" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:782,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Lp5z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 424w, https://substackcdn.com/image/fetch/$s_!Lp5z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 848w, https://substackcdn.com/image/fetch/$s_!Lp5z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 1272w, https://substackcdn.com/image/fetch/$s_!Lp5z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29868580-3456-4211-bad8-74d93a75ac3e_1600x859.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>This deployment uses TP for attention and EP for the sparse MoE layer</em></figcaption></figure></div><p>EP requires less inter-GPU communication than Tensor Parallelism. The Expert Router, which determines which experts each token activates, is replicated onto each GPU as it is a relatively small component of the model. Inter-GPU communication is necessary for passing tokens from expert to expert, but unlike TP, it is not required to collect the results of each layer.</p><p>Thanks to this lower communication overhead, EP scales well to multi-node deployments and systems with limited interconnect bandwidth.</p><h3>Approach #5: Disaggregation</h3><p>Disaggregation combines three important ideas in inference engineering:</p><ol><li><p>Prefill is a compute-bound process that determines the time to first token (TTFT), while decode is a memory-bound process that determines TPS.</p></li><li><p>Specialization improves performance in everything from kernel selection to inference engine parameter tuning.</p></li><li><p>You can effectively parallelize model serving over multiple GPUs, or even multiple nodes, if you can avoid bottlenecks from lower-bandwidth interconnects.</p></li></ol><p>When prefill and decode run on the same node under heavy traffic, they have a higher chance of interfering with one another. Ideally, prefill uses more compute resources, while decode uses more memory, and the two can co-exist efficiently. However, with larger batches and more compute-intensive optimizations, prefill and decode start competing for resources.</p><p>Disaggregation, or disaggregated serving, is the idea of separating prefill and decode into separate engines on separate GPUs or nodes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lh6b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lh6b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 424w, https://substackcdn.com/image/fetch/$s_!lh6b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 848w, https://substackcdn.com/image/fetch/$s_!lh6b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 1272w, https://substackcdn.com/image/fetch/$s_!lh6b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lh6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png" width="1456" height="923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:923,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lh6b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 424w, https://substackcdn.com/image/fetch/$s_!lh6b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 848w, https://substackcdn.com/image/fetch/$s_!lh6b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 1272w, https://substackcdn.com/image/fetch/$s_!lh6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90917382-7d84-4ddf-806f-a514cd61d581_1600x1014.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Disaggregation assigns prefill workers to generate the first token and decode workers to generate subsequent tokens</figcaption></figure></div><p>Disaggregation turns LLM inference into a three-step process:</p><ol><li><p>The prefill engine takes the input sequence and generates a KV cache while computing the first token.</p></li><li><p>The prefill engine sends the KV cache over the hardware interconnect to the decode engine.</p></li><li><p>The decode engine computes all subsequent tokens.</p></li></ol><p>In conditional disaggregation, the request is first sent to the decode engine, which checks if the input sequence is already cached, or is short enough to handle locally:</p><ol><li><p>If so, the decode engine handles prefill locally, skipping disaggregation.</p></li><li><p>If not, the decode engine transfers the request to the prefill engine for disaggregated serving.</p></li></ol><p>Conditional disaggregation is better for real-world traffic.</p><p>Another benefit of disaggregation is that with separate prefill and decode engines, you can optimize each engine individually and the system as a whole. For example, the compute-bound prefill engine requires a lower TP than the memory-bound decode engine.</p><h2>Takeaways</h2><p><em>This is Gergely again. </em>Thanks to <a href="https://x.com/philipkiely">Philip</a> for this deepdive into inference engineering, which is around 10% of the contents of his new book, <em>&#8221;Inference Engineering.&#8221;</em> If you&#8217;d like to go deeper into this topic, you can download the full book for free:</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://baseten.com/inference-engineering&quot;,&quot;text&quot;:&quot;Get the full e-book, for free&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://baseten.com/inference-engineering"><span>Get the full e-book, for free</span></a></p><p>This title will also be available in physical, printed form: sign up to the <a href="https://www.baseten.co/inference-engineering/paper-waitlist/">waitlist</a> to be notified when it&#8217;s available.</p><p><strong>It&#8217;s encouraging that inference engineering is no longer a &#8220;monopoly&#8221; belonging to a few leading AI labs. </strong>Top AI model makers like OpenAI and Anthropic control all aspects of their AI models &#8211; from training to inference &#8211; so there&#8217;s no inference engineering to be done with them.</p><p>However, thanks to increasingly capable open models, engineering teams have the opportunity to tweak how they use models, and this is where the theory and practice of inference engineering becomes invaluable.</p><p><strong>Even so, the discipline of inference engineering still seems to only make sense for a subset of tech companies. </strong>To justify investment in inference engineering, you need to be spending big money on inference from vendors. This is the point at which it can make sense to invest time and money to see if you can set up your own inference stack on top of open models, and swap out some existing usage.</p><p><strong>I wonder if inference engineering is the AI version of the &#8220;build vs buy&#8221; dilemma. </strong>For software-as-a-service (SaaS), the question for every company is whether to build it in-house, or buy from a vendor. For example, should you build a project management software (it&#8217;s possible!), or just buy an existing one? And what about feature flagging, not to mention observability?</p><p>Experienced engineers all understand the pros and cons of building it yourself (time and maintenance, which is a constant drag.) Tuning and operating your own LLM stack is a much newer field, and inference engineering is at the heart of building better inference stacks than what comes &#8220;out of the box&#8221; with open models.</p><p><strong>Picking up the basics of inference engineering feels like a valuable skill &#8211; and it&#8217;s also new and interesting. </strong>If you become well-versed in inference engineering, you could create optionality for your own team and company in LLM usage<strong>. </strong>Running your own inference stack on top of an open model gives control of what you&#8217;re running and of pricing. Inference engineering helps create options for achieving better performance from an open model by using the approaches covered in the extract above from Philip&#8217;s book.</p>]]></content:encoded></item><item><title><![CDATA[The Pulse: is GitHub still best for AI-native development?]]></title><description><![CDATA[Poor availability has dogged GitHub for months and raises questions about its status and focus. Plus, Microsoft promises Windows will not be &#8220;Microslop&#8221;, a massive LLM supply chain attack, and more]]></description><link>https://newsletter.pragmaticengineer.com/p/the-pulse-is-github-still-best-for</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-pulse-is-github-still-best-for</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Thu, 26 Mar 2026 17:23:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!X6bS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48b5fd12-15b7-46b8-acd4-d78d55ef2fe4_1576x432.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The Pulse is a series covering events, insights, and trends within Big Tech and startups. Notice an interesting event or trend? Hit reply and share it with me.</em></p><p>Today, we cover:</p><ol><li><p><strong>Does GitHub still merit &#8220;top git platform for AI-native development&#8221; status?</strong> Availability has dropped to one nine (~90% &#8211; !!), partly due to not being able to handle increased traffic from AI coding agents. There&#8217;s also no CEO and an apparent lack of direction.</p></li><li><p><strong>Should a tool auto-add itself as a contributor to PRs? </strong>Claude Code and GitHub Copilot auto-add themselves to commits, which is effectively free advertising. Codex and OpenCode purposely do not.</p></li><li><p><strong>Microsoft promises Windows will not be &#8220;Microslop.&#8221; </strong>After years of forced Copilot integrations, Start menu ads, and mandatory Microsoft accounts, the Windows team is promising to undo the self-inflicted damage done to the OS. It&#8217;s better late than never, but why did Microsoft allow the &#8220;Microslop&#8221; perception to stick around so long?</p></li><li><p><strong>Industry pulse. </strong>Massive LLM supply chain attack via LiteLLM, backlash after Cursor forgets to mention that Composer 2 is based on an open source model, what happens when you stop reviewing AI code, OpenAI kills Sora, and more.</p></li></ol><h2>1. Does GitHub still merit &#8220;top git platform for AI-native development&#8221; status?</h2><p>We&#8217;re used to highly reliable systems which target four-nines of availability (99.99%, meaning about 52 minutes of downtime per year), and for it to be embarrassing to barely hit three nines (around 9 hours of downtime per year.) And yet, in the past month, GitHub&#8217;s reliability is down to one nine!</p><p>Here&#8217;s data from the third-party, &#8220;<a href="https://mrshu.github.io/github-statuses/">missing GitHub status page</a>&#8221;, which was built after GitHub stopped updating its own status page due to terrible availability. Recently, things have looked poor:</p>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-is-github-still-best-for">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[“How to be a 10x engineer” – interview with a standout dev]]></title><description><![CDATA[An interview with an engineer with no public GitHub contributions, setting clear boundaries &#8211; and yet not having needed to apply for positions when searching for a job, because referrals found them]]></description><link>https://newsletter.pragmaticengineer.com/p/how-to-be-a-10x-engineer-interview</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/how-to-be-a-10x-engineer-interview</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 24 Mar 2026 18:26:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wORh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It was at Uber that I met one of the single best engineers I&#8217;ve had the fortune to work with; let&#8217;s call them &#8220;Sam&#8221; for this article. As engineers, we briefly worked together, and when I became a manager, Sam&#8217;s name regularly came up during <a href="https://newsletter.pragmaticengineer.com/p/performance-calibrations">performance calibrations</a> as being among the company&#8217;s top 10% engineers. One year, he was in the &#8220;top, top&#8221; bucket reserved for the 3% best engineers.</p><p>After I left Uber, we stayed in touch, and a few months ago I heard he was exploring the next opportunity,<strong> </strong>and found out from him that<strong> Sam&#8217;s job search looked nothing like most people&#8217;s:</strong> he didn&#8217;t apply for a single role. Instead, there were reachouts from former colleagues desperate to hire them.</p><p>When we talked for this article, Sam had three warm leads which wanted to interview him ASAP. One startup was not even hiring, but the founder was ready to create a new position just for him.</p><p>I posted a message on LinkedIn about Sam:</p><blockquote><p>&#8220;I hate the term &#8220;10x engineer&#8221; but this engineer is a role model for what a standout engineer is - in fact, some of my writing of standout engineers reference my interactions with folks like them (e.g. my article on the product-minded engineer, this one: https://lnkd.in/et7nWBgW)</p><p>And still, from the outside, this engineer is nearly completely invisible.</p><p>No social media footprint. The LinkedIn profile lists his companies worked at, and nothing else: no technologies, no projects, nothing. Their GitHub is empty for the last 5 years, and has perhaps a dozen commits throughout the last 10 years.&#8221;</p></blockquote><p>This is Sam&#8217;s GitHub contributions for the last several years: absolutely nothing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wORh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wORh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 424w, https://substackcdn.com/image/fetch/$s_!wORh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 848w, https://substackcdn.com/image/fetch/$s_!wORh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!wORh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wORh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png" width="1456" height="1183" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1183,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wORh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 424w, https://substackcdn.com/image/fetch/$s_!wORh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 848w, https://substackcdn.com/image/fetch/$s_!wORh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 1272w, https://substackcdn.com/image/fetch/$s_!wORh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb83f9d2f-711f-4edd-bc8a-303b8de422e5_1600x1300.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Zero public contributions: Behind the profile, one of the best software engineers I&#8217;ve worked with</figcaption></figure></div><p>One of the most upvoted comments on my post was by cloud technologist <a href="https://www.linkedin.com/feed/update/urn:li:activity:7381615884282462209?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7381615884282462209%2C7381647792106270721%29&amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287381647792106270721%2Curn%3Ali%3Aactivity%3A7381615884282462209%29">Olivier Frolovs</a>, who requested an article on Sam for others to learn how he operates. Now, Sam has generously agreed to an interview and has asked to remain anonymous, hence the <em>nom de plume </em>(pseudonym)<em>.</em></p><p>He doesn&#8217;t seek public attention and has a strong professional reputation. During our chat, he offered pointers for engineers looking to excel, and also as proof that an empty GitHub profile and zero social media presence don&#8217;t mean you can&#8217;t be a truly standout developer.</p><p>I also interviewed one of his former managers from Uber for that perspective. Today, we cover:</p><ol><li><p><strong>Getting things done.</strong> High-level task breakdowns, combined with communicating delays as tradeoffs to stakeholders.</p></li><li><p><strong>Setting boundaries.</strong> Saying &#8220;no,&#8221; prioritizing family or work &#8211; and being clear about it, and treating prioritization as a daily practice.</p></li><li><p><strong>Office politics.</strong> Participate selectively and cautiously, build relationships with influential colleagues, pre-sell ideas, direct communication.</p></li><li><p><strong>Negotiation and conflict. </strong>Approach engineers before their managers, build bottom-up consensus, and start with relationship-building.</p></li><li><p><strong>Promotions, keeping up-to-date, and finding the next job. </strong>A personal take on The Big Tech promotion processes, keeping up with the industry, and relying on referrals more.</p></li><li><p><strong>Becoming a manager. </strong>Ownership and independence separate &#8220;good&#8221; engineers from &#8220;great&#8221; ones.</p></li><li><p><strong>Feedback from Sam&#8217;s ex-manager. </strong>A former manager reveals what made Sam stand out to them &#8211; and shares some potential growth opportunities.</p></li></ol><h2>Background</h2><p>In this article, my questions are in <em>italic</em> and Sam&#8217;s voice is in normal text.</p><p><em>Sam, how did you get into tech?</em></p><p>I was always intrigued by computers, software, and &#8211; as it came around &#8211; the internet. My dad had a black-and-white screen laptop for work with Windows 3.1 and some games. We got our first personal computer with Windows 95 and I remember it vividly.</p><p>I started developing websites for my primary school and the company my mom worked at when I was 12. I got paid a small sum, so that was my first-ever paid programming project! Around then, I taught myself Visual Basic 6 and started building 2D and 3D mini-games. I found the website <a href="http://directx4vb.vbgamer.com/">DirectXVB</a> (which is still live today), emailed the website&#8217;s owner with issues I ran into, and they helped me with pointers. Later, I taught myself PHP and built more dynamic websites.</p><p>From the age of 14, I stopped all coding &#8211; I just got tired of it! &#8211; and focused on my studies, and getting into college. I chose a non-computer science major for university, but picked up coding on the side, and I rediscovered that spark, so, in the first year of my Master&#8217;s, I decided to switch and do a Computer Science Bachelor&#8217;s. It was during college that I started to build apps and websites, and it&#8217;s when I got truly hooked on software development.</p><h3>From agency, to large company, then Uber</h3><p>I joined an agency as their first hire, building apps for local companies. It was a small team, we learned with trial-and-error, and getting done at 2-3 o&#8217;clock in the morning was common enough. I stayed 18 months and learned a lot about ownership, the importance of an eye for detail, and collaborating with others.</p><p><strong>My favorite part of the job was the few times I worked directly with a designer</strong>: our agency employed freelance designers who were not involved in most of the projects because the company was trying to save money by having them work less, and not be involved in planning and rollouts. But during the implementation phase, I&#8217;d find myself talking with the designer and bouncing implementation and design ideas around.</p><p>I then joined a small startup where we built our own product. A highlight was having two designers on the team fulltime, whom I could work with and learn from. Engineering also felt like a level up: everyone cared about software quality and UX details.</p><p>Our startup got acquired by a larger company and most of us moved to the Bay Area. We stayed together as a team and were told we would maintain a &#8220;startup culture.&#8221; The founders tried their best to stay true to their word, but they couldn&#8217;t shield us from the reality of working in a corporation.</p><p><strong>I learned a lot about corporate processes, and it was more interesting than I&#8217;d expected. </strong>As I was getting closer to the senior engineer level, I had to understand how internal politics worked, how to &#8220;massage&#8221; peer teams to help support our proposals, and how to talk with engineering leaders like senior managers and directors. Our company was also hugely focused on the annual company event: it was eye-opening for me to see just how much effort went into preparation. It consisted of several rehearsals and dedicated engineering work to showcase our stuff in a way that was near-flawless on the day.</p><p>After a few years, I felt ready for a change and joined Uber. I took a &#8220;title cut,&#8221; something akin to the &#8220;<a href="https://newsletter.pragmaticengineer.com/p/the-seniority-rollercoaster">seniority rollercoaster.</a>&#8221; At Uber, I worked in a new area and got promoted several times. After Uber, I worked at another Big Tech, and now &#8211; very recently &#8211; I&#8217;ve begun at a startup.</p><h2>1. Getting things done</h2><p><em>Feedback at Uber about you during performance calibrations was that you&#8217;re excellent at getting things done. What&#8217;s your process?</em></p><p>When I started out as a junior dev, I pulled long hours so I could deliver on time &#8211; regardless of how much effort it took. I don&#8217;t know what it was, but I always felt that failing to deliver on time was <em>never</em> an option.</p><p>I still vividly remember one project where I worked incredibly hard but still failed to deliver with the quality I expected from myself. As embarrassing as it is, I was so exhausted that I almost started crying on the spot. One of my coworkers comforted me and told me:</p><blockquote><p>&#8220;Man, you&#8217;re crying about the wrong thing. No one died, no one got hurt, and no one will even care that we&#8217;re a few days late, save for the project manager. But even he&#8217;s used to everything being late. Go home, have some sleep, come back tomorrow and take it easy.&#8221;</p></blockquote><p>They were right, of course. Still, I&#8217;m pretty sure this inner pressure to be unsatisfied with &#8220;good enough&#8221; explains a lot about how I work.</p><p><strong>Having a high-level breakdown of the work, and communicating to stakeholders has been my &#8220;secret&#8221;, later in my career</strong>. After a few years as a dev, my estimation skills got better and I had to pull fewer late nights. I also found a hack that greatly helped was doing a high-level breakdown as early as possible, in <em>all</em> cases. As soon as I understand what the work is, I break it all down, ideally on a whiteboard or paper.</p><h3>Importance of communication</h3><p><em>You were also seen as a strong communicator, whether it was with engineers, engineering managers, or product managers. How do you get your point across?</em></p><p>Communicating delays as &#8220;tradeoffs&#8221; works extremely well. As soon as I start a project which I&#8217;m the lead on, I establish communication channels with key stakeholders: product managers, my engineering leadership, and business stakeholders, via email or Slack, I keep them in the loop at least weekly, and about anything that could be a roadblock.</p><p><strong>In my experience, delays are not an issue as long as they are communicated upfront with an explanation and potential alternatives.</strong> When we hit a roadblock that slows down our work, I would never communicate that we&#8217;re &#8220;behind&#8221;. I would offer alternatives like:</p><ul><li><p>We can still ship on time, but we&#8217;d need to cut X and Y features for this release</p></li><li><p>If we are not comfortable cutting X and Y features, then we will need to push out the target date by 2 weeks. If we are comfortable, we can push it out by 1 week</p></li></ul><p>The trick, I&#8217;ve found, is to make it clear to stakeholders that we have a <em>choice</em>: choose more features to ship, or choose a lower-priority feature to drop.</p><p>I learned most of my hacks from people who are good at getting things done, and they have a few attributes:</p><ul><li><p><strong>Task breakdown:</strong> early in my career, there was a senior engineer who was methodical about breaking down tasks and making estimates, even for seemingly trivial projects &#8211; and it worked!</p></li><li><p><strong>Communication tools: </strong>I observed the few <em>really</em> organized product managers, engineering managers, and tech lead, and made their communication styles into a &#8220;package&#8221; that worked for me; things like email updates, facilitating kickoff meetings, launch announcements (including how to communicate a failed/sunset project as a successful launch), and more.</p></li></ul><p>Being good at communication means you have a solid foundation, and then develop a feel for how to best utilize the tools you have. There&#8217;s no &#8220;one-size-fits-all&#8221; approach: people react better or worse to different things. Try to get to know folks around you and put yourself in their shoes.</p><h3>Doing great work</h3><p><em>What does &#8220;standout&#8221; work look like to you?</em></p><p><strong>I think about the quality of my work similarly to the quality of work I do at home. </strong>I have moved houses and renovated them several times and I greatly care about the quality of that work. And I&#8217;ve seen plenty of contractors come to my place, perform their work, and then leave without actually caring about the quality. They just want to &#8220;get s*** done&#8221; and be out of there. I never understood how someone can keep doing their job without feeling a lot of love for it!</p><p><strong>I need to get energy from everything I do, not just in my job. </strong>Whether it&#8217;s playing games with my kids, helping my wife with her website, or building a new website feature for the company I work for: I approach it all with the same attitude.</p><p>Equally, if I no longer get energy from the work I do, then I basically stop enjoying it and this can be a nudge to start to look for something else. If it continues for a long time, this urge can become more persistent, and that&#8217;s the point when I have switched companies or teams. I can go on for some time without getting energy from my work, but it drains me. I try to catch myself before it gets too bad, and I&#8217;ve managed to do so, up to now. This is why I quit my last job without having anything lined up: I stopped getting energy from it for many months and talked with my management chain about it, but they were unable and unwilling to change anything. I needed a change, so it was me who made it.</p><h3>Stepping outside of domain expertise</h3><p><em>You frequently went outside of your domain, working with engineering teams on other platforms and contributing to codebases you&#8217;re not expert in. You seemed to have a great relationship with most engineers, in contrast to some devs. How did you do this?</em></p><p>I am pretty curious and prefer to talk directly with engineers.<strong> </strong>So, when I&#8217;d work on a project with engineers on a different stack, I would ask them to explain their high-level architecture approaches, and roll up my sleeves to make small code changes in a stack I was unfamiliar with.</p><p>Once you understand the high-level structure of a different codebase, and you also know how to make a few small changes, suddenly, it&#8217;s so much easier to figure things out on your own!</p><p><strong>An approach that consistently worked for me is approaching problems from the customer&#8217;s perspective, and being genuinely curious.</strong> For example, I might ping an engineer working on a different system and ask:</p><blockquote><p>&#8220;I noticed a customer has this problem, and to fix it, we probably need to touch the system you own. I don&#8217;t know much about this system: can you explain how it works, and what we could perhaps do to solve this issue that causes frustration for the customer?&#8221;</p></blockquote><p>By making it clear that my goal is to solve a customer problem, I&#8217;m not coming across as just digging around for nothing. And by making it clear that I&#8217;d like to learn from them, it avoids being seen as someone trying to confirm what they are doing, which could come across as arrogant &#8211; especially when the other engineer is the expert on their own system. I&#8217;ve found fellow engineers are happy to explain their understanding and decisions.</p><h2>2. Setting boundaries</h2><p><em>At Uber, I recall you were very good at setting boundaries and saying &#8220;no.&#8221; How do you do that?</em></p><p><strong>Honestly, I find it tough to say no &#8211; but I learned that it&#8217;s worse when I don&#8217;t. </strong>I found that saying &#8216;yes&#8217; to everything usually results in an unmanageable, unbalanced pile of work. Prioritizing is key: I always remind myself to focus on what matters most. For me, the &#8220;most important&#8221; thing for any given topic could be:</p><ul><li><p>A shipping deadline that needs to be hit and is non-negotiable</p></li><li><p>Family needs</p></li><li><p>An urgent task that needs to be done on the same day</p></li></ul><p><strong>Family is very important to me. </strong>When I worked at Uber, I had a reasonably long commute to the office. I blocked out my calendar so I could leave on time in order to be home for dinner with my family. This did not mean I stopped work immediately; I would sometimes work during my commute and, when necessary, I logged back on to continue working after my kids were in bed.</p><p>When we had important deadline agreements at work, I made an agreement with my partner that I stayed longer in the office because I knew it was important to put in extra effort and deliver standout work then.</p><p>It goes back to prioritizing and focusing on the most important thing. Looking back, I&#8217;d say most of the time, the most important thing for me was family, and that work overrode this every now and then.</p><p>My approach to prioritizing keeps changing, though. Demands at home keep changing and expectations at work also change; after Uber, other jobs increasingly focused on async and remote work. This meant more flexibility to accommodate family time &#8211; but work could spill over into evening hours if I did not finish everything.</p><p><strong>If I can give one piece of advice, it&#8217;s to understand what is important for </strong><em><strong>you</strong></em><strong>. </strong>Know your number one, number two, number three priorities: and arrange your workday so you do your top priorities. Don&#8217;t compromise on the most important one!</p><h2>3. Office politics</h2><p><em>At work, how plugged in were you to office politics?</em></p><p><strong>I was aware of politics and tried to build relationships with &#8220;influential&#8221; people. </strong>I try to stay away from &#8220;cocky&#8221; types, and to find what I want to achieve through different folks.</p><p>The importance of politics is something I really started to understand when working at Uber. Initially, I was ignorant, but the more experience I got in Big Tech, the more it became obvious. It took a while before I was able to participate in it. I never liked it; I tend to be direct and transparent, but that does not work in every situation.</p><p><em>Did you take part in it to get stuff done?</em></p><p>Yes, sometimes by being direct and transparent, and communicating the right amount of information, you can get a lot done. Occasionally, it required me to &#8220;massage&#8221; an idea on multiple people before going to the person who called the shots.</p><p><em>What is your view of engineers who are seen as&#8220;political&#8221;?</em></p><p>It&#8217;s part of the game and sometimes it&#8217;s useful to have a good relationship with those people, as you can use that for your own benefit, as well. I personally would never invest much time in understanding and practising politics, as I prefer to focus on building and product.</p><h2>4. Negotiation &amp; conflict</h2><h3>Negotiating with teams</h3><p><em>You were perceived as being good with other teams, and at removing roadblocks for your own. How did you approach this?</em></p>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/how-to-be-a-10x-engineer-interview">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Pulse: Is the FDE role becoming less desirable?]]></title><description><![CDATA[Also: AI-agent generated pull requests cause headaches for large open source projects, OpenAI acquires the creator of uv, a sudden Cursor price hike annoys some enterprise customers, and more]]></description><link>https://newsletter.pragmaticengineer.com/p/the-pulse-is-the-fde-role-becoming</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-pulse-is-the-fde-role-becoming</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Thu, 19 Mar 2026 17:45:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zoD4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e2af820-b16c-41c7-8e80-15563de1864f_1400x1094.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The Pulse is a series covering events, insights, and trends within Big Tech and startups. Notice an interesting event or trend? Hit reply and share it with me.</em></p><p>Today, we cover:</p><ol><li><p><strong>Is the FDE role becoming less desirable? </strong>Job postings for Forward Deployed Engineers (FDEs) have surged, but many professionals don&#8217;t want the role because it&#8217;s more like solutions&#8230;</p></li></ol>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-is-the-fde-role-becoming">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Building WhatsApp with Jean Lee]]></title><description><![CDATA[Jean Lee, engineer #19 at WhatsApp, on scaling the app with a tiny team, the Facebook acquisition, and what it reveals about the future of engineering.]]></description><link>https://newsletter.pragmaticengineer.com/p/building-whatsapp-with-jean-lee</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/building-whatsapp-with-jean-lee</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Wed, 18 Mar 2026 17:20:22 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/191213865/da03f33b54b9fc288fb0794ada97a07b.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<h3>Stream the latest episode</h3><p><strong>Listen and watch now on <a href="https://youtu.be/5Kn32cIWPSY">YouTube</a>, <a href="https://open.spotify.com/episode/56bXJZveAm2QfPViN8FPuk">Spotify</a>, and <a href="https://podcasts.apple.com/us/podcast/the-pragmatic-engineer/id1769051199">Apple</a>.</strong> See the episode transcript at the top of this page, and timestamps for the episode at the bottom.</p><h3><strong>Brought to You by</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gh57!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png" width="800" height="70" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:70,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.pragmaticengineer.com/i/185094534?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gh57!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 424w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 848w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1272w, https://substackcdn.com/image/fetch/$s_!Gh57!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9835d46-a4d0-40e1-a16b-dba8068fd6ad_800x70.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>&#8226; <strong><a href="http://statsig.com/pragmatic">Statsig</a></strong> &#8211; &#8288; The unified platform for flags, analytics, experiments, and more. Stop switching between different tools, and have them all in one place.</p><p>&#8226; <strong><a href="https://www.sonarsource.com/pragmatic/?utm_medium=paid&amp;utm_source=pragmaticengineer&amp;utm_campaign=ss-ai&amp;utm_content=podcast-sonar-ai-lp&amp;utm_term=ww-all-x&amp;s_category=Paid&amp;s_source=Paid%20Other&amp;s_origin=pragmaticengineer">Sonar</a></strong> &#8211; The makers of SonarQube, the industry standard for automated code review. Sonar helps reduce outages, improve security, and lower risks associated with AI and agentic coding. <a href="https://www.sonarsource.com/pragmatic/">See how Sonar</a> is empowering the Agent Centric Development Cycle with new products and capabilities that strengthen the guide, verify, and solve phases of development.</p><p>&#8226; <strong><a href="https://workos.com/">WorkOS</a></strong> &#8211; Everything you need to make your app enterprise ready. Skip the rebuild for enterprise features. Keep shipping. Visit <a href="http://workos.com">WorkOS.com</a>.</p><h3><strong>In this episode</strong></h3><p>How did a tiny team of 30 engineers build the world-famous messaging app more than a decade ago, and what can dev teams learn from that feat today? <a href="http://linkedin.com/in/jeanklee">Jean Lee</a> was engineer #19 at <a href="https://www.whatsapp.com/">WhatsApp</a>, joining when the company was still small, with almost no formal processes. She helped it scale to hundreds of millions of users, went through the $19B acquisition by Facebook, and later worked at Meta.</p><p>In this episode of <em>Pragmatic Engineer</em>, I talk with Jean about what it was like building WhatsApp. When Facebook bought WhatsApp in 2014, only around 30 engineers supported hundreds of millions of users across eight platforms.</p><p>We discuss how the founders kept things simple, saying &#8220;no&#8221; to most feature requests for years. Jean explains why WhatsApp chose Erlang for the backend, why the team avoided cross-platform abstractions, and how charging users $1 per year paid everyone&#8217;s salaries, while keeping growth intentionally slow.</p><p>Jean also shares what the Facebook acquisition was like on the inside, how she dealt with sudden personal wealth, and what it was like transitioning from an IC to a manager at Facebook &#8211; including the reality of calibration meetings and performance reviews.</p><p>We also discuss how AI enables smaller engineering teams, and why WhatsApp&#8217;s experience suggests ownership and trust might matter more than tools.</p><h3><strong>Key observation from Jean</strong></h3><p>Ten takeaways from Jean that I find the most interesting:</p><div id="youtube2-5Kn32cIWPSY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;5Kn32cIWPSY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/5Kn32cIWPSY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>1. WhatsApp built a billion-dollar business with a tiny team, and no AI tools. </strong>WhatsApp served 450 million users with only 30 engineers, long before AI tools existed. Jean says: &#8220;I wonder if being able to move fast is independent from AI. When you&#8217;re small, you&#8217;re just more efficient.&#8221;</p><p><strong>2. WhatsApp had no code reviews after in-place. </strong>WhatsApp cofounder, Brian Acton, reviewed the very first pull request of each new hire, and after that, there were no more code reviews. Jean recounts how Brian reviewed her debut PR in extreme detail. This first (and only!) review set the bar high, and she wrote code to that standard from then on.</p><p><strong>3. WhatsApp had close to zero formal processes</strong>. WhatsApp had no Scrum, no Agile, no TDD (test driven development), and no formal code reviews beyond the first commit. In contrast, Skype had 1,000 engineers and mandatory Scrum training, but WhatsApp still outcompeted it and won. Jean&#8217;s response to hearing of all the formal processes Skype used in order to execute faster: &#8220;I&#8217;m surprised to hear they thought they were shipping faster because of it.&#8221; Perhaps process is often a substitute for trust, not quality?&#8221;</p><p><strong>4. WhatsApp&#8217;s office had a countdown display showing days since the last outage.</strong> When an outage happened, no emails were sent around, and no meetings were called. The number simply reset to zero. Avoiding outages was on everyone&#8217;s mind as a result. This is an example of how visible metrics can create accountability without bureaucracy.</p><p><strong>5. WhatsApp delayed video calling for years, until it was extremely polished.</strong> Contrary to the &#8220;launch early, then iterate&#8221; mantra, WhatsApp held features like video calling back. They also tested features extensively with family members before releasing anything publicly, as part of their refusal to launch something of less than top-notch quality.</p><p><strong>6. Saying &#8220;no&#8221; to features was a competitive advantage.</strong> WhatsApp&#8217;s CEO, Jan Koum, rejected 99% of feature requests from the team. While competitors shipped dozens of shiny, new features, WhatsApp ruthlessly prioritized reliability and simplicity. Jan repeatedly told the team what the mission was. &#8220;I want a grandma living in the countryside to be able to use our app&#8221;, he said.</p><p><strong>7. WhatsApp&#8217;s team was older and more experienced than most startups at the time. </strong>In 2014 when Facebook acquired WhatsApp, only four out of the 30 engineers were less than 30 years old. Perhaps part of the reason for WhatsApp&#8217;s stunning success was having an unusually experienced team from the start.</p><p><strong>8. AI won&#8217;t replace the human touch in engineering management</strong>. Jean sees areas such as OKR management, documentation, and performance data gathering as domains in which AI can take on most of the work. But she believes that understanding and unblocking engineers is best done person-to-person, not by AI.</p><p><strong>9. Posting about your work on Meta&#8217;s &#8220;internal Facebook&#8221; site affects career growth there. </strong>Jean noted that engineers at the social media giant who regularly posted about their launches and learnings enjoyed a sizable advantage in performance calibration reviews.</p><p><strong>10. Jean&#8217;s advice to new grads: invest in the fundamentals.</strong> &#8220;Tools come and go, languages come and go, but foundations don&#8217;t go anywhere,&#8221; she says.</p><h3><strong>The Pragmatic Engineer deepdives relevant for this episode</strong></h3><ul><li><p><a href="https://newsletter.pragmaticengineer.com/p/building-the-threads-app">How Meta built Threads</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/project-management-in-tech">How Big Tech runs tech projects and the curious absence of Scrum</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/performance-calibrations">Performance calibrations at tech companies</a></p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/engineers-leading-projects-part-2">Software engineers leading projects</a></p></li></ul><h3><strong>Timestamps</strong></h3><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY">00:00</a>) Intro</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=99s">01:39</a>) Early years in tech</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=378s">06:18</a>) Becoming engineer #19 at WhatsApp</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=833s">13:53</a>) WhatsApp&#8217;s tech stack</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=1089s">18:09</a>) WhatsApp&#8217;s unique ways of working</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=1527s">25:27</a>) Countdown displays and outages</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=1627s">27:07</a>) Why WhatsApp won</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=1733s">28:53</a>) The Facebook acquisition</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=1993s">33:13</a>) Life after acquisition</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=2367s">39:27</a>) Working at Facebook in London</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=2647s">44:07</a>) Transitioning to management</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=2847s">47:27</a>) Performance reviews as a manager</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=3209s">53:29</a>) After Facebook</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=3533s">58:53</a>) AI&#8217;s impact on engineering</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=3754s">1:02:34</a>) Jean&#8217;s advice to new grads and startups</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=4005s">1:06:45</a>) Empowering employees</p><p>(<a href="https://www.youtube.com/watch?v=5Kn32cIWPSY&amp;t=4097s">1:08:17</a>) Book recommendations</p><h3><strong>References</strong></h3><p><strong>Where to find Jean Lee:</strong></p><p>&#8226; Substack: <a href="https://exaltitude.substack.com/  &#8226; LinkedIn: https://www.linkedin.com/in/jeanklee  &#8226; YouTube: https://www.youtube.com/@exaltitude  &#8226; Website: https://www.exaltitude.io">https://exaltitude.substack.com </a></p><p>&#8226; LinkedIn: <a href="https://www.linkedin.com/in/jeanklee">https://www.linkedin.com/in/jeanklee</a></p><p>&#8226; YouTube: <a href="https://www.youtube.com/@exaltitude">https://www.youtube.com/@exaltitude</a></p><p>&#8226; Website: <a href="https://www.exaltitude.io">https://www.exaltitude.io</a></p><p><strong>Mentions during the episode:</strong></p><p>&#8226; WhatsApp: <a href="https://www.whatsapp.com">https://www.whatsapp.com</a></p><p>&#8226; KakaoTalk: <a href="https://en.wikipedia.org/wiki/KakaoTalk">https://en.wikipedia.org/wiki/KakaoTalk</a></p><p>&#8226; Jan Koum: <a href="https://en.wikipedia.org/wiki/Jan_Koum">https://en.wikipedia.org/wiki/Jan_Koum</a></p><p>&#8226; Brian Acton on LinkedIn: <a href="https://www.linkedin.com/in/brianacton">https://www.linkedin.com/in/brianacton</a></p><p>&#8226; Yahoo: <a href="https://www.yahoo.com">https://www.yahoo.com</a></p><p>&#8226; Sequoia: <a href="https://sequoiacap.com">https://sequoiacap.com</a></p><p>&#8226; Cocktail Flow: <a href="https://cocktailflow.com">https://cocktailflow.com</a></p><p>&#8226; KaiOS: <a href="https://en.wikipedia.org/wiki/KaiOS">https://en.wikipedia.org/wiki/KaiOS</a></p><p>&#8226; Erlang: <a href="https://www.erlang.org">https://www.erlang.org</a></p><p>&#8226; Ericsson: <a href="https://www.ericsson.com">https://www.ericsson.com</a></p><p>&#8226; Erlang Factory 2014 - That&#8217;s &#8216;Billion&#8217; with a &#8216;B&#8217;: Scaling to the Next Level at WhatsApp: </p><div id="youtube2-c12cYAUTXXs" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;c12cYAUTXXs&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/c12cYAUTXXs?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>&#8226; WeChat: <a href="https://www.wechat.com">https://www.wechat.com</a></p><p>&#8226; Skype: <a href="https://en.wikipedia.org/wiki/Skype">https://en.wikipedia.org/wiki/Skype</a></p><p>&#8226; What is Scrum?: <a href="https://www.scrum.org/resources/what-scrum-module">https://www.scrum.org/resources/what-scrum-module</a></p><p>&#8226; Mark Zuckerberg: <a href="https://en.wikipedia.org/wiki/Mark_Zuckerberg">https://en.wikipedia.org/wiki/Mark_Zuckerberg</a></p><p>&#8226; Wealthfront: <a href="https://www.wealthfront.com">https://www.wealthfront.com</a></p><p>&#8226; A Random Walk Down Wall Street: The Best Investment Guide That Money Can Buy: <a href="https://www.amazon.com/Random-Walk-Down-Wall-Street/dp/1324051132">https://www.amazon.com/Random-Walk-Down-Wall-Street/dp/1324051132</a></p><p>&#8226; Surrounded by Idiots: The Four Types of Human Behavior and How to Effectively Communicate with Each in Business: <a href="https://www.amazon.com/Surrounded-Idiots-Revised-Expanded-Effectively/dp/1250420458">https://www.amazon.com/Surrounded-Idiots-Revised-Expanded-Effectively/dp/1250420458</a></p><p>&#8226; Performance Calibrations at Tech Companies: Part 1: <a href="https://newsletter.pragmaticengineer.com/p/performance-calibrations">https://newsletter.pragmaticengineer.com/p/performance-calibrations</a></p><p>&#8226; Performance Calibrations at Tech Companies: Part 2: <a href="https://newsletter.pragmaticengineer.com/p/performance-calibrations-part-2">https://newsletter.pragmaticengineer.com/p/performance-calibrations-part-2</a></p><p>&#8226; Anthropic: <a href="https://www.anthropic.com">https://www.anthropic.com</a></p><p>&#8226; <em>What Color Is Your Parachute? for College: Pave Your Path from Major to Meaningful Work</em>: <a href="https://www.amazon.com/What-Color-Your-Parachute-College/dp/1984857568">https://www.amazon.com/What-Color-Your-Parachute-College/dp/1984857568</a></p><p>&#8212;</p><p>Production and marketing by <a href="https://penname.co/">Pen Name</a>. </p><p></p>]]></content:encoded></item><item><title><![CDATA[Are AI agents actually slowing us down?]]></title><description><![CDATA[As more software engineers use AI agents daily, there&#8217;s also more sloppy software, outages, quality issues, and even a slowdown in shipping velocity. What&#8217;s happening, and how do we solve it?]]></description><link>https://newsletter.pragmaticengineer.com/p/are-ai-agents-actually-slowing-us</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/are-ai-agents-actually-slowing-us</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Tue, 17 Mar 2026 16:59:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/35d38ae8-5fc7-4307-84d4-de2706908538_1674x1258.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When it comes to AI agents and AI tooling, most of the discussion focuses on their potential boosts for efficiency, faster iteration, and the pushing out of more code, faster.</p><p>Last week, we took an inside look into <a href="https://newsletter.pragmaticengineer.com/p/how-uber-uses-ai-for-development">how Uber is adopting AI</a>, internally. The rideshare giant has built close to a dozen internal systems to deal with code generated by AI agents. However, when quantifying the impact of AI, the focus was on how much output has increased, and how devs who use more AI also generate more pull requests; these are the &#8220;power user&#8221; devs who generate 52% more PRs than devs who use AI less. There was no mention of product quality &#8211; at all!</p><p>And there are signs that product quality is dropping overall. Today, we dig into this under-discussed topic, covering:</p><ol><li><p><strong>Anthropic: degraded flagship website.</strong> An annoying UX issue irritated paying Claude customers &#8211; and no one at Anthropic noticed. The company moves very fast, generates 80%+ of production code with Claude, but quality and user experience seem to be taking a backseat.</p></li><li><p><strong>Amazon: AI-agent reliance triggers SEVs. </strong>Amazon&#8217;s retail org has a leap in outages caused by its own AI agents. Now, senior sign off is needed for junior engineers&#8217; AI-assisted changes.</p></li><li><p><strong>Big Tech: &#8220;use AI or you&#8217;re unproductive.&#8221;</strong> Companies like Meta and Uber are tracking AI token usage in performance reviews, putting pressure on engineers to use it heavily &#8212; irrespective of the tools&#8217; quality impact.</p></li><li><p><strong>OpenCode: more time spent cleaning up.</strong> Dax Raad, OpenCode&#8217;s creator, warns that AI agents are lowering the bar for what ships, discouraging refactoring, and don&#8217;t speed teams up.</p></li><li><p><strong>5. Startups: founders see LLMs slowing down long-term velocity. </strong>Sentry&#8217;s CTO and others observe that while AI removes the barrier to getting started, it also produces bloated, hard-to-maintain code that slows long-term development.</p></li><li><p><strong>Research: AI agents underperform claims.</strong> Some studies show AI coding tools produce short-lived velocity gains followed by significant tech debt increases.</p></li><li><p><strong>How do we solve it?</strong> Engineers with strong architectural sense become more critical than ever, proposed solutions include formal validation methods, and perhaps reviving some old school QA ideas.</p></li></ol><h2>1. Anthropic: degraded flagship website</h2><p>This article&#8217;s genesis was last week, when I&#8217;d finally had enough of a persistent UX bug on Claude&#8217;s flagship website: the prompt I typed in regularly got lost. Below is a video of me typing &#8220;How can I&#8230;&#8221; &#8211; and &#8220;losing&#8221; the first two words when the page loaded:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;5b3077ba-7213-44f8-a963-91fc8b9e47f5&quot;,&quot;duration&quot;:null}"></div><p>It&#8217;s pretty straightforward:</p><ol><li><p>The page starts to render and the textbox is displayed</p></li><li><p>The user starts to type their prompt, but the page has not finished loading subscription data</p></li><li><p>The subscription information loads around a second later</p></li><li><p>The textbox is reset and the typing is lost</p></li></ol><p>This is a pretty basic bug you might expect in a prototype, except that this is the landing page of Claude.ai, and it&#8217;s a bug that impacts every paying customer &#8211; easily millions &#8211; every day. Even worse, the bug happens <em>every time</em> you visit the site.</p><p><strong>Somehow, nobody at Anthropic tested the site to catch a plainly obvious bug</strong> which impacted 100% of paying customers. At the same time, no company uses AI coding tools more than Anthropic: around 80% of the company&#8217;s code is now generated by Claude Code, so we can assume a good part of the website is also created that way.</p><p>My complaint about Anthropic&#8217;s website being broken <a href="https://x.com/GergelyOrosz/status/2031986107903054179">went a bit viral</a> and got the attention of the developer team:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Hl1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Hl1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 424w, https://substackcdn.com/image/fetch/$s_!4Hl1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 848w, https://substackcdn.com/image/fetch/$s_!4Hl1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 1272w, https://substackcdn.com/image/fetch/$s_!4Hl1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Hl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png" width="1190" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:1190,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Hl1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 424w, https://substackcdn.com/image/fetch/$s_!4Hl1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 848w, https://substackcdn.com/image/fetch/$s_!4Hl1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 1272w, https://substackcdn.com/image/fetch/$s_!4Hl1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19905f01-a3c2-485f-bd1b-53769c587e53_1190x356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Product manager Robert Bye confirms the bug will be fixed. Source: <a href="https://x.com/RobertJBye/status/2032109640134066319?s=20">Robert Bye</a></em></figcaption></figure></div><p>To their credit, three days later the bug was gone. There&#8217;s no longer a &#8220;double load&#8221; of the textbox: it takes a bit longer to load but only does so once.</p><p>Still, it makes me wonder how much longer this issue would&#8217;ve continued had nobody complained. Also, how many more bugs are present on the Claude website that nobody highlighted on social media? How many more features could be shipped in a state that is subpar for production-grade software with millions of paying customers?</p><p><strong>Anthropic seems to be prioritizing moving </strong><em><strong>very</strong></em><strong> fast over doing so with high quality. </strong>There is no denying that the company is moving at incredible speed and running laps around competitors. A good example is how they built Claude Cowork in just 10 days. Claude Cowork handled work with Microsoft Word and Excel documents surprisingly well, to the point that it set off a &#8220;code red&#8221; inside Microsoft&#8217;s Office division, I understand.</p><p>Microsoft responded as fast as possible, but it still took 2-3 months to launch their (cloned) response, called Copilot Cowork earlier this month, with full access still to follow soon.</p><p><strong>In the case of Anthropic, moving fast with okay quality seems to make good business sense: </strong>they build a better product than what already exists, so no matter if it&#8217;s a bit rough around the edges; they can fix quality issues post-launch and still be months ahead of the competition.</p><h2>2. Amazon: reliance on AI agents causes SEVs</h2><p>Anthropic can afford to move fast while it&#8217;s growing at an extremely high rate and expanding its market share rapidly. At the same time, established players like Amazon have extreme focus on reliability: AWS has become the top cloud provider not least by being extremely reliable (as well as aggressive on pricing).</p><p>Well, reliability at the online retailer seems to be getting worse, too, and the company&#8217;s AI agent, Kiro, could be causing SEVs (Amazon&#8217;s phrase for &#8220;outage&#8221;), <a href="https://www.ft.com/content/7cab4ec7-4712-4137-b602-119a44f771de">according</a> to The Financial Times (emphasis mine):</p><blockquote><p>Amazon&#8217;s ecommerce business has summoned a large group of engineers to a meeting on Tuesday for a &#8220;deep dive&#8221; into a spate of outages, including incidents tied to the use of AI coding tools.</p><p><strong>The online retail giant said there had been a &#8220;trend of incidents&#8221; in recent months, characterised by a &#8220;high blast radius&#8221; and &#8220;Gen-AI assisted changes&#8221; among other factors,</strong> according to a briefing note for the meeting seen by the FT.</p><p>Under &#8220;contributing factors&#8221; the note included &#8220;novel GenAI usage for which best practices and safeguards are not yet fully established&#8221;.</p><p>&#8220;Folks, as you likely know, the availability of the site and related infrastructure has not been good recently,&#8221; Dave Treadwell, a senior vice-president at the group, told employees in an email, also seen by the FT. (...)</p><p>He asked staff to attend the meeting, which is normally optional.</p><p>Junior and mid-level engineers require more senior engineers to sign off any AI-assisted changes, Treadwell added in the briefing note.&#8221;</p></blockquote><p>This meeting was the regular <em>&#8220;This Week in Stores Tech&#8221;</em> operational one, but what was new was the note telling staff to attend this &#8220;optional&#8221; meeting, and the mandate for senior engineers to sign off code changes from juniors. The outages may have been caused by less experienced engineers over-trusting GenAI&#8217;s output. Also, there were incidents caused by AI changes, said the FT:</p><blockquote><p>&#8220;Separately, the company&#8217;s cloud computing arm &#8212; Amazon Web Services &#8212; has suffered at least two incidents linked to the use of AI coding assistants, which the company has been actively rolling out to its staff.</p><p>AWS suffered a 13-hour interruption to a cost calculator used by customers in mid-December after engineers allowed the group&#8217;s Kiro AI coding tool to make certain changes, and the AI tool opted to &#8220;delete and recreate the environment&#8221;, the <a href="https://www.ft.com/content/00c282de-ed14-4acd-a948-bc8d6bdb339d">FT previously reported</a>.&#8221;</p></blockquote><p>Again, a tool causing an outage is not its own fault: it&#8217;s on the engineer who lets the tool run wild. If I delete two lines of code, then push it to production, and the server crashes, the fault is not with the text editor or the Git client, but with me who made the change. Similarly, if you prompt an AI agent to do something, and the AI agent goes off and does its stuff which causes an outage, then responsibility lies with the engineer who didn&#8217;t set up guardrails for the agent.</p><p><strong>However, there is the issue that AI agents can wreak havoc in ways devs don&#8217;t quite understand or expect, until learning the hard way. </strong>This was what took down a lesser-used AWS service, according to the report:</p><blockquote><p>&#8220;Amazon Web Services experienced a 13-hour interruption to one system used by its customers in mid-December [2025] after engineers allowed its Kiro AI coding tool to make certain changes, according to four people familiar with the matter.</p><p>The people said the agentic tool, which can take autonomous actions on behalf of users, determined that the best course of action was to &#8220;delete and recreate the environment&#8221;.</p></blockquote><p>It sounds like an engineer gave overly broad permissions to the coding agent which then used its scope to delete a service. As mentioned, the engineer is responsible, but there is also a learning curve with these AI agents to consider: it&#8217;s not like this type of outage happened in the past. Plus, companies like Amazon are heavily incentivizing using AI agents for as much work as possible, which naturally leads to overuse.</p><h2>3. Big Tech: &#8220;use AI or you&#8217;re unproductive&#8221;</h2><p>Something happens at places that measure devs&#8217; AI usage: pressure builds for all devs to use more AI or else be seen unproductive and at risk of poor performance reviews, potentially leading to a PIP or worse.</p><p><strong>Meta is taking token usage into account during perf reviews.</strong> A current engineering manager at the social media giant told me that the token usage of each engineer is now a data point &#8212; one of many! &#8212; for performance calibrations. By itself, it is not a positive or negative signal, but someone perceived as having low impact <em>and</em> with low token usage is now seen as a blatant low performer. For high performers with outstanding impact, very high token usage is seen as a <em>good</em> thing as it conveys to the manager group that they&#8217;re personally invested in AI and are improving their workflow &#8211; as proved by results.</p><p><em>We previously covered <a href="https://newsletter.pragmaticengineer.com/p/performance-calibrations">how performance calibrations work at places like Meta.</a></em></p><p><strong>Big Tech CEOs are starting to see AI &#8220;power user devs&#8221; as superior to their coworkers.</strong> Uber is a good example: the Dev Platform team started to analyze the output of engineers by whether or not they&#8217;re in the &#8220;power user&#8221; category, meaning they use AI agents at least 20 days per month. They found more PR output by engineers who are power users. So far, that&#8217;s useful data, but it&#8217;s just one piece of information, and doesn&#8217;t reveal the quality of the PRs, the impact of the engineer, or any other business outcome.</p><p>By the time this data reaches CEO level, it has turned into something else. Here&#8217;s Uber CEO, Dara Khosrowshahi, interpreting the same data points on the Diary of a CEO podcast (emphasis mine:)</p><blockquote><p>&#8220;While 90% of our engineers are using AI tools of some sort, there&#8217;s about 30% of them that are using them at a completely accelerated pace. <strong>And it [using AI tools heavily] really is changing their productivity in a way that I&#8217;ve never ever seen before.</strong>&#8221;</p></blockquote><p>There&#8217;s a step from observing more PRs per engineer, to judging power users as being more productive for that reason. Dara continues:</p><blockquote><p>&#8220;I can imagine maybe 5 years from now, as the engineers get more and more productive, that I may not decide to add engineering headcount because at that point <strong>instead of adding an engineer, I should add agents and buy some more GPUs from Nvidia.</strong> That may be the investment in the future.&#8221;</p></blockquote><p>Unsaid in the above is that by that time, only engineers &#8220;using AI at a completely accelerated pace&#8221; would be employed. Would it also mean that engineers not on the bandwagon are on the way out? I appreciate Dara speaking his mind and shedding light on the thought process of a Big Tech CEO.</p><p><strong>Inside large tech companies, it&#8217;s becoming a career risk to not use AI at an accelerated pace, regardless of output quality. </strong>These large companies are the ones likely to be mulling layoffs, like Meta <a href="https://www.reuters.com/business/world-at-work/meta-planning-sweeping-layoffs-ai-costs-mount-2026-03-14/">reportedly preparing</a> to cut up to 20% of staff. And when it comes to identifying redundancies, it&#8217;s a fair assumption that things like &#8220;AI usage&#8221; and &#8220;pull requests per engineer&#8221; will be taken into account, especially as one theme of such layoffs will almost certainly be that the employer wants to focus more on AI.</p><p>So, it&#8217;s common sense (and self-preservation) to use more AI, if only not to be seen as unproductive. Their perceived output will rise and engineering leadership will share more reports about productivity being up, and interpreting more code generated and more pull requests as the proof.</p><h2>4. OpenCode: &#8220;more time spent cleaning up&#8221;</h2><p>Dax Raad is founder and CEO of <a href="https://opencode.ai/">OpenCode</a>, an open source AI coding agent, into which you can plug in models like Claude, ChatGPT, Gemini, and others. It&#8217;s an increasingly popular alternative to the likes of Claude Code and Codex. In our <a href="https://newsletter.pragmaticengineer.com/i/189777574/2-most-used-ai-tools">recent AI tooling survey</a>, it came up as a tool used nearly as much as Google&#8217;s Gemini CLI and Antigravity. A small team works on this increasingly influential tool, and is seeing problems with AI overuse. Dax wrote <a href="https://x.com/thdxr/status/2031377117007454421?s=20">this note</a> to the OpenCode team (emphasis mine):</p>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/are-ai-agents-actually-slowing-us">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Pulse: What will the Staff Engineer role look like in 2027 and beyond?]]></title><description><![CDATA[Also: new trend of token costs becoming a worry for CTOs, 10% cuts at Atlassian, and more.]]></description><link>https://newsletter.pragmaticengineer.com/p/the-pulse-what-will-the-staff-engineer</link><guid isPermaLink="false">https://newsletter.pragmaticengineer.com/p/the-pulse-what-will-the-staff-engineer</guid><dc:creator><![CDATA[Gergely Orosz]]></dc:creator><pubDate>Thu, 12 Mar 2026 17:46:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uTLZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1886a999-fc50-43f4-8989-ac9cb6f395dd_2048x1536.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>The Pulse is a series covering events, insights, and trends within Big Tech and startups. Notice an interesting event or trend? Hit reply and share it with me.</em></p><p>Before we start, I&#8217;d like to share updates about data in the two most recent articles:</p><p><strong>Uber&#8217;s AI adoption numbers. </strong>The Dev Platform folks at Uber have been kind enough to share the latest up-to-date numbers on AI adoption, following <a href="https://newsletter.pragmaticengineer.com/p/how-uber-uses-ai-for-development">Tuesday&#8217;s article</a>, which reported that 31% of all code is AI-authored. It turns out this was incorrect, due to a bug with one of the tools. Here&#8217;s how things look there:</p><ul><li><p><strong>84%</strong> of devs at Uber are agentic coding users (either CLI-based agents, or making more agentic requests than tab-completion in the IDE).</p></li><li><p><strong>65-72% </strong>of code is AI-generated inside IDE-based tools. For AI command line tools like Claude Code the figure is, naturally, 100%.</p></li><li><p><strong>Claude Code usage </strong>almost doubled in 3 months; from 32% last December, to 63% by February. Meanwhile, IDE-based tool usage (Cursor, IntelliJ) have plateaued.</p></li></ul><p>Separately, last week&#8217;s edition of The Pulse reported that Block did not make job cuts between 2022 and 2025, which was incorrect. Layoffs happened in <a href="https://www.businessinsider.com/block-layoffs-jack-dorsey-tech-industry-cuts-2024-1">Jan 2024</a> and <a href="https://www.sfchronicle.com/tech/article/layoffs-block-jack-dorsey-20242797.php">March 2025</a>. I have <a href="https://newsletter.pragmaticengineer.com/i/190020609/2-job-cuts-at-block-what-if-ais-not-to-blame">updated my analysis</a> with these details; apologies for the error.</p><p>Today, we cover:</p><ol><li><p><strong>Staff+ engineers in 2027 and beyond. </strong>What happens to the Staff engineer role when agents write more code? Actually, they could be more in demand than ever!</p></li><li><p><strong>New trend? AI token costs are a rising concern for CTOs.</strong> Accounts from two engineering leaders who are raising the alarm about steeply climbing AI costs and the need to slow down spending.</p></li><li><p><strong>10% layoffs at Atlassian: is it AI&#8217;s fault? </strong>Atlassian says it wants to invest savings in AI, but is there more to it?</p></li><li><p><strong>Industry Pulse.</strong> An AI-powered library reimplementation sparks copyleft licensing debate, Anthropic launches $15&#8211;25 per-review code reviews, Microsoft ships a Claude-powered, Copilot Cowork clone, and Apple is the lone Big Tech not ramping up AI infrastructure spending.</p></li></ol><h2>1. Staff+ engineers in 2027 and beyond</h2><p>At a recent two-day workshop in Utah, US, named <em>&#8216;The Future of Software Development&#8217;</em>, and organized by Martin Fowler, I was among 50 attendees. We self-organized our own sessions, where everyone could suggest a topic close to their heart, which all went on an agenda:</p>
      <p>
          <a href="https://newsletter.pragmaticengineer.com/p/the-pulse-what-will-the-staff-engineer">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>