CI/CD with Robert Erez

The Pragmatic Engineer

0:00

-1:14:47

CI/CD with Robert Erez

Robert Erez of Octopus Deploy joins me to discuss Kubernetes, GitOps, progressive delivery, AI in CI/CD, and the evolving practices behind modern software delivery.

Gergely Orosz

Jun 17, 2026

Stream the latest episode

Listen and watch now on YouTube, Spotify, and Apple. See the episode transcript at the top of this page, and timestamps for the episode at the bottom.

Brought to You by

• Antithesis – if you're using agents to code, the problem isn't writing the code but making sure it didn't break anything. Antithesis goes beyond code review and runs your whole system in faster than real-time and identifies hard to find bugs before your users hit them in production. Antithesis enables teams like Jane Street, Fly.io, and the etcd community to use agents safely and ship better code, faster. Learn more

• WorkOS – make your app and agents Enterprise Ready, with SSO, SCIM, RBAC, and more. Get started.

• turbopuffer – a vector and full-text search engine built on object storage. It’s fast, cheap, and extremely scalable. The teams building the smartest AI products out there — Cursor, Notion, Cognition, Anthropic — they all run on turbopuffer.

In this episode

Robert Erez is a principal engineer at Octopus Deploy, and a longtime expert in CI/CD, deployment systems, and software delivery. Rob and I were also once colleagues on the Skype web team, working on large-scale deployments and release processes.

In this episode of The Pragmatic Engineer, I sit down with Rob to discuss how teams deploy software safely and efficiently at scale. We cover Kubernetes, GitOps, platform engineering, progressive delivery, feature flags, cloud development environments, and the growing role of AI in CI/CD workflows. We also get into the tradeoffs in different deployment approaches, why self-hosted software still matters for some organizations, and the recent evolution of software delivery practices.

Key observations on deployments and CI/CD from the conversation with Rob

Here are 10 interesting takeaways from our chat:

1. Roll forward, never backwards. When a system has state – which typically means it uses databases – then doing a rollback can leave the code talking to a schema that’s no longer in sync. Rob’s advice is to not treat a failure in v2 as a trip back to v1, but rather as a push to v3 with the fix in it.

2. GitOps isn’t actually about Git. None of the four pillars of GitOps – 1) declarative, 2) versioned and immutable, 3) pulled, not pushed, 4) continuously reconciled – require Git, although Git can work under these constraints. Yet, the term ‘GitOps’ has made the industry dogmatic about cramming everything into a repo – even things like secrets that absolutely shouldn’t be there!

3. Continuous deployment can be overkill; continuous delivery is more practical. Shipping every single change to prod (continuous deployment) is not as necessary as many people think, Rob says, and there’s often more value in continuous delivery, where changes flow through testing and the deployment process itself is validated. With continuous delivery, you can decide whether to push to production automatically, or click a button once a week.

4. Feature toggles are a better safety net than rollbacks. When something breaks in production, reaching for a toggle to switch a feature off enables you to “stop the bleeding” and then calmly diagnose an issue. Rolling back a feature flag is less nerve-jangling than scrambling to force a redeployment in the middle of the night!

5. One problem with feature flags is that they’re addictive. On the other hand, the ease with which feature flags are added can create a hygiene crisis if they’re continuously added, but not removed. Treat feature-toggle cleanups like a form of gardening and “weed” rolled-out toggles from the codebase.

6. A Git repo can be a bottleneck at scale. Rob mentions that some companies run thousands of independent Kubernetes clusters that pull state from a Git repository. But such clusters can get throttled by the repo, forcing them into workarounds. Pull-based GitOps doesn’t scale infinitely for free.

7. A sizable number of major institutions remain on-prem – and this won’t change. Banks, other financial bodies, and governments, demand full control over their hardware, upgrades, and downtime. That’s why Rob expects this segment won’t move to cloud-based SaaS.

8. Platform teams work at larger companies. These teams earn their keep in big organizations with multiple teams and projects because they offer ways of bringing sanity and focus.

9. There’s a trend of ephemeral environments replacing test/staging environments. Companies used to have a few testers fighting over a handful of static test environments, but today, it’s trivial to spin up a full environment, per-feature branch, pre-merge. This is an “ephemeral” environment for evaluating that things work, which is then torn down once something is merged. It helps speed up the feedback process.

10. AI shifts the CI/CD calculus from speed to risk. Today, shaving ten minutes off the CI build-time matters because a long-running build blocks human devs. But this time saving will be insignificant when an AI agent writes most of the code and “babysits” a slow pipeline without context switching. Then, the new priority will be to reduce the risk of an AI agent shipping a bug to production, so it will make much more sense to run extra, more thorough, tests – and also even slower ones.