Three Cloud Providers, Three Outages: Three Different Responses
This year, AWS, Azure, and Google Cloud have all suffered comparable regional outages. How did they respond, and why do Azure’s processes stand out compared to its rivals?
It’s rare that all three major cloud providers suffer regional outages, but that’s exactly what happened between April and July:
25 April 2023: GCP. A Google Cloud region (europe-west-9) went offline for about a day, and a zone was offline for two weeks (europe-west-9-a.) (incident details). We did a deepdive into this incident in What is going on at Google Cloud?
13 June 2023: AWS. The largest AWS region (us-east-1) degraded heavily for 3 hours, impacting 104 AWS services. A joke says that when us-east-1 sneezes the whole world feels it, and this was true: Fortnite matchmaking stopped working, McDonalds and Burger King food orders via apps couldn’t be made, and customers of services like Slack, Vercel, Zapier and many more all felt the impact. (incident details). We did a deepdive into this incident earlier in AWS’s us-east-1 outage.
5 July 2023: Azure. A region (West Europe) partially went down for about 8 hours due to a major storm in the Netherlands. Customers of Confluent, CloudAmp, and several other vendors running services out of this region suffered disruption. (incident details). We touched on this outage in The Scoop #55: how can a storm damage fiber cables?
A regional outage is rare for any cloud provider because regions are built to be resilient. The fact each major cloud provider suffered one allows us to compare their responses, and take some learnings about best practices.
Today, we cover:
What is a cloud region and how does it differ between cloud providers? A recap.
Communicating during the incident
Preliminary incident details
Incident postmortem and retrospective
Why is AWS so opaque to the public?
Why is Azure stepping up in transparency and accountability?
Lessons for engineering teams from the three cloud providers
1. What is a cloud region?
Before we dive in, a quick refresher on what a cloud region is. Definitions vary by cloud provider: