top of page

When One Cloud Sneezes: The Amazon AWS Outage That Took Half the Internet With It

Snapchat, Fortnite, Duolingo, Signal — and parts of UK banking — were knocked sideways by AWS. The fix came quickly.

ree

At around 8:00 a.m. BST, parts of the internet juddered. Amazon Web Services (AWS) — the back-end engine for much of the web — began returning “increased error rates and latencies” in its US-EAST-1 (Virginia) region. Within minutes, users saw Snapchat messages fail to send; Fortnite logins time out; Duolingo sessions stall; Signal and Reddit sputter; even Amazon’s own retail site, Alexa, and Prime Video faltered. In the UK, Lloyds Bank, Halifax, Bank of Scotland, Vodafone, BT, and HMRC services were among those reporting issues.


By 10:30 a.m., AWS said it was seeing “significant signs of recovery,” and by roughly 11:00 a.m. it confirmed services that rely on US-EAST-1 had recovered — though queues and throttling lingered for some workloads. Later status updates echoed the same line: the “underlying DNS issue has been fully mitigated.” 


What exactly broke — and who said what

Early analyses pointed to DNS resolution tied to a database endpoint (DynamoDB) in US-EAST-1, triggering timeouts across dependent services. Junade Ali, Fellow at the Institution of Engineering and Technology, told Reuters the issue appeared to involve a networking system that controls a database product — the kind of problem that “can usually be resolved centrally” once identified.


Rafe Pilling, director of threat intelligence at Sophos, pushed back on cyberattack speculation: “When anything like this happens the concern that it’s a cyber incident is understandable… In this case it looks like it is an IT issue on the database side.”


From the user side, executives and platforms publicly connected the dots. Aravind Srinivas, CEO of Perplexity, posted that “the root cause is an AWS issue.” Signal president Meredith Whittaker likewise confirmed the messaging app was affected.


A UK government spokesperson acknowledged the scale and sensitivity: “We are aware of an incident affecting Amazon Web Services, and several online services which rely on their infrastructure… we are in contact with the company.” Lloyds Bank apologized to customers, noting services were “coming back online.”


The blast radius — and why it felt so personal

Today’s failures rippled across everyday life:Snapchat, Fortnite, Roblox, Duolingo, Coinbase, Slack, Wordle, Peloton, Pokémon Go, PlayStation Network, Ring, Reddit, Zoom, Just Eat, Ocado, Microsoft 365, Square, Strava, Tidal, Eventbrite — plus Amazon, Alexa, Prime Video — all saw reported issues at some point, according to multiple outlets and Downdetector rollups. Ookla estimated more than 4 million user reports tied to the incident.


This wasn’t a single app going dark; it was a reminder that a huge slice of the web shares the same underlying pipes. The consequence is synchronized inconvenience that can spill into essential services — banking and government portals among them — on an ordinary Monday morning.


Anxiety now, and the future risks

Today: Even with recovery starting before lunch, outages like this generate psychological drag — uncertainty about payments clearing, deliveries scheduling, or whether your doorbell cam will connect. That erosion of confidence matters. As Reuters framed it, this was the largest general internet disruption since the 2024 CrowdStrike meltdown — a reminder of how interconnected and fragile daily digital life can be.


Tomorrow: The structural worry is concentration. Dr. Corinne Cath-Speth (ARTICLE 19) warned that democratic discourse and secure communications shouldn’t hinge on so few providers: “We urgently need diversification in cloud computing.” Even if AWS, Microsoft Azure, and Google Cloud continue to deliver world-class uptime, the shared blast radius means a hiccup in one region can jolt media, finance, education, retail, and public services at once.

What technologists will (and should) do next

Platform leads will now do the unglamorous work:

  • Interrogate region dependency. If US-EAST-1 is your “everything” region, that’s a risk decision, not a default. Consider active-active or graceful failover across regions.

  • Map vendor-of-vendor exposure. It’s not just your AWS usage — it’s the auth, payments, search, and analytics vendors you rely on that also run on AWS. (Today showed how those indirect links compound.)

  • Design for brownouts. When the database endpoint is flaky or DNS is weird, can your app degrade instead of die — cached reads, limited features, queued writes? (AWS said most requests should succeed as it worked through backlogs — your app should handle that reality.)

  • Communicate fast. Clear status messages from banks and government portals helped reduce panic today; silence fuels speculation.


The takeaway

By late morning in the UK, AWS said the DNS issue was mitigated, and platforms from Snapchat to Fortnite, Duolingo, Signal, Alexa, and Amazon’s retail site gradually returned to normal operations. But the outage punctured the illusion that the cloud is everywhere and nowhere. It’s somewhere — in data centers, with real dependencies — and when one place has a bad day, millions feel it.

“The main reason for this issue is that all these big companies have relied on just one service,”— Nishanth Sastry, Director of Research, University of Surrey (via Reuters).

Until the risk is distributed — across regions, architectures, and suppliers — mornings like this will keep happening. The code will be fixed; the anxiety lingers.

 
 
 

Comments


bottom of page