AWS outage — another cloud dependency wake-up call

The outage comes just days before the biggest online shopping date in the retail calendar, Black Friday, narrowly avoiding derailing dependent retailers’ services. 
26 November 2020

A large-scale AWS outage has handed us another reminder of how large swathes of internet apps, services, and websites are at the mercy of just a handful of public cloud giants. 

Problems on Wednesday — which Amazon claimed only affected one of its 23 geographic AWS regions (US-East-1) — affected multiple firms including Roku, Adobe, Glassdoor, Autodesk, The Wall Street Journal, 1Password among others, as well as a large number of Amazon’s own services, including its home security camera company Ring. 

Major AWS customers including Apple, Slack, and Netflix didn’t seem to be affected.

The outage comes just days before the biggest online shopping date in the retail calendar, Black Friday, therefore narrowly avoiding derailing dependent retailers’ services. 

“We are actively working toward full recovery for all affected services, and will continue to provide updates regularly as we have new information to share,” Amazon wrote on the update page for the AWS incident at the time of the outage. 

Later on the AWS Service Health Dashboard, the e-commerce giant’s cloud arm said that the issue affecting its Kinesis Data Streams API and other dependent services — including CloudWatch, DynamoDB, Lambda, Managed Blockchain, Rekognition, SageMaker, and Workspaces, among many others — had been fixed, but was not yet fully operational again. 

We have now fully mitigated the impact to the subsystem within Kinesis that is responsible for the processing of incoming requests and are no longer seeing increased error rates or latencies. However, we are not yet taking the full traffic load and are working to relax request throttles on the service. Over the next few hours, we expect to relax these throttles to previous levels. We expect customers to begin seeing recovery as these throttles are relaxed over this timeframe.”

Mike Kiersey, Principal Technologist at Dell Technologies company Boomi, observed that even cloud giants such as AWS — which holds 33% of the US$100 billion global cloud market — aren’t immune from the challenges that data, integration, data streaming, and API management present, which “never truly go away”. 

“With architecture as extensive as AWS’, it is imperative across the board that each and every element of this is integrated correctly, from datacenter through to each digital service,” said Kiersey. 

The issues affecting Kinesis underlines the absolute need to be able to process and manage real-time data. If the data stream stops functioning, the fallout can be huge, especially for cloud providers.”

The latest AWS outage is one of a spate of recent outages and IT failures which have highlighted the risk of being dependent on a small number of cloud technology providers.

Gmail and Google Drive suffered an outage earlier this year, while separately, photographers lost data when Adobe Lightroom updates deleted users’ photos and Canon’s Camera Cloud Platform ‘lost’ original photo and video files.

“We are increasingly dependent on a small number of players who dominate the market,” Peter Groucutt, the managing director at Databarracks, previously told TechHQ. Recent events show the challenge of maintaining productivity in outages highlighted the importance of external backups.” 

And while replacing on-premise server rooms with the likes of AWS, Azure, and Google Cloud, has been the most viable evolution of our IT infrastructure, it has inadvertently created a cloud computing “oligopoly”, Groucutt added.