Fastly outage – is the internet too dependent on edge cloud hosting?
- Cloud computing provider Fastly, which underpins a lot of websites, said there had been issues with its global content delivery network (CDN)
- A huge swathe of international news sites crashed including the New York Times, CNN, and the BBC while the flow of millions of pounds in revenue was choked off corporations including Amazon, Boots, and eBay.
Just a few minutes into an outage at cloud platform provider Fastly on the Tuesday morning of 9 June, and a significant chunk of the internet went dark. People attempting to visit a wide array of websites received a blank white page and an error message telling them the connection was unavailable. The outage underscores just how important little-known Internet infrastructure companies like Fastly are, and how even isolated edge cloud disruptions can bring vast aspects of online activities that we typically take for granted, to a screeching halt.
The interruption related to the San Francisco-based cloud services provider was relatively brief, lasting slightly more than an hour in most cases. Fastly later said on its website that it had identified a problem and applied a fix, allowing the affected sites to come back online. But the brief interruption was so swift to impact a collection of high profile web portals, including well-known news sites and even government portals like UK’s ‘.gov.uk’ platforms, that it was reminiscent of other disruptions that occurred to edge cloud hosting services like Amazon Web Services (AWS) and Cloudflare – shining a spotlight on just how vulnerable the underlying web infrastructure can be sometimes, even for the biggest operators.
What is Fastly and what caused the internet outage?
The company offers a content delivery network service or CDN. When it works, a CDN is supposed to improve the speed and reliability of the internet. Rather than visitors to a website all having to connect to servers run by that company – which might not even be in the same country they are – they instead contact Fastly, which runs huge server farms all around the world that host copies of their client’s websites.
That means that although hosted in the cloud, the page loads faster for the user because they are located closer to those edge server farms, physical signals don’t have to travel as far. It also improves the reliability of the website, by ensuring that if there’s a big spike in traffic, it first hits Fastly’s servers, which are designed to handle heavy traffic loads.
We identified a service configuration that triggered disruptions across our POPs globally and have disabled that configuration. Our global network is coming back online. Continued status is available at https://t.co/RIQWX0LWwl
— Fastly (@fastly) June 8, 2021
As for the cause of the internet outage, the problem was quickly identified and within minutes, Fastly had admitted on a status page that it was experiencing problems. With the exception of a few providers, including the BBC which had backup systems in place, every affected website had to wait for Fastly to fix the error before they could restore service.
Fastly is one of a few major CDN providers: others include Cloudflare and Amazon’s CloudFront. But, to give a sense of how well regarded Fastly is within cloud web hosting circles, Amazon’s own retail website actually runs through Fastly, rather than CloudFront, and has done so since May 2020.
Fastly’s SVP of Engineering and Infrastructure Nick Rockwell in a blog posting said, “We experienced a global outage due to an undiscovered software bug that surfaced on June 8 when it was triggered by a valid customer configuration change. We detected the disruption within one minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95% of our network was operating as normal.”
Who was impacted by this?
Among the large organizations impacted by Tuesday’s internet outage were the New York Times, Amazon, and Hulu. Other news outlets affected by the outage included CNN, the Guardian, Bloomberg News, the Financial Times, and The Verge. Apparently, all British government’s websites using the ‘gov.uk’ suffix were taken down as well, limiting access to public services.
Another well-known service to suffer was Spotify, the music streaming platform. Chat forum site Reddit experienced disruption, as did picture-sharing site Pinterest, live game streaming site Twitch, and video-on-demand platforms, Hulu and HBO Max. In some instances, the issue affected some services on platforms that were otherwise unharmed. Twitter users were briefly unable to use emojis because the servers that host them were affected.
This isn’t the first time a cloud service provider has drawn attention to the perils of putting all your eggs into someone else’s basket. Earlier this year AWS unilaterally suspended service to social media site Parler for largely arbitrary reasons that seemed to have more to do with political tribalism than anything else. Even then, the UK government was unable to digitally serve its own population for some time.
5 January 2021