GitHub places the internet under 100m of permafrost

GitHub completes its apocalypse-proof backup strategy with an underground vault.
24 July 2020

Scene from Svalbard. Source: Shutterstock

The largest repository in the world of free and open-source software (and a goodly amount of private data, too) is hosted on GitHub, for many years the de facto standard for code development.

The organization has announced it’s finally managed to back up its entire repository of data (comprising 21TB) onto digital film and stored the media below 100 meters of permafrost in a disused mine in Norway. The theory goes that even when humankind crumbles as a species and the very concept of electricity becomes but a memory, generations to come will still be able to at least read the code that – in 2020 at least – runs the world.

GitHub’s storage partner, Piql, specializes in writing media, including data, to digital tape which holds both digitized information and — rather like the microfiche so beloved of local libraries until recently — miniaturized text. Company’s using Piql can choose, as GitHub has done, to physically store a copy of its data in its Arctic World Archive, which is a temperature-controlled albeit chilly environment that will, it’s hoped, be immune from troublesome events like global warming, nuclear strike and/or whatever other stupid self-inflicted acts humankind can inflict upon itself.

Why Gits?

A Git is an active archive of information pertaining, typically, to software development, although the format can be used by any creative project that requires versioning and is particularly suited to collaborative workflows.

If you or anyone in your organization has ever written a piece of software, the chances are more than good that it has been stored, shared, and worked on using a Git. Although some organizations use their own Git servers (anyone can compile the source code) for internal projects and an extra layer of security, GitHub is often the first and most straightforward choice.

Given that the majority of the code that runs the infrastructure of the internet, the web, the cloud, most smartphones, and much else besides belongs in the open-source realm, storing Github is effectively storing the blueprints that describe how to rebuild what we today call “tech.”

Backups to the nth degree (below zero)?

As much of the data storage process remains physically delicate and susceptible to accidental or deliberate corruption, every organization in the world backs up its critical data at least twice, using different methods. (If your company does not, we would advise stopping reading right now, and getting on the case.) Most backup media, like the disks that live in every laptop, and data center, and the silicon in smaller devices like smartphones and IIoT hardware, is susceptible to some degree of degradation over time.

Many organizations continue to use slow, yet relatively reliable tape drives, while others prefer layered archives comprising successively slower disk drives. Some still use optical drives for archival purposes; among them, it is rumored, Facebook, which uses BluRay disks for rarely accessed information.

Claims of lifespans of hundreds of thousands of years that were applied to CD-R and DVD-R have proved to be false, although some research into Blu-ray disks that use inorganic dyes to store information shows that their shelf-life may be significantly longer.

For organizations that really value what they hold, embedding their data under Svalbard (aka, the icy location featured in Philip Pullman’s Dark Materials trilogy) is probably as good as it gets. But it’s a safe bet that GitHub, like all sensible organizations, probably have at least two other formats of their data copied and checksum kicking around.