Sneakerware and edge tech help push to cloud

If you've lots of zeroes and ones that need moving, sometimes using sneakerware's the best bet.
2 October 2018

Two more devices have joined the Microsoft range of data movers. Source: Microsoft

Early internet adopters will remember having to wait – wait for progress bars to reach their ends, wait for apps to install or update, and especially, wait for data to upload or download.

Internet speeds were glacial compared to today’s (unless you could pay a monthly king’s ransom for a leased line) and therefore most data was kept locally.

With today’s cloud – powered by much quicker connection speeds – a great deal of data is kept remotely, often in geographically distant data centers, or across multiple nodes.

And although data can move so much more quickly and freely, organizations are producing exponential amounts of information.

It’s a wise IT department in today’s litigious and governance-heavy environment that archives as much data as it can.

Most enterprises back up their data, and over time, those backups move into archive or cold storage (often hosted on older, spinning-disk arrays), to be eventually moved offsite.

Sneakerware – the movement of data by physically shifting it from A to B – is often the most effective way of moving large data repositories quickly.

Alternatively, many organizations want to move their data to the cloud not for secondary storage, but to leverage that data’s hidden value.

Deploying data-mining algorithms on previously archived information can yield very positive results, and this is best achieved by use of distributed compute networks in the cloud. The issue remains, however, of how to move huge datasets to the cloud quickly and safely.

Despite the formula of fast connection, cheap storage, and data-enabled devices, the enterprise still struggles, on occasion to move large quantities of data from point to point.

There is the potential for data corruption, cybersecurity concerns, and a limit to available bandwidth and connection speeds – prioritizing network traffic means that moving archives can be slow and painful.

In the storage and cloud business, there’s still, therefore, a particular reliance on sneakerware: picking up hard drives, lifting them onto a truck, and moving them to a data center where they’re literally plugged in.

For large data stores, this is still the most efficient transportation methods – and is, of course, the only way for companies that collect data offline.

Both Amazon and Google both offer their variations on sneakerware (Amazon’s is called Snowball), and Azure also has its version, the Data Box, with a 100 TB capacity. FedEx will transport your Azure Data Box to the cloud, and upload it for you – it makes business sense for the Richmond giant, of course.

The company has now announced the Data Box Heavy, with a one petabyte capacity (real-world results offer 800TB), which is now in preview.

The press pictures show a metal box on castors with a handle to aid transportation – there’s no mention, however on how heavy a Data Box Heavy might be.

The company’s website states:

“Data Box Heavy is ideally suited to transfer data sizes larger than 500 TB in scenarios with limited to no network connectivity. The data movement can be one-time, periodic, or an initial bulk data transfer followed by periodic transfers.”

The Box connects via a 40 Gbps interface, and its contents are AES 256-bit encrypted, in case of a careless misplacing of the hardware.

Also announced is the Data Box Edge, a one unit rack-mounted array of 12 TB capacity, with onboard computer optimized for AI algorithms, designed – as the name suggests – for edge deployments. The box runs Microsoft’s open source IoT Edge software so that organizations can develop their own customized applications.

Unlike the other Data Boxes, the Edge unit isn’t designed to sneakerware-d anywhere. Instead, the idea is that the device can collate & process data and send the resulting (and presumably smaller) data to Azure.

The Edge unit contains field-programmable gate arrays (FPGA), which can be adjusted to deliver high performance for specific purposes. Microsoft has previously installed FPGAs in its own servers, for Azure as well as its Bing search engine.

For businesses which need to prioritize data flows to and from the internet, this may be a good choice. By doing much of the “grunt work” in-house and then only pushing data remotely once it’s known to hold value, moving data to the cloud will create less of an impact on mission-critical applications’ connectivity.