BIG DATA

The challenges of unstructured data

Tackling unstructured data is a Herculean challenge.

18 October 2022

Tony Fyler

@more__hybrid

fyler@hybrid.co

All stories

A beach is like a company’s unstructured data. Could YOU categorize every grain?

Getting your Trinity Audio player ready...

The business world doesn’t like to be surprised – especially not by extremely complex, potentially expensive problems. Which is why the problem of unstructured data is so fundamentally vexing. Unstructured data is an already huge amount of data in most medium-to-large companies, and it’s getting bigger every day. Knowing about it, categorizing it, storing it, and even eventually monetizing it is a whole sequence of enormous headaches that on the one hand, business could do without. But if you get all that right, it can prove to be like finding gold in your basement – so you can’t afford to just sweep unstructured data under your carpet.

We sat down with Krishna Subramanian, President at Komprise – a company that specializes in unstructured data management, to explore the scale of the challenge.

THQ:

So – what are the challenges that companies are facing with their unstructured data?

KS:

Well, 85% of the data in the world today is unstructured data. That’s all the data that doesn’t sit in a database somewhere – everything from photos on your phone to X-rays in your medical records, to videos on TikTok to genomic sequence files. And it wasn’t always this way – if it had been, we’d have had protocols in place for it long before now.

The second big issue begins a kind of domino effect. Because companies don’t know what the data is. Or where it is. Or how many times they have the same data. Or what data’s important. Or what could be hot for monetization, and what’s cold. In fact, most companies are still treating all their unstructured data as though it’s the same sort of data, when it clearly isn’t. But – and here’s the crunch – if you don’t know what you have and where it is, you have to treat it all the same, because you don’t know what might be important.

Which is why lots of companies (68% of our survey respondents) are spending over 30% of their IT budget on unstructured data management. Relatively suddenly – again, this wasn’t an issue a handful of years ago. If you’re an IT department trying to fend off a growing cybersecurity threat, and 30% of your budget is suddenly being siphoned off to manage your unstructured data, probably fairly poorly, your company’s in potential trouble.

Because that’s the real problem with treating all unstructured data as if it’s hot. 80% of it is actually cold. Think about it in your own life. You probably take a lot of videos of your kids or pictures on your cellphone. We all do that these days. But how often do you go and look at every photo? You don’t look at all of them all the time. A lot of that data is cold data.

So, it could be better managed. What if it could sit in a local storage in the cloud, but look like it’s still on your phone? You can see the thumbnail on your phone, and whenever you want it to click on it, you can get it, but it’s not eating up all the storage on your phone, right? It’s the same thing for companies.

So knowing what you have, knowing what’s important and what isn’t, knowing what’s being used and what isn’t and how much it’s costing you is the first step.

The second step to solving unstructured data management is to mobilize your data efficiently. Unstructured data is so big a ‘thing,’ and it’s of so many types of files, and so many sizes, moving it from one place to another isn’t all that easy. Doing it manually is in no way time-efficient – it would build up faster than you could move it. So you need some sort of automated process to do it for you, that can adapt to any of the multiple networks you might be using in the silos where your unstructured data is stored. That’s a step in the right direction towards data mobilization.

What about security? 80% of your unstructured data may be cold data as far as you’re concerned, but it’s still your data – you want it secured, so that no-one else might be able to steal it and potentially monetize it in ways you’ve never imagined.

THQ:

So the bottom line is, you need a set of automated data management tools to be able to deal with unstructured data even remotely effectively in 2022?

KS:

We think so. That’s why we built some.

In Part 2 of this article, we’ll dig deeper into the complexities of using the right tools to handle your unstructured data – and essentially, how to sell the expense of unstructured data management to the C-Suite.