The challenges of unstructured data

Tackling unstructured data is a Herculean challenge.
18 October 2022

A beach is like a company’s unstructured data. Could YOU categorize every grain?

Getting your Trinity Audio player ready...

The business world doesn’t like to be surprised – especially not by extremely complex, potentially expensive problems. Which is why the problem of unstructured data is so fundamentally vexing. Unstructured data is an already huge amount of data in most medium-to-large companies, and it’s getting bigger every day. Knowing about it, categorizing it, storing it, and even eventually monetizing it is a whole sequence of enormous headaches that on the one hand, business could do without. But if you get all that right, it can prove to be like finding gold in your basement – so you can’t afford to just sweep unstructured data under your carpet.

We sat down with Krishna Subramanian, President at Komprise – a company that specializes in unstructured data management, to explore the scale of the challenge.


So – what are the challenges that companies are facing with their unstructured data?


Well, 85% of the data in the world today is unstructured data. That’s all the data that doesn’t sit in a database somewhere – everything from photos on your phone to X-rays in your medical records, to videos on TikTok to genomic sequence files. And it wasn’t always this way – if it had been, we’d have had protocols in place for it long before now.

Even ten years ago, people wouldn’t have thought of unstructured data as data per se, and there was vastly less of it about. It’s grown very rapidly in the age of the smartphone, the cloud, and the ubiquitous video.

That means it’s caught everybody off guard.

So it’s a problem of suddenness and scale?

To some extent yes. We ran a survey of practitioners last year, and we ran it again this year, and even between the two, the difference is startling. This year, more than half the respondents said they’re dealing with over five petabytes of data. And a petabyte is 1,024 TB.

So, say 500 laptops full. To the brim.


Yeah. Certainly, it’s the equivalent of around 10 billion photos. Per organization.

So to us, that’s the first and biggest challenge of unstructured data – the rapid growth of the issue.

The second big issue begins a kind of domino effect. Because companies don’t know what the data is. Or where it is. Or how many times they have the same data. Or what data’s important. Or what could be hot for monetization, and what’s cold. In fact, most companies are still treating all their unstructured data as though it’s the same sort of data, when it clearly isn’t. But – and here’s the crunch – if you don’t know what you have and where it is, you have to treat it all the same, because you don’t know what might be important.

Which is why lots of companies (68% of our survey respondents) are spending over 30% of their IT budget on unstructured data management. Relatively suddenly – again, this wasn’t an issue a handful of years ago. If you’re an IT department trying to fend off a growing cybersecurity threat, and 30% of your budget is suddenly being siphoned off to manage your unstructured data, probably fairly poorly, your company’s in potential trouble.


The double jeopardy of a rising data management threat and potentially underfunding your IT department in everything else it’s called on to do.



So all of that together is really the biggest problem. How do you manage the root of this data? Without treating all this data as if it’s the same, and importantly, without interrupting the user experience.


Okay – how do you do that?


It all starts with visibility. It’s very difficult to solve a problem you can’t see, or can’t fully understand. So, step 1 – find out what you have.


As easy as that?


As difficult as that. Because unstructured data is piling up in many different places, all the time. And so it’s very hard to get an inventory of it, because it might be sitting in different storage systems, it might be sitting inside applications, behind the application, it might be piling up in the cloud.

So the first thing you need is something that can give you visibility across all the silos, and show you exactly how much unstructured data your organization has, how fast it’s growing, what is hot and important, what people are actually using, and what data is cold.

Because that’s the real problem with treating all unstructured data as if it’s hot. 80% of it is actually cold. Think about it in your own life. You probably take a lot of videos of your kids or pictures on your cellphone. We all do that these days. But how often do you go and look at every photo? You don’t look at all of them all the time. A lot of that data is cold data.

So, it could be better managed. What if it could sit in a local storage in the cloud, but look like it’s still on your phone? You can see the thumbnail on your phone, and whenever you want it to click on it, you can get it, but it’s not eating up all the storage on your phone, right? It’s the same thing for companies.

So knowing what you have, knowing what’s important and what isn’t, knowing what’s being used and what isn’t and how much it’s costing you is the first step.

The second step to solving unstructured data management is to mobilize your data efficiently. Unstructured data is so big a ‘thing,’ and it’s of so many types of files, and so many sizes, moving it from one place to another isn’t all that easy. Doing it manually is in no way time-efficient – it would build up faster than you could move it. So you need some sort of automated process to do it for you, that can adapt to any of the multiple networks you might be using in the silos where your unstructured data is stored. That’s a step in the right direction towards data mobilization.

What about security? 80% of your unstructured data may be cold data as far as you’re concerned, but it’s still your data – you want it secured, so that no-one else might be able to steal it and potentially monetize it in ways you’ve never imagined.


So the bottom line is, you need a set of automated data management tools to be able to deal with unstructured data even remotely effectively in 2022?


We think so. That’s why we built some.


In Part 2 of this article, we’ll dig deeper into the complexities of using the right tools to handle your unstructured data – and essentially, how to sell the expense of unstructured data management to the C-Suite.