BIG DATA

The tools to manage unstructured data

With the right tools, you can turn a liability into a significant asset.

18 October 2022

Tony Fyler

@more__hybrid

fyler@hybrid.co

All stories

Unstructured data, managed – from beach to sand garden.

Getting your Trinity Audio player ready...

In Part 1 of this article, we explored the nature and the scale of the challenge businesses face with the rise and rise of unstructured data. We sat down with Krishna Subramanian, President at Komprise – a specialist in unstructured data management – to explore that challenge.

While we had her in the chair, we asked Krishna to expand on the ways to turn the relative unknown chaos of unstructured data into the order of managed unstructured data – the principles at work, and the tools that made it possible.

THQ:

You mentioned that the problems and opportunities of unstructured data had taken the business world by surprise.

Analysis is an element of it. Can you assess what’s in these environments? Can you give visibility? Can you help someone plan? Then there’s the question of whether you can deliver policy-based automated data movement, so you don’t have to babysit the solution. You need to be able to just say “I want data moved here, and I want its lifecycle managed like this. So you need data analysis tools, data mobilization tools, and then data extraction tools, because ultimately, why are you keeping all this data around?

Again, in our survey, 43% of respondents said they want to give more self-service to their departmental users for unstructured data. And if your unstructured data is a mess, your users probably don’t even know it’s there. So how do you make it easier for people to search and call and find the data that’s interesting, and then use it in a big data application or AI or ML application, so you can monetize this data better?

Those are the different ways in which unstructured data management is evolving.

THQ:

How does one get visibility on a data issue of this size, which continues to grow?

KS:

You need to have a standards-based solution. Storage environments all speak some common languages these days. There are file languages like NFS and SMB, and there are object languages like Amazon S3.

So, if your tools can talk to various storage environments in common languages, they can look at what’s inside those environments, and give analysis. And if you can do that, then you don’t need a proprietary solution for every environment. You can have an independent solution that works with your whole data center and your cloud accounts.

That should show you how much data you have, how much is hot, how much is cold, who’s using it, all those things. The information is there in the metadata of all these files and objects, but you need a query engine that can look up from this environment. That’s how you can solve that problem.

THQ:

Which is what you do.

KS:

Which is what we do.

It’s about taking the chaos and giving the customer control. They can say “Here are my data centers, and here are my cloud accounts,” and we will find all the storage environments, find the data that’s sitting in them, organize it by who has it, how fast it’s growing and so on.

And then the customer gets to set policies. Anything over three years old, maybe write a policy that it goes to Amazon Glacier. Anything that’s really hot and important, write a policy to put it on your most expensive flash storage, so people can really get value out of it. Other, less important data, you can write a policy to put it into standard long-term storage. You set these policies, we move the data according to your policies so it’s in the right place, and we move it transparently.

THQ:

So what you’re essentially doing is taking a massive data problem and creating a new data architecture, governed by the policies that the client wants, once they’re aware of all the unstructured data they own?

KS:

Exactly. We’re enabling them to non-disruptively evolve their data architecture.

THQ:

What do you think are is the overall mood of the market when it comes to unstructured data?

KS:

I think it’s a very exciting time in the market. Because even though this problem kind of crept up on people, there is a lot of innovation happening in terms of how to solve it. And for me, what’s most exciting is AI, because AI and ML actually require unstructured data, not structured data. To have really advanced AI or ML you need unstructured data management, because you have to bring unstructured data into those systems. So it feels as though there are multiple market forces at play that make this a very exciting area of innovation.

THQ:

An unstructured data gold rush, even.

KS:

Yes, it is. It really is.