The tools to manage unstructured data
Getting your Trinity Audio player ready...
In Part 1 of this article, we explored the nature and the scale of the challenge businesses face with the rise and rise of unstructured data. We sat down with Krishna Subramanian, President at Komprise – a specialist in unstructured data management – to explore that challenge.
While we had her in the chair, we asked Krishna to expand on the ways to turn the relative unknown chaos of unstructured data into the order of managed unstructured data – the principles at work, and the tools that made it possible.
You mentioned that the problems and opportunities of unstructured data had taken the business world by surprise.
The challenges of unstructured data
Yes, everyone was looking at how to deal with structured data as it grew and grew, and now suddenly there’s a new data problem to tackle. Everything we hadn’t exactly realized was data because it didn’t fit in a database – suddenly it’s important.
We discussed the importance of having data visibility in Part 1. We imagine that’s the first tool you need, the tool that shows you what you have and where it is. And you mentioned that there was an issue with moving data?
Yes, that was a big finding from the survey we ran on data management. Around 43% of companies are trying to move data to the cloud. How do you do that without disrupting either the company’s operation or the user’s experience?
We mentioned the real-world example of hosting all your photos on your cellphone, and the potential of instead, hosting them all in low-cost cloud storage in a way that still, to you the cellphone owner, looked and felt as though you were accessing them directly on your phone – but which didn’t cost you storage space on your phone. Imagine that on a company-wide scale, and that’s the sort of tool you need to build – something that can pull that off without anyone noticing except the CFO, who sees the reduced storage bill.
The trick is to move the data to its optimal storage location without disrupting users. Because if they know about it, if it impinges on their experience, they will resist. They won’t want their data moved, and they certainly don’t want all their applications to suddenly start breaking. They don’t want to go looking for a particular file, and suddenly not be able to find it because you’ve moved it to somewhere else.
And they especially won’t want it moved as soon as you say you have to move it.
Right, exactly. So you need a transparent kind of moving process where there is no disruption to what the users are doing. So it looks like it’s still local, but it can actually be sitting somewhere else. Right.
So, talk to us about the suite of tools we actually need to manage unstructured data.
For decades, there have been lots of products built for structured data. There’s a whole variety of tools to analyze structured data, to sort and place structured data in the right place, to run data lakes on structured data. We need similar tools for unstructured data.
People have always thought of unstructured data as a storage problem. “Oh, I’ll just buy the cheapest storage I can, and that will take care of it.”
But now, the data volumes are too large. It’s gone beyond being a storage problem. Realizing that data management needs actual data management tools for unstructured data is the first step. And those tools have to give you visibility.
Analysis is an element of it. Can you assess what’s in these environments? Can you give visibility? Can you help someone plan? Then there’s the question of whether you can deliver policy-based automated data movement, so you don’t have to babysit the solution. You need to be able to just say “I want data moved here, and I want its lifecycle managed like this. So you need data analysis tools, data mobilization tools, and then data extraction tools, because ultimately, why are you keeping all this data around?
Again, in our survey, 43% of respondents said they want to give more self-service to their departmental users for unstructured data. And if your unstructured data is a mess, your users probably don’t even know it’s there. So how do you make it easier for people to search and call and find the data that’s interesting, and then use it in a big data application or AI or ML application, so you can monetize this data better?
Those are the different ways in which unstructured data management is evolving.
How does one get visibility on a data issue of this size, which continues to grow?
You need to have a standards-based solution. Storage environments all speak some common languages these days. There are file languages like NFS and SMB, and there are object languages like Amazon S3.
So, if your tools can talk to various storage environments in common languages, they can look at what’s inside those environments, and give analysis. And if you can do that, then you don’t need a proprietary solution for every environment. You can have an independent solution that works with your whole data center and your cloud accounts.
That should show you how much data you have, how much is hot, how much is cold, who’s using it, all those things. The information is there in the metadata of all these files and objects, but you need a query engine that can look up from this environment. That’s how you can solve that problem.
Which is what you do.
Which is what we do.
It’s about taking the chaos and giving the customer control. They can say “Here are my data centers, and here are my cloud accounts,” and we will find all the storage environments, find the data that’s sitting in them, organize it by who has it, how fast it’s growing and so on.
And then the customer gets to set policies. Anything over three years old, maybe write a policy that it goes to Amazon Glacier. Anything that’s really hot and important, write a policy to put it on your most expensive flash storage, so people can really get value out of it. Other, less important data, you can write a policy to put it into standard long-term storage. You set these policies, we move the data according to your policies so it’s in the right place, and we move it transparently.
So what you’re essentially doing is taking a massive data problem and creating a new data architecture, governed by the policies that the client wants, once they’re aware of all the unstructured data they own?
Exactly. We’re enabling them to non-disruptively evolve their data architecture.
What do you think are is the overall mood of the market when it comes to unstructured data?
I think it’s a very exciting time in the market. Because even though this problem kind of crept up on people, there is a lot of innovation happening in terms of how to solve it. And for me, what’s most exciting is AI, because AI and ML actually require unstructured data, not structured data. To have really advanced AI or ML you need unstructured data management, because you have to bring unstructured data into those systems. So it feels as though there are multiple market forces at play that make this a very exciting area of innovation.
An unstructured data gold rush, even.
Yes, it is. It really is.
1 March 2024
29 February 2024
28 February 2024