Data management specialist warns against data dangers of generative AI

It's 10pm in generative America. Do you know where your data is?
26 May 2023

Open AI boss Sam Altman, telling Congress that generative AI would need regulation. Source: ANDREW CABALLERO-REYNOLDS / AFP

• The rush to add generative AI to products brings data risks.
• Safeguards and regulations could be slow to rein the technology in.
• The potential for bias in the data intensifies over time.

Generative AI has been taking the world roughly by storm since Microsoft and OpenAI launched ChatGPT in November, 2022. There is absolutely no doubt that it represents a transformation in the way the world works. But there has also been significant backlash, particularly on training data and data usage. The backlash culminated in the Biden administration demanding the generative AI industry implement standards and responsibilities for those using the tech – or have them imposed by government.

Krishna Subramanian, COO at data management specialist, Komprise, says the sudden uptake in generative AI across the board can have serious consequences when it comes to companies’ ability to manage their data.

We sat down with her to find out what those consequences might be.


Let’s start with the sensational question: has generative AI been launched into the world too early? Before the data ramifications have been thought through, and before standards have been set in place?

I think it’s good that we got to see what generative AI could do, especially with things like ChatGPT, because we’ve been talking about AI and machine learning for many years, and some people were wondering whether it was all just hype. Seeing ChatGPT work, and being able to interact with it, has been extremely helpful in terms of quieting those doubters.

The problem is not that we shouldn’t try it out and understand it. The problem is people and companies rushing to have commercial products based on it. That’s where we’re moving prematurely.

Generative AI: everybody’s favorite new toy.


That’s the thing, though, isn’t it? Almost as soon as it was released, almost everybody found ways of using it somehow.

And that’s the danger. Especially in America, where you’re rewarded for taking risks and for moving fast. But without any regulation, companies can move fast, companies can innovate, so there’s no real downside to adding ChatGPT to any business model from a corporate perspective. Yet. And I think that’s the challenge.

We saw the meeting of the kings of generative AI at the White House, the takeaway from which seemed to be that if you’re going to make billions of dollars out of this technology, responsibilities need to come along with the rewards.

And of course, that was followed by Sam Altman of OpenAI talking to Congress and agreeing that regulation was needed, and then hearings in the Senate around AI.

The good news is that I think the open letter that a lot of executives sent to Congress saying that the government needs to step in has had an impact and people are looking into it. I think the problem might be that in general, regulations tend to follow after you’ve seen one or two bad things happen. And in the case of AI, it may be too late if it takes us that long to get regulations.

Generative AI: a rapidly evolving genie?

There’s no real way of putting this particular genie back in any kind of bottle. It’s just how we contain it somehow so that we know what it’s about to do, right?

Right. But we have contained other things before. We contained nuclear weapons. We knew how to contain atomic bombs. We know how to contain drug research that could be potentially dangerous if done the wrong way.

So the idea of regulating something that can have a great impact is not a new concept. It’s just that the regulation has to happen very quickly. What’s new is the pace at which this is moving.

Absolutely – after all, just weeks ago, the tech giants were the news, with their generative AI models. Then open-source coders got their hands on LLaMA, and now suddenly they’re the news, with their smaller, more agile, more function-specific generative AI models.

Six months from now, you can place your bets what’ll be changing the world in this area. So how is a regulatory regime that takes a year to create supposed to still be relevant, when technology has outpaced it two or three times in the time it’s taken to draw up?

“I’m afraid I can’t do that, Dave…”

Exactly. Part of the point, part of how generative AI can give us great benefits, is that it does things much faster than humans can do. So it’s very powerful, but also there’s a lot of risk because people don’t understand what it does and how it does it.

Especially, there’s a lot of risk around the data that generative AI is based on, because at the end of the day, it’s still a machine that’s learning from patterns and it’s gleaning those patterns from the data that you give it. People forget that. Because it sounds so human, we attribute it with human qualities, but it’s really machine learning at the end of the day.

That’s the whole issue with China’s condemnation of the technology, isn’t it? They don’t want to have any version of it in the country that isn’t taught on solidly socialistic principles. So what you’ll get when you ask it things is solidly socialistic answers, that may not necessarily reflect “the truth,” as we see it in the West.

Exactly. That notion of bias in the data is very prevalent. And because it’s generative, because it’s generating new content, you tend to think it is also innovating and changing mindsets. But it’s not. It’s actually perpetuating the bias that already exists in the data that you fed it. It’s generating more and more of the same kind of data.


Leading to a funnel effect over time, where you continue to narrow the focus of the results you get?

Yeah, and people don’t get that. AI is a big area and we have been using AI already in a lot of commercial products. But standard, non-generative AI – the AI we’ve been using up to this point – is really more predictive automation or domain-specific machine learning, which is very narrow in its scope. And it’s also very deterministic.

So you can tell based on your algorithm what the outcomes are going to be. You can give it some objectives and you know how it’s going to hit those objectives. It’s just going to do it faster and automate things that as a human, you don’t have to do.

So that kind of AI has already been in the market for a while. Most companies use it in some form or another. Almost every product out there has some element of it.

What is new is what we’re calling generative AI, where it’s learning something from the data that you give it. And it’s not deterministic.

It’s like a toddler learning something, and even if you tell the toddler not to do something, they may listen to you 10 or 15% of the time – and 85% of the time they won’t listen to you. So that’s generative AI – it’s forming its own opinions.

I mean, what we consider opinions – they’re not opinions, but it’s basing its conclusions on high statistical likelihoods, based on patterns. It’s trying to predict the next patterns.

Generative AI: a powerful toddler?

Like a toddler…

And it’s doing that on its own, through a neural network of some kind. And because it is doing that, we’re applying it to general domains like natural language or image management and things like that.

That is where we really need regulation, because when you apply something in the general domain, it has tremendous potential to deliver both positive and negative impacts.

And when you have something that’s not deterministic, you don’t know what it’s going to do.

And in that short sentence, you just described about half of the science fiction dystopian short stories that have been written in the last six months or so.

You don’t know what it’s going to do – and yet it’s almost everywhere, being put to a whole variety of uses, underpinning businesses across the world, using enormous quantities of company data in ways that are potentially obscure.

That’s why data management is going to be integral to the future of generative AI, and why we need some data safeguards in place.


In Part 2 of this article, we’ll explore the potential data issues that using generative AI opens you up to – and how, in practical terms, you can guard against them, while regulations are mulled by the powers-that-be.