Can generative AI ever be safe to use with proprietary data?

Samsung bumped into a key issue with generative AI. How do companies stop themselves from following suit?
17 July 2023

Generative AI – lots of fun, but hungry for your data.

• Generative AI can eat your proprietary data if you feed it.
• Many business are stuck between the need to use it and the fear of losing data rights.
• The right technology can make generative AI data-safe.

Since ChatGPT burst out of the chest of OpenAI and Microsoft in November, 2022, it, and the rest of the generative AI that have followed in its wake, have been rapidly adopted by businesses across the spectrum, and of every size, from SMEs to multinational enterprises.

But generative AI has also had significant hurdles to overcome in the first seven months of 2023. Italy banned ChatGPT briefly over data concerns. Google released details of the training data for Bard and it was revealed to be significantly less verifiably fact-based than perhaps businesses would – or should – easily accept at the heart of their operations.

Governments around the world have clamored for regulation of an industry that’s currently evolving too fast to effectively be regulated by traditionally slow-moving procedures (with China looking to leapfrog both the US and the EU in that regard).

But behind all the geopolitics and scaremongering, a very real issue has emerged. In April, 2023, Samsung made an egregious, but at the time a completely understandable mistake in regards to ChatGPT.

The company gave engineers in its semiconductor arm access to the generative AI, and encouraged them to use it in the workplace, to see how generative AI as a whole might improve efficiency, streamline processes and generally make life better. In particular, given generative AI’s democratizing ability when it comes to the code-writing process, Samsung was keen to find out whether using it in that way could help speed up that process.

What no-one had considered until Samsung made its error is that if you add source code into generative AI like ChatGPT and ask it to perform wonders, it can take that code – or indeed, any confidential memos you feed it – into itself and use it elsewhere, outside your company, meaning your confidential information and potentially your proprietary code becomes part of the generative AI, and you no longer have sole control over it.

Samsung's experience of generative AI caused headlines.

Samsung’s experience of generative AI caused headlines.

While Samsung took its lumps and started developing an entirely in-house generative AI to learn some safer lessons from, the case highlighted a major potential flaw in the whole generative AI project for companies all around the world. If you couldn’t add real proprietary data to the system without losing control of the data forever, could you even use generative AI in any deep way to deliver insights?

Samsung clearly thought not – it ordered its employees not to use the technology within the workplace on the principle that it was once bitten, twice shy.

But a large percentage of the point of generative AI is its ability to help companies achieve insight that generates economies, connections, or profits by the application of new technology. If it couldn’t do that, would the business case for generative AI evaporate?

The answer is a fairly obvious “no.” Generative AI is something of an “everything engine” – the number and variety of ways it can find uses in the world are almost infinite. But as Samsung showed, the data-hunger of generative AI did create a significant stumbling block to its widespread use within companies on proprietary data.

We took that dilemma to meet Rich Davis, Head of Solutions Marketing at Netskope – a company that claims to have a world first product that makes generative AI safe for that Samsung-style use.

Generative AI safety.

THQ:

A world first in securing generative AI for use in companies?

RD:

Yeah. The background is that, ever since our inception ten years ago, we’ve been focused on protecting data as it moves from users to SaaS apps. And really, generative AI is just another SaaS app.

As it started to appear, we were able to build a parser that understands the language that the client talks when it talks to ChatGPT. And that’s the core of what we’re doing. And what that’s allowed us to do is to get really good visibility into the usage and growth of not just ChatGPT, but all of the generative AI tools.

And from there, we’ve been able to pinpoint where we should focus first, get an idea as to the types of usage that are growth industries, and from there, understand from our customers what they are actually trying to do, what they’re trying to solve, and where their biggest concerns lie, so we can build a component of our system that uses existing technology to enable the safe use of generative AI.

THQ:

How many customers actually know what they’re trying to do with it? Or are they just trying to actively do something with it? Or, come to that, is it just growing into part of what they do?

RD:

I talk to customers who ask me for insight into what other customers are doing, what people are saying, and about 10% of our customer base just outright blocked it, they’ve used that Samsung strategy.

A generative AI ban.

But when you say you’re just going to block all access to any generative AI tools, that becomes very problematic when you look at things like applications talking to applications, API-based access, because you can’t use your normal web gateway for that, that requires more advanced capabilities. So that’s the first thing – the companies that are just disengaging with it completely are missing out on capabilities they need.

But certainly, a lot of people have just made that snap decision, thinking “I don’t really know how this is going to impact me, so the safest option is to just not do anything.”

The problem that most companies worry about is that they don’t want to miss the boat, they don’t want their competition to use some of this technology to innovate, to get ahead of the game, and get a competitive advantage.

THQ:

Because if your competitors can find a way to use it, and you can’t, chances are you’re going out of business.

RD:

Pretty much. So they want to enable usage, they want to allow their teams to start investigating the usage of these tools, to discover how they can be used, whether they’re using the broad brush ChatGPT, Bard and the like, or whether they’re trying to understand how they can start using open-source versions. That means you have to ask “How can I tailor this? How can I bring this in-house on my own dataset and make use of it that way?”

So you’ve got two different discussions ongoing. And people can’t really have the latter conversation without at least understanding the former.

The biggest trend I see is organizations that never wanted to use it… starting to using it, starting to understand it within their business units, but doing it in a safe way. The last thing they want is their core intellectual property being thrown in there.

So really, that’s the buzzword. It boils down to how we can safely allow people to use it. And the other thing is that most organizations really haven’t understood yet is the impact. Where’s this data going? Is the data I’m submitting being used to actually retrain the model at this point or not? Might it be used that way in the future?

The questionable accuracy of generative AI.

The third really interesting topic surrounding business use of generative AI is the question “Is what I’m getting back accurate? Can this actually be used negatively to poison results? Could this negatively affect my brand? Could the data I’m getting back actually be poisoned by a competitor or somebody else to negatively impact my business?”

Generative AI brings fundamental data questions for companies.

Generative AI brings a range of data questions to companies.

THQ:

It’s a strange combination that’s gotten hold of the industry right now, isn’t it? The combination of fear, paranoia, and a kind of yearning to make use of something, because everyone else probably will.

RD:

Yeah, exactly.

The general media coverage hasn’t helped much, because anything that makes the mainstream news gets non-cyber folks worried. It’s been probably the biggest thing I’ve heard CISOs talk about – that the board is suddenly raising these cybersecurity issues with them. The board never talked to them about cybersecurity before, but suddenly, because this is in the news, and it’s a big thing, they want to understand what their policy is and how they’re using it.

Is it secure? How can they use it to help their business? People are scrambling to understand it, and nobody really wants to take six months to get a real handle on it, and then potentially miss the opportunity to jump ahead of their competitors.

Generative AI is hitting the thought processes of boards.

“What do you MEAN, cybersecurity?”

THQ:

Hey, six months is an eternity in generative AI.

RD:

Ha. You’re not wrong.

 

In Part 2 of this article, we’ll take a deeper dive into the mechanics and the engineering of how Netskope’s new tool makes generative AI safe – and take a look at the ethics of the solution.

Generative AI – always hungry for data.