How could blockchain solve the AI copyright problem?

Can blockchain genuinely bring generative AI and copyright holders together?
29 January 2024

LLMs need to eat copyrighted material to have any function. Which could be tricky…

• AI companies are being sued for copyright infringement.
• The AI companies claim they couldn’t train their models without copyright material.
• Does blockchain offer a way forward?

As generative AI continues to grab headlines around the world, its future is not all rosy and certain. In particular, AI has a copyright problem. Large language models, such as OpenAI’s ChatGPT, have faced legal battles due to possible copyright infringements, but crypto-technologies may be the answer to AI’s issues, according to Grayscale CEO Michael Sonnenshein.

It may seem like generative AI is a juggernaut quickly on its way into uncharted realms of technology, but these lawsuits could put the brakes on its rapid development. One main issue has been AI allegedly using copyrighted content as free training material for AI chatbots.

AI and the copyright problem

In December 2023, a lawsuit was made against OpenAI and Microsoft (part owner of OpenAI) by The New York Times. It was alleged that OpenAI utilized over a million New York Times articles to train chatbots. This was the first instance of a major American media organization suing an AI company over copyright infringements, though computer programmers and novelists have previously filed copyright suits against various AI companies.

Other AI models have faced legal issues, such as Stability AI, an AI image generation platform. Last year, Getty Images sued the startup company, claiming it used copyrighted images from its library to help train the Stable Diffusion model. This is set to go to trial in the UK this year.

OpenAI has responded to copyright violation claims, saying that copyright material is a key requirement for developing large language models. Without this material, it would be impossible to train AI chatbots; something OpenAI told the UK Parliament’s House of Lords Communications and Digital Select Committee in December 2023.

According to OpenAI, copyright “covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents.” Therefore, the company argues that “it would be impossible to train today’s leading AI models without using copyrighted materials”. It’s clearly a matter of generative AI vs the media right now.

The blockchain factor

There may be a light at the end of the tunnel for generative AI, however. Grayscale (an American digital currency asset management company) CEO Michael Sonnenshein believes blockchain technology could be the solution to AI’s copyright woes, helping create a fairer system, one that allows copyright owners to track their material used by large language models and other generative AI systems. That way, the owner can be compensated fairly when their material is used in any shape or form by AI.

How can AI function without infringing copyright?

Imagine trying to output a solution “in the style of Proust”…without knowing who Proust was. That’s the generative AI copyright dilemma.

Currently, understanding who the true copyright owner is, authenticating information, and the rise of deepfakes are just some of the challenges facing AI. The solution could be to ward off threats posed by one powerful technology, in this case generative AI, with another, blockchain.

Functioning as a digital ledger, blockchain enables the transparent sharing of information, ensuring virtual immunity against data manipulation or hacking. Blockchain may be best known as the engine that runs cryptocurrency, but it is already being used to improve transparency and the sharing of information of medical records in the healthcare industry. Blockchain has also become a key tool for tracing the food supply chain in agriculture. So teaming up blockchain with AI could potentially overcome various hurdles, enabling further AI development.

Regardless of whether it’s AI models like ChatGPT, Stability AI, or Midjourney, the main issue is who owns material generated by AI, such as an image on Midjourney? There is a growing belief that issues regarding ownership and authenticity could be resolved if some of the outputs of the material are tied back to the blockchain or programmed into tokens. By tokenizing AI-generated artwork or text, security and trust can be heightened, improving traceability, authentication, and overall efficiency in various applications. This could theoretically resolve any copyright concerns promptly and easily.

Whether blockchain and generative AI’s relationship is a happily ever after story or suffers a tragic Shakespearean fate is as yet unknown. But there are signs that they could grow quickly and powerfully together, benefiting generative AI models, copyright owners, and creators alike.