Blockchain won’t stop AIs stealing copyrighted work

AI might be clever - but is it art?
23 January 2024

“2014 – Panama Canal Transit – Dredger” by Ted’s photos – For Me & You is licensed under CC BY-NC-SA 2.0.

  • Generative AI copyright issues won’t be solved by blockchain.
  • Multiple lawsuits attempt to claw back owners’ rights.
  • Use of ‘scraped’ materials is ‘fair use.’

The economic viability of machine learning as a service (MLaaS) is being stymied by a host of lawsuits, most notably against Anthropic and OpenAI. In all cases, owners of copyrighted material object to and seek compensation for the use of materials used without their permission to train machine learning models.

Generative AI models used by companies like OpenAI scrape vast amounts of data from the public internet, but the companies claim their methods constitute ‘fair use’ of publicly-available materials. There are several legal arguments in play, including “volitional conduct,” which refers to the idea that a company that commits copyright infringement has to be shown to have control over the output of the disputed materials. In short, if you can get OpenAI to disgorge a line of poetry, verbatim, that’s been published under a notice of copyright (say, at the bottom of the web page it’s published on), OpenAI is in breach of copyright.

In the case of the New York Times‘s action against OpenAI, the newspaper claims the ML engine crawled and absorbed millions of NYT articles to inform the popular AI engine, gaining a “free ride on the Times’s massive investment in its journalism,” according to the text of the lawsuit.

Generative AI copyright lawsuits

Similar cases have been brought against Midjourney and Stability AI (owners of Stable Diffusion) by Getty Images, who also cite copyright infringement of images they own the rights to. Class action suits have also been brought DeviantArt, whose machine models produce images from users’ text prompts.

Giving evidence to the UK’s House of Lords Communications and Digital Select Committee, OpenAI claimed, “[…] it would be impossible to train today’s leading AI models without using copyrighted materials.” In this, the company admits it’s trained models on materials legally owned by others, but it’s ‘fair use.’

Toot illustrating generative AI copyright article.

Source: fosstodon.org

The case of Getty Images is particularly noteworthy. The company had steered wide of any AI image creation offering, citing “real concerns with respect to the copyright of outputs from these models and unaddressed rights issues with respect to the imagery, the image metadata, and those individuals contained within the imagery,” said Getty’s CEO, Craig Peters in an article in The Verge in 2022.

However, the company announced ‘Generative AI by iStock’ at CES 2024, which draws on its library of images, claiming legal protection and usage rights for content creators. “You can rest assured that the images you generate, and license, are backed by our uncapped indemnification,” the company’s website now states, and that it’s “created a model that compensates […] content creators for the use of their work in our AI model, allowing them to continue to create more […] pre‑shot imagery you depend on.”

Enforcing copyright has always been problematic online, especially if the owner of published media lacks the backing of a phalanx of sharp-toothed lawyers, shoals of whom tend to congregate around large businesses and organizations rather than independent content creators. The choice for artists, musicians, and even part-time bloggers has always, since the internet began, been whether or not to publish. Put the message or media ‘out there,’ and it’s open to potential exploitation by others. Don’t publish digitally, and risk obscurity. Halfway houses like robots.txt files that state “No ML crawlers” in the same way that it’s hoped search engines will not index website pages (“no follow, pleeease”) are a gamble that trusts the inherent good nature of huge corporations controlling the ML models, like Microsoft in the case of Open AI.

Because nobody ever got burned trusting huge corporations. Right?

Blockchain to solve copyright issues?

Speaking to Business Insider last week, the CEO of cryptocurrency trading specialist Greyscale, Michael Sonnenshein, suggested blockchain would be an immutable way of proving copyrighted material’s provenance. “[…] To us, it’s just so obvious that you need an irrefutable, immutable technology to marry [authenticity and ownership], to actually head-on address some of the issues, and that technology is blockchain, which underpins crypto[currency]. […] All of a sudden, issues like provenance and authenticity and ownership, etc., get resolved really, really quickly.”

There are three reasons why Sonneshein’s assertions are fallacious. Firstly, we already have an ascertained authenticity via blockchain: it’s a Ponzi scheme called NFTs, which are vauled only by the idiots who trade in them. Secondly, blockchain publication has the potential to be ecologically disastrous. The coin mining industry for Bitcoin, Ether, and Monero produces the equivalent carbon of a medium-sized country each year. For example, each $1 value of Bitcoin produces $0.50 in environmental and health damage (primarily through air pollution from fossil fuel generation powering mining rigs). Thirdly, if we ensure creators are “properly compensated and credited for what they produce,” because blockcahin tells us, without doubt, who they are, we have come full circle. To re-quote OpenAI’s statement in the UK’s House of Lords committee rooms:

“[…] it would be impossible to train today’s leading AI models without using copyrighted materials.” What the word “impossible” means, in context, is “too costly.” Could you imagine a world where big AI-as-a-service providers track down and pay every content creator on the internet for the use of their work to train AI models? No, neither can we.

As a content creator, the only sure-fire way to protect copyright is to digitally encrypt every item, or place it behind some kind of insurmountable barrier where it can’t be scraped. That’s a paywall, or equivalent walled garden in front of every creator’s work. At a stroke, the internet – designed to be a place for the free and open interchange of ideas, knowledge, and perhaps art – becomes the victim of voracious machine learning algorithms controlled by global businesses.

The world in which generative AI and copyright peacefully co-exist may be attainable. What it won’t be is either cost-effective or easy to achieve.

Trawling the seas to illustrate generative AI/ML copyright discussion article.

“Golden Sky Trawler” by 4BlueEyes Pete Williamson is licensed under CC BY-NC-ND 2.0.