ARTIFICIAL INTELLIGENCE

ChatGPT screening: OpenAI text classifier versus GPTZero app

GPTZero app readily detects AI-generated content thanks to perplexity and burstiness analysis. But OpenAI text classifier struggles.

6 February 2023

James Tyrrell

@JT_bluebird1

james.tyrrell@hybrid.co

All stories

Robotext is on the rise, but AI text screening tools can vary wildly in their ability to differentiate between human- and machine-written web content. Image credit: Shutterstock Generate.

Getting your Trinity Audio player ready...

What used to take hours of painstaking research and crafting, can now be generated in seconds. AI writing tools have put an end to writer’s block – no more staring at a blank screen; wondering what to type. In sales, marketing, and other business communications, AI-generated text and writing suggestions offer a productivity boost. And the uplift in email open rates, read time, and other metrics sells the benefits. King of the hill, currently, is OpenAI’s wildly successful chatbot, ChatGPT. And its text autocompleting API is attracting a wave of product developers, allowing a variety of apps to easily harness the powerful capabilities of large language models such as GPT-3, which has a mind-blowing 175 billion parameters. But as machines speed ahead, humankind has some catching up to do. And some experts worry that AI-generated text makes it too easy to spread misinformation. The OpenAI text classifier, released just a few days ago, provides an online tool for checking whether articles have been written by humans or generated using AI. However, it may provide little comfort – at least based on the testing carried out by TechHQ.

Putting the OpenAI text classifier to the test

To understand the capabilities of AI content detectors to differentiate between human-written prose and machine-generated text, we fed a series of five different samples into the OpenAI text classifier. OpenAI cautions that its text classifier isn’t fully reliable in detecting AI content, particularly on shorter documents with fewer than 1000 characters (200 – 250 words). All of the text samples tested against OpenAI’s screening tool were substantially longer than this lower limit.

Details on each of the test samples fed into the OpenAI text classifier are given below, together with the output results.

Sample 1: news article from a publisher experimenting with using AI-writing tools as part of its content generation process.

‘The classifier considers the text to be possibly AI-generated.’

Sample 2: news article from a publisher experimenting with using AI-writing tools as part of its content generation process.

‘The classifier considers the text to be unlikely AI-generated.’

Sample 3: tech story generated by ChatGPT based on the prompt – ‘write an 800-word news item about telecommunications in the style of a human’.

‘The classifier considers the text to be possibly AI-generated’.

Sample 4: 100% guaranteed human-written tech story published on TechHQ.

‘The classifier considers the text to be very unlikely AI-generated’

Sample 5: machine-generated email produced using an AI powered sales enablement tool.

‘The classifier considers the text to be unclear if it is AI-generated.’

Analysing the results of AI content detection

It’s a small sample of results, but there are already trends that jump out. The OpenAI text classifier appears to lean towards cheerleading human-written content rather than calling out articles generated using a chatbot. OpenAI’s development team notes that the tool is configured to minimize the number of false positives – in other words, the number of times that human-written text is misclassified as being generated by AI. And these settings appear to make the OpenAI text classifier more cautious and less willing to point the finger, even for documents that are fully machine-generated, as was the case for sample 3.

GPTZero to the rescue?

At this point in the story, we could do with some good news. And it may turn out to arrive in the shape of GPTZero, an AI content detector developed by Edward Tian – a computer scientist studying at Princeton University, US. GPTZero – which debuted on the data app sharing platform, Streamlit, and can now be found at gptzero.me – produced some strikingly honest results when tested using our sample data. The AI content detector highlights passages of text considered to be more likely to have been generated by a machine than written by a human. And the screening tool provides two accompanying scores dubbed ‘perplexity’ and ‘burstiness’. According to Tian’s descriptions, Perplexity is a measurement of the randomness of the input text. And burstiness indicates the variation in perplexity.