ChatGPT’s Japanese discernable with 100% accuracy

8 September 2023

“an old Japanese language text book” by ken2754@Yokohama is licensed under CC BY-SA 2.0.

Getting your Trinity Audio player ready...
  • AI researchers in Japan spot 100% of fakes.
  • ChatGPT worst at faux language compared to competitors.

Unless you’ve been living the life of a mole or set up refuge on North Sentinel Island (one of the most isolated islands in the world), then you will know about Open AI’s ChatGPT. The generative artificial intelligence (AI) chatbot became the “fastest-growing consumer application in history” earlier this year, according to a UPS report. This accolade was short-lived, however, as Meta’s Threads has since become the fastest growing app.

Nonetheless, there’s no arguing with the fact that ChatGPT has become a worldwide phenomenon. Currently, the Natural Language Processing (NLP) model can generate text in around 95 languages.

AI researchers’ findings

While the chatbot initially appears to be fluently proficient in many of these languages, a recent study conducted by AI researchers Wataru Zaitsu (associate professor of criminal psychology at Mejiro University in Tokyo) and Jin Mingzhe (professor of data science at Kyoto University of Advanced Science), found ChatGPT’s generated Japanese texts can be distinguished from texts written by humans.

A Mastodon post concerning AI research in the educational sector.

Source: Mastodon

Studies, like this one, have become crucial as organizations, particularly those associated with education, have grown increasingly concerned with writings created by AI but passed off as human-authored. These include academic papers, reports, legal documents, and university essays.

Steps have already been taken to distinguish AI-generated papers from human written pieces. AI detectors, such as GPTZero and Winston AI, can be used to ensure written essays and papers are not artificially generated.

English papers may be discerned to be artificial or ‘real,’ by AI researchers but no Japanese texts had been subjected to tests until Zaitsu and Mingzhe’s recent research. They studied 72 Japanese-language psychology academic texts and compared them with a further 144 papers generated by ChatGPT.

The AI papers were produced using two ChatGPT versions (there are currently three versions – Legacy – 3.5, Default – 3.5, and the most recent update – 4). This analysis used ChatGPT versions 3.5 and 4.

Methods used by AI researchers Zaitsu & Mingzhe

The study looked at specific stylistic features, with a view to finding any common sequences used by AI. The main stylistic features examined included the placement of commas and the patterns of speech fragments in the texts.

The researchers discovered that ChatGPT texts tended to use commas more regularly after “wa,” a postpositional particle. Not only that, but the prefix “hon” (the present) was employed more frequently.

AI researchers find it's easy to discern fakes in Japanese texts.

“japanese text is cool” by rombocket is licensed under CC BY-ND 2.0.

From these findings, a machine learning classifier, a type of artificial intelligence algorithm, could differentiate between the human-written text and ChatGPT’s with 100% accuracy.

The techniques implemented to distinguish parts-of-speech and commas have previously been used in criminal investigations to determine or validate the origin of letters, notes, and other written documents. Zaitsu, a former researcher with the Toyama Prefectural Police, conducted such tests as part of the force’s criminal investigation laboratory.

Zaitsu suggests that, at first glance, the texts will present as if they were authored by humans. However, referring to AI generated text, he stated, “if you rely on data, you can tell them apart from human-written texts without much difficulty,”

The findings are so conclusive that it is easier to distinguish AI text from human text than it is between different humans.

ChatGPT is undoubtedly one of the most popular and widely-used AI text generators out there right now, alongside other examples like Jasper AI and Zaitsu noted that other text-generating AI chatbots excel in producing Japanese texts that closely resemble human-written content, making them more challenging to distinguish.

With AI chatbots and technology continuing to improve, more methods will need to be put in place in settings where human input is essential. But, the more AI technology develops, the harder it will be to tell the difference.

The full research paper can be found here.