Now that GPT-4 is here, what sets it apart from OpenAI’s GPT-3.5?

GPT-4 is more reliable and creative and can handle more nuanced instructions than GPT-3.5.
17 March 2023

Now that GPT-4 is here, what sets it apart from OpenAI’s GPT-3.5? Source: Shutterstock

San Francisco artificial intelligence company OpenAI has released GPT-4, the latest AI model that the company has been working on for most of 2022. The latest milestone in OpenAI’s effort to scale up deep learning shows how stunningly good GPT-4 is at writing essays and solving complex coding problems, among others. 

OpenAI was meant to release GPT-4 first, but when it was almost ready, to the surprise of many, the company decided to shelve its launching plans. Instead, OpenAI updated an unreleased chatbot that used a souped-up version of GPT-3, the company’s previous language model released in 2020.

That was when ChatGPT was launched–a chatbot with GPT-3.5. The generative AI chatbot quickly became a global phenomenon. But just as the world grasped the captivating ChatGPT, OpenAI released its latest large multimodal model that accepts image and text inputs, emitting text outputs. “While less capable than humans in many real-world scenarios, (GPT-4) exhibits human-level performance on various professional and academic benchmarks,” OpenAI said in a blog post.

“A year ago, we trained GPT-3.5 as a first “test run” of the system. We found and fixed some bugs and improved our theoretical foundations,” Tuesday’s blog post reads. As a result, OpenAI’s GPT-4 training run was “unprecedentedly stable,” becoming its first large model whose training performance OpenAI could accurately predict ahead of time. 

Interestingly, according to Microsoft’s head of consumer marketing Yusuf Medhi in a blog post, those who had begun using the new Bing in preview in the last six weeks would have had an early look at the power of the latest model.

GPT-4 vs. GPT-3.5

OpenAI said GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. GPT-4 even passed a simulated bar exam with a score around the top 10% of test takers – by contrast, GPT-3.5’s score was around the bottom 10%. 

“We’ve spent six months iteratively aligning GPT-4 using lessons from our adversarial testing program and ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails,” OpenAI said. The company also says the distinction between GPT-3.5 and GPT-4 can be subtle in a casual conversation. 

“The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5,” it added. However, OpenAI also warned that despite its capabilities, GPT-4 has similar limitations as earlier GPT models. 

“Most importantly, it still is not fully reliable,” the blog post reads. GPT-4 still “hallucinates” facts and makes reasoning errors. “While still a real issue, GPT-4 significantly reduces hallucinations relative to previous models. GPT-4 scores 40% higher than our latest GPT-3.5 on our internal adversarial factuality evaluations,” Open AI said.

The latest large multimodal model however, lacks knowledge of events after the vast majority of its data cut off (September 2021) and does not learn from its experience. OpenAI said it could sometimes make simple reasoning errors that do not seem to comport with competence across so many domains or be overly gullible in accepting obvious false statements from a user. 

“And sometimes it can fail at hard problems the same way humans do, such as introducing security vulnerabilities into code it produces,” the company said. “GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake,” OpenAI added.

So far, GPT-4 access is limited for ChatGPT Plus subscribers with a usage cap. The AI company said it would adjust the exact usage cap depending on demand and system performance in practice. “Depending on the traffic patterns we see, we may introduce a new subscription level for higher-volume GPT-4 usage; we also hope to offer some free GPT-4 queries so those without a subscription can try it too,” the company concluded.