GENERATIVE AI

Demystifying language models – an explainer on how LLMs work

Mascha Kurpicz-Briki’s book takes readers under the hood of today’s chatbots by explaining how LLMs work and looks through the hype.

20 December 2023

Book cover of More than a Chatbot, which explains how LLMs work.

James Tyrrell

@JT_bluebird1

james.tyrrell@hybrid.co

All stories

More than a Chatbot – Language Models Demystified – explains how LLMs work.

Getting your Trinity Audio player ready...

Ever wondered what’s happening under the hood of today’s powerful chatbots? Having a better understanding of how large language models (LLMs) work not only shines a light on what’s possible, but also helps to pinpoint their limitations (and see through the hype). And a great place to start that journey is to learn from experts in applied machine intelligence, such as Mascha Kurpicz-Briki – who has just written a book on demystifying LLMs.

Her guide to how LLMs work steps through all of the need-to-know concepts underpinning natural language processing. And by the end of the journey, readers will have a clear understanding of what it takes for computers to process written text.

Machine learning enables AI

The path begins with machine learning – looking at how computers can learn to solve tasks involving similar, but previously unseen data – before moving into the world of deep learning. Here, readers are introduced to the fascinating fundamentals of neural networks and get to dive into the topic of word embeddings – representing human language as a series of mathematical vectors.

One of the big advantages of LLMs, as Kurpicz-Briki highlights, is that their next-word predictive power considers not just how two words relate to each other, but also acknowledges ‘previous states’. And the fact that chatbots can consider other words in the sentence before predicting what’s likely to come next is a big tell on how LLMs work.

Reaching the heart of the book, the author considers how clever context mechanics have improved model output and tackles the topic of encoders and decoders.

Encoders and decoders

Encoders can be trained using word masking, and – once complete – they pave the way for next-sentence prediction, which begins to paint a picture of how decoders work.

At a high-level, finding information that is similar, in terms of its vector encoding, to the sentence embedding of a user’s question can identify source material that may serve as a useful response. And the more training data that you can feed a model, the greater the potential for improving on those results.

Having described what it takes to make a model that’s accurate and precise, Kurpicz-Briki then raises the topic of model bias. What comes out is related to what goes in – it’s how LLMs work. And that can reflect on some undesirable pairs of questions and answers, which can make chatbots generate some ugly and downright dangerous responses.

Mitigating bias is not straightforward. It’s a major undertaking for developers to try and align their model to that they don’t offend users. We learned earlier how it was an advantage to have contextualized word embeddings. But it also means that bias can be highly entangled in LLMs.

Developers may think that they’ve removed everything, but rephrasing a text prompt could elicit previously hidden bias. “Existing detection methods typically identify a very specific bias and mitigate it in the best case, but do not solve the problem as a whole,” the author cautions.

Our latest Vulnerability Rating Taxonomy #VRT release marks a milestone for the #CrowdsourcedSecurity industry. 🔐

Are you wondering how? VRT gives customers and #hackers a shared understanding of how #LLM-related vulnerabilities are defined. Learn more: https://t.co/f1WgsJOkZf pic.twitter.com/QA0eFs9z4R

— bugcrowd (@Bugcrowd) December 19, 2023

Having gained a good understanding of how LLMs work, their origins, and some of the dangers that go with the territory, readers are well-placed for the final section of the book, which certainly provides food for thought.

The future of humans and language models is a pressing topic. There are huge advantages in utilizing LLMs to make it possible to query vast amounts of data using natural language prompts. Users can talk to data without having to learn specific commands or spend time formatting information. And that’s just one of the many ways that chatbots can boost productivity for firms.

Don’t overlook the uncanny valley

AI will reshape the business landscape, education, and other key elements of our lives. And having a knowledge of how LLMs work will certainly help in understanding some of those twists and turns.

For example, Kurpicz-Briki raises the issue of the uncanny valley effect, which has its origins in robotics – where designs that are clearly distinguishable from humans are better accepted by users – but also applies to chatbots. It’s a topic that touches on the design of VR applications too.

Users tend to get freaked out when systems mimic humans too closely and billion-parameter statistical text predictors have gotten pretty good at that. It’s another reason to read ‘More than a Chatbot – Language Models Demystified’ – to realize that while LLMs are impressive, they are not magic.