Demystifying language models – an explainer on how LLMs work

Mascha Kurpicz-Briki’s book takes readers under the hood of today’s chatbots by explaining how LLMs work and looks through the hype.
20 December 2023

More than a Chatbot – Language Models Demystified – explains how LLMs work.

Ever wondered what’s happening under the hood of today’s powerful chatbots? Having a better understanding of how large language models (LLMs) work not only shines a light on what’s possible, but also helps to pinpoint their limitations (and see through the hype). And a great place to start that journey is to learn from experts in applied machine intelligence, such as Mascha Kurpicz-Briki – who has just written a book on demystifying LLMs.

Her guide to how LLMs work steps through all of the need-to-know concepts underpinning natural language processing. And by the end of the journey, readers will have a clear understanding of what it takes for computers to process written text.

Machine learning enables AI

The path begins with machine learning – looking at how computers can learn to solve tasks involving similar, but previously unseen data – before moving into the world of deep learning. Here, readers are introduced to the fascinating fundamentals of neural networks and get to dive into the topic of word embeddings – representing human language as a series of mathematical vectors.

A key takeaway here is that words of similar meaning have vectors that are closer together, which chimes with the linguistic notion that ‘you shall know a word by the company it keeps’ – attributed to J.R. Firth. And neural networks take this concept to the next level.

“Higher-dimensional vectors help to capture different properties of the words and thus improve the quality of the relations between the word embeddings,” writes Kurpicz-Briki.

Rather than learning the meaning of the word directly, algorithms generate those dictionaries of word vectors by recognizing common words that often appear together in sentences, the author points out. And this hints at the statistical marvel that is today’s chatbot.

By this point in the multi-stage tour into how LLMs work, readers can expect to be well-equipped with a high-level view of language models, as well as having an understanding of part-of-speech tagging and how word dependency is established.

One of the big advantages of LLMs, as Kurpicz-Briki highlights, is that their next-word predictive power considers not just how two words relate to each other, but also acknowledges ‘previous states’. And the fact that chatbots can consider other words in the sentence before predicting what’s likely to come next is a big tell on how LLMs work.

Reaching the heart of the book, the author considers how clever context mechanics have improved model output and tackles the topic of encoders and decoders.

Taking the sentence: ‘There is a field of strawberries, and it is so beautiful!’, as an example, she explains how applying self-attention boosts the model weight for field as it is effectively referred to twice.

“Using this mechanism of self-attention, we do not lose track of words referring to other words that have appeared previously in the sentence,” writes Kurpicz-Briki.

Transformer-based models enable contextualized word embeddings. For example, orange can appear alongside other fruits in the vector space. But the word can be mapped with colors too, recognizing that it has different meanings, which are encoded accordingly by considering whole sentences.

Encoders and decoders

Encoders can be trained using word masking, and – once complete – they pave the way for next-sentence prediction, which begins to paint a picture of how decoders work.

At a high-level, finding information that is similar, in terms of its vector encoding, to the sentence embedding of a user’s question can identify source material that may serve as a useful response. And the more training data that you can feed a model, the greater the potential for improving on those results.

Having described what it takes to make a model that’s accurate and precise, Kurpicz-Briki then raises the topic of model bias. What comes out is related to what goes in – it’s how LLMs work. And that can reflect on some undesirable pairs of questions and answers, which can make chatbots generate some ugly and downright dangerous responses.

Mitigating bias is not straightforward. It’s a major undertaking for developers to try and align their model to that they don’t offend users. We learned earlier how it was an advantage to have contextualized word embeddings. But it also means that bias can be highly entangled in LLMs.

Developers may think that they’ve removed everything, but rephrasing a text prompt could elicit previously hidden bias. “Existing detection methods typically identify a very specific bias and mitigate it in the best case, but do not solve the problem as a whole,” the author cautions.

Having gained a good understanding of how LLMs work, their origins, and some of the dangers that go with the territory, readers are well-placed for the final section of the book, which certainly provides food for thought.

The future of humans and language models is a pressing topic. There are huge advantages in utilizing LLMs to make it possible to query vast amounts of data using natural language prompts. Users can talk to data without having to learn specific commands or spend time formatting information. And that’s just one of the many ways that chatbots can boost productivity for firms.

Don’t overlook the uncanny valley

AI will reshape the business landscape, education, and other key elements of our lives. And having a knowledge of how LLMs work will certainly help in understanding some of those twists and turns.

For example, Kurpicz-Briki raises the issue of the uncanny valley effect, which has its origins in robotics – where designs that are clearly distinguishable from humans are better accepted by users – but also applies to chatbots. It’s a topic that touches on the design of VR applications too.

Users tend to get freaked out when systems mimic humans too closely and billion-parameter statistical text predictors have gotten pretty good at that. It’s another reason to read ‘More than a Chatbot – Language Models Demystified’ – to realize that while LLMs are impressive, they are not magic.