LLaMA leak mixed blessing for Facebook AI

Meta granted access to its set of large language models to approved researchers, but LLaMA model weights have reportedly escaped the lab.
8 March 2023

Model weights in the wild? Image credit: Shutterstock Generate.

Getting your Trinity Audio player ready...

Large language models (LLMs) such as GPT-3, which, by predicting the words that are most likely to come next in a sequence, have transformed the performance of chatbots and are inspiring a variety of enterprise applications. Wading through years of emails using a regular search tool is likely to be a slow and tedious process. But advanced chatbots that have chewed through the data could potentially serve up useful answers in seconds. Breakthroughs in LLMs could give AI-powered customer service agents a much greater capacity to understand incoming queries and help contact centers to manage traffic more efficiently. And the conversation has been dominated by one firm in particular, OpenAI, which has captivated millions of users through its incredibly popular ChatGPT tool. But OpenAI isn’t the only LLM developer in town. Facebook operator Meta has built a set of models too, known as LLaMA, which has gone under the radar, comparatively. But reports that LLaMA model weights have been leaked online could change that.

What makes LLaMA so special?

The creation of GPT-3 is an incredible achievement. The LLM has 175 billion parameters, a record at the time of its release. But digesting vast amounts of text scraped from the internet takes some doing. And OpenAI required a custom-built supercomputer – hosted on Azure by its investment partner, Microsoft – to complete the task. The machine features a whopping 10,000 GPUs that had to whirr away for months, and consumed plenty of energy in the process. Meta, on the other hand, isn’t focusing on building the biggest LLM. Instead, the social media giant is striking out in a different direction, as its AI team revealed in a recent blog post. “Smaller, more performant models such as LLaMA enable others in the research community who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this important, fast-changing field,” writes Meta AI.

LLaMA, which was apparently trained exclusively on publicly available datasets, consists of a set of LLMs ranging from 7 billion to 65 billion parameters in size. And, according to results published on arXiv [PDF], ‘LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B’. For reference, PaLM was developed by Google, and Chinchilla-70B is one of DeepMind’s set of LLMs. In fact, DeepMind’s work back in 2022 flagged that simply making LLMs bigger and bigger may not be the way to go. Writing up its results in a technical blog, the team found that the current large language models are ‘far too large for their compute budget and are not being trained on enough data’.

DeepMind tested its data scaling hypothesis by training Chinchilla for 1.3 trillion tokens (1 token represents around 4 characters in English, and 100 tokens equates to roughly 75 words) and reported that Chinchilla outperformed Gopher, its previous LLM, despite Chinchilla having just 70 billion parameters, which may seem like a lot, but it is 210 billion fewer than Gopher. LLaMA’s developers have clearly picked up on this and have gone big on data (Meta AI has trained its set of models on trillions of tokens) rather than parameters. For Meta, the upshot of this strategy is that it has developed a powerful LLM that can be run on a relatively modest computing setup, albeit not one that everyone will have lying around at home.

According to reports of the leaked model weights, Meta’s 13 billion parameter version of LLaMA – the configuration that is said to be comparable to GPT-3, although experts caution that AI benchmarking isn’t always reliable – can be run on a single A100 GPU. NVIDIA’s A100 Tensor Core GPU is part of its HPC data center platform, so perhaps calling it a ‘modest computing setup’ is a stretch. But it is highly affordable compared with the 10,000 GPU infrastructure used to create GPT-3, and does represent a shift towards much more energy-efficient development of LLMs.

Meta’s plan was to make its foundation set of LLaMA models available ‘on a case-by-case basis’ to research labs. There’s a big difference between ChatGPT, which has been optimized for conversation and cleansed for toxic comments by human-in-the-loop teams, and LLaMA in its foundation stage. And it’s possible that Meta is cautious about how LLaMA will behave when users put it to the test with all kinds of strange prompts. So, better to keep rogue outputs behind closed doors and team up with trusted partners to fine-tune the model.

But, if reports are correct, LLaMA’s model weights have escaped into the wild, which means that in the hands of someone with sufficient knowledge to get the LLM up and running, its output – good, bad, and ugly – could become very public. However, there’s no such thing as bad publicity, right? And the incident has boosted attention on Meta’s progress in generative AI. Also, the team behind GPTZero (an AI-generated content detector) has – according to an email sent to subscribers – taken the opportunity to generate ‘large swaths of LLaMA text’ so that it can optimize its screening tool for not just current tools, but also upcoming generative AI products.