HARDWARE

Like the sound of private, air-gapped LLMs? Welcome to generative AI on CPUs.

Dynamic sparsity allows generative AI to run on CPUs, side-stepping GPU shortages, and enabling private, air-gapped LLMs.

13 July 2023

James Tyrrell

@JT_bluebird1

james.tyrrell@hybrid.co

All stories

Air-gapped generative AI: dynamic sparsity makes it possible to run billion parameter models on CPUs and have pocket LLMs that users could carry around. And if that wasn’t enough, the approach could fix GPU shortages too.

Getting your Trinity Audio player ready...

We are living in boom times for GPUs. The huge appetite for augmenting software with large language models (LLMs) and building advanced chatbots is great news for chip designers such as NVIDIA, whose hardware powers generative AI. Users are scrambling for LLM training stacks and inference architecture, leading to supply issues. But what if you could sidestep GPU shortages entirely by running deep learning algorithms on CPUs instead?

Dynamic sparsity saves resources

OpenAI’s GPT-3, which provided the foundation for the hugely successful ChatGPT, has 175 billion parameters – the various weights that determine how deep learning neural networks map inputs into outputs. And, as it turns out, this number can be made dynamically much smaller without impacting model accuracy, which is a game changer for democratizing AI.

Being able to train deep learning models with hundreds of thousands of input dimensions and several thousand hidden layers on a CPU makes generative AI much more portable and standalone. So-called pocket LLMs – which could be carried in a backpack and kept offline – provide air-gapped generative AI capabilities that are ideal for keeping company data safe.

In the excitement to see what’s possible using conversational AI, many users may have inadvertently submitted sensitive data to LLMs hosted in the cloud. AI heavyweights such as Open AI and Google warn users not to enter sensitive information – pointing out that human reviewers may process conversations for quality purposes.

Big players have invested substantial sums in reinforcement learning with human feedback to make advanced chatbots the success story that they’ve become. And air-gapped, pocket LLMs running on CPUs give users the chance to bring that model refinement in-house and take full control over the development of domain-specific solutions trained with private data.

Chips in high demand – why are there GPU shortages?

GPUs became all the rage in the AI community when developers saw that bigger was better. Large models with many hidden layers fed with huge amounts of data produced rubbish to begin with. But leaving that compute whirring away for days – back-propagating results to optimize the weights of the neural network, and completing multiple passes (epochs) through the whole training set – produced staggering results.

In the case of next-word predicting LLMs, data doesn’t even have to be labelled. Models are shown sentences with words removed and have to guess what’s missing. Incredibly, when performed at scale, this unsupervised learning method is capable of teaching computers how to translate languages, write code, and converse convincingly with humans – to list just a few wonders of generative AI.

Bolt dynamic sparsity engine for CPUs

By focusing the compute on the high activations and ignoring the low activations, developers find that it’s possible to use commonly available CPUs – even though such chips have fewer cores. And users can once again benefit from the superior memory capacity of CPUs – one of the constraints of fast, parallel-processing GPUs.

“People haven’t seen what CPUs can do,” Anshu Shrivastava CEO and Founder of ThirdAI (pronounced third-eye) told TechHQ. “We’ve done all the heavy lifting to make model training very efficient.”

Rather than having to pipe their business intelligence into a third-party service, users can instead – thanks to ThirdAI’s dynamic sparsity engine, dubbed Bolt – build and deploy billion parameter models that run on their own CPUs. And having that information locally, in-house, makes it so much easier to keep those generative AI-solutions – which could be internal company chatbots, or highly interactive document search tools – up to date.

“Every data point that goes through infrerence can become part of a new training set,” adds Shrivastava. “It’s a continuous process.”