ARTIFICIAL INTELLIGENCE

Edge AI: how to make deep learning more efficient

Quantization, pruning, and teacher and student models are just a few ways to make deep learning more efficient to open up edge AI use cases.

22 August 2023

Edge AI use cases benefit from leaner, more efficient deep learning models.

James Tyrrell

@JT_bluebird1

james.tyrrell@hybrid.co

All stories

Efficient edge AI on the horizon: multi-layer neural networks are powerful universal function approximators that can now be squeezed into smaller footprint hardware thanks to a suite of deep learning optimization techniques.

Getting your Trinity Audio player ready...

Artificial intelligence (AI) is transformative across today’s industrial landscape. Everything from enterprise software to machine automation is benefiting from the ability of multi-layered neural networks – with sufficient data and training – to make sense of our world. But as the size of deep learning models balloons, opening the door to more complex natural language processing (NLP) and other AI applications, so does the amount of compute that’s required. And that’s a problem when it comes to edge AI.

Edge AI trend

Deploying deep learning algorithms on portable computing hardware such as smartphones or onboard vehicles gives users access to powerful image recognition capabilities – to give just one of many use cases. And running models locally on edge AI hardware provides resilience against any interruption in connectivity.

There are also energy considerations. Users are starting to question the environmental impact of running giant AI algorithms in the cloud, given the energy cost of training models with billions of parameters and consuming large amounts of cooling water in the process. But, as it turns out, developers have become experts at pruning their models to reduce the computing demands of deep learning inference with only a minor impact on the accuracy of results.

At an abstract level, you can think of a deep neural network as a universal function approximator. Given enough parameters, everything can be represented by a mathematical function. You might have seen formulae that look like shells when plotted in 3D or fractals that resemble tree branches. And large numbers of artificial neurons have proven to be capable of describing images and finding missing words in sentences.

Training these AI algorithms involves adjusting millions of model weights to make patterns of artificial neurons sensitive to certain inputs, such as edge features in an image. It’s also necessary to set biases for each of the nodes in the network, to determine the strength of the activation that’s required to make the corresponding artificial neurons ‘fire’.

If you’ve ever seen an analog music synthesizer covered in knobs, this is a good analogy, but multiply the number of dials by a million or more. And our input could be the feed from a video camera, which – after passing through all of the settings – turned on a light whenever a dog was seen in the image.

A Survey on Model Compression for Large Language Models

"Addressing the imperative need for efficient deployment, we delve into various methodologies, encompassing quantization, pruning, knowledge distillation, and more."https://t.co/aXv66pAdFv #LLM #papers pic.twitter.com/KAiOEcXTEz

— llm.ai (@llmgen_AI) August 22, 2023

Looking at the numbers on the dials, we might see that some parameters are more important than others. And that brings us to the concept of model pruning, which is one approach to squeezing algorithms onto edge AI hardware.

Today, developers use a variety of methods to make edge AI neural networks faster to run and smaller to accommodate without compromising performance. One approach is to zero out very small model weights, which can pinpoint artificial neurons that have little impact on how the algorithm behaves.

Another trick is to retrain the pruned model over a few iterations, which may result in fine tweaks to the other parameters, to recover any of the lost accuracy. Some pruned image recognition algorithms can behave more effectively than the original neural networks, which is a great result for edge AI.

Unfortunately, large language models (LLMs) can be harder to optimize as the retraining step isn’t trivial. But a new approach termed Wanda (pruning by weights and activations), which has been evaluated on the LLaMA family of LLMs, shows that considering activation paths allows 50% of the structure to be pruned without a major loss in performance. And, importantly, the training doesn’t need to be rerun to update the weights.

Considering how the weights are represented can help too – for example, storing values as 8-bit integers rather than in single-precision floating-point format (FP32) can save dramatically on memory. Conventionally, model weights are scaled to between zero and one, but those values can still be recovered from the memory-saving integers for processing.

Another strategy for making algorithms more efficient for edge AI applications is to deploy so-called teacher and student models, where the student learns from the richer information provided by the teacher. Specifically, the teacher model can give the student model the probability distribution of the most likely results as training inputs.

AI success stories

AI success stories in the cloud are inspiring a variety of use cases. And, as developers become more accomplished at compressing that algorithmic performance into smaller footprints, we can expect those benefits to translate into edge AI applications too.

Also, users have a growing number of tools to lean on to optimize their machine-learning models. Google’s TensorFlow Model Optimization Toolkit supports the deployment of models to edge devices that have restrictions on processing, memory, power consumption, network usage, and model storage space.

POPULAR TOPICS

POPULAR TOPICS

Edge AI: how to make deep learning more efficient

Edge AI trend

READ NEXT

Is running AI on CPUs making a comeback?

READ NEXT

What can generative AI do for business?

AI success stories

READ NEXT

Can zero-trust LLMs overcome poisoned generative AI?