How MIT is making ground towards ‘greener’ AI

When training an AI model produces five times the lifetime emissions of an average US car, something’s got to change.
27 April 2020

GPUs: High energy consumption hardware used for processing data. Source: Shutterstock

  • AI model training is found to emit up to 626,000 pounds of carbon dioxide 
  • MIT researchers developed an advanced AI system to address the environmental issue
  • The new network will turbocharge the integration of advanced AI in consumer tech 

Artificial intelligence (AI) is powerful and transformational technology whose applications across sectors and industries have world-changing potential. Right now, for example, it’s playing a strategic role in helping scientists in the hunt for a COVID-19 vaccine.

But the more we use it, the more all this future-facing promise conflicts with something: our quest for sustainability. 

A decade ago, AI training was conducted on a commodity laptop or server. Today, training and development is done with a multitude of specialized hardware, including GPUs and TPUs, both of which come with high costs and massive energy consumption. 

A recent research paper by the University of Massachusetts at Amherst revealed that the energy used in AI model training amounted to about 626,000 pounds of carbon dioxide, or the equivalent to five times the lifetime emissions of an average US car – including its manufacturing. 

The massive consumption numbers are due to the energy needed to power the hardware that runs for weeks or months at a time during training.

This consumption only intensifies more during the model deployment phases, when deep neural networks are deployed on diverse hardware platforms, each requiring their own different properties and computational resources in order to operate.

Recognizing the need to lower costs and minimize carbon footprints in AI model training, researchers from MIT have invented a new automated AI system that trains and runs a portion of the neural networks. 

Researchers found that by enhancing the computational efficiency of the system, the carbon emission needed in the training process and subsequent deployment can be reduced to something in the low triple digits, as opposed to six-figures.

In the study, a Once-for-All (OFA) methodology was designed to “decouple model training from architecture search.” 

The OFA network serves as a ‘mother’ network, which is able to web together an exceptionally high number of subnetworks. The ‘mother’ network then shares its knowledge and past experiences with all of the subnetworks, turning them into trained models that enable them to operate independently without the need for retraining.

In contrast with previous methods that required a neural network to be retained for each deployment scenario, the OFA network supports a variety of architectural configurations, taking into account “elastic depth, width, kernel size, and resolution,” as written in the research paper.

The inherent nature of an OFA network employs efficiency and reduces the need for multiple training and retraining of subnetworks, saving on costs and lowering the overall carbon emission.

Apart from being a more sustainable method, the AI-powered OFA system also has the potential to revolutionize both enterprise and consumer devices. 

The OFA network can be used as a search base to find the most suitable subnetwork needed for respective devices, utilizing “the accuracy and latency tradeoffs that correlate to the platform’s power and speed limits,” a press release stated. 

Different subnetwork requirements between IoT devices and smartphones are two elements that underpin the complexity of searching and scaling neural network architecture to each tool. 

An OFA system can consist of more than 10 quintillion (that’s a one followed by 19 zeros) different architectural settings — a vault of networks that can be infinitely beneficial in pushing technological progress. 

Song Han, an assistant professor in the Department of Electrical Engineering and Computer Science, added, “Searching efficient neural network architectures has until now had a huge carbon footprint. But we reduced that footprint by orders of magnitude with these new methods.”

Han concluded that the goal is to achieve “smaller, greener neural networks,” and in a bid to deploy next-gen AI on consumer devices, “we have to figure out how to shrink AI down to size.”

An IBM fellow and member of the MIT-IBM Watson AI Lab, John Cohn also weighed in on the significance of reducing AI’s carbon footprint in order to support its rapidly growing progress in society today. 

“The upside of developing methods to make AI models smaller and more efficient is that the models may also perform better,” Cohn said.