Helping robots learn: GPT-3 tool descriptions add value

Large language models have been put to the test in helping robots learn by training industrial-like machines to use different tools.
6 January 2023

Programming puzzle: teaching robots how to use new tools could become easier thanks to large language models such as GPT-3. Image credit: Shutterstock.

Getting your Trinity Audio player ready...

The jury is still out on whether talking to plants helps them grow, but machines are definitely becoming more responsive to human language. Robots, in particular, can benefit from natural-sounding text generated by large language models such as GPT-3. For example, providing machines with linguistic information about new tools is valuable in helping robots learn about and manipulate previously unfamiliar objects.

“Extra information in the form of language can help a robot learn to use the tools more quickly,” said Anirudha Majumdar, a researcher based at Princeton University. In a recent study [PDF] – which was supported by the Toyota Research Institute – Majumdar and colleagues used GPT-3 to generate descriptions of 36 implements, including a hammer, an axe, a squeegee, and other assorted items. The information was then fed into an artificial intelligence (AI) model, which featured language and image recognition elements.

Training and testing

To test the effectiveness of the approach, the researchers compared the performance of AI policies with and without language components. Having learned its parameters from 27 of the tools, the algorithm was then tested on nine untrained implements. For each tool in the test, the robot arm was given four tasks – pushing, lifting, sweeping, and hammering.

In many cases, the robot performed tasks much more effectively when it had been given a description of the tool. And the researchers observed notable improvements – for example, in how the robot adapted to using a crowbar to manipulate a bottle.

“With the language training, it learns to grasp at the long end of the crowbar and use the curved surface to better constrain the movement of the bottle,” said Allen Ren, who led in writing up the group’s results. “Without the language, it grasped the crowbar close to the curved surface and it was harder to control.”

The language models enable the robot to build up a shared structure of the tools and their utility, which is critical in helping robots learn more rapidly. It reduces the need for detailed instructions, which can be a barrier to automation.

Helping robots learn

Large language models, such as GPT-3, which have been trained on vast amounts of text gathered from the internet, make it straightforward to generate tool descriptions based on a simple text prompt. Rather than having to search for the information, developers can simply use an API to generate the details. And because the information is task agnostic, this encourages the AI model to generalize its response – in other words, to suit a broad range of inputs.

Each tool was paired with 800 automatically generated language descriptions, each of which was 2-4 sentences long. And the group used Google’s pre-trained BERT model to convert the natural language into a vectorized form that could be processed using AI.

Using a method known as t-distributed stochastic neighbor embedding (t-SNE), it’s possible to visualize BERT’s output – which originally takes the form of a 768-dimensional vector – as a graph of clustered data points. And the exercise provides a useful semantic sense-check of the processing.

The visualization indicates that, without any fine-tuning, the model is capable of recognizing similar tools. On the 2D chart, ‘mallet’ and ‘hammer’ – to give one example – are clustered close to each other. ‘Shovel’ and ‘trowel’ are also displayed as near neighbors.

Significant words

As the researchers note, a common feature among tools is the handle. And, interestingly, when the team removed the word from the descriptions – to test its significance – the robot failed to grip objects as firmly and ended up dropping tools.

All of the various scenarios were built using PyBullet – a popular real-time physics simulation environment. In the simulator, the team configured a 7-DOF Franka Panda robot arm for testing its meta-learning framework dubbed Accelerated Learning of Tool Manipulation with Language (ATLA).

PyBullet has a long list of commands dedicated to robot control and provides a useful virtual platform for developers to test out their ideas. As the above YouTube clip shows, the correlation between virtual and real-world behavior is impressive. In fact, by paying careful attention to the quality of the physics simulation, PyBullet can be used to learn control policies that are sufficiently robust for actual robots.

Future prospects

Looking at trends, the use of large language models in helping robots to learn and become more useful is on the rise. In 2022, Google Research showed (in collaboration with Everyday Robots) how tapping into the world knowledge encoded in large language models can upgrade robot performance.

In the demo, robots were able to execute more complex and abstract tasks thanks to the addition of linguistic information. And, as models evolve, the interaction between operator and machine is likely to become even more natural and conversational.

Such progress would see robots not just as tools for performing repetitive and easy-to-automate tasks, but also as human helpers – for example, in healthcare and other scenarios.