ROBOTICS

Helping robots learn: GPT-3 tool descriptions add value

Large language models have been put to the test in helping robots learn by training industrial-like machines to use different tools.

6 January 2023

James Tyrrell

@JT_bluebird1

james.tyrrell@hybrid.co

All stories

Programming puzzle: teaching robots how to use new tools could become easier thanks to large language models such as GPT-3. Image credit: Shutterstock.

Getting your Trinity Audio player ready...

The jury is still out on whether talking to plants helps them grow, but machines are definitely becoming more responsive to human language. Robots, in particular, can benefit from natural-sounding text generated by large language models such as GPT-3. For example, providing machines with linguistic information about new tools is valuable in helping robots learn about and manipulate previously unfamiliar objects.

Training and testing

To test the effectiveness of the approach, the researchers compared the performance of AI policies with and without language components. Having learned its parameters from 27 of the tools, the algorithm was then tested on nine untrained implements. For each tool in the test, the robot arm was given four tasks – pushing, lifting, sweeping, and hammering.

In many cases, the robot performed tasks much more effectively when it had been given a description of the tool. And the researchers observed notable improvements – for example, in how the robot adapted to using a crowbar to manipulate a bottle.

Helping robots learn

Large language models, such as GPT-3, which have been trained on vast amounts of text gathered from the internet, make it straightforward to generate tool descriptions based on a simple text prompt. Rather than having to search for the information, developers can simply use an API to generate the details. And because the information is task agnostic, this encourages the AI model to generalize its response – in other words, to suit a broad range of inputs.

Significant words

As the researchers note, a common feature among tools is the handle. And, interestingly, when the team removed the word from the descriptions – to test its significance – the robot failed to grip objects as firmly and ended up dropping tools.

All of the various scenarios were built using PyBullet – a popular real-time physics simulation environment. In the simulator, the team configured a 7-DOF Franka Panda robot arm for testing its meta-learning framework dubbed Accelerated Learning of Tool Manipulation with Language (ATLA).

PyBullet has a long list of commands dedicated to robot control and provides a useful virtual platform for developers to test out their ideas. As the above YouTube clip shows, the correlation between virtual and real-world behavior is impressive. In fact, by paying careful attention to the quality of the physics simulation, PyBullet can be used to learn control policies that are sufficiently robust for actual robots.

Future prospects

Looking at trends, the use of large language models in helping robots to learn and become more useful is on the rise. In 2022, Google Research showed (in collaboration with Everyday Robots) how tapping into the world knowledge encoded in large language models can upgrade robot performance.

In the demo, robots were able to execute more complex and abstract tasks thanks to the addition of linguistic information. And, as models evolve, the interaction between operator and machine is likely to become even more natural and conversational.

Such progress would see robots not just as tools for performing repetitive and easy-to-automate tasks, but also as human helpers – for example, in healthcare and other scenarios.