AUTONOMOUS VEHICLES

Synthetic data for AI fills gaps in edge cases

Self-driving car developers safely explore extreme scenarios during autonomous vehicle training thanks to the rise of synthetic data for AI.

18 October 2022

James Tyrrell

@JT_bluebird1

james.tyrrell@hybrid.co

All stories

Movie magic: computer generated images of automotive scenarios provide valuable synthetic data for AI. Image credit: Shutterstock

Deep learning has pushed the capabilities of artificial intelligence to new levels, but there are still some kinks to straighten out. Particularly in safety-critical applications such as self-driving cars. If an artificial intelligence (AI) recommendation engine gets its predictions wrong and puts a strange advert in your browser window, you might raise an eyebrow. But no long-term damage would have been done. Things are very different of course when algorithms get behind the wheel and encounter something they’ve never seen before. Rare events, or edge cases, present a tricky problem for developers of autonomous vehicles. Fortunately, synthetic data for AI – based on lifelike simulations of real-world events – could help to fill in the gaps.

No pixels were harmed during filming

The ability to create computer-generated images that wow movie audiences can be repurposed. Instead of bringing aliens to life, digital tools can, for example, create a vast array of unlikely, but theoretically possible automotive scenarios. Such unexpected events could include a truck and trailer upended on the highway. And then this synthetic data for AI can be deployed to put autonomous driving systems safely to the test.

Cost-benefit analysis

Developers could, of course, use crash test dummies and various props to achieve the same thing, but the time and expense of doing so are much inflated compared with deploying synthetic data. Plus, if things went wrong, you’d risk damaging the vehicle and its sensors, whereas in a simulated environment everything can be simply refreshed and rerun.

Firms such as Synthesis AI have shown how synthetic data can be used to test the effectiveness of driver safety monitoring systems. These tools work by tracking the driver’s face to identify signs of drowsiness or distraction. Output can be linked to advanced driver-assistance systems (ADAS) – for example, to prime pre-crash mechanisms if the safety monitoring alerts fail to trigger a response from the driver.

Naturally, developers wouldn’t ask a test driver to fall asleep at the wheel on purpose – as a vehicle speeds along the road – so that they could put a potential facial detection algorithm (and the mitigations that go with it) to the test. Instead, synthetic data can be deployed. Synthesis AI points out that even 5-10 seconds of sleep – defined as a ‘microsleep episode’ – can be sufficient to cause an accident, so driver safety monitoring systems need to be capable of responding quickly and accurately to triggers. Changes in steering patterns can be one sign that the driver is becoming drowsy. But, more recently, using vision systems capable of differentiating between blinking and sleeping has gained interest as warning signs could potentially be gathered sooner. Facial expressions too can reveal signs of drowsiness.

Head start

The availability of realistic synthetic data for AI can give firms a helping hand in entering markets where competitors may hold large datasets that would otherwise provide a high barrier to entry. Making it straightforward for start-ups to generate useful AI training sets based on synthetic data gives newer companies the capacity to quickly build momentum without needing to invest large amounts of capital.

Synthetic data goes beyond just recreating scenarios that would be problematic in the real-world. The concept gives AI developers much more scope to dig into new real-world areas where training data would ordinarily be expensive, time-consuming, or both, to collect at scale. Analyst firm Gartner is bullish on the growth of synthetic data for AI. The company forecasts that – over the next decade – synthetic data will become the main form of data used in AI.