Voice tech: accelerating audio-based authentication

Clever signal analysis techniques combined with fast-acting AI are providing developers with innovative options for unlocking devices and protecting revenue.
30 August 2022

Customer focus: acoustic signatures powered by nimble machine learning methods can provide a safe and efficient means of identifying genuine callers. Image credit: Shutterstock.

Knowledge-based authentication (KBA) – the series of identity verification questions that financial service providers ask callers – is far from perfect. It slows down the conversation, delays help that customers may anxiously need – for example, if they’ve lost their bank card – and, perhaps worst of all, offers patchy protection against fraudsters. If asked, could you remember the correct response to all of the secret questions on your accounts? Analysis shows that adversaries have an annoying habit of outscoring genuine customers in KBA tests, with pass rates for bad actors averaging as high as 92% in some cases. Trustworthy callers, on the other hand, only managed a score of 46%. And this is where looking at voice signals in more detail turns out to be a game-changer.

Signal specifics

Frustrated by the poor experience facing genuine customers, engineers in the US set about developing a process that they named ‘phone-printing’ to give operators a more effective way of recognizing their clients over voice channels. “The founders realized that they could use audio analysis to solve this problem,” Nick Gaubitch, who heads up Pindrop’s EMEA research team and has a background in acoustic signal processing, told Tech HQ. “The important part is not what is said, but who said it.”

Today, Pindrop’s algorithms can make sense of more than 1300 acoustic features of incoming calls to give call center agents a valuable heads up on the likelihood that they are interacting with a genuine customer. The process, which requires less than 20 seconds of audio signal for weighing up alongside a reference model based on around 60 seconds of onboarding data, considers the entire voice transmission chain from the caller’s microphone to the operator’s headset.

“You have the voice component, but there’s a lot of other things going on besides,” said Gaubitch. “Microphone characteristics, audio compression; there’s a wealth of information that can be used, which gives a specific signature to the communication.” A patent search points to some of the techniques that can be deployed to gather inputs. These include noise signals comparing ideal touch-tone dialing sounds, technically known as dual-tone multifrequency (DTMF) information, with waveforms present on the incoming call. Looking beyond simply what’s being said allows the team to better protect call centers against audio replay attacks and deep-fakes (synthetic voices). And today, eight out of the top 10 banks in the US use Pindrop’s technology, according to Gaubitch.

Embedded protection

Gaining traction in the market is one thing, but to be successful long term the algorithms need to be extremely efficient – managing billions of calls in real-time. The progress made on this front has helped Pindrop’s engineers to shrink the computing footprint such that voice authentication services can be made available as standalone chips for use in embedded devices. This helps to cater for scenarios where products, such as those in the IoT domain, and elsewhere, can operate in a mixture of online and offline modes. It also opens to the door to the idea of voice keys, either replacing or augmenting traditional solutions for unlocking automobiles, to give another potential example.

Pindrop has been providing its authentication and fraud-detection services for more than a decade. And one of the big trends over that time has been the rise of voice as a hands-free way of issuing digital commands. At the same time, voice assistants and other audio-enabled devices have boomed in popularity. Together, this growth has created prospects beyond call centers, which now includes voice authentication for smart TVs – recognizing different users in the room – and opportunities to reduce the friction involved in securing the connected home.

But, as the popularity of voice-based solutions takes off in multiple markets and users rely more widely on the technology, what happens if our voices change – for example, when we catch a cold, or if we require surgery? How do algorithms adjust to the new signals? Unsurprisingly, if everything changes then the model would raise a red flag and users would, most likely, have to re-enroll, to re-train the classifier on newly approved data. But Gaubitch and his team have shown that the firm’s AI approach can help in other ways. “As you grow older, your voice changes,” he points out. “And we’ve found a predictable way to adjust for that.”

Early adopters

Appetite for seamless authentication methods is growing with customers, particularly younger users who readily engage with digital services – so it is interesting to discover that audio-based authentication algorithms have the capacity to evolve with their audience. And today, there is increasing evidence that KBA alternatives are proving to be a valuable addition to the authentication and fraud detection space.

Analysis shows that services such as phone-printing, which combines a multitude of acoustic features into an abstract reference analogous to a fingerprint (albeit one that can adapt and mature), has helped to drive down fraud rates in banking and finance from 1 in 717 (in 2017) to 1 in 1175 (in 2021). This welcome trend is particularly notable given that providers of financial services are strong targets for fraudsters, and companies in this sector can face high volumes of attacks over long periods of time.