DIGITAL MARKETING

Synthetic voice trend amps up sonic branding

AI-powered audio engines can leverage as many as 600 TTS voices and auto-produce the audio content with just a few clicks of the trackpad.

11 October 2022

James Tyrrell

@JT_bluebird1

james.tyrrell@hybrid.co

All stories

Stepping up: tech firms such as Aflorithmic are bringing fresh thinking to the audio space. Image credit: Aflorithmic.

It’s no secret that audio is a fast-growing media channel for businesses, tech firms included. But what might have escaped the news is how improvements in text-to-speech (TTS) combined with artificial intelligence (AI) templates are radically automating production workflows for generating Spotify ads, voice-overs, podcasts, and more. Synthetic voice has reached its tipping point.

AI-powered audio engines can leverage as many as 600 TTS voices generated by leading providers such as Google Speech to Text, IBM Watson Text to Speech, Microsoft Azure Cognitive Speech Services, and Amazon Polly. By combining different voices in the same piece, the tech layer can readily synthesize interview-like experiences. AI templates then select backing music and top and tail the main content with an intro and outro – all with just a few clicks of the trackpad. Plus, as new services become available, they can simply be dropped in on top and accessed using the same set of tools.

End-to-end pipeline

TTS development brings together expertise in linguistics, acoustics, digital signal processing, and artificial intelligence – the latter having been catapulted forwards due to refinements in deep learning. Trend-setting AI models include FastSpeech2 [PDF], which includes contributions from Microsoft, and Google’s Wave-Tacotron [PDF]. These approaches, together with other related work such as Baidu’s ClariNet [PDF], have banged the drum for using sequence-to-sequence neural networks to simplify processing pipelines and provide so-called end-to-end TTS. And these new architectures, which are much faster than their predecessors, have given rise to much more believable, and by extension, more listenable, synthetic voice options.

Recently, Aflorithmic teamed up with DeepZen – which is known for speech models that add rhythm, stress, and intonation to written text – to extend the list of lifelike voices available on its platform. And, according to Kunz, more TTS providers are on the way. Today, expressive voices are state-of-the-art and algorithms are capable of generating speech with a wide variety of accents. In the lab, Kunz and his team, which includes co-founders Peadar Coyle and Björn Ühss, are exploring the vocal range of sports commentators to better understand the expressive capabilities of synthetic voice.

Jingle 2.0

Favourably familiar digital speech plays into the rising field of ‘sonic branding’. “Companies are recognizing that they don’t just want to look a certain way, they want to sound a certain way too,” explains Kunz. Synthetic voice allows firms to deliver a consistent, and readily identifiable, audio signature across different touchpoints. But rather than having to re-record the message for different campaigns, all that’s required is an easy edit to the original text. And, again, the necessary audio production just happens in the background. “Users can generate a fully produced piece without any knowledge of sound engineering,” said Kunz. “With AI a lot of innovation comes from the data.”

Success stories

Voice-cloning success stories include UneeQ – a developer of ‘AI-powered customer experience ambassadors that recreate human interactions’. The New Zealand based company, with offices in Australia and the US, used the platform to create a digital Albert Einstein. And the voice-cloned conversational AI proved to be a big hit – more than tripling website traffic and delivering a 270% increase in booked meetings for the client.

Traditional media firms are using synthetic voices to breathe new life into their content. Publishers in Germany are using audio engines to auto-generate fully-produced newscasts that give listeners an up-to-date bulletin of key stories and daily events. Looking at the stats, the approach seems to be working. According to Aflorithmic, the first project has reached over a million plays since launch. And 12 other German publishers have recently signed up to the portal providing the AI audio newscast creation tool.

POPULAR TOPICS

POPULAR TOPICS

Synthetic voice trend amps up sonic branding

READ NEXT

Digital assistants: neural networks drive voice AI growth

End-to-end pipeline

Jingle 2.0

READ NEXT

Powerful AI models are healthy for voice tech

Success stories

READ NEXT

Remote learning: anonymized AI powers engagement dashboard