AI in golf: the business of text-to-speech hits the fairway

Generative AI in golf now includes computer-created commentary of Masters Tournament video clip highlights thanks to IBM technology.
6 April 2023

Training session: AI in golf makes it possible to add synthetic speech as commentary to video clips of golfers in action. Image credit: Shutterstock Generate.

Getting your Trinity Audio player ready...

Generative AI is rocket fuel to business automation. The high speed of commercial adoption and huge range of use cases for large language models (LLMs) such as OpenAI’s flagship GPT-4 is staggering. And while it’s hard not to delight in the capabilities of advanced chatbots that can code, understand multiple languages, and summarize business reports – to list just a handful of the talents of ChatGPT and similar implementations of LLMs – concern is certainly growing over what generative AI means for employment. Robots have been chasing production work for decades, but today’s AI algorithms can replace fashion models and many other jobs that, up until now, have been largely unaffected by the march of automation. And, as the 2023 Masters Tournament begins at the Augusta National Golf Club in the US, generative AI in golf is chipping away at sports commentators too.

AI in golf

IBM iX, the design experience arm of the veteran computing firm, has teamed up with the Masters digital team to train AI in the language of golf, which at the Augusta National Golf Club includes terms such as ‘patrons’ rather than ‘fans’. Plus, there’s a long list of other golfing vocabulary with the potential to confuse AI text engines that rely on deep-learning derived probabilities to predict the words most likely to appear next in sentences. Examples include ‘eagle’ (which could be a bird, if one flew over or was perched somewhere on the golf course, but is more likely to be two under par), ‘bogey’, ‘sand traps’, ‘bunkers’, ‘rough’, ‘second cut’, and the list goes on.

To fine-tune the system, IBM fed the foundation model with domain-specific data. And, according to Noah Syken – Vice President, Sports and Entertainment Partnerships at IBM – it took just three hours to bring the AI algorithm up to speed on the finer points of the game of golf. For the autogenerated commentary to be a success, AI in golf needs to tap into information that it knows to be accurate and can trust. And the list of data sources employed includes shot details, scoring, other approved golfing stats, and hole-by-hole video footage.

“The AI translates the metadata from each shot into descriptive textual elements,” explains Syken. “That text goes through two neural networks, where hundreds of millions of computations are performed to produce thousands of possible sentences.” From the long list of candidates, the fine-tuned model then selects what it believes to be the most probable commentary and feeds that text into IBM’s Watson Text-to-Speech service, syncing the audio with the on-screen video highlights.

And if you’re concerned about computer-generated vocals sounding robotic, think again. IBM has some sample voices on its Watson Text-to-Speech demo page, and these demonstration audio models are just the tip of the iceberg. Modern synthetic speech can be incredibly expressive and lifelike. In fact, real-life sports commentary has played a role in the development of much more realistic-sounding voices as sound model designers have looked to mimic the expressive nature of calling play-by-play action in games such as soccer and other athletic endeavours.

Synthetic voice

Today, firms such as Aflorithmic allow users to automatically generate high-quality video voiceovers from text scripts or subtitles, as well as other spoken audio content such as synthetic radio adverts, news announcements and computer-made podcasts. It’s a booming sector, and developers are using the service to add voice to a wide range of consumer products as well as exploring opportunities for sonic branding. And once you get used to the possibilities it is hard to see a road back.

Returning to the topic of auto-generated commentary driven by AI in golf, fans of the Tiger Woods PGA Tour series of games developed by EA Sports for the Xbox and PlayStation consoles will remember the human voices of Jim Nantz and David Feherty. The broadcasters provided stock phrases that added to the entertainment of making good and bad shots, and golf simulations without commentary were definitely left lacking in comparison.

Whether IBM’s use of generative AI and text-to-speech in golf will add the same charisma remains to be heard. As this author types, the first highlights videos featuring computer-created commentary have yet to appear on the Masters website. But by the end of the tournament, which teed off today and concludes at the 18th hole on Sunday 9th August, the system is expected to have provided narration for more than 20,000 video clips. And that number is telling.

Assuming each clip to be 90 seconds long, it’d take a single human more than 20 days to commentate on the footage, and they would be finished long after the golf tournament is finished. Generative AI has a work rate that leaves humans for dust, but – putting a positive spin on things ahead of the long bank holiday weekend – that leaves more time for playing golf. Fore!