Can NVIDIA reinvent voice technology for greater possibilities?

Innovation in conversational AI is transforming how voice technology can be utilized.
15 September 2021 | 17 Shares

Voice technology continues to see more innovations come into it as businesses are finding more use cases where the technology can be implemented. Conversational AI, which is a key component of voice technology is breaking new grounds not only in the world of customer service but in a variety of industries.

Today, conversational AI is enabling voice technology to evaluate the biometrics of voice. Not only can it detect and understand what a person is saying, but voice technology tools are also beginning to pick up accents. The technology can pick up words and break them up into segments of several tones, which are analyzed to understand vocal patterns.

In fact, the global conversational AI market is expected to reach US$13.9 billion by 2025. North America is projected to hold the largest market size due to increasing demand for enhanced customer retention initiatives. At the same time, more players are emerging in the industry, especially smaller start-ups that enabling more conversational AI use cases.

The NVDIA answer to voice technology  

While there are still plenty of criticisms of the accuracy of voice technology and its inherent limitations, companies like NVIDIA are developing new tools and applications to perfect it. For example, NVIDIA Riva is a fully accelerated software-defined kit for building multimodal conversational AI applications that use an end-to-end deep learning pipeline.

For developers, Riva allows them to easily fine-tune state-of-art models on their data to achieve a deeper understanding of their specific context and optimize to offer end-to-end real-time services that run in less than 300 milliseconds (ms) and delivers seven times higher throughput on GPUs compared with CPUs.

According to Sid Sharma, NVIDIA’s Head of Product Marketing, AI Software, Riva plans to offer a controllable text-to-speech service in the near future to help developers in building expressive conversational AI applications. He explained how Riva fuses vision, audio, and other sensor inputs simultaneously to provided capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.

Despite English being the most common language, Sharma explained that every language is critical for deploying conversational AI applications globally. “As a first step towards democratizing speech technology, NVIDIA partnered with Mozilla Common Voice. We plan to make conversational AI available in the majority of languages,” said Sharma.

He added that the controllable text-to-speech capability allows people to direct the AI voice by modifying pitch, tone, and prosody which also helps in creating a calming voice affect experience during conversations.

Source – NVIDIA

Voice biometrics and security

As conversational AI perfects voice technology applications, voice biometrics is now becoming increasingly sought after for authentication purposes. While it still may be early days before voice authentication becomes mainstream, the general idea is that it may just revolutionize biometric security as well.

For Sharma, it’s all about how conversational AI transforms interactions with machines. To realize this vision, Sharma said that they are currently focused on advancing the state-of-the-art for speech and language comprehension, and also have a range of initiatives across the company in research and development with customers.

“Speaker identification is one such component of conversational AI systems and we have released a few models publicly as part of NeMo. We will continue to work with our customers and the developer community to map out the future of these technologies,” explained Sharma.

NeMo is an open-source conversational AI toolkit that offers developers an easy and flexible way to create new models that train and fine-tune with the best performance on NVIDIA GPUs. NeMo integrates with leading open-source conversational AI libraries such as Hugging Face and Pytorch Lightning.

As more voice players get into the market, Shamra pointed out that NVIDIA provides several core technologies, platforms, and solutions to advance the state-of-the-art for conversational AI. “For businesses looking to leverage voice technologies, Riva will only make it easy for every enterprise to build custom, domain-specific services and apps that can be run on every platform from the edge to the cloud,” said Sharma.