Gearing up secure enclaves for processing voice data

Voice tech can activate, authenticate, and even warn of employee burnout. And secure enclaves for processing voice data are the next step.
13 December 2022

Confidential computing: encrypted processing hardware paves the way for companies such as Intelligent Voice to add an on-chip security layer to speech recognition and sentiment analysis services. Image credit: NVIDIA.

Getting your Trinity Audio player ready...

How many conversations do you have during your working day, and what kind of business value are you able to extract from that data? Perhaps you keep a few notes; there might be formal minutes available or even a recording to refer to. Increased use of online meeting platforms such as Zoom or Microsoft Teams is generating more data in the workplace. But without the right tools, making sense of hours and hours of discussion is going to take a long time. Fortunately, voice technology has come on leaps and bounds, and not only are speech recognition tools easy to find, but they are also more feature-rich than ever. And roadmaps include secure enclaves for processing voice data.

Voices contain a huge number of personal identifiers, and spoken contents have the potential to be highly-sensitive – for example, if they reveal intellectual property details or discuss other confidential information. In an age of GDPR-awakened digital sovereignty, users are asking many more questions about how and where their data is stored. And companies that are proactive will come out on top, as legislation continues to tighten. Voice tech firms such as Intelligent Voice, UK experts in GPU-based speech-to-text conversion, are looking at next-generation hardware to deliver true end-to-end encryption of voice data.

Confidential computing

Examples of such secure enclaves for processing voice data include NVIDIA’s Hopper architecture. Anyone who’s used BitLocker on their laptop and looks out for a trusty padlock in their web search bar will be familiar with encrypting data at rest and in transit. But the very latest confidential computing solutions take this to the next level by making sure that the data remains protected on-chip during processing. It marks the next step in the rise of voice technology, which has a growing list of use cases.

Surveillance might be one of the first applications that comes to mind – voice-to-text systems have had some big wins in spotting code words being used inside prisons. Warning of financial fraud is another key area, and there’s been a wave of adoption in the banking sector as firms look to both protect customers and save themselves from massive fines. But voice technology isn’t just about keeping criminals at bay. “In terms of where voice is going, we should look for the positives,” Nigel Cannings – founder and CTO of Intelligent Voice – told TechHQ. “We’re trying to change people’s perception.”

As noted earlier, speech-to-text systems have the potential to unlock huge amounts of business value. And combining voice data with natural language processing dials up the gains that are possible even further, something Cannings has long been championing. Having trained as a lawyer, he has a keen interest in language, and Cannings combines that with a fascination for technology, something that runs in the family – his father introduced the first PC’s to Europe in the 1970s.

Today, Intelligent Voice’s technology is finely honed and clients can easily access extracted data using an audio review player that quickly finds keywords and comes with a host of other features. Another option is to use an API, and the firm’s services are integrated into various compliance tools. Underpinning the system are some cleverly deployed machine learning techniques, which include the use of convolutional neural networks (CNNs).

Soon after forming in 2015, the company received a UK Smart Grant to explore the merits of using CNNs running on GPUs. CNNs enable object detection, and by looking at ‘pictures’ of the voice data (by turning the sound information into spectrographs) the firm’s system can perform very accurate speech-to-text conversion. Feeding the algorithm with large amounts of training data has expanded its capabilities to include many languages and dialects.

Vox in a box

The GPU component was key, making it possible to parallel process the various feature extraction and classification layers involved in machine learning, enabling super-fast speech recognition. But it also meant that systems could be run on-premises, something that was hugely appealing to security-conscious customers who were very cautious about putting sensitive data in the cloud.

Beneficiaries of voice-to-text services, which Intelligent Voice augments with sentiment analysis tools, are emergency services contact centers. “There’s a real issue with burnout,” Cannings comments. “Operators are highly trained, but they are still human beings.” Conversational analytics can be used to protect agent welfare by identifying when a control room operator is exhibiting signs of stress or fatigue that merits supervisor intervention.

Cannings has no shortage of ideas for the firm’s technology. And, as touched upon, that includes gearing up secure enclaves for processing voice data. In fact, he sees true end-to-end encryption (where the only person able to view the data would be the customer) as being a game-changer on a much larger scale. He points out that it could transform people’s confidence in the security of voice assistants. And might even lead to some successful subscription models in that sector.