STREAMING

An environmental sustainability solution for streaming and calls

You'll have to pry our iPod from our cold, dead - oh, you've already moved on?

8 November 2023

Can AI help retain environmental sustainability in audio transmission?

Tony Fyler

@more__hybrid

fyler@hybrid.co

All stories

AI – like a telephone operator, only better.

• Environmental sustainability does not sit well with 21st century entertainment streaming.
• But high resolution business telephony is also a culprit in imperiling environmental sustainability.
• Streaming has become the new normal – so technology needs to evolve to make it work for the environment.

In Part 1 of this article, we spoke to Rob Reng, CTO of IRIS Audio, an AI audio startup which aims to deliver clearer audio in call center settings without the current weight of energy wastage or carbon burn.

In fact, Rob explained to us the true ecological cost of our post-pandemic world of streaming audio and video – globally each year, the world burns as much carbon in streaming media as the whole of Spain burns across all its industries in the same period. Thousands of transatlantic flights-worth of carbon emissions are released by individual mega-selling Spotify song streams.

The consequences of this invisible streaming cost are profound – video conferencing with images costs carbon. YouTube, Spotify, Alexa – carbon. All the systems companies have traditionally used to improve audio quality in their telephonic systems – burn carbon. None of which is good for our environmental sustainability.

Which is what led IRIS to invest in an AI-based way to reduce the amount of high quality (and so, heavyweight) audio transmitted in scenarios such as sales and customer service calls.

The quality of audio is not strained…

We had an obvious question.

THQ:

How does that process actually work?

RR:

Well, in some respects, it’s the same as all other other machine learning-based algorithms. You can teach an algorithm to do almost anything based on the training data you use. So if your training data is vast, which it has to be, you can teach a piece of software to make a prediction, because compression, whether it’s voice or music, is all about removing things that it believes you don’t need.

So for voice content, normally the transmission process just shaves about 4 kilohertz, (human hearing goes up to 15 kilohertz). It just takes away that top nine kilohertz of sound.

When you’re hearing somebody on the phone, and you think, “Oh, God, this sounds terrible,” that’s because all of that audio information has just been brutally discarded.

So, given what we see at the bottom of the range, we can then predict what should have been there. Once you know that, it’s just a case of filling in the dots around it. This has already been done for image recognition, as well as in medical equipment. You intensify and clarify the image by filling in those dots where stuff is missing with elements most likely to have been there.

Environmental sustainability versus call quality?

We’ve all had those “What did you just say?” moments. Some people have them all day long.

THQ:

Oh, like the Samsung camera moon shot idea?

RR:

Something like that, yeah. In the world of imagery, it’s called super resolution. But it’s not really done so much in audio – or at least, it hasn’t been, yet. We’re very keen to take the ideas from the world of imaging and put them into the world of audio.

THQ:

And, without wishing to sound repetitive, how do you do that?

Training AI to respect environmental sustainability.

RR:

We train a network on thousands of hours of high resolution and low resolution sounds. So when it sees something that’s low res, it goes, “Oh, I know how to make that high res.” And it just interpolates and rebuilds what it believes should have been there in the first place. And so far, it’s been very accurate.

Bring back the iPod!

RR:

Yes, that’s true. You do need to train your system, either on a machine that you can run in your own office, which is the most cost-effective way of doing it, or, because usually it’s good to get ad hoc training on top of that, we run experiments on Amazon and Google. But obviously, we try to limit that as much as we can, and use our own infrastructure.

THQ:

The only reason we harp on that is because, as we said, there are things that are beginning to become understood in terms of ecological impact, and there are things that aren’t, and audio is something that isn’t yet widely seen by either the general public or the business world as a thing that has to be accounted for in terms of environmental sustainability.

So, do you think people are aware enough yet of those impacts? Enough to make them immediately see the benefits of this kind of system?

RR:

I think things are currently set up to be too easy, certainly. It’s too easy to stream, for instance. The whole ecosystem is set up in such a way as to make it seamless, so you just open Spotify and you have the world’s music at your fingertips, and you can just press a button and the music’s there waiting for you with 5G.

That makes it that much easier than sitting there waiting for seconds or minutes at a time for your song to be downloaded for you to listen to. But that completely hides the fact that there’s music being stored somewhere.

Convenience versus environmental sustainability.

RR:

The streaming infrastructure just makes it too easy to have all that music at your fingertips and to be able to effortlessly switch between artists and find new music, which is obviously a positive thing in a lot of ways.

Which means it’s not embedded in our psyche anymore to download music and to run it off a single device like your iPod. Most people seem to be streaming now.

THQ:

Which is the point, yes? Streaming is the ultra-convenient alternative to all that downloading malarkey that people used to do.

RR:

Exactly, yeah.

THQ:

So take us on a tangent.

Do we think people who are now intensely familiar with the convenience of one-touch entertainment streaming would necessarily care about the environmental sustainability impact of it?

If we were to say “Stream and the planet gets it!,” would most people change their behavior? Or are we looking to put the carbon-burden onto the streaming companies or the companies that run the systems?

RR:

Yeah, I’m excited to make people care, but it’s a valid question – and it’s not one that only exists in regards to streaming. Look at our food choices. We all know it would be far better, ecologically, not to eat meat, but we choose to do it anyway. Because we like it. So it’s really hard to force people to be vegetarian music downloaders with just one device and a lunchbox full of tofu.

THQ:

Because we’re fighting against previous norms of convenience and pleasure, which makes us come off as tedious miserabilists?

RR:

Yeah – we all have choices, and of course, you can’t do everything. You just have to do the things you think are enough to try and make a difference. And, you know, do the best you can.

The irony of including this link to a video you can stream from a YouTube server somewhere is…not entirely lost on us.

In Part 3 of this article, we’ll dive deeper into the AI solution to call quality IRIS has developed, and explore what it can do right now – and what the hopes are for its future.