An environmental sustainability solution for streaming and calls
• Environmental sustainability does not sit well with 21st century entertainment streaming.
• But high resolution business telephony is also a culprit in imperiling environmental sustainability.
• Streaming has become the new normal – so technology needs to evolve to make it work for the environment.
In Part 1 of this article, we spoke to Rob Reng, CTO of IRIS Audio, an AI audio startup which aims to deliver clearer audio in call center settings without the current weight of energy wastage or carbon burn.
In fact, Rob explained to us the true ecological cost of our post-pandemic world of streaming audio and video – globally each year, the world burns as much carbon in streaming media as the whole of Spain burns across all its industries in the same period. Thousands of transatlantic flights-worth of carbon emissions are released by individual mega-selling Spotify song streams.
The consequences of this invisible streaming cost are profound – video conferencing with images costs carbon. YouTube, Spotify, Alexa – carbon. All the systems companies have traditionally used to improve audio quality in their telephonic systems – burn carbon. None of which is good for our environmental sustainability.
Which is what led IRIS to invest in an AI-based way to reduce the amount of high quality (and so, heavyweight) audio transmitted in scenarios such as sales and customer service calls.
In the IRIS solution, sound is sent from wherever it’s produced or stored in low quality, and AI at the receiver’s end boosts it to high quality in situ. That means there’s no perception of a quality drop, and no need to send heavyweight high quality audio, incurring the carbon cost that entails.
The quality of audio is not strained…
We had an obvious question.
How does that process actually work?
Well, in some respects, it’s the same as all other other machine learning-based algorithms. You can teach an algorithm to do almost anything based on the training data you use. So if your training data is vast, which it has to be, you can teach a piece of software to make a prediction, because compression, whether it’s voice or music, is all about removing things that it believes you don’t need.
So for voice content, normally the transmission process just shaves about 4 kilohertz, (human hearing goes up to 15 kilohertz). It just takes away that top nine kilohertz of sound.
When you’re hearing somebody on the phone, and you think, “Oh, God, this sounds terrible,” that’s because all of that audio information has just been brutally discarded.
So, given what we see at the bottom of the range, we can then predict what should have been there. Once you know that, it’s just a case of filling in the dots around it. This has already been done for image recognition, as well as in medical equipment. You intensify and clarify the image by filling in those dots where stuff is missing with elements most likely to have been there.
Oh, like the Samsung camera moon shot idea?
Something like that, yeah. In the world of imagery, it’s called super resolution. But it’s not really done so much in audio – or at least, it hasn’t been, yet. We’re very keen to take the ideas from the world of imaging and put them into the world of audio.
And, without wishing to sound repetitive, how do you do that?
Training AI to respect environmental sustainability.
We train a network on thousands of hours of high resolution and low resolution sounds. So when it sees something that’s low res, it goes, “Oh, I know how to make that high res.” And it just interpolates and rebuilds what it believes should have been there in the first place. And so far, it’s been very accurate.
OK, allow us a moment of Devil’s advocacy here. Because we don’t hear a lot about the ecological impact of audio, but we are increasingly familiar with the potential environmental sustainability impact of generative AI and the data center eco-costs they entail, where’s the balance-point between using an AI system to enhance audio and the eco-costs of generative AI?
Ah – yes. Well, the point with our kind of system is that it’s running at the endpoint, so the processing power is distributed across millions of small devices, as opposed to having one massive data center crunching Bitcoin or doing what all these other industries do with their huge data farms. That would involve having to power the machines, and cool them down with a lot of air conditioning and fans, etc – which is where the questions of environmental sustainability really kick up a gear.
In this case, when we run it on the endpoint, we don’t have to worry about cloud infrastructure, we’re just deploying the power of a mobile phone or a laptop for the duration of that call. And in addition to that, because it’s a valid concern, we’ve done a huge amount of work to make our algorithms very light, precisely so they’re not eating up all the power of your battery while they’re running. You’re not going to kill your laptop by running our endpoint AI solution or anything like that.
And obviously, by running it locally at the endpoint, firstly, you don’t have to pay for a server, and secondly, you don’t have to have the overhead of all the operational costs and the environmental costs that come with it.
That’s really why we’ve always opted to run our AI algorithms at the endpoint and on the customer’s device – so you don’t pay for it. And you don’t have to worry about the knock-on environmental sustainability effects.
But you still have to train the system, which is going to require machine learning and the carbon costs that incurs, right?
Bring back the iPod!
Yes, that’s true. You do need to train your system, either on a machine that you can run in your own office, which is the most cost-effective way of doing it, or, because usually it’s good to get ad hoc training on top of that, we run experiments on Amazon and Google. But obviously, we try to limit that as much as we can, and use our own infrastructure.
The only reason we harp on that is because, as we said, there are things that are beginning to become understood in terms of ecological impact, and there are things that aren’t, and audio is something that isn’t yet widely seen by either the general public or the business world as a thing that has to be accounted for in terms of environmental sustainability.
So, do you think people are aware enough yet of those impacts? Enough to make them immediately see the benefits of this kind of system?
I think things are currently set up to be too easy, certainly. It’s too easy to stream, for instance. The whole ecosystem is set up in such a way as to make it seamless, so you just open Spotify and you have the world’s music at your fingertips, and you can just press a button and the music’s there waiting for you with 5G.
That makes it that much easier than sitting there waiting for seconds or minutes at a time for your song to be downloaded for you to listen to. But that completely hides the fact that there’s music being stored somewhere.
The billions of tracks in various sizes are being stored in huge, huge data centers somewhere. That’s the forgotten part of the whole experience. And rather than, for instance, just downloading the song once, on one device, most people have several streaming services, on several devices, and stream the song as needed on any of them, or many of them. And each time, you’re having an impact, rather than just having downloaded it once and running it off iTunes. You’re incurring a cost each time you stream it.
So, on behalf of those of us who are as old as dirt and are still nursing our last iPod Classic along, that was a more ecologically sound solution?
Absolutely, it was. It would be a much better model to bring back.
Yo, Apple! Hear us! Make new iPods! For the sake of the planet!
However, I think we’ve unfortunately gone past that.
Convenience versus environmental sustainability.
The streaming infrastructure just makes it too easy to have all that music at your fingertips and to be able to effortlessly switch between artists and find new music, which is obviously a positive thing in a lot of ways.
Which means it’s not embedded in our psyche anymore to download music and to run it off a single device like your iPod. Most people seem to be streaming now.
Which is the point, yes? Streaming is the ultra-convenient alternative to all that downloading malarkey that people used to do.
So take us on a tangent.
Do we think people who are now intensely familiar with the convenience of one-touch entertainment streaming would necessarily care about the environmental sustainability impact of it?
If we were to say “Stream and the planet gets it!,” would most people change their behavior? Or are we looking to put the carbon-burden onto the streaming companies or the companies that run the systems?
Yeah, I’m excited to make people care, but it’s a valid question – and it’s not one that only exists in regards to streaming. Look at our food choices. We all know it would be far better, ecologically, not to eat meat, but we choose to do it anyway. Because we like it. So it’s really hard to force people to be vegetarian music downloaders with just one device and a lunchbox full of tofu.
Because we’re fighting against previous norms of convenience and pleasure, which makes us come off as tedious miserabilists?
Yeah – we all have choices, and of course, you can’t do everything. You just have to do the things you think are enough to try and make a difference. And, you know, do the best you can.
The irony of including this link to a video you can stream from a YouTube server somewhere is…not entirely lost on us.
In Part 3 of this article, we’ll dive deeper into the AI solution to call quality IRIS has developed, and explore what it can do right now – and what the hopes are for its future.
6 December 2023
5 December 2023
4 December 2023