AI can decipher what you’re typing on your keyboard with 95% accuracy – should we be worried?
- AI can decipher keyboard inputs by sound with 95 percent accuracy.
- This demonstrates potential for ‘acoustic side-channel attacks.’
- TechHQ talks to the researcher about the implications of their study.
Unfortunately, it’s not only people peeping over your shoulder that you have to be worried about when typing in your password. That’s because an AI has been successfully trained to work out the inputted keys just by the sound of your keyboard strokes with up to 95 percent accuracy. While the so-called ‘acoustic side-channel attack’ is not a new concept, this is by far the most accurate demonstration of one.
The study was conducted by Dr Maryam Mehrnezhad from Royal Holloway University of London, Joshua Harrison from Durham University, and Dr Ehsan Toreini from the University of Surrey. A pre-print of their results was published online at the very start of this month which caused widespread hysteria within the mainstream media; after all, everything from your Alexa to your Ring Doorbell to your phone has microphones these days – so they could all be manipulated into becoming a possible entry point for attack.
Even the least tech-savvy among us are also acutely aware of the power that today’s AIs have, thanks to ChatGPT’s conversational prowess. TechHQ caught up with Dr Mehrnezhad to find out whether the panic is really warranted.
AI and keyboard monitoring – a real threat?
“Modern technologies have found their way to every corner of our lives, such as children’s education and digital health,” she said.
“In other words, there is a high value of sensitive data that is collected – for example, from our smart home assistants’ microphones – transferred, processed, and shared, sometimes without user knowledge and consent.
“Side-channel attacks are only one category of potential risks and harms associated with modern technologies, and it is true that the computational power is also increasing for machine/deep learning programs. This would help potential bad actors implement more expensive and sophisticated attacks by using off-the-shelf hardware and software – something we did in our research.
“In another work, we used motion sensor data in mobile phones to detect user physical activities (walking, talking on the phone, running, etc.), touch patterns (scrolling, clicking, zooming, etc.), and PINs via an attack inside the browser – i.e., no need to install a malicious app. Different information about users is of interest to various stakeholders – for example, an insurance company would be interested in your lifestyle, while a cybercriminal would be more interested in PINs and passwords.”
The conspiracy theory that our smartphones are listening to us has been circulating ever since the devices hit the mainstream. There is something eerie about having a conversation with someone, expressing thoughts you have never typed down, and then being presented with an advert that seemingly relates to those thoughts shortly after.
To some extent, our devices are eavesdropping– to be able to respond to our calls, our robot pals Siri and Alexa do have to have their ears constantly pricked. However, if you’ve ever had the ‘Your iCloud is nearly full’ message, you know that most smartphones do not have the capacity to store hours and hours of audio data to send to advertising companies.
Even Facebook swears AI is not listening to your keyboard clicks
Most virtual assistants are programmed to delete audio data before a wake word is ever uttered, only storing the brief snippets of speech that immediately follow. Even famously shifty tech overlord Mark Zuckerberg has shut down the notion that Facebook uses microphones to get our data. But this does not necessarily mean they are safe from a hostile takeover.
To train their machine learning algorithms, Dr Mehrnezhad and her colleagues recorded themselves typing on the keyboard of a 2021 MacBook Pro. They used both the laptop’s inbuilt microphone, activated during a Zoom call, and that of an iPhone 13 Mini placed 17 cm away. They pressed all 36 letter and number keys no the keyboard 25 times in a row, varying the finger used and pressure applied, before feeding part of the audio data from each microphone into an AI deep learning model.
Despite the ‘tap’ of each keystroke being largely indistinguishable to the human ear, the system was able to match them up, sorting them into 36 groups of the same sounds. The researchers are still not sure what identifying acoustic factors the AI used to do this, and Dr Mehrnezhad thinks that it is to do with “the location of the key and its distance from the microphone.”
The jigsaw puzzle of data your AI can capture
“There are multiple features contributing to the knowledge an attacker can obtain from an acoustic channel,” she added.
Joshua Harrison, the study’s first author, told The Guardian that he believes that each keystroke makes a different sound depending on how close it is to the edge of the keyboard.
Finally, the researchers labeled each group of keystroke sounds with its corresponding letter or number, before putting the deep learning algorithm to the test by feeding it the rest of the audio data. It was 93 percent accurate at telling which key was being pressed in the iPhone recording and 95 percent on the Zoom call.
The study is not the first of its kind. A few years ago, Dr Toreini tested whether an AI could decipher words typed on the Enigma machine in a similar way. This was only accurate to about 70 percent, but the new study uses much more advanced technology.
While Dr Mehrnezhad’s team did not perform the study in a location with significant background noise, she says that it is a legitimate proof-of-concept; there is the potential for AIs that can accurately read what you’re typing just through keyboard clicks.
“We provide a range of mitigation methods in the paper, such as being vigilant about your surroundings and video/voice connections when dealing with sensitive information such as PINs and passwords,” she added.
“It is always good to follow general security and privacy advice such as choosing strong passwords and not opening random links.
“We have tested one videoconferencing platform for our attacks, but there are other platforms that apply noise-canceling algorithms to automatically remove the typing sounds. If such approaches become common practice across other companies, at least one version of the attack will be mitigated.
“However, I don’t think the pressure should be on the end user. The cybersecurity and privacy community – academia, industry, policymaking, standardization, etc. – need to come up with more secure and privacy-preserving solutions, legislation, and enforcement, and enable citizens to use modern technologies in order to improve the quality of their lives without any risk and fear.”