New acoustic attack steals data from keystrokes with 95% accuracy::A team of researchers from British universities has trained a deep learning model that can steal data from keyboard keystrokes recorded using a microphone with an accuracy of 95%.
New acoustic attack steals data from keystrokes with 95% accuracy::A team of researchers from British universities has trained a deep learning model that can steal data from keyboard keystrokes recorded using a microphone with an accuracy of 95%.
I’ll believe it when it actually happens. Until then you can’t convince me that an algorithm can tell what letter was typed from hearing the action through a microphone.
This sounds like absolute bullshit to me.
The part that gets me is that the ONLY reason this works is because they first have to use a keylogger to capture the keystrokes of the target, then use that as an input to train the algorithm. If you switch out the target with someone else it no longer works.
This process starts with using a keylogger. The fuck you need “ai” for if you have a keylogger?!? Lol.
That’s pretty much what the article says. The model needs to be trained on the target keyboard first, so you won’t just have people hacking you through a random zoom call
I think you might have misunderstood the article. In one case they used the sound input from a Zoom meeting and as a reference they used the chat messenges from set zoom meetings. No keyloggers required.
I haven’t read the paper yet, but the article doesn’t go into detail about possible flaws. Like, how would the software differentiate between double assigned symbols on the numpad and the main rows? Does it use spell check to predict words that are not 100% conclusive? What about external keyboards? What if the distance to the microphone changes? What about backspace? People make a lot of mistakes while typing. How would the program determine if something was deleted if it doesn’t show up in the text? Etc.
I have no doubt that under lab conditions a recognition rate of 93% is realistic, but I doubt that this is applicable in the real world. Noboby sits in a video conference quietly typing away at their keyboard. A single uttered word can throw of your whole training data. Most importantly, all video or audio call apps or programs have an activation threshold for the microphone enabled by default to save on bandwith. Typing is mostly below that threshold. Any other means of collecting the data will require you to have access to the device to a point where installing a keylogger is easier.
it doesn’t need a keylogger. Just needs a Videocall meeting, a Discord call meanwhile you type to a public call, a recording of you on youtube streaming and demoing something… etc.
Well to train ai you need to known what the correct answer is.
It’s bad now, but where we’re at with AI… It’s like complaining that MS paint in 1992 couldn’t make photorealistic fake images. This will only get better, never worse. Improvements will come quickly.
Sounds like a fantastic way to target a streamer, but it’s otherwise very limited.
Is gonna sound crazy, but I think you can skip the keylogger step!
You could make a “keystroke-sound-language-model” (so like a language model that combines various modalities, e.g, flamingo), then train that with self-supervised learning to match “audio” with “text”, and have a system where:
I think it’s very narrow to think that, just because this research case requires a keylogger, these systems couldn’t evolve other time to combine other techniques