As tens of millions of happy delighted owners know, Siri, Alexa, Cortana and Google, will do lots of useful things in response to voice commands.
But what if an attacker could find a way to tell them to do something their owners would rather they didn’t?
Their discovery is that it is possible to hide commands inside audio such as voice statements or music streams in a way that is inaudible to humans.
A human being would hear something innocuous which the virtual assistants interpret as specific commands.
The researchers have previously demonstrated how this principle could be used to fool the Mozilla DeepSpeech speech-to-text engine.
The New York Times claims that researchers at UC Berkeley were able to:
…embed commands directly into recordings of music or spoken text. So while a human listener hears someone talking or an orchestra playing, Amazon’s Echo speaker might hear an instruction to add something to your shopping list.
How might attackers exploit this?
The obvious examples are manipulated audio buried inside a radio or TV broadcast, podcast, YouTube video or online game, or perhaps even autoplaying audio on a phishing website.
As for which commands, the answer is more or less anything the device can be asked to do from dialling a phone number, accessing a website, or perhaps even buying something.
For example, the researchers claim they were able to hide the phrase “okay google, browse to evil.com” inside the sentence “without the dataset the article is useless.”
A vulnerable device would be any that responds to voice commands, which today would be home speakers and smartphones.
The problem the research highlights is how little is known about how internet companies implement speech technologies and what, if any, safeguards are built in.
On the face of it, smartphones would be harder to manipulate because in most cases they require users to unlock them before their embedded digital assistants will activate. Always-on home speakers, by contrast, might be easier to target.
What this research constitutes is a red flag that these devices could, in theory, be remotely controlled, not that they are being mis-used.
There does seem to be an unstoppable movement to embed voice control inside all sorts of devices that have never had such a feature before, including home security and door locking, which is opening up a whole new world security and privacy concerns.
For now, it is much more likely that the current generation of devices would be targeted to carry out unwanted surveillance (including by the companies themselves), rather than implementing advanced command spoofing.
But as security watchers know from experience, where the theory goes practice has a habit of following.