That smart home speaker isn’t listening to everything you say, according to new research – but it is listening a lot more than it should. Researchers have found some speakers activating by mistake up to 19 times each day.
Virtual assistants like Siri and Alexa are programmed not to listen to your conversation constantly. Instead, they listen for a ‘wake phrase’. When they hear it, it’s their cue to listen to what you subsequently say, which could be an instruction or a request. Google Assistant responds to “OK Google”, Apple’s Siri perks up when you say “Hey Siri” and Microsoft’s Cortana pricks up its digital ears when you say “Hey Cortana”.
The problem is that just like humans, virtual assistants often mishear things. Siri might think that “Seriously” sounds enough like its wake word to start listening to what you’re saying, but that’s just one of a range of sounds that might trigger it. That’s why it’s been reported recording everything from sex to criminal deals.
Until now, we haven’t known just how (in)accurate these voice assistants are at listening for wake phrases. Thanks to research by academics at Northeastern University and Imperial College London, now we do. It turns out they’re not that accurate at all.
The researchers wanted to simulate real-world conditions, so they set up a variety of smart speakers with embedded virtual assistants and played them 125 hours of audio from various Netflix shows ranging from The Office to The Big Bang Theory and Narcos. They tested the first generation Google Home Mini, Apple’s first-generation HomePod, Amazon’s second- and third-generation Echo Dot, and the Harman Kardon Invoke, which has Microsoft’s Cortana embedded.
The researchers detected when speakers were recording by capturing video feeds to determine whether their lights activated, and by monitoring the network to spot any traffic that they were sending back to the cloud. They also checked their cloud accounts to watch for any self-reported recordings.
They found that devices would activate up to 19 times each day on average. The HomePod device was the worst, with an over-enthusiastic Siri switching on for lots of phrases. Speech that triggered it started with “Hi” or “Hey” followed by something starting with something sounding like an “S” and a vowel, or something that sounds like “ri”. Examples of speech that set it off included “He clearly”, “Hey sorry” or “I’m sorry”, and “Okay, yeah”, so watch who you’re apologising to or agreeing with. Even “historians” would set it off.
When the devices did wake up, they’d often do so for relatively long periods. The HomePod and the Echos would wake up for at least six seconds more than half the time. The second-generation Echo Dot and the Harman Kardon speaker had the longest activations, earwigging for between 20 and 43 seconds.
Amazon’s Echo Dot 3 mistakenly woke up the fewest times, and has by far the widest range of wake-up phrases. You have to set the chosen wake word in advance, so we can assume the researchers ran the test using each wake word – “Alexa”, “Amazon”, “Echo”, or “Computer”.
… we found activations with words that contain “k” and sound similar to “Alexa,” such as “exclamation”, “kevin’s car”, “congresswoman”
An “Amazon”-enabled Dot did apparently wake up when it heard “My pants on” which could be potentially, um, embarrassing, depending on the context.
Every show caused at least one device to wake up, and most shows woke up multiple devices. However, the results were mostly inconsistent. The team experimented with each device 12 times (other than the Harman Kardon speaker, which only got four tests). Only 8.44% of the activations occurred consistently across 75% of the tests. The researchers said:
This could be due to some randomness in the way smart speakers detect wake words, or the smart speakers may learn from previous mistakes and change the way they detect wake words.
That inconsistency compounds a known problem with AI-driven devices; they’re opaque. AI algorithms can’t explain what they do. They’re black boxes that produce results based on statistical models. There isn’t a procedural set of instructions that you can follow to predict their results. It’s a problem that distances us from the tech, putting it outside our complete control.
There were some upsides, though. Despite some past incidents, they found no evidence that these devices were always recording peoples’ conversations in their tests.
The good news is that you can turn off active listening on many of these devices, although doing so might leave with you with a relatively expensive bluetooth speaker unless your hardware has an alternative tap-to-talk option. In the meantime, be careful what you say – particularly immediately after mentioning Radiohead’s ground-breaking third studio album “OK Computer”.
Latest Naked Security podcast
Click-and-drag on the soundwaves below to skip to any point in the podcast.
10 comments on “Smart speakers mistakenly eavesdrop up to 19 times a day”
People who comprise the bulk of the Digital Device Mass Consumption Tech Toys Market™ care more about nifty and quick than mindful and prudent.
The mild interruption of a device “waking up” when it’s not needed is seen as a forgivable annoyance or even a humorous occurence–haha, dumb device, LOL–if it’s even noticed at all. Conversely, an afternoon of raising one’s voice, “Oh-KAY, GOOGLE!!” or “ALEXA! WHAT IS THE WEATHER!! is guaranteed to result in one-star reviews and a sales drop.
…and we can’t have that. We all know that with few exceptions marketing departments, R&D, CEOs, and board meetings are all driven by “the bottom line.”
Erring on the side of caution takes on a very different meaning when a balance sheet is part of the equation. Consumer awareness is growing at a snail’s pace, so stories like this are less a deterrent than they should be.
A world of embracing security may come eventually, but it will still be a long time.
“Move fast and wake things”.
Danny, I can’t understand why your brilliant comment doesn’t have 50 upvotes, even discounting the clever reference. It condenses my comment to five words–stating it better in the process–punctuating once again that brevity is not my strong suit.
Just call me Lambchop; it’s sock puppet time…
Siri is a mess. I’ve had to get rid of it – it was activating and calling 911 (the emergency line) multiple times when people are just talking. You can get seriously fined here for prank calls to 911.
It’s “Harman Kardon”. Of the three mentions of the name, you got the Kardon part right once; the Harman not at all.
I expect better from Nekkid Sikurity.
Fixed, thanks. (I think I got them all :-)
Peculiarly, the company name in logo form has a slash in the middle, thus:
Sorry about that.
I would like the chance to prove my artificial intelligence, by volunteering for exposure to 125 hours of BBT, Office, and Narcos.
If they let people make their own Wake Up Phrase, it would solve a lot of that. There is an opensource product out there in Raspberry Pi form (Project Alias) that does let you do that, as well as feeding the devices white noise when not in use, so it cannot hear you unless you use the right passphrase. And – you can use it through you’re phone, so you don’t need to be in the same room, or maybe even home.
I used to run google home and alexa devices. A little over a year ago I reviewed what each kept “in the cloud”. The amazon device due to its single wake word “alexa” went off all the time when we weren’t asking it anything. The google home didn’t, two word wake sequence apparently helped.
I found that amazon had dozens and dozens of snippets of conversations, quite a few sensitive (talking to doctors on the phone, talking to school folks about problems with my kid, etc). They’re all sitting in a box in a closet now. I ended up giving up on the google home’s as well. I have a phone set up to not respond to “hey google” but to require a key press on the screen to ask it something.