‘Voice-Squatting’ Turns Alexa, Google Home into Silent Spies

A team of academic researchers has demonstrated that it’s possible to closely mimic legitimate voice commands in order to carry out nefarious actions on these home assistants.

A team of academic researchers has tested the phonetic wherewithal of smart-home assistants Amazon Alexa and Google Home, finding it possible to closely mimic legitimate voice commands in order to carry out nefarious actions.

The researchers, a composite team from Indiana University in Bloomington, the University of Virginia and the Chinese Academy of Sciences, have dubbed the attack “voice squatting.” They also demonstrated the efficacy of a second kind of attack, called “voice masquerading” – though this latter attack vector isn’t a new discovery.

Both Alexa and Google Home devices have third-party developer ecosystems that allow coders to build “skills,” i.e., applications, for the gadgets. To voice-squat, an adversary can create a new, malicious skill that is specifically built to open when the user says certain phrases. Those phrases are designed to be similar, if not nearly identical, to phrases used to open legitimate apps. So, the device would hear the approximate phrase and may open the rogue app instead of the legitimate one, thus hijacking the connection.

Once the malicious app has tricked the device into opening it, it can go about eavesdropping or recording user activity. It can “pretend to yield control to another skill (switch) or … service (stop), yet continue to operate stealthily to impersonate these targets and get sensitive information from the user,” researchers said in a paper on the discovery.

The team used the example of the Capital One app to better explain how voice squatting works. To open the legitimate banking app on the Amazon device, users say “Alexa, open Capital One.” In their proof of concept, the researchers created a skill that opens when a user says “Alexa, capital won” or “Alexa, capital one, please.” Similarly, a legitimate skill that opens when a user says the words “rat game” was shown to be hijackable by a rogue skill that opens to the phrases “rap game” or “rat game, please.” And the “sleep sounds” command – for those looking for white noise and the like to help them fall asleep – can be voice-squatted by a skill that activates with the phrase “play some sleep sounds.”

The team said the gambit was effective in trials about 50 percent of the time.

The researchers also delved further into voice masquerading, which Checkmarx researchers were able to demonstrate in April. In that attack, malicious skills simply replicate (instead of approximate) legitimate ones, to the point of behaving like the real app, while secretly eavesdropping in the background.

“Both are demonstrated to pose realistic threats to a large number of VPA users from remote and both have serious security and privacy implications,” the paper said. “Our preliminary analysis of the Amazon skill market further indicates the possibility that similar attacks may already happen in the real world.”

The researchers contacted both Amazon and Google with the findings, and both tech giants said they are working to address the problem. However, “protecting such [voice user interfaces] is fundamentally challenging, due to the lack of effective means to authenticate the parties involved in the open and noisy voice channel,” the paper noted.

Prior research also shows that adversaries can generate obfuscated voice commands to spy on users or gather data. DolphinAttack for instance is a method for using completely inaudible ultrasound signals to attack speech-recognition systems and transmit harmful instructions to popular voice assistants like Siri, Google, Cortana, and Alexa. And in November, security firm Armis disclosed that Amazon Echo and Google Home devices are vulnerable to attacks through the over-the-air BlueBorne Bluetooth vulnerability.

Image source: Amazon

Suggested articles

Hey Alexa, Who Am I Messaging?

Research shows that microphones on digital assistants are sensitive enough to record what someone is typing on a smartphone to steal PINs and other sensitive info.