A new way to eavesdrop on people’s mobile phone calls has come to light in the form of Spearphone – an attack that makes use of Android devices’ on-board accelerometers (motion sensors) to infer speech from the devices’ speakers.
An acronym for “Speech privacy exploit via accelerometer-sensed reverberations from smartphone loudspeakers,” Spearphone was pioneered by an academic team from the University of Alabama at Birmingham and Rutgers University.
They discovered that essentially, any audio content that comes through the speakers when used in speakerphone mode can be picked up by certain accelerometers in the form of sound-wave reverberations. And because accelerometers are always on and don’t require permissions to provide their data to apps, a rogue app or malicious website can simply listen to the reverberations in real time, recording them or livestreaming them back to an adversary, who can analyze and infer private data from them.
Picking Up Sound Waves
The team tested three Android models: The LG G3; Samsung Galaxy Note 4; and the Samsung Galaxy S6. In all three cases, because the accelerometer and the speakers are located so close together, the sensor can “feel” the audio.
“These speech reverberations are generated due to the smartphone’s body vibrating due to the principle of forced vibrations, behaving in a manner similar to a sounding board of a piano,” explained the research team, in a recently released paper. “It is possible to compromise the speech privacy of a live human voice, without the need of recording and replaying it at a later time.”
They added, “These speech characteristics may be their gender, identity or even the spoken words during the call (by performing speech recognition or reconstruction).”
As for how a threat actor would gain access to the accelerometer’s data and therefore be able to glean the reverberation information, that comes down to app permissions – or rather a lack thereof.
“A known security vulnerability associated with smartphone motion sensors is the unrestricted access to the motion sensor readings on most current mobile platforms (e.g., the Android OS), essentially making them zero-permission sensors,” according to the paper.
Thus, any app or mobile website can gain access to accelerometer data.
As an example of the aforementioned “specific threat instances” – i.e., how an attack scenario would play out in the real world – consider the case of someone putting a call on speakerphone, say while he’s driving or making dinner. The accelerometer could capture the speech characteristics of the remote end party on the call.
It’s not just phone calls, either – information can be leaked about any media played through the speakers, including songs, videos, voice assistant interactions and more.
“On-board motion sensors can also be exploited to reveal any audio/video file played on the victim’s smartphone loudspeaker,” according to the paper. “In this instance, the attacker could exploit motion sensors by logging the output of motion sensors during the media play, and learn about the contents of the audio played by the victim.”
This could be exploited by advertisers, for instance, using the information gleaned from eavesdropped media content.
Smart voice assistants like Google Assistant or Samsung Bixby meanwhile repeat the user’s voice commands using the phone’s loudspeakers, to affirm that they understood the request. This opens up the possibility of an attacker learning the voice assistant’s responses, thus learning more about the user and her habits.
Under the Hood
Spearphone attempts to compromise speech privacy by performing gender, speaker and speech classification, via signal processing along with machine learning, the team said.
From an attacker’s perspective, gender classification helps the attacker to narrow down the set of speakers for unidentified speech samples, thereby increasing the recognition accuracy for speaker identification. This can also cause a privacy compromise in scenarios where the gender of a person may be used to target them for advertising or could be used for discrimination purposes.
“Certain oppressive societies put restrictions on particular genders and may use gender classification to target individuals in potentially harmful ways,” the researchers noted.
Speaker classification meanwhile helps the attacker with more context about the communicated speech (in addition to revealing the identity of one of the parties involved in a private voice call).
“For example, an attacker can learn if a particular individual was in contact with the phone owner at a given time,” researchers said. “Another example could be a person of interest under surveillance by law enforcement who is in contact with the phone owner. It could also lead to leakage of the entire phone log of the phone owner.”
And then there’s speech classification, which reveals the contents of the speech itself that may be considered private between the two communicating parties. This requires deeper analysis to pull off.
“In order to perform speech classification, we build a classification model based on a finite word list,” the researchers explained. “Speech features from the obtained sensor readings for isolated words are compared against the labeled features of the word list by the classification model that provides the attacker with a possible rendition of the actual spoken word. We also study the feasibility of performing speech reconstruction by isolating possible words from natural speech and then using word recognition on isolated words to reconstruct speech.”
The design of any side-channel attack exploiting motion sensors could be fixed by eliminating the zero-permission aspect of these sensors in an upcoming OS update. However, that’s not very feasible, the paper pointed out.
“To mitigate such attacks, Android platform could implement stricter access control policies that restrict the usage of these sensors,” they explained. “However, a stricter access control policy for the sensors directly affects the usability of the smartphones. Even implementing the explicit usage permission model by the applications often does not work since users do not pay proper attention to the asked permissions.”
However, a potential defense against Spearphone lies in hardware redesign.
“The internal build of the smartphone should be such that the motion sensors are insulated from the vibrations generated by the phone’s speakers,” according to the paper. “One way to implement this approach would be to mask or dampen the vibrations leaked from the phone’s speakers by surrounding the inbuilt speakers with vibration-dampening material. This form of speech masking would prevent speech reverberations emanated from the phone’s speakers, possibly without affecting the quality of sound.”
Until that happens, users should carefully vet the apps they download (and the websites they visit while using the speakerphone feature) in order to protect themselves.
Samsung and LG did not immediately respond to a request for comment on the findings.
Interested in more on patch management? Don’t miss our free live Threatpost webinar, “Streamlining Patch Management,” on Wed., July 24, at 2:00 p.m. EDT. Please join Threatpost editor Tom Spring and a panel of patch experts as they discuss the latest trends in Patch Management, how to find the right solution for your business and what the biggest challenges are when it comes to deploying a program. Register and Learn More