Voice assistants are growing rapidly in popularity — but at the same time, the privacy concerns and security issues with popular home assistant devices like Amazon Echo and Google Home are peaking too.
Earlier in July, Amazon came under fire after acknowledging that it retains the voice recordings and transcripts of customers’ interactions with its Alexa voice assistant indefinitely – raising questions about how long companies should be able to save highly-personal data collected from voice assistant devices.
Amazon continues to find itself in hot water regarding privacy policies around its Echo devices. In April, Amazon came under fire after a report revealed the company employs thousands of auditors to listen to Echo users’ voice recordings. And last year, Amazon inadvertently sent 1,700 audio files containing recordings of Alexa interactions by a customer to a random person –and later characterized it as a “mishap” that came down to one employee’s mistake.
Below is a lightly edited transcript of the podcast.
Lindsey O’Donnell: Welcome to the Threatpost podcast. This is Lindsey O’Donnell with Threatpost. And I’m here today with Tim Mackey. Tim is the principal security strategist at the cyber security research center for Synopsys. Tim, thanks so much for joining us today.
Tim Mackey: You’re welcome, Lindsey. And we actually managed to get ourselves a truly wonderful day outside for all of our discussions today.
LO: For once, if not a little too hot, but it’s summer, so you got to expect it.
TM: I like the sun part of it.
LO: Exactly. So I wanted to chat about a pretty popular topic today. And that is voice assistants like Amazon Echo devices, Google Home, Apple HomePod, that are becoming increasingly popular in houses. And I don’t know about you, Tim, I don’t know if you have one of these devices. But full disclosure, I have multiple Amazon Echoes around my house. So you know, having to read and even write every day about these privacy and security news stories kind of makes me want to go and throw them in the trash.
TM: Yeah, but for me, I don’t actually have one. My girlfriend has effectively forbidden it from the house because of her concerns. My eldest, she has one for when she’s in college, and she just loves the device. So that’s a mixed household.
LO: Right. I feel like I can definitely see the advantages of these devices, for sure. But I feel like there’s so many questions that are being raised about how much data is being collected, and what specifically that data is, and how long it’s being retained and who accesses it. So, Tim, I mean, on your end, what would you say are some of the top concerns that you have right now regarding voice assistant data privacy, either about voice assistants in general or, or anyone in particular.
TM: So I’m going to keep it to general. And the biggest concern that I have is actually around data retention policies and disclosure. So we have an expectation that these are connected devices, and that perhaps short of the Alexa-then-perform-action activity, that the communication, the actual processing of our request, is going to occur on an Amazon server, Google server or so forth. And so that level of variability in our voices patterns between myself, yourself, other family members, visitors and so forth, and Alexa or Google Home’s ability to react to this is a positive. And intrinsically, we kind of expect that it’s going to take the immediate action, process it and then be done. And what we’re learning is that the providers tend to keep this data for an indeterminate amount of time. And that’s, from my perspective, a significant risk, because the volume of data itself means that it’s potentially very interesting to a malicious actor someplace who wishes to say, target an individual.
LO: Yeah, that’s a really good point. And I think, you know, a lot of this, these concerns came to a peak last, I think it was last week, the first week of July, when Amazon acknowledged that it does retain the voice recordings and transcripts of customers interactions with Alexa voice assistants indefinitely, unless the customer, you know, has to manually go in and ask for those recordings to be deleted. So I think that was a really good wake up call for a lot of people about maybe not necessarily that this is happening, but that really raising a lot of these questions like, how long is it safe for data to be retained? And what would happen if a malicious actor, like you said, does gain access to all this data? So I think that raises a lot of really good questions. And, you know, it’s not just Amazon too I think Google Home does something similar. So I think it’s a really good point. And it brings up questions about what would be like the best practice in this case for Amazon or other vendors to better polish up their data retention policies.
TM: So we have a little bit of a template with GDPR, on what the best path forward should be. Under GDPR, there’s, for practical purposes, six possible scenarios under which pieces of data should be collected and potentially acted on. And under two of them, it’s either to perform the valid function of the device, or user consent. And with user consent, there has to be a level of disclosure. What are you collecting? Why are you collecting? How long is it going to be retained for and who’s going to be touching this data. And in the case of an indeterminate, indefinite retention policy on the part of Amazon, we actually see that everyone was really surprised that this was actually happening. And I know that, from my perspective, there really isn’t a whole lot of value to an indeterminate or indefinite data retention policy. Engineers might like to take the data that is being processed and pass it through the next version of a product to go and say, ‘Yep, Does it still work the way that we expect? And has it fixed the bugs that we were trying to address?’ But once you’ve released that next version, there really isn’t a whole lot of value to the bulk of the data. And so with the end user not being part of the consent process, effectively not having a say in what happens to the data in their own home, we’re opening ourselves up to a litany of issues we’ve seen, for example, courts subpoena Amazon for background Alexa data, following, say, a murder investigation in New Hampshire last year. We’ve seen that in Arkansas a couple years ago, as well. And so effectively, what we’re now in a position of is, has Amazon effectively breached our expectation of privacy? And are we actually participating in that overall discussion, and it sounds like we’re at a point where the ability of the engineers to actually go and implement the code hasn’t yet caught up with what the regulators are saying, or what our users are expecting of us in general.
LO: So my question is if Amazon isn’t getting too much out of saving and retaining all this data, what’s the reason to keep it indefinitely? Is it for advertising purposes, or for just having that data?
TM: So I don’t have a definitive answer to that. I can say that, in general, the value of the data is going to diminish over time. But it’s also looking at what the potential future functionality might be. So there might be some value in the data for a feature as yet to be determined. It might be something that we’re going to potentially see in this year’s revision of the product, it might also be a recognition that the devices themselves have a shelf life to them, that’s a little bit longer than the provider would expect.
So for example, first generation Echoes are still very much alive and well out there. But they had a potentially more limited hardware capacity than what we see in the current generation devices. And so how do you actually allow for the level of service that people expect while working with those older devices? So there’s value during development, there’s definitely future proofing value, there’s value in resolving issues. So for example, if you make a statement, ‘Alexa, perform action,’ and it doesn’t perform the correct action, then taking the data that was actually processed and then using that as a, how do I fix that type scenario becomes immensely valuable when you’re trying to target a general population with, let’s say, different accents and different speech patterns.
LO: I know too another aspect of all this is who has access to that data in particular, you know, beyond Amazon, and developers, and I know that this has been a kind of a big issue that keeps cropping up in the past. I think it was in April, Amazon came under fire after a report revealed that the company was allowing auditors to listen to Echo users’ voice recordings. And then, you know, if you remember, there was that incident last year, when Amazon accidentally sent a couple of audio files containing containing recordings of Alexa interactions to random customer, by mistake. So I think that to kind of how that data is viewed, who it’s viewed by, how it shared, it may be making consumers and regulators worried as well.
TM: Completely agree, and this isn’t something that’s unique to the home assistants. We had a disclosure in January of this year that the video files associated with Ring doorbells were accessible to the development teams. And that effectively, anyone with appropriate credentials was in a position to go and say, well, let’s go and see what the live stream of this doorbell’s camera might be. And the reports included anecdotes of developers using this in a more juvenile prank-like manner. So the fundamental question comes down to why is this data being retained, and who knows that it’s being retained? I know that for myself, the amount of data that is collected on a given individual in my family is a concern, and the lack of controls over that and what could be the result of mining that data is equally a concern, particularly as we start to head into election cycles, or when we have high profile breaches, as we seem to be getting on a sadly, more ongoing basis. Right?
LO: It just goes to show kind of the level of just an invasion of privacy, that this really has the potential to have with it being these devices being right in your home, it really makes you think, do you think there will be any sort of consumer breaking point where, voice assistant owners and IoT device owners will be starting to be worried by these data privacy issues? I mean, you and I we’re part of the security community. So we kind of see this from like, the bubble of the security space. But do you think that this is going to be a bigger issue on the consumer friend?
TM: I definitely do. And I think the poster child for this is Facebook, and all the challenges that they’ve had as an application independent of the company itself. As an application over the last few years coming out of the Cambridge Analytica scandal, coming out of the breaches last year, in terms of being able to have ongoing net new users coming down to the platform, other platforms within the Facebook empire are able to attract users but Facebook itself as an application has become tainted, and it’s effectively in a more damage recovery kind of scenario. And when coupling that with regulations, like GDPR, and the potential fines that have come down, particularly around consent, and opt in and disclosures, for example, the fine that French regulators levied against Google for the startup initial experience on an Android device, really to hammer home that from a software development perspective, from a security perspective, the mantra of will just give me more data so that I can act on it is no longer something that is going to be acceptable, and that the consumers themselves are starting to recognize that they should have a voice in what happens. The biggest concern that I have is actually what the potential privacy implications are, in the event that someone accesses a home automation device of some form, collects the data on that someone and on the occupants of that building, that structure, and that there aren’t necessarily all of the appropriate warrants in place. And now we have the potential Fourth Amendment case that would then get entwined with the First Amendment. And that’s a messy scenario for any organization to be involved in. We just need to look at Apple’s response and all of the various opinions that were expressed on how Apple responded following the San Bernardino attacks.
LO: These attacks wouldn’t be generic. They seem like they might be more targeted types of attacks, if they were to happen, given the ultra-personalized types of data that’s on these devices. Speaking of regulation, by the way, where do you think regulation and voice assistant devices are going to go in the future? Do you think that there’s going to be any sort of, you know, ultimate fallout here between how these devices are collecting and using privacy? Do you think that this is going to be the next big kind of GDPR issue?
TM: I definitely think that we’re going to have, let’s say within the next 15 months, a GDPR ruling on the data collection policy of home automation devices. Voice assistants will probably be high on the list, as would things like video doorbells. And effectively it’s going to be a case of what was disclosed and how was the information processed. So under GDPR, there’s a right to be forgotten. If someone was to, for example, go to Amazon and say, I want all of the data that is associated with my Alexa device that has ever been collected to be deleted.
Could Amazon comply with that? Technically, perhaps, perhaps not. It might not be something that was actually designed into the system, originally, and now they have to bolt that on. And that’s when we start bolting things on that stuff is able to slip through the cracks. And therein lies the bigger long term security issue, wherein if we don’t know that the data has been collected, and we don’t know how the data has been processed, or how it’s been stored, then how can we actually know that we’ve attended to all the obligations that are associated with say, a right to be forgotten type scenario?
LO: Well, we’ll definitely be keeping an eye out for any potential situation like that. Tim, thanks so much for joining us on the show today to discuss these voices assistants, like Amazon and Google Home, and kind of the privacy implications there.
TM: Certainly, Lindsey. Thank you ever so much.
LO: Great. Once again, this is Lindsey O’Donnell here with Tim Mackey from Synopsys. Catch us next week for more interviews, news wraps and more on the Threatpost podcast.