Research Finds MAC Address Hashing Not a Fix for Privacy Problems

A quick research project done by a graduate student at Stanford on the security of hashed MAC addresses in retail analytics software has shown that time and the inevitable advancement of technology have are the greatest enemies of cryptography.

UPDATE–Cryptographic algorithms and hash functions are designed to be resistant to a variety of attacks, but one of the things that they can’t defend against is time. Time and the inevitable advancement of technology have turned out to be the greatest enemies of cryptography, and a quick research project done by a graduate student at Stanford on the security of hashed MAC addresses in retail analytics software has shown that to be true once again.

One of the things that has raised the hackles of privacy advocates in recent years is the rise of passive tracking of consumers’ mobile devices as they move through stores, coffee shops, malls and other locations. Retailers can use software that detects the network announcements that cell phones with WiFi and Bluetooth enabled make periodically in order to track a given person’s device. This allows retail analytics firms to build databases that include the various locations that a device has been tracked in over a period of time.

This presents some rather obvious privacy issues, because most consumers have no idea that their devices are sending out these signals, let alone that retailers are gathering the information and building massive databases with the results. In October, a code of conduct surrounding retail analytics was released, and one of the provisions is for firms to hash the MAC addresses of users’ devices after they’re collected as a way to preserve users’ privacy. Jonathan Mayer, a PhD student at Stanford University, decided to take a look at how difficult it would be to reverse the hash of a given device’s MAC address, something that is meant to be quite difficult.

Hash functions take an input, in this case a device’s MAC address, and produce a random series of letters and numbers as the output, the hash value. Attackers should not be able to take the hash value and reverse it to get the MAC address. But Mayer found that this was not only possible but quite cheap and quick to do. Using a rented Amazon AWS server with a fast graphics card, Mayer used the hash-checking program oclHashcat and was able to reverse the hash of his own cell phone’s MAC address in about 12 minutes.

“Some back of the envelope math suggested the task was doable. There are 6 bytes in a MAC address; the first 3 bytes are allocated to the network device vendor, and the last 3 bytes are chosen by the vendor. In total, then, there are 248 possible MAC addresses. Since only 19,130 vendor prefixes have been actually allocated for use, however, there are at most 238.22 validly assigned MAC addresses. That number might sound big, but modern consumer hardware can calculate roughly 230 hashes per second. In other words, it should be possible to check every validly assigned MAC address in just a few minutes,” Mayer wrote.

Mayer was using the SHA-1 algorithm during his test, but said that the same approach would work using other algorithms. His research shows that an attacker who was able to access a database of hash values would have the ability to reverse those values and get the MAC addresses associated with the hashes. The attacker still would need to connect those MAC addresses to individual devices and their owners somehow, but Mayer said that can be done.

“Some businesses and network operators keep a mapping between MAC addresses and individuals. A government agency could subpoena the device vendor for the purchaser’s identity. At any rate, the MLA Code of Conduct seems to concede a MAC address is identifiable; it suggests a MAC has to be hashed to be ‘de-personalized’,” Mayer said via email.

Unless every organization that is recording MAC information is hashing them, then an attacker could be able to link a MAC address

“Hashing is not a silver bullet for electronic privacy. As we have seen, it is possible to test retail analytics data against every possible device. If data is associated with a particular device, it is always linkable back to an individual,” he said.

Most hash functions were produced in a time when the average person had no legitimate access to the kind of computing power it would take to reverse them. Indeed, only a handful of government agencies likely possessed that kind of power until very recently. But the rapid improvement in hardware and the concurrent rise of commodity cloud computing platforms such as AWS have made high-level compute power available to the masses at low prices. Reversing a hash value produced by an older algorithm such as SHA-1 is now within reach for just about any attacker.

“The specific hash function doesn’t matter much, though. All three of the problems I wrote about arise from any hash function. One caveat with respect to reversing hashes: Key stretching would make brute force attacks more difficult. It runs up against practical constraints, though, because retail analytics services have to be able to calculate hashes live in production,” Mayer said.

This story was updated on March 20 to add comments from Mayer.

Image from Flickr photos of Jerry Seaman

Suggested articles