The great mystery since the NSA and other intelligence agencies’ cyber-spying capabilities became watercooler fodder has not been the why of their actions, but the how?

For example, how are they breaking crypto to decode secure Internet communication?

A team of cryptographers and computer scientists from a handful of academic powerhouses is pretty confident they have the answer after having pieced together a number of clues from the Snowden documents that have been published so far, and giving the math around the Diffie-Hellman protocol a hard look.

The answer is an implementation weakness in Diffie-Hellman key exchanges, specifically in the massive and publicly available prime numbers used as input to compute the encryption key.

The team of 14 cryptographers presented their paper, “Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice,” this week at the ACM Conference on Computer and Communications Security, which explains that given the budgets at the disposal of the NSA, for example, such an agency could build enough custom hardware and invest the time required to derive an output that would give the attacker “intermediate” information that would eventually lead to the breaking of individual encrypted connections.

“It’s not arriving at the key, instead it’s telling you something about the mathematical structure about that particular choice of the prime number when used in Diffie-Hellman,” said J. Alex Halderman associated professor computer science and engineering at the University of Michigan and one of the authors of the paper. “The analogy is sort of cracking the prime. After you crack the prime, breaking individual Diffie-Hellman connections that use that prime is easy.”

The prime numbers are the most likely target, he said, because they’re usually not generated from scratch, instead are plucked from previous work or taken from recommendations in established standards. Halderman told Threatpost that an intelligence agency of the NSA’s caliber would need to spend hundreds of millions of dollars to build the custom hardware required for such a large computation.

“The payoff you get is just so enormous,” Halderman said. “You built this machine and after a year or so of cranking on the problem, it produces some output that then lets you break individual connections very quickly. So you do this enormous amount of math that involves only the prime, and what you’re left with is an intermediate product of the attack, information about the mathematical structure of the prime. So you can amortize that cost of building that computer across potentially trillions of encrypted connections that you can then very quickly and cheaply break.”

The paper explains that Diffie-Hellman picking a prime number from an available pool was an implementation choice that makes sense from a development standpoint, but is a disastrous choice in the context of security and privacy.

“The core lesson of our paper is about this disconnect in the analysis of some foundational crypto between the mathematical cryptography community and the world of people who are applying crypto in real systems,” Halderman said. “We’re taking stuff about the cryptanalysis that’s been known within certain research circles, but the people in those research circles didn’t realize quite how Diffie-Hellman was being used in the real world. They didn’t realize that there was such widespread duplication of primes and primes of such weak strength.”

Most at risk if a common 1024-bit prime is broken, the paper says, are IPsec VPNs, 66 percent of which rely on Diffie-Hellman key exchanges. Such an attack would also break 26 percent of SSH connections and 18 percent of HTTPS connections on the top one million domains.

From the Snowden documents we already know the NSA had the capability and motivation to passively snag traffic from Internet backbones, collecting encrypted VPN traffic for example. Halderman said an attacker could then send that collected traffic back to the purpose-built supercomputer which hacks away at the math until the key is derived.

“The mystery is how are they attacking the cryptography because standard cryptography we use for the Web and VPNs has been developed with the intention that even if you have access to enormous supercomputers, you shouldn’t be able to break it,” Halderman said. “So there must be something weak about the cryptography everyone depends on that the applied security community hadn’t realized. Our result is simply an explanation for what the big problem likely is that is enabling that large-scale decryption with high-performance computing.”

The NSA’s black budget, which was explained in a Snowden document, includes $1 billion annually for computer network exploitation, and other related programs afforded hundreds of millions for the same task, putting this type of attack within range of the agency.

In the meantime, it’s likely going to be years before this is properly addressed, Halderman said, because so many protocols are affected.

“This isn’t a flaw in a particular protocol, it’s a property of the math the underlies Diffie-Hellman, which is part of the foundation of almost every important cryptographic protocol we use,” Halderman said. “It’s certainly not an overnight [fix]. One of the problems is that the standards behind any important protocols like the IPsec VPN protocol specify that everyone will use these particular primes that by virtue of being so lightly used are made weaker. I think it’s going to be years unfortunately before standards and implementations are widely updated to account for this threat.”

Fascinating article! The huge (and hugely expensive) upfront task of finding the DH weakness could be seen as a deterrent to every hacker or hacking community out there. Even most nation-state actors. But the NSA is not most actors, they certainly have shown the appetite and have the budget to pull it off. Unfortunately it may have pulled out the underpinnings of a huge swath of existing Internet security as well.

So RSA is susceptible to breaking with quantum computers, DH suffers from massive implementation mistakes. What does it leave us with?

Absolutely right.

We also have further evidence to support this, from the evidence of tampering with the random functions to allow prediction of the next bit, & therefore the selected prime(s) then used in the maths.

I’ve also thought about a system that would literally use prime numbers as a regular computer users integers. Stunningly computationally expensive to get started, but once the work is done the entire field is open, much as you suggest here. Consider a rainbow table of enormous primes. Yes, the number of primes is infinite, but only in the infinite set. We are bounded by 1024bits, so “only” 2^1024 ÷ log 2^1024 are prime from that set. Discard those that are too small and the set becomes far more manageable. Add in the fact that you record what the result of each bit of maths said in another enormous database from which you can pull the previous answer, & “small” systems that use only 1024 bits become easy targets.

Bigger keys are clearly one answer, but moving to eliptic curves might be a better one.

This isn’t a flaw in the protocol, but in the math? Somehow I doubt that anyone affected cares one iota about this subtle distinction, or even has any idea whatsoever as to its meaning.

Thanks for that annoying comment “Dr Haliard”. I find the article easy to digest as a non crypto expert. Thanks for a great article.

If I got it right the problem is that some implementations took primes from pre-created pools. Well thats no math problem – this is an implementation problem….

Yah. Great. Now. 14 people figured that out on a relatively benign budget. NSA reportedly employs (tens of?)thousands of crypto experts.And they have an almost limitless budget. We should just realise we live in civil world, are permitted some privacy. Unlike high confidentiality environments requiring secret sauce.