The great mystery since the NSA and other intelligence agencies’ cyber-spying capabilities became watercooler fodder has not been the why of their actions, but the how?

For example, how are they breaking crypto to decode secure Internet communication?

A team of cryptographers and computer scientists from a handful of academic powerhouses is pretty confident they have the answer after having pieced together a number of clues from the Snowden documents that have been published so far, and giving the math around the Diffie-Hellman protocol a hard look.

The answer is an implementation weakness in Diffie-Hellman key exchanges, specifically in the massive and publicly available prime numbers used as input to compute the encryption key.

The team of 14 cryptographers presented their paper, “Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice,” this week at the ACM Conference on Computer and Communications Security, which explains that given the budgets at the disposal of the NSA, for example, such an agency could build enough custom hardware and invest the time required to derive an output that would give the attacker “intermediate” information that would eventually lead to the breaking of individual encrypted connections.

“It’s not arriving at the key, instead it’s telling you something about the mathematical structure about that particular choice of the prime number when used in Diffie-Hellman,” said J. Alex Halderman associated professor computer science and engineering at the University of Michigan and one of the authors of the paper. “The analogy is sort of cracking the prime. After you crack the prime, breaking individual Diffie-Hellman connections that use that prime is easy.”

The prime numbers are the most likely target, he said, because they’re usually not generated from scratch, instead are plucked from previous work or taken from recommendations in established standards. Halderman told Threatpost that an intelligence agency of the NSA’s caliber would need to spend hundreds of millions of dollars to build the custom hardware required for such a large computation.

“The payoff you get is just so enormous,” Halderman said. “You built this machine and after a year or so of cranking on the problem, it produces some output that then lets you break individual connections very quickly. So you do this enormous amount of math that involves only the prime, and what you’re left with is an intermediate product of the attack, information about the mathematical structure of the prime. So you can amortize that cost of building that computer across potentially trillions of encrypted connections that you can then very quickly and cheaply break.”

The paper explains that Diffie-Hellman picking a prime number from an available pool was an implementation choice that makes sense from a development standpoint, but is a disastrous choice in the context of security and privacy.

“The core lesson of our paper is about this disconnect in the analysis of some foundational crypto between the mathematical cryptography community and the world of people who are applying crypto in real systems,” Halderman said. “We’re taking stuff about the cryptanalysis that’s been known within certain research circles, but the people in those research circles didn’t realize quite how Diffie-Hellman was being used in the real world. They didn’t realize that there was such widespread duplication of primes and primes of such weak strength.”

Most at risk if a common 1024-bit prime is broken, the paper says, are IPsec VPNs, 66 percent of which rely on Diffie-Hellman key exchanges. Such an attack would also break 26 percent of SSH connections and 18 percent of HTTPS connections on the top one million domains.

From the Snowden documents we already know the NSA had the capability and motivation to passively snag traffic from Internet backbones, collecting encrypted VPN traffic for example. Halderman said an attacker could then send that collected traffic back to the purpose-built supercomputer which hacks away at the math until the key is derived.

“The mystery is how are they attacking the cryptography because standard cryptography we use for the Web and VPNs has been developed with the intention that even if you have access to enormous supercomputers, you shouldn’t be able to break it,” Halderman said. “So there must be something weak about the cryptography everyone depends on that the applied security community hadn’t realized. Our result is simply an explanation for what the big problem likely is that is enabling that large-scale decryption with high-performance computing.”

The NSA’s black budget, which was explained in a Snowden document, includes $1 billion annually for computer network exploitation, and other related programs afforded hundreds of millions for the same task, putting this type of attack within range of the agency.

In the meantime, it’s likely going to be years before this is properly addressed, Halderman said, because so many protocols are affected.

“This isn’t a flaw in a particular protocol, it’s a property of the math the underlies Diffie-Hellman, which is part of the foundation of almost every important cryptographic protocol we use,” Halderman said. “It’s certainly not an overnight [fix]. One of the problems is that the standards behind any important protocols like the IPsec VPN protocol specify that everyone will use these particular primes that by virtue of being so lightly used are made weaker. I think it’s going to be years unfortunately before standards and implementations are widely updated to account for this threat.”