Academics Make Theoretical Breakthrough in Random Number Generation

Two University of Texas academics have made what some experts believe is a breakthrough in random number generation that could have longstanding implications for cryptography and computer security.

Two University of Texas academics have made what some experts believe is a breakthrough in random number generation that could have longstanding implications for cryptography and computer security.

David Zuckerman, a computer science professor, and Eshan Chattopadhyay, a graduate student, published a paper in March that will be presented in June at the Symposium on Theory of Computing. The paper describes how the academics devised a method for the generation of high quality random numbers. The work is theoretical, but Zuckerman said down the road it could lead to a number of practical advances in cryptography, scientific polling, and the study of other complex environments such as the climate.

“We show that if you have two low-quality random sources—lower quality sources are much easier to come by—two sources that are independent and have no correlations between them, you can combine them in a way to produce a high-quality random number,” Zuckerman said. “People have been trying to do this for quite some time. Previous methods required the low-quality sources to be not that low, but more moderately high quality.

“We improved it dramatically,” Zuckerman said.

The technical details are described in the academics’ paper “Explicit Two-Source Extractors and Resilient Functions.” The academics’ introduction of resilient functions into their new algorithm built on numerous previous works to arrive at landmark moment in theoretical computer science. Already, one other leading designer of randomness extractors, Xin Li, has built on their work to create sequences of many more random numbers.

“You expect to see advances in steps, usually several intermediate phases,” Zuckerman said. “We sort of made several advances at once. That’s why people are excited.”

In fact, academics worldwide have taken notice. Oded Goldreich, a professor of computer science at the Weizmann Institute of Science in Israel, called it a fantastic result.

“It would have been great to see any explicit two-source extractor for min-entropy rate below one half, let alone one that beats Bourgain’s rate of 0.499,” Goldreich said on the Weizmann website. “Handling any constant min-entropy rate would have been a feast (see A Challenge from the mid-1980s), and going beyond that would have justified a night-long party.”

MIT’s Henry Yuen, a MIT PhD student in theoretical computer science, called the paper “pulse-quickening.”

“If the result is correct, then it really is — shall I say it — a breakthrough in theoretical computer science,” Yuen said.

The study of existing random number generators used in commercial applications has intensified since the Snowden documents were published; sometimes random numbers aren’t so random. Low quality random numbers are much easier to predict, and if they’re used, they lower the integrity of the security and cryptography protecting data, for example. Right now, Zuckerman’s and Chattopadhyay’s result is theoretical and work remains in lowering the margins of error, Zuckerman said.

Previous work on randomness extractors, including advances made by Zuckerman, required that one sequence used by the algorithm be truly random, or that both sources be close to random. The academics’ latest work hurdles those restrictions allowing the use of sequences that are only weakly random. Their method requires fewer computational resources and results in higher quality randomness. Today’s random number systems, for example, are fast, but are much more ad-hoc.

“This is a problem I’ve come back to over and over again for more than 20 years,” says Zuckerman. “I’m thrilled to have solved it.”

Suggested articles


  • GreyCloud on

    Thank you for the news post. Early on, with ANSI C, I realized a key problem in PRNG was the loop. With loops, the computer was so regular that I could see statistical patterns form during testing. Adding a second time-based random delay, while taking longer to generate a number(s), creates a more even spread. Not attempting to achieve a perfect RNG with chaos or noise source, it sufficed. It seems natural to add 2+ variables if you are willing to sacrifice computational time. Even my random delay code suffers from the effectiveness of thread timing and pre-compiled libraries, but statistically worked well to prevent a range of extremes from occurring. How you employ these second methods depends on your mission. Mine is dependent on the fact that the initiation of a PRNG is either manual or not timed, so the random delay acts as a quasi- second seed. Every operation on the stack has time constraints, and threads rent time on the core. It creates frightening regularity and is an external variable that has to be dealt with. When you run an application and when you run the PRNG function can be different every time, and is a good place to start when attempting to even out the stats, even though your random delay algorithm may not directly effect the generation(although it should). You have to code to avoid birthday or preimage attacks, and make them unfeasible.
  • bob on

    Why not just use a good hash function on the two inputs?
    • GreyCloud on

      Because the hash could be called into question. Similar to a programmer's method of translating the PRNG result into a range, two hashes, even with compression, could result in a reduction operation depending upon how you do the math. If you simplify, even a bit shift and XOR would result in repeat collision with 2 hash sources to create one result. In other words, eventually you would get at least variables that would spit out same/similar/partial-same results. A hash should only be a final product for this reason. You have to define how you are doing these hashes. What I was trying to describe with timing regularity, is that you can get groups of numbers close to each other(I call them neighbors). Not necessarily under stat bell curve, looped results that provide a hacker the ability to build a better brute-force initial guess. With distributed attacks, you could guess right. Timing regularity and reduction operations is like palming a 6-sided die. You may not be perfect, but close enough to double your odds of success. A modern distributed attack method is like 6 guys guessing on that palmed 6-sided die. One of them may get it.
  • RNGuy on

    I wonder how this compares to RANLUX, the RNG written in Fortran. At the higher levels, it proves to be mathematically random.
  • GreyCloud on

    My final post on this: anyone can test their own attempts. Loop your algorithm 100,500,1024,2048,4096 times and run distribution stats. Over a long enough timeline distribution will even out and flatline. The quicker your distribution evens out from the median, the more successful the algorithm is. The key is to ident groupings of numbers and outliers that break a traditional bell curve. It may take tens of thousands of iterations to discover this. You can take a perfectly good PRNG and destroy it with a small range. If you create/randomize a translation table within your program to create a range, you are at least trying to not be linear in results. Don't dismiss the fail points in how RNG is implemented. That is how the hacker thinks. Work around the perimeter also.

Subscribe to our newsletter, Threatpost Today!

Get the latest breaking news delivered daily to your inbox.