Artificial intelligence and deep learning are creeping into information security, and one of the early applications of those approaches has emerged and is focused on passwords.
Researchers from the Stevens Institute of Technology and the New York Institute of Technology have recently published some early results from their work using Generative Adversarial Networks (GANs) to generate password guesses at a better rate, they said, than existing manual rule-generation techniques feeding existing tools such as Hashcat or John the Ripper. By opting for these powerful analytical tools, the researchers said they can use machines to learn from existing data, such as any of the millions of passwords leaked in the last 18 months, and develop new password rules that not only improve the efficiency of the pen-testing tools, but also could someday be the primary tool used to recover or guess passwords.
“Let’s say tomorrow there is another password leak; if you’re building rules manually and you want to take advantage of that knowledge from the leak, you have to get people to go through it and see what is not matched, and figure out how to match it manually by coming up with new rules and keywords. It’s a manual work,” said Paolo Gasti of NYIT, one of the researchers involved. “What we are doing instead is we take the password dump, give it to the tool and let it run for a day, a week or a month and you’re done. You’ve already learned as much as the tool can learn from this new dataset.”
Gasti, along with his colleagues Briland Hitaj, Giuseppe Ateniese and Fernando Perez-Cruz of the Stevens Institute of Technology and the New York Institute of Technology recently released a paper called “PassGAN: A Deep Learning Approach for Password Guessing.” PassGAN is the name of their technique that leverages GANs to improve on rule-based password generation tools, they said.
“PassGAN represents a substantial improvement on rule-based password generation tools because it infers password distribution information autonomously from password data rather than via manual analysis,” they wrote. “As a result, it can effortlessly take advantage of new password leaks to generate richer password distributions.”
The paper includes results from a number of experiments that demonstrate how PassGAN analyzed passwords leaked in the LinkedIn and RockYou breaches and outperformed John the Ripper’s SypderLabs rules and “were competitive” with best64 and gen2 rules from HashCat.
“When we combined the output of PassGAN with the output of HashCat, we were able to match 18%-24% more passwords than HashCat alone,” they wrote. “This is remarkable because it shows that PassGAN can generate a considerable number of passwords that are out of reach for current tools.”
GANs, meanwhile, are deep-learning tools that are made up of two deep neural networks: generative and discriminative. The deep learning is used in many applications to generate something new from a data set (i.e, scanning thousands of images of faces or rooms to create a new, unique image). Gasti said this may be the first application of GANs in security, and their intent was to teach the deep neural networks what user-chosen passwords look like without providing the network any context, such as personal information like dates of birth or pet names which users often combine when forming what they believe are complex passwords.
“We are not providing any information, just blindly giving a set of passwords to the machine, and the machine is figuring out what a password is. The idea is that this machine will go through these passwords hundreds of thousands of times and every time it runs through them, it learns something new, some new relationship between components of a password,” Gasti said. “The hundred-thousandth pass might be ‘I’ve identified this word and numbers and know the relationship between them and the probability that binds them.’ Every time it goes through something, it learns something new. Instead of having a team of people manually go through hundreds of thousands of passwords, the machine does it for you.”
Ideally, a fast cluster of machines could analyze millions of passwords for a month, for example, and extract rules that a manual process could never generate, he said.
“I feel we’re raising the bar for what a good password is supposed to look like,” Gasti said. “Since we’re better at guessing passwords now, you can imagine what we thought was secure before we now know is not. Passwords that were un-guessable before now become guessable not because they are weaker, but because we have a better way to assess them.”