Google has fixed a bypass for its reCAPTCHA authentication mechanism – the Turing test-based methodology for proving that website users aren’t robots, commonly spotted on log-in pages online. The news comes as Google releases a new version of reCAPTCHA in beta.
Google has been working on refining and strengthening reCAPTCHA for years, and last year extended it to mobile websites for Android users. Essentially, web developers can drop in a reCAPTCHA code fairly easily using Google’s API. Once embedded, it determines whether to trust website visitors based on their ability to solve a simple puzzle, such as clicking on all street signs in a presented photo, solving an audio challenge, or typing in a word or number that’s presented in distorted form.
The internet giant said that more than over 300 million reCAPTCHAs are solved each day.
Behind the scenes, a handshake is going on. Once a user solves the challenge and clicks verify, the reCAPTCHA function sends an HTTP request to the web application. That in turn sends its own request to the Google reCAPTCHA API, which both verifies itself as a trusted application and requests verification that the visitor solved the reCAPTCHA correctly.
An exploit for the bypass vulnerability required an HTTP parameter pollution in the web application, according to independent app security expert Andres Riancho, who reported the bypass (and earned $500 from the Google bug-bounty program for his efforts). In other words, the web application would need to send verification requests to the reCAPTCHA API in an insecure way. This reduces the severity of the flaw, but also leads to a 100-percent success rate.
“When this situation occurred the attacker was able to bypass the protection every time,” Riancho said in a post on the flaw posted on Monday, adding that Github searches showed that about 60 percent of web applications with reCAPTCHA integrations have a HTTP parameter pollution.
“HTTP parameter pollution is almost everywhere: client-side and server-side, and the associated risk depends greatly on the context,” explained Riancho. “In some specific cases it could lead to huge data breach, but in most cases it is a low-risk finding.”
An attacker in this case can send a specially crafted response to the vulnerable web application, which contains a hard-coded secret key which disables reCAPTCHA response verification. The key is provided to developers as an easy way to disable reCAPTCHA’s verification in staging environments. If a malicious actor inserted the key within the string ahead of the regular response from the API, he or she could get around the reCAPTCHA.
“Note that the request contains two secret parameters, the first one is controlled by the attacker (due to the HTTP parameter pollution in the vulnerable web application) and the second one is controlled by the application itself…the reCAPTCHA API [always] uses the first one,” Riancho said.
The vulnerable web application thus needs to create the URL with the response parameter first, and then the secret, and only 5 to 10 percent of the reCAPTCHA implementations meet this requirement.
“So, if I would have wanted to exploit this in the wild, only ~3 percent of the sites which use reCAPTCHA would have been vulnerable: not bad since this is used everywhere,” said Riancho.
Google has fixed the security issue upstream in the reCAPTCHA REST API, which fortunately means no modifications are required to the affected web applications.
This is not the first time an exploit has been designed for the mechanism. Last Fall saw the debut of unCAPTCHA, an artificial intelligence-based automated system designed at the University of Maryland. It was able to break Google’s audio-based reCAPTCHA challenges (offered as an option for people with disabilities), with an accuracy of 85 percent. It could in fact solve 450 reCAPTCHA challenges with that accuracy level in just 5.42 seconds: less time than it takes to listen to the challenge in the first place.
unCaptcha combines free, public, online speech-to-text engines with a phonetic mapping technique. The system downloads the audio challenge, breaks it into several digital audio clips, then runs them through several text-to-speech systems to determine exact and near-homophones, weights the aggregated results by confidence level, and then sends the most probable answer back to Google.
Meanwhile, Google recently announced the beta version of a fresh approach to reCAPTCHA that eliminates the puzzles. reCAPTCHA 3 instead returns a trustability score for each request based on interactions with a website.
Google said that the mechanism won’t interrupt users, and website administrators will be able to decide when to run it. The scores are based on user interactions with a site, with behavioral characteristics like mouse movements used to determine a visitor’s level of humanity. A “good” score is 1.0, while a bot interaction will be scored a zero.