Researcher uses Google’s speech tools to skewer Google reCAPTCHA

There are three types of Google’s prove-you’re-a-human reCAPTCHA tests, or what are also known as Completely Automated Procedures for Telling Computers and Humans Apart:

  1. Image Challenge: when Google makes you select all the kitty pictures or whatever other images developers have had us click on to prove we’re real.
  2. Audio Challenge: when you need to enter numbers that are read out loud.
  3. Text Challenge: when you need to pick all the phrases that match a given category.

Now, if Google could only figure out how to keep researchers from using its own tools to skewer those challenges.

No. 1, the image challenge, was gamed about a year ago when researchers used Google’s own massive image search database in reverse, finding words to match an image, rather than images to match a word, to help them find images in a reCAPTCHA set that shared a particular characteristic.

Now, the audio challenge has purportedly fallen, and yet again, it stumbled on one of Google’s own services: this time, it was Google’s speech recognition API.

A security researcher identifying him-/herself only as East-Ee Security said on Monday that they’ve discovered what they’re calling a “logic vulnerability” that allows for easy bypass of Google’s ReCaptcha v2 anywhere on the web.

The researcher came up with a way to automatically exploit that vulnerability. Dubbed ReBreakCaptcha, it works in these three stages:

  1. Challenge. Get to the right sort of reCAPTCH page where an audio challenge is offered, and download it.
  2. Recognize. Convert the audio file to a suitable format and send it to Google’s Speech Recognition API.
  3. Verify. Validate the Speech Recognition result and paste it into the reCAPTCHA, as though a human had figured it out.

East-Ee Security posted proof-of-concept code on GitHub.

In order to work, ReBreakCaptcha needs to make sure it gets an audio challenge every time, since that’s the type of challenge it knows how to game. It’s able to do that because when you’re presented with a text challenge, the dialog box offers a “reload” button. ReBreakCaptcha just keeps clicking that Reload Challenge button until it gets the audio challenge.

Likewise, when presented with an image challenge, ReBreakCaptcha selects the microphone icon at the bottom of the dialog box to select an audio challenge instead.

The controls on the audio challenge page are to play the audio, type in the answer, or download the audio challenge as a file.

The download button comes in handy. ReBreakCaptcha downloads the audio, converts it to WAV format (as Google’s Speech Recognition API requires), then feeds it into Google’s Speech Recognition. What the service sends back is a string: perfect for copying and pasting into the audio challenge’s text input box.

All these steps are automated through a Python script that relies on a library named SpeechRecognition that has support for several engines and APIs, online and offline.

The point of reCAPTCHA challenges is to slow down bots (software robots), so a bot that can solve a CAPTCHA automatically defeats the whole object.

The reason to determine if somebody’s human or bot is that bots do nefarious things, and they never get bored or tired when they’re doing them.

For example, bots harvest email addresses from contact or guestbook pages, scrape sites and reuse the content without permission on automatically generated doorway pages, take part in Distributed Denial of Service (DDoS) attacks, and automatically try to log into sites with reused passwords ripped off from breaches.

Of course, we saw reCAPTCHA fooled when researchers got around the image challenge with a success rate of 70% last April.

As it happens, Google’s been working on an even spiffier reCAPTCHA version, called Invisible reCAPTCHA, that won’t require us to click on anything at all. Rather, it will use advanced risk analysis technology that relies on clues as subtle as how a user (or a bot) moves the mouse in the brief moments before clicking the “I am not a robot” button to determine who’s human and who’s a bot.

But for now, while we wait for reCAPTCHA version 3 to come out, there’s apparently one more way to break version 2. East-Ee Security said that at the time the vulnerability was posted, the vulnerability hadn’t yet been patched.

The researcher didn’t mention whether s/he’d reported the bug to Google. I reached out to ask that of Google and to find out the status of a fix, and I’ll update the article if I hear back.