Three security researchers from Columbia University in New York recently published a paper with a rather dramatic sounding title: ✔︎ I’m not a human: Breaking the Google reCAPTCHA.
As you can see, the title really does start with a check mark (tick), shown in green, and it’s a play on the Google reCAPTCHA interface, which pops up a message like this once you’ve solved the puzzles it presents:
CAPTCHAs are Completely Automated Procedures for Telling Computers and Humans Apart, and they usually take the form of interactive puzzles based on fuzzy writing or blurry images (and even, in one humorous example, calculus problems).
The idea is that that humans can solve CAPTCHAs with a modest effort and not too much irritation, but even fast computers can’t deal with well-designed CAPTCHAs reliably at all.
This is supposed to make it costly and complicated for cybercrooks to write programs that can rapidly register for hundreds of free email accounts, for instance, while at the same time making it not too annoying for legitimate users to sign up.
CAPTCHAs don’t have to be perfect, and if crooks (or security researchers) figure out how to trick them once in a while, the internet isn’t going to collapse.
Nevertheless, CAPTCHAs represent something of a security arms race.
They are, after all, devised to slow down crooks in order to protect online resources from deliberate, malicious overload, so a crook who can work around a CAPTCHA’s speed bumps can claw back the criminal advantage he originally enjoyed.
At this point, the CAPTCHA’s creators need to respond by adapting the CAPTCHA puzzles” so that the crooks have to go back to solving them by hand.
The headline chosen for this paper is somewhat unfortunate, for all that it attracts your attention, because the researchers haven’t really “broken” anything.
Indeed, their research is much more general, and in many ways much more useful, than a simple “we found a bug, now fix it.”
The researchers investigated how crooks could to try to speed up the rate at which they can try new CAPTCHAs; how to guess how the CAPTCHA process works behind the scenes in order to game the system; and how to use other online services to solve CAPTCHAs automatically more quickly than the designers thought possible.
For example, the researchers noticed than Google’s image-based reCAPTCHAs, which ask you to pick images with matching characterics from a randomly-chosen set (e.g. “all pictures with street signs”), weren’t as random or as varied as you might think.
In an amusing irony, they were able to use Google’s own massive image search database in reverse, finding words to match an image, rather than images to match a word, to help them find images in a reCAPTCHA set that shared a particular characteristic.
The results don’t sound particularly good: an accuracy rate of 70%, with an average of about 20 seconds per CAPTCHA, with only a modest test set of images in the test set.
However, this turns out to be about twice as fast as humans can solve CAPTCHAs in bulk, and computers don’t get bored or distracted.
Pay-as-you-go CAPTCHA-busting services, typically based in developing countries with low wages and a plentiful supply of staff who are desperate for work, are already popular with the crooks, and good enough for criminal purposes…
…so the attacks in the paper could give crooks twice the CAPTCHA-busting performance for free.
And, of course, computer-based attacks only ever get faster, all else being equal.
According to the authors of the paper, Google has already taken on board some of the findings and improved its reCAPTCHA system, so the authors do indeed seem to have fulfilled their implicit goal of helping to make things better.
That’s a more satisfying result, and indeed a more practical outcome, than simply “breaking reCAPTCHA.”
It’s also a strong reminder of why set-and-forget is a dangerous way to approach security, and why you need to treat security as a journey, not a destination.
2 comments on “Solving Google reCAPTCHAs – without using humans”
Minor typo? “Pas-as-you-go”
I was a fan of the old reCAPTCHA systems that helped Google process and interpret book scans. I was sad when that became telling street signs apart because it lost the element of using human brainpower for some greater purpose, but when they finally introduced the “click this box” version, I was glad to see all those sometimes very difficult to read scrambled images go away. If improvements in usability like that continue to be made while continuously improving/accounting for the updates in criminal capabilities, then that’s the best of both worlds, I suppose.
Fixed the typo, thanks.