04/05/2012

ReCAPTCHA: Making the World an Accurately Known Place

If you’ve ever filled out a form on the web (and who hasn’t) you’ve certainly seen a reCAPTCHA like the one here. An interesting fact that many, but not all, people know is that by using reCAPTCHA you’re helping to preserve our history by digitizing books. As of their 2008 paper in Science, over 440 million words had been identified. Google purchased reCAPTCHA in 2009, and put it to work on old editions of the NY Times and Google Books.

reCAPTCHA example

But digitizing isn’t just for words. Now it seems they’re adding street signs and house numbers to the list of possible items presented to the user for validation. According to Google, the images of numbers and street signs were added to the program several weeks ago as part of an experiment. Only recently though, has the buzz begun to sweep the web. Many people are speculating on the accuracy and the security of using the images in reCAPTCHA.

The way traditional reCAPTCHA works is two words are displayed. One of which is known to Google and the other one isn’t. The idea is that the user doesn’t know which word is which, and automated programs can’t be used to generate the answers. The word that isn’t known is from a scanned image of text that is unreadable by OCR software (software that translates images to text). By entering your best guess at the word, you’re helping Google identify it. Your answer, along with many others, are saved and compared until they reach a confidence level on what the actual word is based on statistical analysis.

The issue with using an image of a street sign, or house number, is that it will be obvious to the user which one is already known and which one isn’t – meaning that the security is only half as effective. If reCAPTCHA becomes less secure, it will be less effective in stopping spam. This all is assuming that every image Google uses is the unknown in the necessary phrase. No one knows that to be the case for sure.

Others though, think that Google’s use of reCAPTCHA for map data is a blessing. Have you ever Googled an address and found that it was placed on the wrong side of the road, or river or town? If they are able to refine their address database the results of location searches should improve. In a way you, and every user, are helping make the world a more accurately know place.

Unless of course you’re one of those types that enters a wrong answer because – well, just because. It may get you through the form, but with the amount of data Google has at its finger tips rest assured it won’t stop them from identifying the unknown number or word.

What do you think of Google’s new use of reCAPTCHA? Will the experiment go mainstream?