The weekly newsletter for Fed2 by ibgames

EARTHDATE: August 17, 2008

Official News page 8


REAL LIFE NEWS: RE-CAPTCHA PUTS SECURITY MEASURES TO WORK

by Hazed

You know when you want to buy tickets or something online, and before you can complete the purchase the website puts up an image of a word all wiggly and skewed, and you have to decipher it and type it back in to prove that you are a real human and not a bot trying to corner the market in seats for the latest hot show?

I find that so irritating! I hate the fact that I have to prove that I am a human. Here I'm trying to spend money, and it's yet one more barrier thrown up before I can complete the transaction. It makes me grumble, loudly. It's even worse when all I want to do is post a comment on a blog, or register to read something on a site - I usually decide that it's just not worth the bother, and give up.

Now I have discovered that one particular system is putting this security measure to work to solve a completely different problem - to help digitize old books and manuscripts. It's a really clever way of putting an annoyance to a very good use. It's called re-CAPTCHA and it's the work of computer scientists from Carnegie Mellon. Now, when computer users solve one of those stupid distorted-letters puzzles, they can also help turn the printed word into something a machine can read.

Here's how it works. The team led by Luis von Ahn have taken the basic security measure which is called CAPTCHA and bolted it onto the mechanisms used to digitize bookks and newspapers for archives. Old texts are digitized by photographically scanning the pages and then using character recognition software to transform the text into bits and bytes. But when the paper has turned yellow and the ink has faded, the software sometimes can't recognize some words. In extreme cases, as many as one word out of five can fox the OCR software, according to tests. To get a human to supply the missing translation would be very expensive.

Conventional CAPTCHAs were developed at Carnegie Mellon anyway, so they are ideally placed to tamper with the system. They've done that by taking images of words from old texts that OCR systems had problems with. Then, when people decipher the image in order to do something on the web, the results are stored and once enough people agree on what a distorted word is, it's taken to be correct and used in the digital version of the original text.

The re-CAPTCHA system has been running for over a year on thousands of websites, and now the researchers have concluded that the process is just as good as using humans to transcribe old text - it achieves better than 99% accuracy. More than 1.2 billion re-CAPTCHAs have been solved, and more than 440 million words have been deciphered, which is the equivalent of manually transcribing more than 17,600 books.

Sounds great! For more details about this clever way of harnessing web user power, see http://recaptcha.net/.


Fed2 Star index Previous issues Fed 2 home page