In the early 1990s – before Gmail, Zimbra, Zoho, and ProtonMail came along – Yahoo Mail was king of the hill. It was free, easy to use, you could have any email address you wanted, and access it from anywhere on the planet. Everything was good. Except for one problem: spam. Literally millions of fake email accounts were being created every day by spammers.
By using automated computer scripts, known as ‘bots’, the spammers were able to complete the Yahoo Mail sign up for hundreds of times a minute, without any human intervention at all. The resulting random email addresses, which Yahoo had no way of telling apart from genuine accounts created by real, living humans, could then be used to sell you fake pills, or encourage you to wire money to a friendly Nigerian prince.
This was a real problem for Yahoo, and other free email providers. Spam filters were beginning to block emails from all Yahoo accounts, as the only way to combat the incessant spam. The company desperately needed a way to tell real humans apart from the spam-bots.
Yahoo did not know what to do, but Luis von Ahn, then a PhD student in computer science, had the answer. Luis created a simple task that was easy for humans, but really difficult for the bots: deciphering a string of badly written letters and numbers. Most importantly of all, computers – like the ones running Yahoo’s email sign-up form – would be able to tell, automatically, whether the test was being answered by a human, or by a bot.
Von Ahn called his invention, “completely automated public Turing test to tell computers and humans apart.” It may not be catchy, but its acronym is: CAPTCHA.
reCAPTCHA, the New York Times… and Google.
Within a week of Luis von Ahn creating CAPTCHA, Yahoo was using it in their sign-up process, and it was working brilliantly. Unfortunately, everybody hated it. It took up 10 seconds of their time, when they could have been doing something else. So, von Ahn realised, why not give them something to do in that time?
Right around the same time as this was happening the New York Times was setting out to digitise its 100+ year newspaper archive. Much of that was an automated process, but there were lots of words in the scanned pictures of newspapers that the computers couldn’t work out. So, von Ahn realised, show one word at a time to each of the 200 million people completing a CAPTCHA test each day, and problem solved.
Luis founded a company called reCAPTCHA to do this work and, when Google wanted to build the world’s biggest library of digital books, they bought reCAPTCHA from von Ahn for an undisclosed, but very large, sum of money.
Spammers hate reCAPTCHA, even more than the rest of us. Spam is big business, and reCAPTCHA is a threat to their income. Soon, they were paying ‘CAPTCHA farms’, low paid workers in developing nations, to solve reCAPTCHAs for the spam bots. When Google realised this and launched an improved reCAPTCHA –v2, asking users to choose ‘all the images with a car’ and similar – the spammers simply used off-the-shelf machine learning tools to beat the test.
The arms race had escalated, and the spammers were winning.
A new reCAPTCHA, v3.
Google’s answer is a new reCAPTCHA, v3. The first thing users will notice about this as it is rolled out online (which is still a work in progress) is that, well, they don’t notice it. Instead of asking users to complete a task, or a test, reCAPTCHA v3 is invisible. Instead, code in the webpage looks at what a user does on the website, and compares that to what other, real humans, have done. reCAPTCHA then gives the user a score – a probability of whether or not they are a robot – and sends this to the website. It’s then up to the site to decide what to do.
Of course, sooner or later even this new, transparent reCAPTCHA v3 will be cracked by the spammers but, for now, this machine learning AI is enough to keep the internet a useful tool, and not a spam filled wasteland.