CAPTCHA And Beyond: Defending Against Bad Bots

Bots are a growing threat. An estimated 20% of web traffic is now made up of bad bots, carrying out everything from distributed denial-of-service (DDoS) or “credential stuffing” attacks to scraping data, publishing fake reviews, and slanting advertising and visitor metrics on websites. These Malicious bots are increasingly sophisticated in their behavior, often making them indistinguishable from human users.

For example, bots can mimic human actor mouse behavior, meaning that they will simulate the way a genuine user may move their mouse while visiting a particular website. While these bots may be secretly (or not so secretly) carrying out brute-force attacks targeting online user accounts or data mining for competitive purposes, these bots can therefore slip through security safety nets by appearing to be regular legitimate visitors going about their everyday activities.

The rise of malicious bots

With the rise of malicious bots online, security professionals have been on the hunt to find ways to block these bad bots — whether that’s methods such as CAPTCHA or less intrusive tools.

Given the potential havoc bots can wreak, it’s no surprise why people would be eager to stop them in their tracks. For example, a credential stuffing cyberattack — in which stolen account credentials, such as usernames or email addresses and passwords, are used to try and hack into different services through a trial-and-error, brute force methodology — can result in sensitive customer data being breached.

Meanwhile, a DDoS cyberattack can bring down online services by overloading them with bot traffic. Both of these may be devastating to organizations. Other uses of bad bots, such as skimming data, might not be so immediately crippling, but can nonetheless be used in a manner that is harmful.

The good and bad of CAPTCHA

Fortunately, there are multiple bot detection methods out there, although some of these are more effective than others. Perhaps the most familiar method used to sort real users from fake ones is the CAPTCHA. Short for “Completely Automated Public Turing test to tell Computers and Humans Apart,” CAPTCHA are tools designed to distinguish between bots and legitimate, flesh-and-blood human users.

First invented in the late 1990s, although they were only given a name in 2003, CAPTCHA have become more advanced as bots have gotten more sophisticated. The most common CAPTCHA types ask users to identify wiggly, stretched letters or numbers. However, more recent variations have requested users to click in specific areas or identify objects within images. In short, CAPTCHA systems have to stay one step ahead of what automated bots can mimic human behavior in order to filter out the fake users from the real ones.

Unfortunately, CAPTCHA are not the most elegant solution when it comes to bot detection. That is because they necessarily impede the user experience by asking them to answer questions in order to proceed with whatever they might be doing. Users may be willing to put up with this once or twice during an internet session, but it is a tool that has to be used sparingly, thereby making it difficult to use in continuous, real-time monitoring of potential bad actors. CATCHA have other weaknesses too. In their efforts to sort humans from bots, they can ask humans to complete tasks which are difficult even for some humans to understand and complete. Certain CAPTCHA types also do not support all available browsers or certain devices. By trying to block out bad users, they may end up blocking some good ones as well.

Alternative methods are available

Fortunately, there are other methods that can be deployed also. Device fingerprinting attempts to recognize unique devices by analyzing factors such as the operating system of the user, the specific web browser they are using, their IP address, and so on. Device fingerprinting makes it easier to inspect traffic, identify malicious devices and, ultimately, spot and stop bad bots before they inflict any damage.

Another method consists of cookie challenges. In the event that a classification algorithm identifies a potential bot, a cookie challenge will respond to an HTTP request by sending a cookie. Usually, web browsers will store and resend this cookie. However, the majority of bots don’t support cookies and, as a result, will not respond. This is a good way of determining whether or not you are dealing with a bot.

Yet another strategy is JavaScript challenges. Similar to a cookie challenge, a JavaScript challenge will respond to an HTTP request with a JS cookie. Most web browsers execute JavaScript instructions, but the majority of bots do not. Because of this lack of support for a JS engine, they will fail to respond and therefore mark themselves out as a suspect.

These are just a few of the methods that can be used for dealing with potential bad bots, nipping them in the bud before they become a problem. Methods such as these are also beneficial because they do not negatively impact upon the user experience.

Going forward, bad bots are only going to become more of a problem for those who run and provide internet services and websites. (And, as a result, their users as well.) Tools like CAPTCHAs certainly have their place, but they should be the last line of defense. Bots are getting smarter all the time. The way we stop them needs to do so as well — but without affecting the experience of good users along the way.