Most website visitors are bots. Our 2016 Bot Traffic Report, including data from the past five years, shows that 50 percent of all site visitors are bots.
Despite their reputation in the media lately, web bots aren’t inherently bad. Only a little more than half of bot traffic is malicious, which means a large part of the bot traffic on your site is good for the site. Regardless of if your site attracts 10 visitors a day or 100,000, there’s a mix of good bots indexing your website and bad bots trying to overwhelm it.
The good bots help collect information and monitor site metrics. Search engines like Google and Bing rely on bots to run simple and repetitive tasks. These good bots are essential drivers for all the online activity whether they’re private, public, large or small.
Bad bots on the other hand, such as Mirai, Nitol, Cyclone and Leet are malware that wreak havoc on computers and networks. They waste valuable resources, steal proprietary information, and flood websites with distributed denial of service (DDoS) attacks.
I hosted a webinar “Three Strategies to Stay on Top of Bots on Your Website” recently that looks at how you can apply different strategies to bots. Watch the video to see how you can take advantage of the beneficial bots and drop unwanted traffic.
Intelligent Bot Detection
Incapsula uses a layered approach to secure all web traffic. As traffic passes through our system, it is subject to access control, bot mitigation, a web application firewall and proprietary rules and policies.
Over all, we use three ways to address bad web bots.
One way to identify and mitigate bots is by performing static analysis of each HTTP request. By looking at structural web requests and the header information and co-relating that with what a bot claims to be, we can passively determine its true identity.
Beyond the static approach is a behavioral analysis approach where we look at the activity associated with a particular bot. This detects impersonator bots that are attack bots masking themselves as legitimate visitors to circumvent security solutions. Impersonator bots account for 24.3 percent of all bot traffic. Most of these bots will link themselves to a parent program like MSIE/Chrome, or to a well-known bot like the Google Search Crawler. If the bot’s behavior varies in any way from the parent program, this anomaly will help identify the client as suspicious.
A more progressive way of addressing a bad bot is a challenge-based approach (or support-based approach). Websites are equipped with proactive components to measure how a visitor behaves with different technologies. Using JavaScript, a run-time language that executes locally on the client’s machine, we can further inspect properties of the underlying device. In extreme circumstances, we can also use scrambled imagery like CAPTCHA to actively challenge a user.
The most effective way to identify and mitigate bots, however, is by combining the technologies in a multilateral approach. It’s best to analyze each visitor (every bot/human that comes to your website) and match a client application ID based on its combined characteristics from all three approaches.
Our client classification process uses all the methods listed above and works in the following manner:
Developing a Plan
In developing a plan to mitigate bad bots, you must also develop a plan to work effectively with good bots. This involves optimizing and accelerating traffic (i.e. no added latency), and improving user experience by creating clean URLs that are readable and search-engine–friendly.
Once you’ve done that, the goal is to stop the bad bots from overtaking your website. The first step is to automatically block known bad bots (like vulnerability scanners, comment spammers and DDoS bots). Afterward, you can systematically add problematic clients to your “block list.” Clients to watch out for include developer tools, specific attack tools and content scrapers.
Information Is the Key
Once you’ve implemented your strategy for managing known good and bad bots, it is now time to think deeper about how to manage the unknown. Creating an effective strategy for managing the unknown will often depend on the application you are protecting, your organization’s risk tolerance, and the clients you typically expect. In any case, the first step to building out an effective strategy is to arm yourself with as much information as possible.
To help enhance the information you have about each client, Incapsula customers can use Header Enrichment using Incapsula Application Delivery Rules to provide your application, SIEM and BI tools with additional context to make decisions.
Once you have the information needed to make more informed decisions, you can use Incapsula Application Delivery Rules to manipulate bots with any of these three strategies:
- Identify bad bots and redirect them to a special site or server with fake content
- Content switching based on Layer 7 parameters
- Implement behavior-based rules using injected HTTP headers
I cover these methods in the webinar recording “Three Strategies to Stay on Top of Bots on Your Website“. Find out how to protect and drive traffic to your website. Let me know if you have questions in the comments area.
Try Imperva for Free
Protect your business for 30 days on Imperva.