What is and how to stop bot traffic

liliya.georgieva · July 31, 2023, 3:25pm

Bot traffic describes any non-human traffic to a website or an app. The term bot traffic often carries a negative connotation, but in reality bot traffic isn’t necessarily good or bad; it all depends on the purpose of the bots.

Some bots are essential for useful services such as search engines and digital assistants (e.g. Siri, Alexa). Most companies welcome these sorts of bots on their sites.

Other bots can be malicious, for example those used for the purposes of credential stuffing, data scraping, and launching DDoS attacks. Even some of the more benign ‘bad’ bots, such as unauthorized web crawlers, can be a nuisance because they can disrupt site analytics and generate click fraud.

Using Tracking Rules you can exclude traffic from certain sources by filtering with IP addresses or domains.

How can bot traffic be identified?

Marfeel can help to detect bot traffic. The following analytics anomalies are the hallmarks of bot traffic:

Sawtooth wave form: Bot traffic lots of times is scheduled with a given frequency and can be visually identified

image3104×1974 574 KB

image1659×1300 223 KB
Spike in traffic from an unexpected traffic source: A sudden spike in users from one particular traffic source, particularly Direct traffic can be an indication of bot traffic.

image3104×1974 476 KB

image1552×1012 193 KB
Spike in traffic from an unexpected location: A sudden spike in users from one particular region, particularly a region that’s unlikely to have a large number of people who are fluent in the native language of the site, can be an indication of bot traffic.

image3104×1974 431 KB
Abnormally high pageviews: If a site undergoes a sudden, unprecedented and unexpected spike in pageviews, it’s likely that there are bots clicking through the site.
Surprisingly high or low session duration: Session duration, or the amount of time users stay on a website, should remain relatively steady. An unexplained increase in session duration could be an indication of bots browsing the site at an unusually slow rate. Conversely, an unexpected drop in session duration could be the result of bots that are clicking through pages on the site much faster than a human user would.

How can bot traffic hurt analytics?

As mentioned above, unauthorized bot traffic can impact analytics metrics such as page views, bounce rate, session duration, geolocation of users, and conversions. These deviations in metrics can create a lot of frustration for the site owner; it is very hard to measure the performance of a site that’s being flooded with bot activity. Attempts to improve the site, such as A/B testing and conversion rate optimization, are also crippled by the statistical noise created by bots.

How to filter bot traffic from Marfeel

Marfeel supports discarding bot and synthetic traffic created by good citizen bots that identify themselves as bots via a user agent.

How to validate the source of an IP

When identifying the source of an IP address you can determine whether and IP belongs to an ISP or a Hosting/Cloud provider (where bots are normally hosted) by using:

Command line tools like whois and host
Equivalent web versions such as DomainTools and IPLocation )

You have to check the Organization field, where names like AWS, Hetzner, OVH, etc., indicate that the IP is likely assigned to a server, possibly indicating a bot. If the organization is an ISP like AT&T, 02, Vodafone, Orange, etc., it is more probable that the requests are coming from real users.

Once you’ve decided the offending IPs you will have to blacklist them using one of the solutions below.

How can websites manage bot traffic?

There are a number of tools that can help mitigate abusive bot traffic:

The first step to stopping or managing bot traffic to a website is to include a robots.txt file. This is a file that provides instructions for bots crawling the page, and it can be configured to prevent bots from visiting or interacting with a webpage altogether. But it should be noted that only good bots will abide by the rules in robots.txt; it will not prevent malicious bots from crawling a website.
The easiest and most effective way to stop bad bot traffic is with a bot management solution. A bot management solution can leverage intelligence and use behavioural analysis to stop malicious bots before they ever reach a website. Cloudflare Bot Management, Datadome Bot Protection Software, Fastly Bot Management or AWS WAF are some state-of-the-art products that proactively identify and stop bot abuse.
A rate limiting (alternative) solution can detect and prevent bot traffic originating from a single IP address, although this will still overlook a lot of malicious bot traffic.
Implement CAPTCHA with Bot Detection: Deploying CAPTCHA challenges with bot detection mechanisms can help differentiate between human users and bots. This simple step can deter most automated attacks. Some options could be this or this.
Utilize Web Application Firewalls (WAFs): WAFs provide an additional layer of defense by inspecting incoming traffic and filtering out suspicious requests. They can block known bot IP addresses and patterns, reducing the risk of bot-related incidents. It is a usual functionality on your CDN provider (this and this) or you can install in your webservers as an external module.
Monitor Traffic Patterns: Regularly monitoring website traffic patterns can help identify unusual spikes or suspicious activities indicative of bot presence. A Marfeel Operator can look at a site’s traffic and identify suspicious network requests, providing a list of IP addresses to be blocked by a filtering tool such as a WAF. This is a very labor-intensive process and still only stops a portion of the malicious bot traffic.