Deny access to user agents using .htaccess

Call Us 0800 107 7979

Service Status

Back to knowledgebase

17 December 2021

A common nuisance for website owners are useless bots and crawlers. In particular so-called SEO bots can be a pain. They have a habit of eating up all your website’s resources by making an excessive number of hits, and you get nothing in return. Some of these bots look for a robots.txt file before they start hitting your website, but that is of little help if your website is attacked by a bot you didn’t know about.

Blocking bots

You can quickly stop a bot in its tracks via your website’s .htaccess file. For instance, earlier today I found a bot called DataForSeoBot that was grinding a website to a halt. The bot used this user agent:

"Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)"

The following rule returns an error 403 (“forbidden”) if the user agent contains the (case-insensitive) string “dataforseobot”:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "dataforseobot" [NC]
RewriteRule "^.*$" - [F,L]

The rule is similar to the rules I used in the article about denying access to URLs. It again uses Apache’s mod_rewrite module. The main difference is that the rule matches a user agent (%{HTTP_USER_AGENT}) rather than a URL (%{REQUEST_URI}).

So, the rewrite condition checks if the user agent includes the string dataforseobot, and the NC flag ignores the case. It is worth noting that the double quotes around the string are redundant in this example – you only need them if the string you want to match contains one or more spaces.

Next, the rewrite rule matches any string ("^.*$") and the F flag returns an error 403. The L flag tells Apache to not process any other rules in the .htaccess file.

Matching multiple user agents

You can use a simple regular expression to match multiple user agents. For instance, another naughty bot I encountered recently identified itself as “trendkite-akashic-crawler”. To match both the DataForceSeoBot and the Trendkit crawler you can use this rule:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "(dataforseobot|trendkite-akashic)" [NC]
RewriteRule "^.*$" - [F,L]

Test your rules

As said, the above rules return an error 403 if the user agent is matched. To check if your rules are working you can therefore look for the user agent in your website’s access log. Here, you can see the bot tried to access /foo.html and that Apache returned an error 403:

1.2.3.4 - - [13/Dec/2021:13:59:06 +0000] "GET /foo.html HTTP/1.1" 403 0 "-" "Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)"

If you have access to cURL then you also check your rules by spoofing the user agent:

$ curl -IL -A "Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)" http://example.com/
HTTP/1.1 403 Forbidden
...

The curl command uses three options:

-I returns just the server’s response headers, which includes the status code. It doesn’t download the web page.
-L makes cURL follow any redirects, such as a redirect from HTTP to HTTPS.
-A specifies the user agent that is sent to the server. This is what allows you to spoof the user agent.

robots.txt vs block rules

Ideally, you only need to tell what bots are and aren’t allowed to crawl your website via a robots.txt file. In practice, this approach doesn’t really work. There are too many bots, and new bots are let loose all the time. Keeping track of them quickly becomes a full time job.

There is the option to only allow specific bots. However, there are many bots that check if the bot is explicitly denied or allowed, and follow whatever the rule is for the Googlebot if the bot is not listed in the robots.txt file. This effectively gives them carte blanche, as very few websites deny the almighty Googlebot. Plus, there are also lots of bots that simply ignore robots.txt files.

In short, blocking naughty bots is a sensible approach. The bots will still try to crawl your website, but they are always denied access. Returning an error 403 uses hardly any resources, and the bots can therefore no longer cause your website to slow down.

Featured Blogs

Identifying Common Server Issues and How to Avoid Them

22nd July 2024

By catalyst2 Team

To maintain a smooth operation of any online business or digital service you need a server that is efficient and that you can rely on. Here at catalyst2 we understand the challenges that businesses face daily; purely to keep their business alive so the challenges that are faced in addition to this when server issues …

Read Article

Why Server Backups are a Safety Net in the Digital World

15th July 2024

By catalyst2 Team

Data is the foundation for businesses as it provides insights into customer behaviour and trends, as well as business performance and efficiency. So, businesses have the tools they need to make informed decisions and plan strategically. For businesses that are just starting out or that are smaller, data is a catalyst to help identify growth …

Read Article

What are Flexible Servers and When are They Beneficial?

10th July 2024

By catalyst2 Team

The word ‘server’ is frequently used in everyday discussions in IT departments within organisations, but unless you work in the tech industry, you might not be familiar with what it really means. A server is the backbone of all digital operations. It is there to manage tasks like data storage, processing requests, delivering content across …

Read Article

Why Server Uptime Matters for Your Business Website

4th July 2024

By catalyst2 Team

While many businesses still have physical workspaces or brick-and-mortar stores, the importance of a digital presence shouldn’t be overlooked. In addition to using things like social media channels to boost your brand awareness and interact with your customers, having a website offers several benefits. In today’s digital world, the performance of your website is crucial …

Read Article

How to Protect Your Website Against Digital Disasters

27th June 2024

By catalyst2 Team

More so than ever before, having a high-performing website is essential to the ongoing success of a business. Regardless of which industry you operate in, a website is a powerful tool that can improve brand recognition, drive growth and support customer engagement. Not to mention, it allows for online sales and can remove geographical barriers …

Read Article

Does a Server Impact Website Traffic?

23rd May 2024

By catalyst2 Team

Regardless of which industry sector you operate in or what type of products and services you offer, having a website is non-negotiable in today’s digital world. More so than ever before, people turn to the internet to find businesses that can assist them with their specific needs and if you don’t have a website, you …

Read Article

A Beginner’s Guide to Managing Your Website Server

17th May 2024

By catalyst2 Team

When it comes to website management, there is a key component that business owners often overlook; server management. Your website server plays a crucial role in your online presence, it facilitates the delivery of your web pages to users and without a server, your website simply wouldn’t be visible online. So, it’s crucial to ensure …

Read Article

Exploring the Costs Associated With a Website

14th February 2024

By catalyst2 Team

In the digital age, getting a website is more than just a way to build an online presence, it’s a vital business investment. Not only do large retail businesses need a website, but smaller local businesses can also benefit from taking their business online. Recognising this importance, many businesses are now dedicating significant resources towards …

Read Article

The Importance of Dedicated Servers for E-Commerce Websites

26th January 2024

By catalyst2 Team

In the fast-paced world of e-commerce, having a reliable website is an essential factor in success. Whether you also have a brick-and-mortar store or solely a digital shop, ensuring your website is performing at its best can have a direct impact on your bottom line. Factors such as website uptime, ease of navigation and speed …

Read Article

Things to Consider During a Website Review

10th January 2024

By catalyst2 Team

Designing a website requires careful consideration and several reviews during the development stages to ensure everything is perfect. Once live, it can be easy to assume that no further updates are needed and your website will continue to perform well, however, this isn’t always the case. Regularly reviewing your website is essential to ensure its …

Read Article

What our clients say

Great real person support – direct phone number, usually the same individual so any problems are handled by the same people. Excellent.

Daniel Chandler, Vindico UK Ltd

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

Necessary

Always Enabled

Functional

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Performance

Analytics

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_5562310_11	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Others

Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.

Cookie	Duration	Description
_ashkii	session	No description available.
_wicasa	3 months	No description available.
AnalyticsSyncHistory	1 month	No description
cookid	3 months	No description available.
cookietest	session	No description
crisp-client/domain-detect/1644827320973	session	No description
crisp-client/domain-detect/1644827348275	session	No description
crisp-client/domain-detect/1644827428415	session	No description
crisp-client/domain-detect/1644827479357	session	No description
crisp-client/domain-detect/1644827596454	session	No description
crisp-client/domain-detect/1644827724838	session	No description
crisp-client/domain-detect/1644827824383	session	No description
crisp-client/domain-detect/1644827878659	session	No description
crisp-client/domain-detect/1644828716243	session	No description
crisp-client/domain-detect/1644828846246	session	No description
crisp-client/domain-detect/1644829369013	session	No description
crisp-clientsession30cc6953-ebcf-4bc6-b649-c44eb446409e	6 months	No description
dbmFP	3 months	No description available.
dbmPK	3 months	No description available.
li_gc	2 years	No description