Yesterday I received an email saying that my site was down. I checked my server and it was indeed down, and the server was having abnormally high CPU usage from Apache. There was a resource-intensive website (the National Rail timetable) hosted on the machine but I didn’t expect high usage.
I tried disabling the resource-intensive website and the other sites on the host became responsive again, but once I enabled it back on, the whole host went unresponsive again, with a large amount of requests flooding my website, asking for lots of timetables in the past.
I installed the QoS module and soon it showed the number of connections on my server went to the hundreds. I then set it to a low limit (with maximum 5 connections from a single IP, and maximum 32 concurrent requests for the timetable site) but it didn’t help much to make the server responsive again.
After further checking the server log, it seemed that the majority of requests came from ClaudeBot and Bytespider, with a minority from other bots as well. These two bots are well known to be bad bots as evidenced from a web search so the best thing I could do was to block them.
After blocking them in Apache config, everything became responsive again. I will not hesitate blocking bots again in the future if they are known to be bad.
For those who are interested, the code I added was (if used in .htaccess, the Directory container isn’t needed – change it as required for the scope of blocking):
# Deny bad bots
BrowserMatchNoCase "Bytespider" bad_bot
BrowserMatchNoCase "ClaudeBot" bad_bot
<Directory />
Order Deny,Allow
Deny from env=bad_bot
</Directory>
P.S. The following table shows the number of requests on the timetable website from various bots yesterday:
Bot | Total requests in 1 day |
ClaudeBot | 189965 |
Bytespider | 18470 |
Amazonbot | 14082 |
MJ12bot | 4717 |
SemrushBot | 3967 |
bingbot | 3588 |
Googlebot | 1567 |
PetalBot | 454 |
DotBot | 85 |