{"id":287,"date":"2024-09-30T00:45:54","date_gmt":"2024-09-30T00:45:54","guid":{"rendered":"https:\/\/miklcct.com\/wordpress\/?p=287"},"modified":"2024-09-30T01:11:09","modified_gmt":"2024-09-30T01:11:09","slug":"claudebot-and-bytespider-you-are-not-welcome","status":"publish","type":"post","link":"https:\/\/miklcct.com\/wordpress\/2024\/09\/30\/claudebot-and-bytespider-you-are-not-welcome\/","title":{"rendered":"ClaudeBot and Bytespider, you are not welcome"},"content":{"rendered":"\n<p>Yesterday I received an email saying that my site was down. I checked my server and it was indeed down, and the server was having abnormally high CPU usage from Apache. There was a resource-intensive website (the <a href=\"https:\/\/gbtt.uk\/\">National Rail timetable<\/a>) hosted on the machine but I didn&#8217;t expect high usage.<\/p>\n\n\n\n<p>I tried disabling the resource-intensive website and the other sites on the host became responsive again, but once I enabled it back on, the whole host went unresponsive again, with a large amount of requests flooding my website, asking for lots of timetables in the past.<\/p>\n\n\n\n<p>I installed the QoS module and soon it showed the number of connections on my server went to the hundreds. I then set it to a low limit (with maximum 5 connections from a single IP, and maximum 32 concurrent requests for the timetable site) but it didn&#8217;t help much to make the server responsive again.<\/p>\n\n\n\n<p>After further checking the server log, it seemed that the majority of requests came from ClaudeBot and Bytespider, with a minority from other bots as well. These two bots are well known to be bad bots as evidenced from a web search so the best thing I could do was to block them.<\/p>\n\n\n\n<p>After blocking them in Apache config, everything became responsive again. I will not hesitate blocking bots again in the future if they are known to be bad.<\/p>\n\n\n\n<p>For those who are interested, the code I added was (if used in .htaccess, the Directory container isn&#8217;t needed &#8211; change it as required for the scope of blocking):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Deny bad bots\nBrowserMatchNoCase \"Bytespider\" bad_bot\nBrowserMatchNoCase \"ClaudeBot\" bad_bot\n&lt;Directory \/&gt;\n    Order Deny,Allow\n    Deny from env=bad_bot\n&lt;\/Directory&gt;<\/code><\/pre>\n\n\n\n<p>P.S. The following table shows the number of requests on the timetable website from various bots yesterday:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td>Bot<\/td><td>Total requests in 1 day<\/td><\/tr><tr><td>ClaudeBot<\/td><td>189965<\/td><\/tr><tr><td>Bytespider<\/td><td>18470<\/td><\/tr><tr><td>Amazonbot<\/td><td>14082<\/td><\/tr><tr><td>MJ12bot<\/td><td>4717<\/td><\/tr><tr><td>SemrushBot<\/td><td>3967<\/td><\/tr><tr><td>bingbot<\/td><td>3588<\/td><\/tr><tr><td>Googlebot<\/td><td>1567<\/td><\/tr><tr><td>PetalBot<\/td><td>454<\/td><\/tr><tr><td>DotBot<\/td><td>85<\/td><\/tr><\/tbody><\/table><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday I received an email saying that my site was down. I checked my server and it was indeed down, and the server was having abnormally high CPU usage from Apache. There was a resource-intensive website (the National Rail timetable) hosted on&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10],"tags":[],"class_list":["post-287","post","type-post","status-publish","format-standard","hentry","category-network"],"_links":{"self":[{"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/posts\/287","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/comments?post=287"}],"version-history":[{"count":3,"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/posts\/287\/revisions"}],"predecessor-version":[{"id":292,"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/posts\/287\/revisions\/292"}],"wp:attachment":[{"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/media?parent=287"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/categories?post=287"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/miklcct.com\/wordpress\/wp-json\/wp\/v2\/tags?post=287"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}