Rate Limiting allows you to limit how many pages visitors and automated crawlers can access on your website per minute. If they exceed the limits that you have specified, they will temporarily have their access revoked and will receive a message saying their access to your site has been temporarily limited and they should try again in a few minutes.
Enable Rate Limiting and Blocking
This option lets you enable or disable the rate limiting and blocking features. If this option is OFF, all blocking including the options below are disabled.
How should we treat Google’s crawlers
Google crawlers are special. Usually, you want Google’s crawlers to visit and index your site without interruption, and you want to ensure that they have unlimited access to your site. So we have created this option so that you can ensure that Google is treated differently and given greater access than normal site visitors.
Verified Google crawlers have unlimited access to this site
If you would like to use a strict setting, you can set this to only give verified Google crawlers unlimited access to the site. This uses a reverse DNS lookup to verify that a visitor claiming to be a Google crawler is actually who they say they are. If a visitor arrives pretending to be Google by faking a Googlebot header, they won’t have unlimited access because they will fail the reverse lookup (PTR) test.
Anyone claiming to be Google has unlimited access
This option gives unlimited access to any visitor that has a Googlebot User-Agent header identifying them as a Google crawler. This will ensure that Google is never rate-limited on your website and can consume as much content as it likes. However, if a visitor claims to be Google by changing their User-Agent header to emulate Googlebot, they will also have unlimited access.
Treat Google like any other Crawler
We do not recommend you use this option unless you have very loose (not strict) settings in your rate limiting. If you treat Google like any other crawler and you are limiting the number of requests per minute to a low number, you may temporarily block Google from crawling your site. Note that the default HTTP response status code when someone is blocked is to return a 503 “temporarily unavailable” response. So if you do accidentally block Googlebot, you are telling it to “come back later” rather than “go away permanently” so the damage to your search engine optimization is not as great as it might otherwise be.
If anyone’s requests exceed…
This is a global limit on all requests. If anyone breaks this limit, they will receive the Wordfence HTTP 503 temporarily unavailable response with a user-friendly explanation. If you have given Googlebot special treatment using the options above, then this limit does not apply to Googlebot. In general, 240 per minute is a good global request per minute setting which allows even fast (but friendly) crawlers to access your site without overloading it. That is 4 requests per second, which crawlers like Bing can easily generate. If they try to crawl your site faster than that, they will be given an HTTP 503 response which has the effect of telling them to slow down. Use the “throttle” option in most cases, which will rate limit rather than block crawlers and visitors.
If you see the message “Very strict. May cause false positives” then your settings are more strict than we recommend. Strict settings may cause legitimate users to get blocked, depending on how the site is built and whether plugins or themes cause additional requests for a single page view. This message applies to each type of rate limiting listed here.
If a crawler’s page views exceed…
If we detect a visitor is not a human and is a bot that is crawling your site, then this limit will apply. This is very useful to limit the amount of traffic robots can generate on your site. However some good robots tend to crawl your site quickly, so setting this to 240 per minute is a good setting unless you are having a problem with robots overloading your site. Use the “throttle” option in most cases, which will simply rate limit crawlers.
If a crawler’s pages not found (404s) exceed…
If your site is well-configured and designed, then you can set this as low as 30 per minute, or even 15 per minute. If a crawler is generating many resource “not found” (404 response status code) errors on a well-configured website, then they are usually not a friendly crawler. For example, they may be scanning your site for vulnerabilities, and you may want to block them. Setting this to 30 per minute and using the “block” rather than “throttle” option is an effective way to block IP addresses that are scanning your site for vulnerabilities.
However, if your site is not well designed or configured, then it may, during the normal course of operations, experience many resource not found errors. For example, if you include many images that do not exist in your site pages, then your pages will generate a lot of 404 errors on your site. Those 404 errors can cause Wordfence to block the crawler that is crawling a page if they exceed the limit that you have set. So before setting this to a low number and setting the action to “block”, make very sure that you do not get a lot of resource not found (404) errors on your site during normal operations. One way to do this is to look at your browser error log or console which often displays 404 errors on a page in red.
If a human’s page views exceed…
If we detect a visitor is human, then this limit will apply. What you set this limit to depends on your website. In general, we recommend you keep this high, especially if you are using AJAX on your website. 240 per minute is a healthy setting, unless you have many static pages with no AJAX and are sure that the normal traffic pattern that humans generate on your site is much lower.
If a human’s pages not found (404s) exceed…
If your site is well-configured and well-designed, then you can set this as low as 30 per minute, or even 15 per minute.
However, if your site is not well designed or configured, then it may, during the normal course of operations, experience many resource not found (404) errors. For example, if you include many images that do not exist in your site pages then your pages will generate a lot of 404 errors on your site. Those 404s can cause Wordfence to block the visitor who is viewing a page if they exceed the limit that you have set. So before setting this to a low number and setting the action to “block”, make very sure that you do not get a lot of resource not found (404) errors on your site during normal operations. One way to do this is to look at your browser error log or console which often displays 404 errors on a page in red.
How long is an IP address blocked when it breaks a rule
Remember that there are two different actions you can choose from when someone breaks a firewall rule.
You can “block” them, which immediately removes access from the site for a predetermined amount of time, defined by the setting “How long is an IP address blocked when it breaks a rule”.
Or, you can “throttle” them, which means that their site access will be temporarily blocked until they reduce their request frequency to below the limit that you have set.
The option “How long is an IP address blocked when it breaks a rule” controls how long an IP is blocked if you have set the option to “block”. We use a duration of between 5 minutes to one hour on our own production sites. This is enough time to limit the malicious activity an IP can be engaged in. The duration you set is entirely up to you. If you would like to be very aggressive, you can set the duration to 24 hours or longer, but it is important to note that IP addresses are dynamically assigned on the internet. So if you block someone using a certain IP address, they may switch to using a different IP address in a day or two, and a new user who is not engaging in malicious activity but who is now assigned the IP you have blocked may now be prevented from accessing your site if you have set this duration very long.
Allowlisted 404 URLs
URLs in this allowlist, that generate a 404 not found response status code, will not be counted against the throttling rules that are used to limit crawlers. The default list includes site icons like “favicon.ico” or “apple-touch-icon.png” and retina versions of images following the pattern “/*@2x.png”, which some browsers request even if they are not listed in your site’s HTML. Enter one URL per line. URLs should begin with a slash, such as “/favicon.ico”.