Rate Limiting lets you limit how many pages visitors and automated crawlers can access on your website per minute. If they exceed the limits you’ve specified, they will temporarily have their access revoked and will receive a message saying their access to your site has been temporarily limited and they should try again in a few minutes.
Enable Rate Limiting and Blocking
This option lets you enable or disable the rate limiting and blocking features. If this option is OFF, all blocking including the options below are disabled.
Immediately block fake Google crawlers:
If you are having a problem with people stealing your content and pretending to be Google as they crawl your site, then you can enable this option which will immediately block anyone pretending to be Google.
The way this option works is that we look at the visitor User-Agent HTTP header which indicates which browser the visitor is running. If it appears to be Googlebot, then we do a reverse lookup on the visitor’s IP address to verify that the IP does belong to Google. If the IP is not a Google IP, then we block it if you have this option enabled.
Be careful about using this option, because we have had reports of it blocking real site visitors, especially (for some reason) legitimate visitors from Brazil. It’s possible, although we haven’t confirmed this, that some Internet service providers in Brazil use transparent proxies that modify their customers’ user-agent headers to pretend to be Googlebot rather than the real header. Or it may be possible that these providers are engaging in some sort of crawling activity pretending to be Googlebot using the same IP address that is the public IP for their customers. Whatever the cause is, the result is that if you enable this you may block some real visitors.
How should we treat Google’s crawlers
Google crawlers are special. Usually you want Google’s crawlers to visit and index your site without interruption, and you want to ensure that they have unlimited access to your site. So we have created this option so that you can ensure that Google is treated differently and given greater access than normal site visitors.
Verified Google crawlers have unlimited access to this site
If you would like to use a strict setting, you can set this to only give verified Google crawlers unlimited access to the site. This uses a reverse DNS lookup to verify that a visitor claiming to be a Google crawler is actually who they say they are. If a visitor arrives pretending to be Google by faking a Googlebot header, they won’t have unlimited access because they will fail the reverse lookup (PTR) test.
Anyone claiming to be Google has unlimited access
This option gives unlimited access to any visitor that has a Googlebot User-Agent header identifying them as a Google crawler. This will ensure that Google is never rate-limited on your website and can consume as much content as it likes. However, if a visitor claims to be Google by changing their user-agent header to emulate Googlebot, they will also have unlimited access.
Treat Google like any other Crawler
We do not recommend you use this option unless you have very loose (not strict) settings in your rate limiting. If you treat Google like any other crawler and you are limiting the number of requests per minute to a low number, you may temporarily block Google from crawling your site. Note that the default HTTP response when someone is blocked is to return a 503 temporarily unavailable response. So if you do accidentally block Googlebot, you are telling it to “come back later” rather than “go away permanently” so the damage to your SEO is not as great as it might otherwise be.
If anyone’s requests exceed…
This is a global limit on all requests. If anyone breaks this limit, they will receive the Wordfence HTTP 503 “temporarily unavailable” response with a user-friendly explanation. If you have given Googlebot special treatment using the options above, then this limit does not apply to Googlebot. In general, 240 per minute is a good global request-per-minute setting which allows even fast (but friendly) crawlers to access your site without overloading it. That is 4 requests per second, which crawlers like Bing can easily generate. If they try to crawl your site faster than that, they will be given an HTTP 503 response which has the effect of telling them to slow down. Use the “throttle” option in most cases, which will rate limit rather than block crawlers and visitors.
If you see the message “Very strict. May cause false positives.” then your settings are more strict than we recommend. Strict settings may cause legitimate users to get blocked, depending on how the site is built and whether plugins or themes cause additional requests for a single pageview. This message applies to each type of rate limiting listed here.
If a crawler’s page views exceed…
If we detect a visitor is not a human and is a bot that is crawling your site, then this limit will apply. This is very useful to limit the amount of traffic robots can generate on your website. However some good robots tend to crawl your site quickly, so setting this to 240 per minute is a good setting unless you’re having a problem with robots overloading your site. Use the “throttle” option in most cases, which will simply rate limit crawlers.
If a crawler’s pages not found (404s) exceed…
If your site is well-configured and designed, then you can set this as low as 30 per minute, or even 15 per minute. If a crawler is generating many page-not-found errors on a well-configured website, then they are usually up to no good. For example, they may be scanning your website for vulnerabilities, and you may want to block them. Setting this to 30 per minute and using the “block” rather than “throttle” option is an effective way to block IP addresses that are scanning your site for vulnerabilities.
However, please read the following caveat: If your site is NOT well designed or configured, then it may, during the normal course of operations, experience many page not found (404) errors. For example, if you include many images that don’t exist in your web pages, then your pages will generate a lot of 404 errors on your site. Those 404s can cause Wordfence to block the crawler who is crawling a page if they exceed the limit you’ve set. So before setting this to a low number and setting the action to “block,” make very sure that you don’t get a lot of page not found (404) errors on your site during normal operations. One way to do this is to look at your browser error log or console which often displays 404 errors on a page in red.
If a human’s page views exceed…
If we detect a visitor is human, then this limit will apply. What you set this limit to depends on your website. In general, we recommend you keep this high, especially if you are using AJAX on your website. 240 per minute is a healthy setting, unless you have many static pages with no AJAX and are sure that the normal traffic pattern that humans generate on your site is much lower.
If a human’s pages not found (404s) exceed…
If your site is well-configured and well-designed, then you can set this as low as 30 per minute, or even 15 per minute.
However, please read the following caveat: If your site is NOT well designed or configured, then it may, during the normal course of operations, experience many page not found (404) errors. For example, if you include many images that don’t exist in your web pages then your pages will generate a lot of 404 errors on your site. Those 404s can cause Wordfence to block the visitor who is viewing a page if they exceed the limit you’ve set. So before setting this to a low number and setting the action to “block,” make very sure that you don’t get a lot of page not found (404) errors on your site during normal operations. One way to do this is to look at your browser error log or console which often displays 404 errors on a page in red.
How long is an IP address blocked when it breaks a rule
Remember: there are two different actions you can choose from when someone breaks a firewall rule.
You can “block” them, which immediately removes access from the site for a predetermined amount of time, defined by this setting: “How long is an IP address blocked when it breaks a rule.”
Or, you can “throttle” them, which means that their site access will be temporarily blocked until they reduce their request frequency to below the limit you have set.
The option “How long is an IP address blocked when it breaks a rule” controls how long an IP is blocked if you have set the option to “block.” We use a duration of between 5 minutes to one hour on our own production websites. This is enough time to limit the malicious activity an IP can engage in. The duration you use is entirely up to you. If you would like to be very aggressive, you can set the duration to 24 hours or longer, but it is important to note that IP addresses are dynamically assigned on the Internet. So if you block someone using a certain IP address, they may switch to using a different IP in a day or two, and a new user who is not engaging in malicious activity but who is assigned the IP you have blocked may now be prevented from accessing your site if you have set this duration very long.
Whitelisted 404 URLs
URLs in this whitelist, that generate a 404 Not Found response status code, will not be counted against the throttling rules that are used to limit crawlers. The default list includes site icons like “favicon.ico” or “apple-touch-icon.png” and retina versions of images following the pattern “/*@2x.png”, which some browsers request even if they are not listed in your site’s HTML. Enter one URL per line. URLs should begin with a slash, such as “/favicon.ico”.