Research: The Top 100 WordPress Page Not Found Errors and What They Mean
Analyzing requests to non-existent pages across a broad range of websites can yield revealing insights into common website misconfigurations, vulnerability scans by malicious and non-malicious actors and other interesting behavior, for example, undocumented crawler behavior.
As part of the Wordfence security network, we monitor page not found errors (also called 404 errors) across WordPress.org websites that choose to participate in the Wordfence Security Network (WfSN). The WfSN auto-blocks IP’s from WordPress sites protected by Wordfence that are known to be currently engaged in malicious activity like vulnerability scanning or brute force hacking.
You can see a visual representation of some of the data we aggregate in real-time on our home page.
To facilitate this project, we temporarily aggregated anonymous data over 5 days within the past week of all page not found errors across all sites in the WfSN. This included:
- 26,140,811 page not found reports.
- Comprising 13,549,207 unique URL’s.
- Aggregated from across approximately 30,000 unique WordPress sites. This is a rough estimate because we did not store site identifiers.
We then sorted and grouped the results by the most commonly occuring URL’s and we removed any URL’s that had any likelihood of identifying a source website. The results were surprising.
You can find the full top 100 page-not-found URL’s for WordPress on this Google Docs public spreadsheet. We have marked the interesting URL’s in red. All entries have been annotated to describe what they are.
The pie chart below gives a graphical representation of the relative frequency of each URL within the top 100. This gives you a sense of how dominant the top URL’s are.
We hypothesize that the majority of the security scans listed in the spreadsheet and below are not security auditing tools because there are no other vulnerability scans with an equal and equally high number of 404 occurrences, which would be the case if a security scanning tool was scanning for multiple vulnerabilities.
The highlights we found are:
- By far the most common 404 error is generated when an Apple or Android device visits a page and tries to find a bookmark icon. The most common 404 generated when this happens is: /apple-touch-icon.png, although there are several variants and in our annotated spreadsheet we have noted what each variant is. If you run a WordPress site, get yourself an /apple-touch-icon.png file that is an icon for your site that mobile devices can use to display a bookmark icon for your site on the device home page and elsewhere. Place it in your root website folder. More info here.
- By far the most common security related page not found error generated is malicious bots scanning several different URL’s for a file upload vulnerability in the CKEditor text editor. We saw a total of 45,841 vulnerability scans using 5 URL variants in the top 100. If you sum together the 5 URL’s scanned it puts this vulnerability scan in the 7th most commonly occurring page not found error for WordPress sites.
- Surprisingly, in second place when considering scans by malicious actors (bad bots, human hackers and the like) for a specific vulnerability was a scan for the “Geo Places” commercial theme. Version 4.x of this theme contains a shell upload vulnerability. We saw over 14,000 page-not-found errors generated as bad bots were scanning for the existence of the URL: /wp-content/themes/GeoPlaces/monetize/general/
- The /author=1 scan is high on the list. This is caused by malicious bots scanning your site to discover your username. The newest version of Wordfence includes protection against this.
- The /groups/create 404 is probably malicious scripts looking for a BuddyPress vulnerability.
- One of the most popular scans is a Joomla vulnerability scan where bad robots are looking for a Joomla writeToFile.php vulnerability, so make sure if you’re running Joomla, that it is up to date. The top URL is: /components/com_oziogallery/imagin/scripts_ralcr/filesystem/writeToFile.php
- We also saw over 5000 scans for the tell-a-friend WordPress plugin which is caused by spam bots trying to exploit tell-a-friend to send spam. An example of this plugin being exploited. The URL is: /wp-content/plugins/tell-a-friend/tell-a-friend.php
- There are also several variants in the list of a scan for the phpMyAdmin setup.php script. If you have this script present and are running a vulnerable version, an attacker can use this to execute arbitrary PHP code on your system, giving them full access. A common URL for this is /phpMyAdmin/scripts/setup.php
- Another interesting URL is the
/th1s_1s_a_4o4.html URL which is a URL that google uses to see how your site responds to 404 errors when they crawl your site.
- Another interesting page-not-found we saw in the top 100 is for the /HNAP1/ URL which is actually an exploit used on D-Link routers. This allows an attacker to access administrative functions on a home D-Link router and presumably install packet sniffing software to monitor your web browsing. The exploit inadvertently causes 404 errors on websites that target victims are visiting. If you’re a home wifi user (most of us are) make sure your local router is current and running the latest firmware.
You can check out the full list on this spreadsheet. This data is published under the creative commons attribution license which means you can republish the list and any part (or all) of this page, but you must credit us by providing a link to this blog post as the source. See below.
We encourage other security researchers to comment on the data we’re gathering and to build detection into your products for these common vulnerability scans to detect and thwart malicious actors. Update: Shortly after posting this, Bob Rudis, co-author of Data Driven Security posted a graphical breakdown by category of the data we’ve published. Post your own thoughts and analysis as a comment here.
Top 100 Page Not Found errors for WordPress by Wordfence is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at http://www.wordfence.com/blog/2014/05/top-100-page-not-found-errors-for-wordpress/.