Research: The Top 100 WordPress Page Not Found Errors and What They Mean

Rationale

Analyzing requests to non-existent pages across a broad range of websites can yield revealing insights into common website misconfigurations, vulnerability scans by malicious and non-malicious actors and other interesting behavior, for example, undocumented crawler behavior.

Methodology 

As part of the Wordfence security network, we monitor page not found errors (also called 404 errors) across WordPress.org websites that choose to participate in the Wordfence Security Network (WfSN). The WfSN auto-blocks IP’s from WordPress sites protected by Wordfence that are known to be currently engaged in malicious activity like vulnerability scanning or brute force hacking.

You can see a visual representation of some of the data we aggregate in real-time on our home page.

To facilitate this project, we temporarily aggregated anonymous data over 5 days within the past week of all page not found errors across all sites in the WfSN. This included:

  • 26,140,811 page not found reports.
  • Comprising 13,549,207 unique URL’s.
  • Aggregated from across approximately 30,000 unique WordPress sites. This is a rough estimate because we did not store site identifiers.

We then sorted and grouped the results by the most commonly occuring URL’s and we removed any URL’s that had any likelihood of identifying a source website. The results were surprising.

The Results

You can find the full top 100 page-not-found URL’s for WordPress on this Google Docs public spreadsheet. We have marked the interesting URL’s in red. All entries have been annotated to describe what they are.

The pie chart below gives a graphical representation of the relative frequency of each URL within the top 100. This gives you a sense of how dominant the top URL’s are.

Screen Shot 2014-05-05 at 7.18.11 PM

We hypothesize that the majority of the security scans listed in the spreadsheet and below are not security auditing tools because there are no other vulnerability scans with an equal and equally high number of 404 occurrences, which would be the case if a security scanning tool was scanning for multiple vulnerabilities.

The highlights we found are:

  • By far the most common 404 error is generated when an Apple or Android device visits a page and tries to find a bookmark icon. The most common 404 generated when this happens is: /apple-touch-icon.png, although there are several variants and in our annotated spreadsheet we have noted what each variant is. If you run a WordPress site, get yourself an /apple-touch-icon.png file that is an icon for your site that mobile devices can use to display a bookmark icon for your site on the device home page and elsewhere. Place  it in your root website folder. More info here. 
  • By far the most common security related page not found error generated is malicious bots scanning several different URL’s for a file upload vulnerability in the CKEditor text editor. We saw a total of 45,841 vulnerability scans using 5 URL variants in the top 100. If you sum together the 5 URL’s scanned it puts this vulnerability scan in the 7th most commonly occurring page not found error for WordPress sites.
  • Surprisingly, in second place when considering scans by malicious actors (bad bots, human hackers and the like) for a specific vulnerability was a scan for the “Geo Places” commercial theme. Version 4.x of this theme contains a shell upload vulnerability. We saw over 14,000 page-not-found errors generated as bad bots were scanning for the existence of the URL: /wp-content/themes/GeoPlaces/monetize/general/ 
  • The /author=1 scan is high on the list. This is caused by malicious bots scanning your site to discover your username. The newest version of Wordfence includes protection against this.
  • The /groups/create 404 is probably malicious scripts looking for a BuddyPress vulnerability.
  • One of the most popular scans is a Joomla vulnerability scan where bad robots are looking for a Joomla writeToFile.php vulnerability, so make sure if you’re running Joomla, that it is up to date. The top URL is: /components/com_oziogallery/imagin/scripts_ralcr/filesystem/writeToFile.php
  • We also saw over 5000 scans for the tell-a-friend WordPress plugin which is caused by spam bots trying to exploit tell-a-friend to send spam. An example of this plugin being exploited. The URL is: /wp-content/plugins/tell-a-friend/tell-a-friend.php
  • There are also several variants in the list of a scan for the phpMyAdmin setup.php script. If you have this script present and are running a vulnerable version, an attacker can use this to execute arbitrary PHP code on your system, giving them full access. A common URL for this is /phpMyAdmin/scripts/setup.php
  • Another interesting URL is the
    /th1s_1s_a_4o4.html URL which is a URL that google uses to see how your site responds to 404 errors when they crawl your site.
  • Another interesting page-not-found we saw in the top 100 is for the /HNAP1/ URL which is actually an exploit used on D-Link routers. This allows an attacker to access administrative functions on a home D-Link router and presumably install packet sniffing software to monitor your web browsing. The exploit inadvertently causes 404 errors on websites that target victims are visiting. If you’re a home wifi user (most of us are) make sure your local router is current and running the latest firmware.

You can check out the full list on this spreadsheet. This data is published under the creative commons attribution license which means you can republish the list and any part (or all) of this page, but you must credit us by providing a link to this blog post as the source. See below.

We encourage other security researchers to comment on the data we’re gathering and to build detection into your products for these common vulnerability scans to detect and thwart malicious actors. Update: Shortly after posting this, Bob Rudis, co-author of Data Driven Security posted a graphical breakdown by category of the data we’ve published. Post your own thoughts and analysis as a comment here.

Creative Commons License
Top 100 Page Not Found errors for WordPress by Wordfence is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at http://www.wordfence.com/blog/2014/05/top-100-page-not-found-errors-for-wordpress/.

Did you enjoy this post? Share it!

Comments

23 Comments
  • Nice work Mark and team.
    Thank you for making this available.
    Should be required reading for WP admins.

    • Thanks Ed! We appreciate the feedback. Stay tuned for more.

      Regards,

      Mark.

  • Thanks for this list. Good info. The ../../../ entry is probably a directory traversal attack signature.

    Thanks
    Rick

    • Hi Rick,

      At first glance it looked like that to me too, but it's actually dashes, not double dots between those slashes.

      Regards,

      Mark.

  • Thanks so much for this! It's such a useful resource, it's clearly explained, and it helped me to understand what's been going on. Much appreciated!

    • Hi Julie,

      Thanks for that feedback - glad to hear it was clear and helpful.

      Regards,

      Mark.

  • I have noted that I get a 404 result for /-/-/-/-/-/-/-/-/-/-/ when I use the plugin "Rename wp-login.php" which "hides" the wp-login page.

    • Very interesting, thanks Ian, we'll investigate further.

      Regards,

      Mark.

      • I was going to say the same thing as Ian did, but he beat me to it. As soon as I installed the “Rename wp-login.php,” plugin, I started seeing the "/-/-/-/-/-/-/-/-/-/-/" pattern on the 404 - File Not Found live traffic tab. The reason I think that this was caused by hiding the wp-login page is that I see this pattern both from the domain itself, e.g. http://www.MyDomain.com/-/-/-/-/-/-/-/-/-/-/ and when the bots are trying to register at http://www.MyDomain.com/wp-login.php?action=register/-/-/-/-/-/-/-/-/-/-/.

        As an aside, although many security experts say that they see no value in hiding the WordPress login page, it has made a huge difference in the security and performance of my blog. While I have never had my blog hacked, I was having a problem with MySQL crashing from the brute force attacks. In fact, I spent a couple of weeks repeatedly restarting MySQL before I found and installed the “Rename wp-login.php" plugin. After hiding the wp-login page (which I did four months ago), I no longer have problems with MySQL crashing. Of course, hiding the login page should only be used as one part of a security strategy because, by itself, it would be of little use in preventing a blog from getting hacked.

  • In order to block the "/author=1" scan, I updated to the latest version of Wordfence... but I can still hit my website's URL with "/author=1" appended and results were returned?

    Is there anything special I'm meant to be doing apart from checking off "Prevent discovery of usernames through '?/author=N' scans" in the options?

    Thanks in advance.

    • Hi Mike.

      Yes that's correct. With author=1 protection OFF your visitor will be redirected to a URL like /authors/yourname

      With author=1 protection ON they will instead be redirected to your site's home page.

      Please test this if you don't mind and confirm that's what you're seeing.

      Regards,

      Mark.

      • Hi Mark,

        I'm not able to get this to work. On or off, I get a 404 on /author=1. Am I missing something here?

        Very informative blog btw. Such a good, well supported and described product for WP admins.

        Gary

        • If your blog runs in a subdirectory or not from the root of your server then this would not have worked. We've introduced a fix in 5.0.7 which just went into beta. See the most recent blog post.

          Regards,

          Mark.

          • Thanks Mark. That is exactly my case, running WordPress from within a subdirectory. Works like a charm since upgrading to the lastest version of Wordfence. Thanks for the quick response and of course for the great plugin! Appreciate it.

    • I also tested the /?author=1 trick and it did expose the name of that author.

      Prevent discovery of usernames through '?/author=N' scans is checked in my Login Security Options.

      I am using Version 5.0.6 of WordFence Security.

      I am using Customizr theme.

      • On a different web site http://ssgreenberg.name/PoliticsBlog/ that is not using Customizr, the Prevent discovery of usernames through ‘?/author=N’ scans works as you state it should.

      • I found that ?/author=1 is prevented, but /?author=1 is not prevented on the the site with customizr theme.

        Either way I do it on the other site, it is prevented.

  • Great information, as usual.

    I'm wondering if there's any way to filter out the spam comments before they even hit Akismet. The plug-in is doing a great job of catching this stuff but I still have to go in and delete them.

    • Hi Ken,

      We work with or without Akismet and will do the same thing that Akismet does: We dump the bad comments into your spam folder if we detect that they contain anything malicious.

      Deleting comments is a bit too aggressive I think in the unlikely event there's a false positive.

      Regards,

      Mark.

  • Great work guys and good timing as I noted this black box WP vulnerability scanner today at wpscan.org. The botnet and DDoS'ing potential of a whole load of compromised WordPress sites is huge.