Got a persistent scraper? Here’s how to deal with them, permanently!

I received an email from one of my favorite Wordfence users who runs a popular blog and creates really great original content. She has a problem with a persistent scraper who comes in and steals her content within minutes of her publishing a new article.

Here’s how I’m suggesting she deal with the issue: [You must be running WordPress with Wordfence for this to work]

  • Don’t block by IP address because the scraper can change IP address fairly easily.
  • Don’t block their country. Even though Wordfence has this feature it’s no use if the scraper is in the same country as your readers.
  • Instead watch your live traffic and get the visitors IP address.
  • Then click the “Block this Network” option that appears under their visit.
  • This will automatically run a “Whois” on the IP address and Wordfence will tell you which network the scraper is on.
  • In the Whois data you’ll see red links that tell you the network range, how many IP addresses are in the network and you can click the red link to block that network.
  • Click it and you’ll be taken to Advanced Blocking. [Make sure you don’t see a message at the top that says “Firewall is off”. If you do, go to your options and turn your firewall on.]
  • Now you’re ready to block their network. The range will already be filled in and you’ll see green text that tells you how many IP addresses are in the address range you’re going to block.
  • Now hit “Block visitors matching this pattern” and that’s it! You’ve blocked the scrapers network. Now even if they change their IP address, as long as they’re coming from the same corporate network or home or business network provider they will be blocked.

Using this method gives you a much better shot at blocking scrapers. If they’re using a provider like Comcast at home, they may decide instead to jump onto their cellphone and get your content that way, so keep a good lookout and block any other networks they appear to come from.

You can also use Advanced Blocking to block browser patterns combined with networks. So you could enter something like “*Gecko/20100101 Firefox/23.0*” without quotes. Note the asterisks on either side of that browser pattern. You can combine this with a network.

Using IP blocking, country blocking and browser pattern blocking combined with network blocking and the Whois facility that Wordfence providers you can block someone very effectively.

Remember that you own the content that you create, so if you’d like to bring a case of copyright infringement against and individual, you can use the Wordfence “Whois” facility to find out who the network provider is of a particular visitor. Then you can work with your legal team to subpoena data from a provider demonstrating they were using a certain IP address and taking certain actions on your site. In general I prefer a less litigious approach to resolving conflict, but you should know you have that data at your disposal.

Happy blogging!!

Mark Maunder – Wordfence creator and Feedjit Inc. CEO.

Did you enjoy this post? Share it!

Comments

4 Comments
  • Hi Mark,

    Thanks for the heads up and details of how to deal with these people.

    I know a couple of people that'll be interested in reading this post as I read about them having this exact issue just this week.

    Excellent tutorial, thanks again Mark,
    Barry

  • Mark - I've used your technique above to try to block the networks of several persistent and pernicious scrapers/impersonators but all I get is the msg:
    Sorry, but no data for that IP or domain was found.
    Obviously, the offending scrapers/impersonators are smart enough to not be traceable. How do you handle these?

    • You can manually block the range of IP's you're seeing the hacker/scraper originate from.

      Regards,

      Mark.

  • When it runs the WHOIS search automatically, There are 3 red links:
    % Information related to
    % Abuse contact for
    inetnum

    The IP range is the same. Does it matter which red link we click? Can you elaborate on that in your article, simply telling us which or that it doesn't matter.