OpenISearch is Even Worse Than SMBot

by Scott Allen

New Scraper Bot, OpenISearch, is Related to SMBot, but Even Worse
There is a new scraper bot with the User-Agent: “OpenISearch/1.x (www.openisearch.com)” started slamming my web sites and my clients’ web sites just a few days ago. It originated from the same IP blocks (216.182.224.*-216.182.239.*) and same server at Amazon Web Services (compute.amazonaws.com) as SMBot, which is owned and operated by Specific Media. OpenISearch started repeatedly hitting the robots.txt file over and over, several hundred times per hour. I have no idea why it would be necessary to hit robots.txt more than once if it had legitimate purposes. (It clearly does not have legitimate purposes.) Then it started hitting various other pages in my sites, repeatedly. The first day, it hit one web site over 750 times!

I went to the URL given in the User-Agent, www.OpenISearch.com, and it’s nothing but a front. It claims to be “The Ultimate Search Engine”, that will have “more results than all other search engines combined”. Wow, that’s a HUGE claim…Are they planning to unseat Google, Yahoo, and MSN? Right. And yet none of the links on the page are even working — it claims to be “Coming Soon”.

I can’t say with absolute certainty, but there is a very high probability that OpenISearch is either the little brother of SMBot or it’s replacement. Since I blogged about it, SMBot completely quit hitting my sites and OpenISearch picked up where it left off, slamming my sites. Here are some interesting things OpenISearch has in common with SMBot:

  1. OpenISearch has the same format for the User-Agent:
    • OpenISearch User-Agent: “OpenISearch/1.x
      (www.openisearch.com)”
    • SMBot User-Agent: “SMBot/1.1
      (www.specificmedia.com)”
  2. OpenISearch and SMBot both come from the same IP blocks (216.182.224.*-216.182.239.*) and server at Amazon Web Services (compute.amazonaws.com).
  3. The web sites are a very similar design style.
  4. The domains are both registered to “Domains by Proxy”.

OpenISearch bot is being used to scrape complete web sites, repeatedly. In the last couple days it hit several of my sites 500-750 times each, in one day! OpenISearch is using this bot to data-mine your site for information that can be used by their advertisers and advertising network. They are making a profit off your hard work! Without your permission of course. Can someone say Digital Millennium Copyright Act violations? OpenISearch completely disregards Robots.txt standards as well.

How to Ban OpenISearch and Keep Them from Data-Mining
Your Web Site

All you need to do is add a few lines of code to your .htaccess file, and you’ll be blocking the OpenISearch bot in no time.

Place this near the top of your .htaccess file: RewriteEngine on
Then place this somewhere below it:

RewriteCond %{HTTP_USER_AGENT} ^OpenISearch [NC]
RewriteRule .* - [F]

SetEnvIf Remote_Addr ^216.182.2(2[4-9]|3[0-9]). openisearch

<Files>
Order Allow,Deny
Allow from all
Deny from env=openisearch
Deny from compute.amazonaws.com
</Files>

This .htaccess technique will also work to to block other bad visitors to your web site. Feel free to leave comments.

Tags:
| | |

Bookmark or Share with Friends: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Sphinn
  • Digg
  • Reddit


If you enjoyed this post, make sure you subscribe to the RSS feed!


Email This to a Friend Email This to a Friend

Print This Post Print This Post


Related Posts:

  • Ban SMBot - Specific Media is Data-Mining Your Site
  • Cyber-Surveillance and Internet Data-Mining
  • Say No to Technorati’s Forced Upgrades - Bad Information Spreads Like Wildfire


  • About This Entry