OpenISearch is Even Worse Than SMBot
by Scott AllenNew Scraper Bot, OpenISearch, is Related to SMBot, but Even Worse
There is a new scraper bot with the User-Agent: “OpenISearch/1.x (www.openisearch.com)” started slamming my web sites and my clients’ web sites just a few days ago. It originated from the same IP blocks (216.182.224.*-216.182.239.*) and same server at Amazon Web Services (compute.amazonaws.com) as SMBot, which is owned and operated by Specific Media. OpenISearch started repeatedly hitting the robots.txt file over and over, several hundred times per hour. I have no idea why it would be necessary to hit robots.txt more than once if it had legitimate purposes. (It clearly does not have legitimate purposes.) Then it started hitting various other pages in my sites, repeatedly. The first day, it hit one web site over 750 times!
I went to the URL given in the User-Agent, www.OpenISearch.com, and it’s nothing but a front. It claims to be “The Ultimate Search Engine”, that will have “more results than all other search engines combined”. Wow, that’s a HUGE claim…Are they planning to unseat Google, Yahoo, and MSN? Right. And yet none of the links on the page are even working — it claims to be “Coming Soon”.
I can’t say with absolute certainty, but there is a very high probability that OpenISearch is either the little brother of SMBot or it’s replacement. Since I blogged about it, SMBot completely quit hitting my sites and OpenISearch picked up where it left off, slamming my sites. Here are some interesting things OpenISearch has in common with SMBot:
- OpenISearch has the same format for the User-Agent:
- OpenISearch User-Agent: “OpenISearch/1.x
(www.openisearch.com)” - SMBot User-Agent: “SMBot/1.1
(www.specificmedia.com)”
- OpenISearch User-Agent: “OpenISearch/1.x
- OpenISearch and SMBot both come from the same IP blocks (216.182.224.*-216.182.239.*) and server at Amazon Web Services (compute.amazonaws.com).
- The web sites are a very similar design style.
- The domains are both registered to “Domains by Proxy”.
OpenISearch bot is being used to scrape complete web sites, repeatedly. In the last couple days it hit several of my sites 500-750 times each, in one day! OpenISearch is using this bot to data-mine your site for information that can be used by their advertisers and advertising network. They are making a profit off your hard work! Without your permission of course. Can someone say Digital Millennium Copyright Act violations? OpenISearch completely disregards Robots.txt standards as well.
How to Ban OpenISearch and Keep Them from Data-Mining
Your Web Site
All you need to do is add a few lines of code to your .htaccess file, and you’ll be blocking the OpenISearch bot in no time.
Place this near the top of your .htaccess file: RewriteEngine on
Then place this somewhere below it:
RewriteCond %{HTTP_USER_AGENT} ^OpenISearch [NC]
RewriteRule .* - [F]
SetEnvIf Remote_Addr ^216.182.2(2[4-9]|3[0-9]). openisearch
<Files>
Order Allow,Deny
Allow from all
Deny from env=openisearch
Deny from compute.amazonaws.com
</Files>
This .htaccess technique will also work to to block other bad visitors to your web site. Feel free to leave comments.
Tags:
OpenISearch | SMBot | Specific Media | WebGeek
If you enjoyed this post, make sure you subscribe to the RSS feed!
Related Posts:
About This Entry
You’re currently reading “OpenISearch is Even Worse Than SMBot,” an entry on WebGeek
- Published:
- 02.02.07 / 2pm
- Category:
- .htaccess, Bad Bots, Data-Mining, Site Scrapers, Website Security
- Related Posts:
- Ban SMBot - Specific Media is Data-Mining Your Site
- Cyber-Surveillance and Internet Data-Mining
- Say No to Technorati’s Forced Upgrades - Bad Information Spreads Like Wildfire
- RSS Feeds:
- Subscribe to Blog
- Subscribe to Comments
- WordPress Plugins:
- WP-SpamFree: Blog Anti-Spam
- About Us:
- Hybrid6 Studios is a
web design and SEO firm
based in Los Angeles, CA.- Hybrid6 Studios is a






2 Comments
Jump to comment form | comments rss [?] | trackback uri [?]