Ban SMBot - Specific Media is Data-Mining Your Site

by Scott Allen

Specific Media Data-Mines Web Sites…Does it Steal Content Too?
Specific Media has recently created a bot with the User-Agent: “SMBot/1.1 (www.specificmedia.com)” that they are using to rip complete web sites, repeatedly. In the last couple days it hit several of my sites 200-300 times each, attempting to download the entire site each time. Specific Media is using this bot to data-mine your site for information that can be used by their advertisers and advertising network. They are apparently making a profit off your hard work! Without your permission of course. Can someone say Digital Millennium Copyright Act violations? SMBot completely disregards Robots.txt standards as well. At first their bot was crawling around without a user agent. (Fellow bot-hunter IncrediBILL has more info on SMBot.)

I went to Specific Media’s web site, www.SpecificMedia.com, and did a little reading. According to their site:

Our advanced algorithms categorize website content in 3,300+ categories…

Right. Beware whenever you see “advanced algorithms”. Specific Media claims to help advertisers target more accurately, meanwhile ignoring some ethical considerations. Now, to be fair, I’ve lived on both sides of this equation. Marketers do need channels to advertise through, and they do need research, but there is also an ethical responsibility when it comes to data mining, and bombarding web sites.

How to Ban SMBot and Keep Specific Media from Data-Mining Your Site
All you need to do is add a few lines of code to your .htaccess file, and their bot will be blocked from your site (for now).

Place this near the top of your .htaccess file: RewriteEngine on
Then place this somewhere below it:

RewriteCond %{HTTP_USER_AGENT} ^SMBot [NC]
RewriteRule .* - [F]

<Files>
Order Allow,Deny
Allow from all
Deny from compute.amazonaws.com
</Files>

This .htaccess technique will also work to to block other bad visitors to your web site. Feel free to leave comments.

Specific Media / SMBot Update (02/03/2007)
There seems to be some new developments in the world of Specific Media and SMBot.

  • SMBot seems to have a little brother, OpenISearch, that’s even worse than SMBot.

Tags:
| | | | | | |

Bookmark or Share with Friends: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Sphinn
  • Digg
  • Reddit


If you enjoyed this post, make sure you subscribe to the RSS feed!


Email This to a Friend Email This to a Friend

Print This Post Print This Post


Related Posts:

  • OpenISearch is Even Worse Than SMBot
  • Cyber-Surveillance and Internet Data-Mining
  • SES NY Day 3 - The Conference So Far
  • Web Site Security - Bot Traps
  • SES NY Day 4 - Wrapping It Up, Along With Some Random Thoughts


  • About This Entry