Ban SMBot - Specific Media is Data-Mining Your Site
by Scott AllenSpecific Media Data-Mines Web Sites…Does it Steal Content Too?
Specific Media has recently created a bot with the User-Agent: “SMBot/1.1 (www.specificmedia.com)” that they are using to rip complete web sites, repeatedly. In the last couple days it hit several of my sites 200-300 times each, attempting to download the entire site each time. Specific Media is using this bot to data-mine your site for information that can be used by their advertisers and advertising network. They are apparently making a profit off your hard work! Without your permission of course. Can someone say Digital Millennium Copyright Act violations? SMBot completely disregards Robots.txt standards as well. At first their bot was crawling around without a user agent. (Fellow bot-hunter IncrediBILL has more info on SMBot.)
I went to Specific Media’s web site, www.SpecificMedia.com, and did a little reading. According to their site:
Our advanced algorithms categorize website content in 3,300+ categories…
Right. Beware whenever you see “advanced algorithms”. Specific Media claims to help advertisers target more accurately, meanwhile ignoring some ethical considerations. Now, to be fair, I’ve lived on both sides of this equation. Marketers do need channels to advertise through, and they do need research, but there is also an ethical responsibility when it comes to data mining, and bombarding web sites.
How to Ban SMBot and Keep Specific Media from Data-Mining Your Site
All you need to do is add a few lines of code to your .htaccess file, and their bot will be blocked from your site (for now).
Place this near the top of your .htaccess file: RewriteEngine on
Then place this somewhere below it:
RewriteCond %{HTTP_USER_AGENT} ^SMBot [NC]
RewriteRule .* - [F]
<Files>
Order Allow,Deny
Allow from all
Deny from compute.amazonaws.com
</Files>
This .htaccess technique will also work to to block other bad visitors to your web site. Feel free to leave comments.
Specific Media / SMBot Update (02/03/2007)
There seems to be some new developments in the world of Specific Media and SMBot.
- SMBot seems to have a little brother, OpenISearch, that’s even worse than SMBot.
Tags:
SMBot | Specific Media | bad bots | spam bots| bots | site rippers | site scrapers | webgeek
If you enjoyed this post, make sure you subscribe to the RSS feed!
Related Posts:
About This Entry
You’re currently reading “Ban SMBot - Specific Media is Data-Mining Your Site,” an entry on WebGeek
- Published:
- 01.05.07 / 5pm
- Category:
- .htaccess, Bad Bots, Data-Mining, Site Scrapers, Website Security
- Related Posts:
- OpenISearch is Even Worse Than SMBot
- Cyber-Surveillance and Internet Data-Mining
- SES NY Day 3 - The Conference So Far
- Web Site Security - Bot Traps
- SES NY Day 4 - Wrapping It Up, Along With Some Random Thoughts
- RSS Feeds:
- Subscribe to Blog
- Subscribe to Comments
- WordPress Plugins:
- WP-SpamFree: Blog Anti-Spam
- About Us:
- Hybrid6 Studios is a
web design and SEO firm
based in Los Angeles, CA.- Hybrid6 Studios is a






3 Comments
Jump to comment form | comments rss [?] | trackback uri [?]