Cyber-Surveillance and Internet Data-Mining
by Scott AllenSpyBots Exist. Forget 007; It’s The Age of Double-0-BOT
In case you don’t yet believe it, there are data-mining bots and clandestine cyber-surveillance techniques being used on the internet. (Data-mining is information gathering on a large scale, usually in a short period of time.) There are secret bots out there gathering intelligence and info on you and your web site. Some have good intentions, and some bad. Many are bots written by dishonorable webmasters who scrape your site for content and make a buck by using your stolen content on their site. Some are marketing analytics companies who gather data about you and your company and sell it to advertisers for profit.
Here is a couple direct quotes from Buzz Analytics’ web site:
Applying Intelligence Agency Techniques to Produce Tailored Competitive Intelligence by Scanning and Mining the Internet in over 30 Languages and Nations To Discover Hidden Trends, Charting Your Internet Mind Share and Buzz Index, Tracking On-Line Opinion and Issues, Listening In on Word of Mouth and Customer Generated Media — Blogs, Consumer Portals, Special Interest Sites, Political Cause Networks, On-Line News Services, and Archives. We Analyze What the Online Community and Your Customers are Saying About Your Company, Your Products, and Your Competition; then Fuse that Information Into Concise Reports or Encrypted Executive Dashboards Accessed Via Our Secure Portal.
The National Security and Homeland Defense applications of monitoring and mining the Internet, and tracking trends for specific content are obvious. Manual content analysis of media is a traditional intelligence technique that found wide application in World War II. The BuzzAnalytics advantage is that we have developed computerized processes based on machine intelligence and neural networks to automate the capture and analysis of targeted information. BuzzAnalytics employs intelligence professionals, who have Department of Defense clearances for access to classified information, to perform the analysis and deliver highly customized reports to our clients.
It’s a little disturbing, eh? and these are just one of many companies that provide this “service”.
A blog I read covered a similar topic, “Mining Messaging for Research Gold“. It talked about how market researchers are trying to spy on your instant messaging software. Is nothing sacred anymore! Here’s a quote from the article:
Measurement firm Nielsen’s recently-launched BuzzMetrics service, which aims to monitor social media spaces for the skinny on companies, products and memes, is contemplating just that.
Director of science and innovation Matthew Hurst wrote: “I’ve recently become aware of a new channel for expressing opinion: status messages on IM client buddy lists. I see in my Gmail page the message ‘Lenovo stinks!’.”
Hurst, an expert in writing software robots that crawl pages for specific juicy bits of information, is thinking out loud that the kinds of status messages we set when we’re “away”, “at lunch” or “gone to Tesco” could be the next frontier for companies keen to know what consumers think, even suggesting mining Second Life for avatar opinions.
This one related to instant messaging, but you can see how there is a world of data-miners out there digging up info on you and your company. You need to protect and limit access to your site. There are a lot of methods involved in this, including, but not limited to: Limiting visitor access via .htaccess, password protection, and installing server firewalls. But, it gets much more complex than that. I’ll give you some quick tips to get you started.
Don’t Get Data-Mined:
Protect Your Web Site from Unauthorized Access
The first step is to know who is visiting your site. Either examine your server logs or install a good, secure PHP statistics script in your site. Know the user-agents and IP Addresses of your visitors. This can be a complex topic, and to go fully in-depth, it involves .htaccess and server side scripting (PHP, ASP, ColdFusion, etc.)
The second step is to limit access to your site, and have your site react differently according to who the user is, especially potential threats. You will need to know some programming, or hire a web programmer familiar with web security issues, such as myself. I want to be clear that I am not advocating cloaking, which is a black-hat search engine optimization technique in which the content presented to the search engine spider is different from that presented to the users’ browser. This is strictly for security purposes. Your site should serve the same content to both legitimate users and Search Engines, but why serve your valuable content to hackers, data-miners, site scrapers, rogue bots and the like? You get my point.
If you would like to explore some more information on the topic of Web Security, feel free to explore some of the following links:
- .Htaccess IP Banning - Block Bad Visitors
- User-Agents :: Cloak and Dagger for Web Sites (Part 1)
- Look Up IP Address Info (Web Security)
- How to Build Bot Traps in PHP - “Code Red! Unidentified Robots in Sector 17…”
- Web Site Security - Bot Traps
- Ban SMBot. Specific Media is Data-Mining Your Site!
- OpenISearch is Even Worse Than SMBot
- Setup a Secured WiFi Network
- Legality of Stealth Robots, Are They Trespassing?
- Evolving Stealth Bots
- How To Shut Down Scrapers the AUP Way.
- Cogent PSI and Sproose
If you are interested in having more comprehensive web site security installed on your site, please contact me, and we can discuss your specific needs.
Tags:
web security | website security | cyber-surveillance | WebGeek
If you enjoyed this post, make sure you subscribe to the RSS feed!
Related Posts:
About This Entry
You’re currently reading “Cyber-Surveillance and Internet Data-Mining,” an entry on WebGeek
- Published:
- 01.08.07 / 6pm
- Category:
- Bad Bots, Cyber Surveillance, Data-Mining, Site Scrapers, Website Security
- Related Posts:
- Go Back in Time - The Wayback Machine
- KeywordSpy.com Caught Spamming Blogs for Links
- Web Site Security - Bot Traps
- Ban SMBot - Specific Media is Data-Mining Your Site
- Coding Your Site - Firefox or IE?
- RSS Feeds:
- Subscribe to Blog
- Subscribe to Comments
- WordPress Plugins:
- WP-SpamFree: Blog Anti-Spam
- About Us:
- Hybrid6 Studios is a
web design and SEO firm
based in Los Angeles, CA.- Hybrid6 Studios is a






No comments
Jump to comment form | comments rss [?] | trackback uri [?]