Naughty Google - AdWords Bot Posing as IE

by Scott Allen

We all know that search engines can, on occasion, check out a site or two in stealth mode. I think that’s shady, especially with Google’s code of “Do No Evil”. What becomes alarming is when they do it consistently, using spoofed user-agents to mask their true identity. Recently we’ve had some interesting log entries that indicate Google’s AdWords bot is posing as Internet Explorer. (I won’t explain all the details, but we know these were the AdWords Bot and that it was checking out some of our client AdWords campaigns.) Here’s a sample of the user-agent recorded in the logs, along with IP addresses, times and dates:

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)

  • 72.14.194.27 — 2007-11-28 (Wed) 05:14:01 [IP Info: ARIN | DNSSTUFF]
  • 66.249.84.66 — 2007-11-28 (Wed) 04:58:50 [IP Info: ARIN | DNSSTUFF]
  • 66.249.85.65 — 2007-11-28 (Wed) 04:50:33 [IP Info: ARIN | DNSSTUFF]
  • 72.14.193.66 — 2007-11-28 (Wed) 04:34:00 [IP Info: ARIN | DNSSTUFF]

I highly doubt that Google staff is up at 4:30 AM to take a look at our AdWords Campaigns. :) So that begs the question, why are Google bots crawling around sites posing as Internet Explorer, and violating their own policies? We don’t practice or advocate cloaking, so there’s nothing to hide from Google as far as we’re concerned, but in my opinion, this is a shady practice.

Google Bots Misbehaving in Other Ways
Google outlines the “official” method for validating their bots by using a Reverse/Forward DNS verification, quoted below:

Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:

> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1

I don’t think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.

According to Matt Cutts, who authored the entry on the Google Blog, that is the “official way to authenticate Googlebot”.

That being said, it becomes very difficult to validate that it is in fact Google crawling your site when they don’t play by their own rules. 3 of the 4 IP addresses listed above don’t have proper Reverse DNS entries set up, so if you’re using the “official” method, it would never authenticate as a Google-type-bot. (We have a few other validation tricks up our sleeve that go beyond what Google recommends, but my point is that most webmasters who use the official method will run afoul.)

One more misbehavior to mention. Recently we have also found that on some sites we manage, some of Google’s crawlers have been traversing areas of our site restricted by robots.txt.

Bad Google!

 

Tags:
| |

Bookmark or Share with Friends: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • StumbleUpon
  • del.icio.us
  • Sphinn
  • Digg
  • Reddit


If you enjoyed this post, make sure you subscribe to the RSS feed!


Email This to a Friend Email This to a Friend

Print This Post Print This Post


Related Posts:

  • Info on Google’s New Ability to Index Flash
  • Google Apps for Your Domain
  • Search Engine Optimization (SEO) Tools #1 - Check Your Rankings
  • Google PageRank Gets Shaken Up, But Not to Worry
  • Friday Favorites - 06/08/07


  • About This Entry