Duplicate Content Prevention: WWW vs. Non-WWW and .Htaccess
Many website owners have a technical SEO issue with their site(s) that they don’t even realize, and when left uncorrected, can really hurt search engine rankings. This situation occurs when the one or more pages can be accessed via several different URL’s. [...]
Improve Site Security and SEO with One Line of Code
I was recently doing research in Google for a new WordPress plugin we are developing. I was greeted with page after page of results that read like this:
The Google results show that many sites have their directory contents being listed, and ranked. This tells me that many, many site owners are using default server settings and unwisely revealing the contents of their directories. It is extremely important to hide your directory contents for two reasons: Security and SEO. [...]
Website Security: Hackers, Botnets, and LIBWWW-PERL
Recently, there has been a rash of automated hacker attacks, defacing websites across the globe that don’t employ adequate security measures. Earlier this week, several friends of mine had their sites hacked and defaced. Most of these attacks don’t come from experienced hackers — they come from script kiddies employing automated scripts and a network of compromised computers (botnets). [...]
Detect User-Agents: Cloak and Dagger for Web Sites – Part 2
“I’ve heard of User-Agents…”
In a previous post, I introduced you to User-Agents. Now let’s find out why you need to detect them, and how.
According to Wikipedia:
When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-agent: or User-Agent: and typically includes information such as the application name, version, host operating system, and language. [...]
.Htaccess Reference
Wikipedia used to have a very informative article about .htaccess. It unfortunately was deleted (against the objections of many, including myself) and merged with the Apache article. However, I had the foresight to archive the page. I felt that it was a great loss to the internet and the community of web developers because of the useful information it contained, so I’m re-posting the article here.
UPDATE 07/08/07 – The Wikipedia article has been reinstated. [...]
OpenISearch is Even Worse Than SMBot
New Scraper Bot, OpenISearch, is Related to SMBot, but Even Worse
There is a new scraper bot with the User-Agent: “OpenISearch/1.x (www.openisearch.com)” started slamming my web sites and my clients’ web sites just a few days ago. It originated from the same IP blocks (216.182.224.*-216.182.239.*) and same server at Amazon Web Services (compute.amazonaws.com) as SMBot, which is owned and operated by Specific Media. [...]
Ban SMBot – Specific Media is Data-Mining Your Site
Specific Media Data-Mines Web Sites…Does it Steal Content Too?
Specific Media has recently created a bot with the User-Agent: “SMBot/1.1 (www.specificmedia.com)” that they are using to rip complete web sites, repeatedly. In the last couple days it hit several of my sites 200-300 times each, attempting to download the entire site each time. Specific Media is using this bot to data-mine your site for information that can be used by their advertisers and advertising network. [...]
.Htaccess IP Banning – Block Bad Visitors
Increase your web site’s security by blocking bad visitors with .htaccess. If you have nuisance visitors, site scrapers, or spammers, you may want to add some lines of code to your .htaccess file that will block bad visitors by IP address or by blocks of IP addresses. You want to be careful though that you don’t ban blocks of IP’s carelessly as you may end up banning potential customers or other valid site visitors. [...]
URL Rewriting – Search Engine Friendly URL’s – Part 2
You probably know by now that dynamic web sites have a challenge in search engine optimization because they often use dynamic url’s with information carried from page to page in query strings. (http://www.yourdomain.com/index.php?id1=value1&id2=value2&id3=value3)
Through proper use of .htaccess and mod_rewrite, you can turn your ugly dynamic url’s into search engine friendly (and user friendly) url’s. [...]
Custom Error Pages and Friendly 404’s
So everyone hates those ugly white “404 File Not Found” pages when they either type in a non-existent URL, or the page has moved.
It is important to create your own custom error pages for a few reasons.
Number One, and this is a HUGE point…the UGLY ERROR PAGES MAKE PEOPLE GO AWAY! If you want to drive potential readers and customers away in droves, then fine, keep your default ugly white error pages. [...]

