Search Engine Optimization (SEO): Robots.txt

After analyzing the domain name, general design, and URL format, my colleagues and I look at potential client’s robots.txt and sitemap. This is helpful because it starts to give you an idea of how much (or little) the developers of the site cared about SEO. A robots.txt file is a very basic

step webmasters can take to work with search engines. The text file, which should be located in the root directory of the website (http://www.example.com/robots.txt), is based on an informal protocol that is used for telling search engines what directories and files they are allowed and disallowed from accessing. The inclusion of this file gives you a rough hint of whether or not the developers of the given site made SEO a priority.

Instead, I will tell you a cautionary tale. Bit.ly is a very popular URL shortening service. Due to its connections with Twitter.com, it is quickly becoming one of the most linked websites on the Web. One reason for this is its flexibility. It has a feature where users can pick their own URL.

For example, when linking to my website I might choose http://bit.ly/SexyMustache. Unfortunately, Bit.ly forgot to block certain URLs, and someone was able to create a shortened URL for http://bit.ly/robots.txt. This opened up the possibility for that person to control how robots were allowed to crawl Bit.ly. Oops! This is a great example of why knowing even the basics of SEO is essential for web based business owners.

After taking a quick glance at the robots.txt file, SEO professionals tend to look at the default location for a sitemap. (http://www.example.com/sitemap.xml). When I do this, I don’t spend a lot of time analyzing it (that comes later, if owners of that website become a client); instead, I skim through it to see if I can glean any information about the setup of the site. A lot of times, it will quickly show me if the website has information hierarchy issues. Specifically, I am looking for how the URLs relate to each other. A good example of information hierarchy would b e www.example.com/mammal/dogs/english-springer-spaniel.html, whereas a bad example would be www.example.com/node? type=6&kind=7. Notice on the bad example that the search engines can’t extract any semantic value from the URL. The sitemap can give you a quick idea of the URL formation of the website.

Search Engine Optimization (SEO)

Thursday, March 13, 2014

Robots.txt

No comments:

Post a Comment