Robots.txt File and SEO

When it comes to the robots.txt file and SEO many sites owners appear to overlook this great little text file. I look at at least 20-30 sites a day and I can’t tell you how many don’t have a robots.txt file.

I understand if you’re an experienced SEO and you’re fully aware of the robots.txt file. More than likely it’s nothing new to you. But I don’t really blog for experienced SEO’s, rather I write with the everyday webmaster in mind. With that said I do see experienced SEO’s trying to control the bots  through the meta tag so maybe I’m writing for you as well.

For the other 98% of you, I’m sure you maybe wondering what a robots.txt file is?

For sake of disclosure some of the information below was pulled from my hand written notes that I took while I was a Stomper member.

The robots.txt file was designed to inform bots from the search engines how to behave on your site. It tells them what information they can have access to and what information they can’t.

The first part of a robots.txt file consists of  User-agent which specifies the robot.

User-agent: Googlebot

You may also use the wildcard character ‘*’ to specify all robots.

User-agent: *

You can find user agent names in your own server logs by checking for requests to robots.txt. Most off the major search engines have names for their spiders.

Here is a small list:

Googlebot
Yahoo! Slurp
MSNBot
Netcraft
Ask

The second part of a robots.txt file consists of Disallow. These lines can specify files and/or directories. For example, if you want to instruct spiders to not download contact.php, you would enter:

Disallow: contact.php

You can also specify directories:

Disallow: /cgi-bin

This will block spiders from your cgi-bin directory.

Disallow: /wp-admin/

This will block spiders from your Wordpress admin directory.

Common Questions about the Robots.txt File

Q: Where do I place the robots.txt file?

A: The file should be placed in the root directory of your server. In other words, in the same place as your index.htm file for your home page is located.

Q: What are some things that I would want to exclude from the robots?

A: Here are a few things I would exclude:

  • Any folder that is “off limits” to public eye are not password protected
  • Images
  • CGI-BIN

Creating your robots.txt file is not complicated. It can be made using notepad or some other text editor. It’s super easy to make and will take less than five minutes if you follow these simple steps:

  1. Copy and paste the robots.txt file from my site at  http://www.denverseoguy.com/robots.txt
  2. View to your web server folder and write down the folders you want to exclude
  3. Open the robots.txt in notepad
  4. Modify the Disallow lines in the my robots.txt to reflect your folders
  5. Save the file and upload to your server
  6. Validate the file
  7. If you need to make changes, do so and then repeat steps 4-6

Note:  There is a rite and a wrong way to control the bots. Many times you may see the snippet of code  below in the meta tags of the page itself. This is the wrong way:

<meta name=”robots” content=”noindex,nofollow”>

I see this a lot on sites that are giving search optimization advice. Now if your experienced in optimization then you should know that most reputable search engines overlook the meta tags right?

For instruction on how to validate your robots.txt file you can go to Google Webmaster/Site owner Help

If you aren’t sure whether you can do this, please feel free to give me a shout and I’ll help you over the bumps.

Good luck!

Tell your friends, Tell your colleagues, Tell a SMB...

Buzz This

Post a Comment

Your email is never shared. Required fields are marked *

*
*
blog comments powered by Disqus