When it comes to the robots.txt file and SEO many sites owners appear to overlook this great little text file. I look at at least 20-30 sites a day and I can’t tell you how many don’t have a robots.txt file.
I understand if you’re an experienced SEO and you’re fully aware of the robots.txt file. More than likely it’s nothing new to you. But I don’t really blog for experienced SEO’s, rather I write with the everyday webmaster in mind. With that said I do see experienced SEO’s trying to control the bots through the meta tag so maybe I’m writing for you as well.
For the other 98% of you, I’m sure you maybe wondering what a robots.txt file is?
For sake of disclosure some of the information below was pulled from my hand written notes that I took while I was a Stomper member.
The robots.txt file was designed to inform bots from the search engines how to behave on your site. It tells them what information they can have access to and what information they can’t.
The first part of a robots.txt file consists of User-agent which specifies the robot.
User-agent: Googlebot
You may also use the wildcard character ‘*’ to specify all robots.
User-agent: *
You can find user agent names in your own server logs by checking for requests to robots.txt. Most off the major search engines have names for their spiders.
Here is a small list:
Googlebot
Yahoo! Slurp
MSNBot
Netcraft
Ask
The second part of a robots.txt file consists of Disallow. These lines can specify files and/or directories. For example, if you want to instruct spiders to not download contact.php, you would enter:
Disallow: contact.php
You can also specify directories:
Disallow: /cgi-bin
This will block spiders from your cgi-bin directory.
Disallow: /wp-admin/
This will block spiders from your Wordpress admin directory.
Common Questions about the Robots.txt File
Q: Where do I place the robots.txt file?
A: The file should be placed in the root directory of your server. In other words, in the same place as your index.htm file for your home page is located.
Q: What are some things that I would want to exclude from the robots?
A: Here are a few things I would exclude:
- Any folder that is “off limits” to public eye are not password protected
- Images
- CGI-BIN
Creating your robots.txt file is not complicated. It can be made using notepad or some other text editor. It’s super easy to make and will take less than five minutes if you follow these simple steps:
- Copy and paste the robots.txt file from my site at http://www.denverseoguy.com/robots.txt
- View to your web server folder and write down the folders you want to exclude
- Open the robots.txt in notepad
- Modify the Disallow lines in the my robots.txt to reflect your folders
- Save the file and upload to your server
- Validate the file
- If you need to make changes, do so and then repeat steps 4-6
Note: There is a rite and a wrong way to control the bots. Many times you may see the snippet of code below in the meta tags of the page itself. This is the wrong way:
<meta name=”robots” content=”noindex,nofollow”>
I see this a lot on sites that are giving search optimization advice. Now if your experienced in optimization then you should know that most reputable search engines overlook the meta tags right?
For instruction on how to validate your robots.txt file you can go to Google Webmaster/Site owner Help
If you aren’t sure whether you can do this, please feel free to give me a shout and I’ll help you over the bumps.
Good luck!
Tell your friends, Tell your colleagues, Tell a SMB...





