Robots.txt file have you confused? here is a great tool to validate it’s format!

I found this tool for validating the Robots.txt  while trying to figure out why my sites were getting crawled like crazy! I run three sites on one shared host account and it turns out that the robots.txt file has to be in the root directory of the host. For example, let’s say you have a default site called domain.bla which points to your default root directory of / (public_html/). But if you also have other sites in that root directory like public_html/domain2.bla/ the robots.txt file in the domain2.bla site is ignored because it is not in the root directory. So you have to customize the robots.txt in the root directory to cover all sites under the root directory.

This has greatly diminished the crawling of my sites.

May God bless!


UPDATE!!! The author of the robots.txt validator has created a new tool, see here:


