Robots.txt for a WordPress Website
When you install WordPress, it comes with an auto generated ‘robots.txt’ file which is rather incomplete.
By experience I find this is the most appropriate robots.txt file (at a minimum), for a WordPress site installed right in the root folder:
If the WordPress installation is in a sub folder (example folder /blog/ ) then the robots.txt for the entire site has to have WordPress specific directives in addition to those for the rest of the site.
Under the general user agent (User-agent: *) line you will add these similar directives to the others that apply to the rest of the site:
Note: There may be css or js files in other folders (e.g. in some plugin folder) which may need to be allowed as well. It depends on your installation.
In addition tag, category, archive and author pages require a robots “noindex” meta tag, because all they are at best is lists of links to posts, and at worst duplicate content when all all part of each post is listed as well.
This is easily achieved by using the All in One SEO Pack pulgin (or similar) and configuring it to add those robots “noindex” meta tag to those types of urls.
Another set of pages which need to be noindexed are attachment pages. Other than modifying manually the script that creates the attachment page (and keeping it update with every new version of WordPress or of your theme). Still best policy would be to not create attachment pages in the first place, just be careful how you insert your images and watch out what you link them to.