A Side Note On WordPress, SEO, sitemap.xml and robots.txt

Last week, I learned that WordPress doesn’t ship with a default robots.txt.

  • this is the default file that search engine crawlers parse to see what resources and URL patterns that it allowed and not allowed to crawl; it’s step 1 in every search engine optimization (SEO) guide.

I guess I just stupidly assumed that it was included in WP. Anyways, I thought it to be fair to tell everyone that if you are using WordPress and you care how your site shows up in search results, you should generate a robots.txt and a sitemap.xml.

Robots.txt?

Know that it’s important for search engines. Read this:

** NOTE: Not all web crawlers are guaranteed to read example.com/robots.txt; it serves as a guideline.

I Feel Dumb…

I feel like an idiot, and I should. The other day I just happened to search for “engfers” on Google, and the result that came back was my site with an indented sub-result that was some error from a file in the WP-Super-Cache plugin. I thought to myself, why is the plugins/ directory being crawled?

Needless to say, I shortly thereafter found Google’s Webmaster Tools to help rectify my situation. It’s a pretty nice web-app that allows you to remove content from Google’s search (which I then used).

I also noticed that the webmaster tools had sections for analyzing your robots.txt and sitemap.xml. Well, I was surprised to find out that this site didn’t have a robots.txt.

Most of you are probably think that I’m an idiot because that’s SEO 101. Well yes, it is; however, I didn’t realize that WordPress doesn’t ship with a default robots.txt! Don’t ask me why I didn’t see that before because I don’t know. Nevertheless, I think WP should ship with a robots.txt that AT LEAST eliminates  plugins/ and wp-include/ from being crawled.

Our Shiny, New robots.txt

There seems to be a billion and one SEO blogs out there; however, I was looking for resources for a robots.txt optimized for WordPress.

I found a couple of articles and examples at askapache.com and an example from the WordPress.org Codex.

The final version of our robots.txt (http://www.engfers.com/robots.txt) was pulled from the WordPress Codex page.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

# Sitemap
Sitemap: http://www.engfers.com/sitemap.xml

**NOTE: This file must to be at the ROOT of your web server!

Final Note: sitemap.xml

The big-daddy search engines like Google, Yahoo, Microsoft, etc use your site’s sitemap.xml (example.com/sitemap.xml) to make it easier crawl your website. It’s also a very important point of SEO; just do a bit of searching on it.

The final line in our robots.txt points to the sitemap:

Sitemap: http://www.engfers.com/sitemap.xml

For WordPress, use a plugin like the Google Sitemap Generator, to have it automacially generate the sitemap for you.

+1 = Moreover, It will automatically regenerate the sitemap.xml when you publish or edit a new article or page. =)

7 Responses to A Side Note On WordPress, SEO, sitemap.xml and robots.txt

  1. Great article engfer! Most people and bloggers have never heard about robots.txt files, and that isn’t good for anyone.

    The newer WordPress versions show a default robots.txt “file” by using internal rewrites, which IMHO is not nearly as good as using an actual file. Keep it up..

  2. Great post, thank you. I really had no idea about how I should be using robots.txt files with WordPress, just assumed they had it done the best way for me.

  3. Hi. Your site displays incorrectly in Firefox, but content excellent! Thanks for your wise words :)

  4. Finally someone who can write a good blog ! . This is the kind of information that is useful to those want to increase their SERP’s. I loved your post and will be telling others about it. Subscribing to your RSS feed now. Thanks

  5. i have been using WordPress for 2 years but i still dont know how to do SEO using WordPress, is there an SEO pluggin for WordPress?.

  6. Nice looking blog, might I ask you what template you are running and how much it costs? I’ve been using cheap ones but can’t find one that I actually like.

  7. 3column2k — it’s free. but it’s old and probably needs to be updated for WP 2.8

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>