» A Side Note On WordPress, SEO, sitemap.xml and robots.txt

Last week, I learned that WordPress doesn’t ship with a default robots.txt.

  • this is the default file that search engine crawlers parse to see what resources and URL patterns that it allowed and not allowed to crawl; it’s step 1 in every search engine optimization (SEO) guide.

I guess I just stupidly assumed that it was included in WP. Anyways, I thought it to be fair to tell everyone that if you are using WordPress and you care how your site shows up in search results, you should generate a robots.txt and a sitemap.xml.

Robots.txt?

Know that it’s important for search engines. Read this:

** NOTE: Not all web crawlers are guaranteed to read example.com/robots.txt; it serves as a guideline.

I Feel Dumb…

I feel like an idiot, and I should. The other day I just happened to search for “engfers” on Google, and the result that came back was my site with an indented sub-result that was some error from a file in the WP-Super-Cache plugin. I thought to myself, why is the plugins/ directory being crawled?

Needless to say, I shortly thereafter found Google’s Webmaster Tools to help rectify my situation. It’s a pretty nice web-app that allows you to remove content from Google’s search (which I then used).

I also noticed that the webmaster tools had sections for analyzing your robots.txt and sitemap.xml. Well, I was surprised to find out that this site didn’t have a robots.txt.

Most of you are probably think that I’m an idiot because that’s SEO 101. Well yes, it is; however, I didn’t realize that WordPress doesn’t ship with a default robots.txt! Don’t ask me why I didn’t see that before because I don’t know. Nevertheless, I think WP should ship with a robots.txt that AT LEAST eliminates  plugins/ and wp-include/ from being crawled.

Our Shiny, New robots.txt

There seems to be a billion and one SEO blogs out there; however, I was looking for resources for a robots.txt optimized for WordPress.

I found a couple of articles and examples at askapache.com and an example from the WordPress.org Codex.

The final version of our robots.txt (http://www.engfers.com/robots.txt) was pulled from the WordPress Codex page.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

# Google Image
User-agent: Googlebot-Image
Disallow:
Allow: /*

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

# Internet Archiver Wayback Machine
User-agent: ia_archiver
Disallow: /

# digg mirror
User-agent: duggmirror
Disallow: /

# Sitemap
Sitemap: http://www.engfers.com/sitemap.xml

**NOTE: This file must to be at the ROOT of your web server!

Final Note: sitemap.xml

The big-daddy search engines like Google, Yahoo, Microsoft, etc use your site’s sitemap.xml (example.com/sitemap.xml) to make it easier crawl your website. It’s also a very important point of SEO; just do a bit of searching on it.

The final line in our robots.txt points to the sitemap:

Sitemap: http://www.engfers.com/sitemap.xml

For WordPress, use a plugin like the Google Sitemap Generator, to have it automacially generate the sitemap for you.

+1 = Moreover, It will automatically regenerate the sitemap.xml when you publish or edit a new article or page. =)

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

7 Responses to “A Side Note On WordPress, SEO, sitemap.xml and robots.txt”


  1. 1 AskApache

    Great article engfer! Most people and bloggers have never heard about robots.txt files, and that isn’t good for anyone.

    The newer WordPress versions show a default robots.txt “file” by using internal rewrites, which IMHO is not nearly as good as using an actual file. Keep it up..

  2. 2 Market Leverage

    Great post, thank you. I really had no idea about how I should be using robots.txt files with WordPress, just assumed they had it done the best way for me.

  3. 3 domains

    Hi. Your site displays incorrectly in Firefox, but content excellent! Thanks for your wise words :)

  4. 4 Check my Pagerank

    Finally someone who can write a good blog ! . This is the kind of information that is useful to those want to increase their SERP’s. I loved your post and will be telling others about it. Subscribing to your RSS feed now. Thanks

  5. 5 Matt

    i have been using WordPress for 2 years but i still dont know how to do SEO using WordPress, is there an SEO pluggin for WordPress?.

  6. 6 dvd on wii

    Nice looking blog, might I ask you what template you are running and how much it costs? I’ve been using cheap ones but can’t find one that I actually like.

  7. 7 engfer

    3column2k — it’s free. but it’s old and probably needs to be updated for WP 2.8

Leave a Reply