The Ultimate WordPress Htaccess File?

Canonicalization is a serious problem for webmasters, just read this latest entry by Matt Cutts or this great post from John Andrews. However telling webmasters that they should fix these issues isn’t enough, webmasters & bloggers need solutions.

In Search of the Ultimate Htaccess file

A couple of months ago Alister Cameron posted a simple solution to .htaccess such that you didn’t need to use a plugin to convert URLs using www to URLs without.

At the time I suggested a couple of improvements, and also mentioned I would post about it here on my blog, hopefully to help develop what could be looked on as the “Ultimate” .htaccess file for WordPress, something you could just drop in your root folder and be done with it.
For me the inclination was for multiple niche websites using WordPress as a CMS, so I really wanted to avoid anything that would make the content look dated.

I am not an htaccess guru, and this is all cobbled together from code suggested by other people in various places
Before using any of this code, make a backup of your existing .htaccess, and be prepared to copy it back if testing proves something is broken.

Lets start off with the default .htaccess for WordPress once you turn on mod_rewrite for SEO friendly URLs

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress

The first thing we want to do is get rid of the WWW if someone uses it. I know there are 2 schools of thought on whether URLs should have www by default or not, I prefer without and never type www unless I can’t access another site without it (broken htaccess).

Secondly we also want to get rid of trailing slash problems

The base rules that Alister first suggested were

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.alistercameron.com$ [NC]
RewriteRule ^(.*)$ http://www.alistercameron.com/$1 [R=301,L]

RewriteCond %{REQUEST_URI} ^/[^\.]+[^/]$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]
</IfModule>

However we want this to be the Ultimate htaccess code, thus we don’t want to have to enter the domain name. I am not sure whether this will work if you have multiple blogs in sub-folders.

In this code we are using HTTP_HOST rather than adding a URL manually to every .htaccess file you create. If you are setting up 50 blogs (niche marketers do things like this, and fill them with unique original content – not everyone creates splogs) then being able to use one default file is a major advantage.

# If subdomain www exists, remove it first
RewriteCond %{HTTP_HOST} ^www\.([^\.]+\.[^\.]+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

Can we improve on the trailing slashes code?

Possibly…

A while ago I was also reading a post over on Aaron Walls SEO Book blog. Within the comments were suggestions with improvements to the code Aaron suggested.
Finding the exact reference is a problem as it wasn’t on this thread

Searching on a phrase in the code these days only brings up a reference on Alister’s blog where I mentioned it in the comments, so I have no idea who to attribute this htaccess code to.

# If requested resource does not exist as a file
RewriteCond %{REQUEST_FILENAME} !-f
# and does not end with a period followed by a filetype
RewriteCond %{REQUEST_URI} !..+$
# and does not end with a slash
RewriteCond %{REQUEST_URI} !/$
# then add a trailing slash and redirect
RewriteRule (.*) $1/ [R=301,L]
</IfModule>

I am not a htaccess guru, but this seems to take into account more potential situations such as files to download.

If you put all this code together you end up with something like this

<IfModule mod_rewrite.c>
RewriteEngine On
# If subdomain www exists, remove it first
RewriteCond %{HTTP_HOST} ^www\.([^\.]+\.[^\.]+)$ [NC]
RewriteRule ^(.*)$ http://%1/$1 [R=301,L]

# If requested resource does not exist as a file
RewriteCond %{REQUEST_FILENAME} !-f
# and does not end with a period followed by a filetype
RewriteCond %{REQUEST_URI} !..+$
# and does not end with a slash
RewriteCond %{REQUEST_URI} !/$
# then add a trailing slash and redirect
RewriteRule (.*) $1/ [R=301,L]
</IfModule>

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

So can anyone offer any constructive improvements?

If you offer improvements, please provide code samples and explain exactly why it is an improvement so people can learn from it (as I said I am a newbie at this)
Code can be entered using code tags in square brackets.

Liked this post? Follow this blog to get more. Follow

Comments

  1. says

    In my expirence the use of [L,QSA] is often a life saver. It might not mean much but the number of times a cruftless URL got a ?foo=bar by a plugin (such as a multipage post) that doesn’t work…

    The solution is QSA Query String Append. If something is going to insist on passing by QS then you need to hand it over.

    Furthermore QS handling is native to php via $_GET[] and so passing the “fake” path data to the script means that PHP dos not need to work so hard.

    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?virtualpath=$1 [L,QSA]

    I have no idea if your platform can cope with this but one assumes it can. This worked with NucleusCMS – YMMV.

  2. says

    Andy

    Is there a reason why there are 2 blocks of code in this if you are using WordPress (as I am)?

    Also I want to have the default to include the “www.” subdomain as it is the norm for people to think of (how many business cards have you seen without it as the companyt web address?). It might be a good idea to include this in the code in comments for people like me who are not comfortable with messing with a .htaccess file (hint!)

    [BTW this is a 2nd posting as my original appeared to be truncated when I used a double quote mark. Is this a bug in the blog software?]

  3. says

    I am not sure what you are referring to. There are 5 blocks of code on the page that seem to display OK in both FF2.0+ and IE7

    The final block of code is the one I use all the time for blog now, although I did just hit a problem with my host deleting it for some reason, maybe by accident.

    It took me 30 seconds to fix the probelm when diagnosed, because I only have one .htaccess file to worry about, for all of my sites.

  4. says

    I’m having problems posting comments for some reason when quoting the code so I’ll use the line numbers ;)

    Block 1, as I see it, is lines 1 to 15 and block 2 is 17 to the end. I was thinking of ccombining them for neatness.

    Also how do you force the www. to be prepended?

  5. says

    If you use WWW it takes up more space – someone can type in www and it will work, and just redirect to the shorter version.

    Alister has some usable code listed on his page, but that doesn’t automatically set the host, but if you are not running lots of blogs, that isn’t a major issue.

  6. says

    Nice Post, Here is what I am using to fight against canonicazation in wordpress (thanks to http://www.jimwestergreen.com for the code)

    Options +Indexes
    Options +FollowSymLinks
    RewriteEngine on
    RewriteCond %{HTTP_HOST} ^myblog\.com
    RewriteRule ^(.*)$ http://www.myblog.com/$1 [R=permanent,L]
    	
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_URI} !(.*)/$
    RewriteRule ^(.*)$ http://www.myblog.com/$1/ [L,R=301]
  7. says

    # If subdomain www exists, remove it first
    RewriteCond %{HTTP_HOST} ^www\.([^\.]+\.[^\.]+)$ [NC]
    RewriteRule ^(.*)$ http://%1/$1 [R=301,L] 

    The code above is for no-www
    How to code for yes-www? (I’m a newbie :P )

    I tried some code but it requires entering url.

  8. Vacation Rentals says

    Just curious why you want to get rid of the trailing slash? Trailing slashes are definitely the way to go. It takes longer for a server to render a page with them off. It’s one more iteration it has to go thru…

    • says

      now reworded – the code gets rid of trailing slash problems

      You will notice I don’t use trailing slashes on this site for canonical permalinks, because I have a specified document type.
      If you don’t have a specified document type, with folders being used as a permalink URL, then you would use a trailing slash – that works too.

  9. joomla guy says

    Ive been hacking my brain on figuring out the best way to set up my htaccess file for about a week now and nothing seemed to bear fruit. Just tried your recommended approach as outlined above and Voila! it works like a charm :)

    Thank you!

  10. rgopinath says

    Hi I am cracking my head over here.
    Still i am not able to set permalinks. I am using plesk control panel.

  11. Nokia 5530 Blog says

    I have the following .htacess file, working fine at site root:
    RewriteEngine On

    #if the requested filename does not exists (as file nor directory), then assume CAT_NAME_HERE/SUB_CAT_HERE

    RewriteCond %{REQUEST_FILENAME} !-f

    RewriteCond %{REQUEST_FILENAME} !-d

    RewriteRule ^([^/]+)/([^/]+)/?$ /index.php?c=$1&area=$2 [L,QSA]

    RewriteRule ^([^/]+)/([^/]+)/([^/]+)/([^/]+)/?$ /index.php?c=$1&area=$2&d=$3&subarea=$4 [L,QSA]

    ###END################################…

    But now I instaled wordpress at a subfolder like http://www.website.com/worpress/ and

    with this .htaccess file the images of subfolders like /wp-admin/ do not display.

    Thank your for your time.

Trackbacks

  1. [...] Andymatic:Hey, who doesn’t love the name Andy? Seriously, Andymatic is a personal blog with a well rounded topic list of links and bookmarks covering politics, tech news and pop culture. For instance, he found a really cool blog link to ‘The Ultimate WordPress htaccess file’ [...]