Three .htaccess tips that can help your SEO
Search Engine Optimisation

Three .htaccess tips that can help your SEO

29th July 2011
Padlock
Image Credit: lupinoduck

A lot of times when you deal with SEO issues, these are related to URL structure, URL patterns and / or URL parameters being used. Since I’m currently doing some website restructurings for various clients, I thought I’d share some of the directives which we frequently use and which will make a lot of troubles just go away. In addition to URL handling I also added some performance related directives, as well as various access control tips.

What you should know up-front: The following directives can only be used when running an Apache web-server – no matter if you’re on an Unix or Windows box. However there are similar modules for other web-servers like Microsofts’ IIS, lighttpd and nginx as well. You should also keep in mind that using those directives might slow down you web-servers performance. Depending on the amount of parallel requests flying in, a lot of rules to be processed can potentially kill your web-server. So please make sure you really do use that stuff with care!

Talking about performance: Usually the following directives are being used within Apache’s so called .htaccess (which is short for hypertext access) files – however things might slow down doing so because for each request the webserver has to check for the presence of such an .htaccess file. This would mean whenever you have the possibility to access and use the httpd.conf (which is Apache’s main configuration file) I’d strongly recommend putting your directives there. This is mainly because the configuration will just be read once (at start-up) and only be refreshed when you especially ask to do so (e.g. perform a re-start / re-fresh). This will massively improve performance because you can de-activate .htaccess processing completely which results in fewer directory and HDD reads.

And as a final info: Sometimes you can’t use specific directives with .htaccess but only within httpd.conf – this usually depends on the web-server’s configuration. If possible, please be sure to test the httpd.conf if a directive does not seem to work in your .htaccess file. However – let’s get down to the real stuff, shall we?

1. Redirects

Yeah, what else, right? Very true, a lot of stuff in regards to URL structure is redirecting from A to B. For example when restructuring a site. Or maybe the content doesn’t exist any longer but you want to keep inbound links and rankings and therefore decide to redirect. Good choice!

The easiest way to redirect from “old/url.html” to “new/url.html” would look like this:

Redirect 301 /old/url.html http://www.domain.com/new/url.html

Or let’s say, for whatever reason, you want to move away from “.html” as a file extension but use folders instead. You can still do that with a single line using the RedirectMatch directive:

RedirectMatch 301 ^/([a-zA-Z0-9]*).html$ http://www.domain.com/$1/

Surely, it looks a little more complex – however, it’s just using simple patterns now. Keep in mind, the above shown example would only redirect html-files within the root directory and just accepts file names containing letters (upper and lower-case) as well as numbers.

RedirectMatch is quite a powerful tool to play around with, however – sometimes you’d want to go one step further. For example when solving www vs. non-www issues. This would require the use of Apache’s mod_rewrite module. The necessary code would look like this:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www.domain.com$
RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]
</IfModule>

The first and last line ensure that those commands will only be executed if the necessary module is present and has been loaded (because you’ll get some not so nice errors otherwise). Line two activates Apaches rewrite engine, line three contains the condition (in this case: if the hostname does not contain www…) execute line four. And this line just performs a redirect from without www to the same domain with a pre-pended www.

Please note: All rules explicitly state to perform a HTTP 301 redirect. If you don’t do so, Apache does perform a HTTP 302 redirect which is really bad for your SEO – so always, always make sure to explicitly specify the desired status code properly.

Another thing to keep mind: So called redirect chaining is probably a bad idea – make sure you directly redirect your URL from A to B, and not for example from A to B, where B does execute another redirect to C. At some point search engines just don’t follow those redirects anymore. And to prevent from that, well, just take “the shortest way possible”.

And as a quick tip for debugging: I really recommend using the RewriteLog feature when setting up new rules – it make things so much easier!

2. Performance

A while ago I wrote a post on performance optimization and what you can do to improve your websites performance – for users and / or for search engines. Surely some of them are based on the server-side of things.

Before we start, please make sure you have loaded the necessary modules. Go for the httpd.conf – there need to be two lines looking like this (which are not commented out, e.g. it must NOT start with “#”):

LoadModule deflate_module modules/mod_deflate.so
LoadModule expires_module modules/mod_expires.so

To activate file expires headers, go for this:

FileETag MTime Size
ExpiresActive on
ExpiresDefault "access plus 86400 seconds"

This would cause every file (for the same user) to be cached for 24 hours. However, in most cases you want to be more specific and define an expiration date on a per-file basis. For example images or css files using different expiration dates:

ExpiresByType image/jpeg "access plus 604800 seconds"
ExpiresByType text/css "access plus 259200 seconds"

It’s plain simple, just specify the number of seconds (based on the access time) on how long each of the file types should be cached. To double check if this works, just check the response headers, there should be something like this:

Another thing to improve performance is enabling GZIP-compression. This just means files will be compressed (aka g-zipped) before they’ll get send to the browser. To do so, just setup an Apache output filter like this one:

AddOutputFilterByType DEFLATE text/html text/plain text/xml application/xml application/xhtml+xml text/javascript text/css application/x-javascript
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4.0[678] no-gzip
BrowserMatch bMSIE !no-gzip !gzip-only-text/html

Copy, paste and you’re done. Again, check the response headers and try to find these:

3. Controlling access, browsing & more

Most of the time you probably don’t want everybody to see what’s on your sever. Usually this should be turned off by default, however – if not – use this to prevent from displaying directory listings:

Options -Indexes

Sometimes you might find yourself in a situation where the website’s default file is not named “index.html” – but eventually “welcome.html”, “main.htm” or similar. In this case, requesting the root directory of that website won’t return a website. In this case you need to set up another file to be used as a default:

DirectoryIndex welcome.html

To trigger a password prompt – like when you want to block access for search engines (and users) – for example to prevent from accessing a development environment, use the following directives:

AuthType basic
AuthName "development area"
AuthUserFile /your/path/to/.htpasswd
AuthGroupFile /dev/null
Require valid-user

Keep in mind that you have to setup a user / password combination and put these to the above defined .htpasswd file. Here is a generator for this.

Well – I hope you find one or the other tip helpful. And please feel free to share your own tips in the comments.

Tags

Written By
Bastian Grimm is founder and CEO of Grimm Digital. He mainly works as online marketing consultant with a strong focus on organic search engine optimization (SEO). Grimm specializes in SEO strategy consulting, website assessments as well as large scale link building campaigns.
  • This field is for validation purposes and should be left unchanged.