Clicky

X

Subscribe to our newsletter

Sign up for Newsletter State of Digital
* = required field
Daily Updates

Three .htaccess tips that can help your SEO

29 July 2011 BY

Padlock

Image Credit: lupinoduck

A lot of times when you deal with SEO issues, these are related to URL structure, URL patterns and / or URL parameters being used. Since I’m currently doing some website restructurings for various clients, I thought I’d share some of the directives which we frequently use and which will make a lot of troubles just go away. In addition to URL handling I also added some performance related directives, as well as various access control tips.

What you should know up-front: The following directives can only be used when running an Apache web-server – no matter if you’re on an Unix or Windows box. However there are similar modules for other web-servers like Microsofts’ IIS, lighttpd and nginx as well. You should also keep in mind that using those directives might slow down you web-servers performance. Depending on the amount of parallel requests flying in, a lot of rules to be processed can potentially kill your web-server. So please make sure you really do use that stuff with care!

Talking about performance: Usually the following directives are being used within Apache’s so called .htaccess (which is short for hypertext access) files – however things might slow down doing so because for each request the webserver has to check for the presence of such an .htaccess file. This would mean whenever you have the possibility to access and use the httpd.conf (which is Apache’s main configuration file) I’d strongly recommend putting your directives there. This is mainly because the configuration will just be read once (at start-up) and only be refreshed when you especially ask to do so (e.g. perform a re-start / re-fresh). This will massively improve performance because you can de-activate .htaccess processing completely which results in fewer directory and HDD reads.

And as a final info: Sometimes you can’t use specific directives with .htaccess but only within httpd.conf – this usually depends on the web-server’s configuration. If possible, please be sure to test the httpd.conf if a directive does not seem to work in your .htaccess file. However – let’s get down to the real stuff, shall we?

1. Redirects

Yeah, what else, right? Very true, a lot of stuff in regards to URL structure is redirecting from A to B. For example when restructuring a site. Or maybe the content doesn’t exist any longer but you want to keep inbound links and rankings and therefore decide to redirect. Good choice!

The easiest way to redirect from “old/url.html” to “new/url.html” would look like this:

Redirect 301 /old/url.html http://www.domain.com/new/url.html

Or let’s say, for whatever reason, you want to move away from “.html” as a file extension but use folders instead. You can still do that with a single line using the RedirectMatch directive:

RedirectMatch 301 ^/([a-zA-Z0-9]*).html$ http://www.domain.com/$1/

Surely, it looks a little more complex – however, it’s just using simple patterns now. Keep in mind, the above shown example would only redirect html-files within the root directory and just accepts file names containing letters (upper and lower-case) as well as numbers.

RedirectMatch is quite a powerful tool to play around with, however – sometimes you’d want to go one step further. For example when solving www vs. non-www issues. This would require the use of Apache’s mod_rewrite module. The necessary code would look like this:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.domain\.com$
RewriteRule ^(.*)$ http://www.domain.com/$1 [L,R=301]
</IfModule>

The first and last line ensure that those commands will only be executed if the necessary module is present and has been loaded (because you’ll get some not so nice errors otherwise). Line two activates Apaches rewrite engine, line three contains the condition (in this case: if the hostname does not contain www…) execute line four. And this line just performs a redirect from without www to the same domain with a pre-pended www.

Please note: All rules explicitly state to perform a HTTP 301 redirect. If you don’t do so, Apache does perform a HTTP 302 redirect which is really bad for your SEO – so always, always make sure to explicitly specify the desired status code properly.

Another thing to keep mind: So called redirect chaining is probably a bad idea – make sure you directly redirect your URL from A to B, and not for example from A to B, where B does execute another redirect to C. At some point search engines just don’t follow those redirects anymore. And to prevent from that, well, just take “the shortest way possible”.

And as a quick tip for debugging: I really recommend using the RewriteLog feature when setting up new rules – it make things so much easier!

2. Performance

A while ago I wrote a post on performance optimization and what you can do to improve your websites performance – for users and / or for search engines. Surely some of them are based on the server-side of things.

Before we start, please make sure you have loaded the necessary modules. Go for the httpd.conf – there need to be two lines looking like this (which are not commented out, e.g. it must NOT start with “#”):

LoadModule deflate_module modules/mod_deflate.so
LoadModule expires_module modules/mod_expires.so

To activate file expires headers, go for this:

FileETag MTime Size
ExpiresActive on
ExpiresDefault "access plus 86400 seconds"

This would cause every file (for the same user) to be cached for 24 hours. However, in most cases you want to be more specific and define an expiration date on a per-file basis. For example images or css files using different expiration dates:

ExpiresByType image/jpeg "access plus 604800 seconds"
ExpiresByType text/css "access plus 259200 seconds"

It’s plain simple, just specify the number of seconds (based on the access time) on how long each of the file types should be cached. To double check if this works, just check the response headers, there should be something like this:

Another thing to improve performance is enabling GZIP-compression. This just means files will be compressed (aka g-zipped) before they’ll get send to the browser. To do so, just setup an Apache output filter like this one:

AddOutputFilterByType DEFLATE text/html text/plain text/xml application/xml application/xhtml+xml text/javascript text/css application/x-javascript
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4.0[678] no-gzip
BrowserMatch bMSIE !no-gzip !gzip-only-text/html

Copy, paste and you’re done. Again, check the response headers and try to find these:

3. Controlling access, browsing & more

Most of the time you probably don’t want everybody to see what’s on your sever. Usually this should be turned off by default, however – if not – use this to prevent from displaying directory listings:

Options -Indexes

Sometimes you might find yourself in a situation where the website’s default file is not named “index.html” – but eventually “welcome.html”, “main.htm” or similar. In this case, requesting the root directory of that website won’t return a website. In this case you need to set up another file to be used as a default:

DirectoryIndex welcome.html

To trigger a password prompt – like when you want to block access for search engines (and users) – for example to prevent from accessing a development environment, use the following directives:

AuthType basic
AuthName "development area"
AuthUserFile /your/path/to/.htpasswd
AuthGroupFile /dev/null
Require valid-user

Keep in mind that you have to setup a user / password combination and put these to the above defined .htpasswd file. Here is a generator for this.

Well – I hope you find one or the other tip helpful. And please feel free to share your own tips in the comments.

AUTHORED BY:
h

Bastian Grimm is founder and CEO of Grimm Digital. He mainly works as online marketing consultant with a strong focus on organic search engine optimization (SEO). Grimm specializes in SEO strategy consulting, website assessments as well as large scale link building campaigns.
  • http://twitter.com/brettpringle Brett Pringle

    Hi Bastian,

    Another feature commonly overlooked in .htaccess is the uppercase vs lowercase that can occur within URL structures, and forcing any capitalisation to lowercase within the RewriteEngine

    RewriteEngine On
    RewriteMap lc int:tolower
    RewriteCond %{REQUEST_URI} [A-Z]
    RewriteRule (.*) ${lc:$1} [R=301,L]

    • http://www.candleforex.com/ CandleForex

      Now that’s a tip I wasn’t aware of. Thanks!

  • http://www.grimm-digital.com/ Bastian Grimm

    Hi Brett,

    very true – great addition, thanks a lot! Will update the post later :)

    B.

  • Dave Ashworth

    With regards caching and gzip compression, I’ve found the following 2 commands to pretty much sort out all issues found highlighted when using YSlow:

    SetOutputFilter DEFLATE
    Header unset ETag
    FileETag None

    and

    Header set Cache-Control “max-age=2419200, public”

    more on this here:
    http://www.returnondigital.com/blog/tips-to-improve-your-page-load-speed

    with regards redirects, another useful commands in relation to canonical tags is to redirect the index file:

    as
    http://www.domain.com
    and
    http://www.domain.com/index.php

    can be seen as two different pages, you should use the following:

    RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/
    RewriteRule ^index\.php$ http://www.domain.com/ [R=301,L]

  • Pingback: Three .htaccess tips that can help your SEO – State of Search | Complete Seo Tips

  • http://www.probiotixfoods.com Modi

    Great tips Bastian, thanks for sharing!

    Page speed and Yslow very often suggest adding expires headers, caching css and image files as well as enabling gzip but many webmasters seem to struggle. Is there any way to achieve the same on IIS?

  • g1smd

    1a. Don’t mix Redirect/RedirectMatch directives (processed with mod_alias) and RewriteRule (processed with mod_rewrite) in the same site. Use RewriteRule for all of the rules. Failure to do so will result in previously rewritten internal paths being exposed as new URLs should mod_alias ever run after mod_rewrite.

    1b. Don’t wrap your RewriteRules in ifModule tags. You do NOT want mod_rewrite to silently fail. If rewriting fails you want to know about it – immediately.

    1c. In the non-www to www rule don’t redirect pure HTTP/1.0 requests. They do not send a HOST header so the current rule as shown could create an infinite loop. Fix by using !^( www \.example\.com)?$ as the RegEx pattern.

    @Brett The Redirect target should include both protocol and canonical domain name. Additionally, using a rewritemap is not the most efficient way to force lower-case URLs.

    @Dave The index filename redirect is a good idea. To ensure that a redirect chain is NOT created, the index redirect MUST be listed before the non-www to www redirect.

    Do change the \.php bit to instead read \.(php|html?) or similar.

  • http://www.travelcenteruk.co.uk/ cheap flights

    Thank for the tips, I dont find htaccess in my windows server, so how can I rectify in windows server?

  • ActualTecnologia