Apache Rewrite Directives

Last Amended: 14th November 2009

OK, so I've got my nice shiny VPS up and running using FreeBSD 6.2 and Apache 2.2 but it seems I still have a bit of tweaking to do. For now, I've got three problemettes to solve, and I want to do it using Apache Rewrite directives. Fortunately, I know my way around Apache reasonably well and I sure know all about them there Regular Expressons. But until now I've never had a need to play with Rewrite in any detail. And as is my wont, I'm documenting what I did and why I did it.

The Rewrite module is a powerful add–on for Apache and there is a lot, indeed, more than a lot, that can be achieved with it. It is still quite arcane and can be difficult to learn, especially if you aren't already familiar with the terseness and line-noise voodoo that is the Unix Way of Life. In fact, the Rewrite module is relatively easy to understand, it's the Regular Expressions that most people are likely to slip up on. I would suggest, in traditional cook book manner, that you begin by searching the web for a page like this one and see if the solution can be easily adapted to your needs – after all, why re–invent the wheel if you don't need to?

Fortunately, my needs are quite simple and I don't need to use the more esoteric functions of the Rewrite module.

1. www to non-www rewrite

I don't want to have to constantly distinguish between www.zeltus.eu and zeltus.eu. Traditionally rewriting has been used to ADD the "www." at the start of a URL – but I want to remove it. Why? Well, for one, it's shorter and therefore more memorable. Secondly, there is no reason nowadays to go for the older traditional www.zeltus.eu format URL's. DNS lookups nowadays are very flexible in what is accepted as a valid URL. And this flexibility will almost certainly expand when IPV6 starts to become more prevalent.

So long as I am consistent in what I Rewrite to, then Search Engines like Google will happily optimise my statistics by only "seeing" a single domain URI, which should help increase my sites' visibility. And finally, any API's that require a licence key – Google Maps for instance – won't get all stroppy over my using the wrong key for a domain – as thanks to rewriting I will only have one domain and hence only need one licence key.

In fact, it's likely that web addresses will be pretty much completely de–formatted in the coming years. ICANN have recently stated that they are considering allowing any format TLD name so long as the customer is prepared to pay. No more cyber–squatting and friendly URL's to boot!

2. Trailing /'s

The most trivial of all the problemettes I have is the trailing slash problem. Trivial in that the solution is easy. I do suspect however that this is a rule that might well prove it's worth to me.

I have organised my web pages to make use of sub–directories. Now, altho' I rather hope most readers will reach me by clicking on Search Engine links and the like, there is always the possibility that the touch–typists out there will simply type my URL into the address bar. And there is a huge difference between for instance, http://zeltus.eu/mre and http://zeltus.eu/mre/

In the former, the browser is attempting to retrieve and display from my server a file in my web–root called mre.html (which doesn't actually exist). Yes, I know I'm assuming the browser is clever enough to try and add ".html" if it is missing) In the latter, the browser has requested a file called index.html (again, yes, I know, it might be any of a number of index types as specified in my httpd.conf file) in the sub-directory mre. For the omission (or addition, if you will) that is quite a difference in behaviour.

So I need a rule that checks to see if a trailing slash exists and if not, to add one if necessary (i.e. if the final part of the supplied URL is a directory)

3. Hotlinking

Hotlinking can be a serious problem for some heavily used sites. Others, mostly portal type sites (Flickr is an excellent example), rely on it to operate correctly. But I want to defeat anyone trying to hotlink images from my website. I don't actually have any problems with hot–linkers, zeltus being a relatively undiscovered region of the web. But Prevention is always better than Cure.

I don't have any issues with someone typing a full image URI in their address bar, or downloading my images (I'd rather they didn't but I can't stop them) – but I am not going to donate any bandwidth that I pay for so they can link directly to my images.

Here's an example.

http://zeltus.eu/photos/Charlies_rear.jpg

Copy and paste this into your browsers' address bar and it'll display the photograph in an otherwise blank page. This is expected behaviour and fine and dandy by me.

Or click this link –  Charlie's Rear  – and you'll get the same effect.

But include an <img> statement in your page that references this photo, and you will instead see  this  image. All thanks to a Rewrite directive.

here is the test code, just to make copy'n'paste a little bit easier.

<img src="http://zeltus.eu/photos/Charlies_rear.jpg" alt="" />

There are plenty of other options I could have used, such as displaying an error page or otherwise being unhelpful to hot–linkers. But I figure displaying an image that advertises my own site is a pretty good compromise solution.

To continue...

I do have to be conscious of the rule ordering. I'm looking for a cascade effect so that each ruleset is expecting whatever the previous ruleset has generated.

I am also avoiding using .htaccess files. As far as I am concerned these are for specific directory (or file) settings and what I want is a web–wide solution set. There is also an overhead in using .htaccess files – the rules are parsed each time an access to that directory is made, whereas entries in httpd.conf are parsed just the once when Apache is restarted.

Of course, owning my own VPS means manipulating Apache is no problem for me – if all you've got is the choice of .htaccess files, then that's what you'll have to use.

Just as reminder to myself, my httpd.conf file is located in /usr/local/etc/apache22. But I really must set up a userid or directory where I can create links to all the configuration files that are liberally splatted around my VPS.

Prerequisites

MAKE A BACKUP COPY OF httpd.conf BEFORE EDITING IT! Nothing will ruin your day more than ending up with a non-working httpd.conf file and no way to revert back to the original. DAMHIKIJFKOK.

It is necessary that the mod_rewrite module is included in Apache's config file before any of this will work. It normally is, but in case you're unsure, you're looking for a line that looks something like this one...

LoadModule rewrite_module libexec/apache22/mod_rewrite.so

...somewhere in a (hopefully logical) section of httpd.conf

Also required is a specific call to start up the Rewrite engine. This is a once–only call (for each domain, if you have more than one.)

RewriteEngine On

It will normally immediately precede the Rewrite rules themselves.

1. Solution - www to non-www rewrite

RewriteCond %{HTTP_HOST} !^zeltus\.eu$ [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://zeltus.eu/$1 [L,R=301]

I'll explain this line by line.

RewriteCond %{HTTP_HOST} !^http://zeltus\.eu [NC]

If HTTP_HOST is NOT http://zeltus.eu, then whatever it is, it needs to be rewritten it TO http://zeltus.eu

RewriteCond %{HTTP_HOST} !^$

A belt–and–braces check to ensure the string isn't empty. It ensures the next rule doesn't throw up an error.

RewriteRule ^(.*) http://zeltus.eu/$1 [L,R=301]

If the preceding Conditionals have got us to here, we need to rewrite with the shorter site string. I might have to adjust the flags when I come to string all these rules together. R=301 is a "Page Moved Permanently" return code – the default is to return 302, a temporary removal. In the context I wish to use these rules, it doesn't really matter which code I use – indeed, I am unaware of any mechanism that takes any notice of these. But WHEN I use WHAT flags and where has yet to be decided, at this stage.

Incidentally, the [R] flag is a REDIRECT, not Recursion or Return, as some folks seem to think but REDIRECT!

2. Solution - The trailing slash problem

RewriteCond    %{REQUEST_FILENAME}  –d
RewriteRule    ^(.+[^/])$ $1/  [R]

Almost trivial, this rule determines if the requested filename is in fact a directory. If it is, then append a trailing slash. This RE grabs hold of everything up to but NOT including a trailing slash saving the result in $1. Thus, if a trailing slash IS supplied, it is stripped off and then explicitly replaced! A slight processing overhead and a pretty good example of how esoteric and obfuscated even simple RE's (Regular Expressions) can be. But it is only a short learning curve coming to terms with the simple RE's I have use here.

3. Solution - Hotlinking

RewriteCond %{HTTP_REFERER} . 
RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)?zeltus\.eu/ [NC]
RewriteCond %{HTTP_USER_AGENT} !(googlebot–image|msnbot|psbot|yahoo–mmcrawler) [NC]
RewriteRule \.(bmp|gif|jpe?g|png)$ /images/hotlink.$1 [L]

Again, let's briefly explain what's going here. Firstly, there are a few Conditionals which determine if the Rule needs to be applied.

RewriteCond %{HTTP_REFERER} . 

This deals with blank referers. There are a number of reasons for this variable being blank but the most likely is that the user has made a direct reference to an image, as mentioned  above . As I want to allow such calls, this line handles such a case and may be read as "if HTTP_REFERER is NOT blank"

RewriteCond %{HTTP_REFERER} !^http://([^.]+\.)?zeltus\.eu/ [NC]

If the referrer isn't my domain, then carry on checking. Note that I'll be amending this rule later on, as in this format, it allows for the www. In the address that I intend to strip out with a preceding rule. The [NC] flag allows for case-insensitive letter combinations.

RewriteCond %{HTTP_USER_AGENT} !(googlebot-image|msnbot|psbot|yahoo-mmcrawler) [NC]

I'm quite happy to allow sites like image.google.com and so on to hook directly onto my images. I may have to add additional robot names as and when I get to hear about them.

RewriteRule \.(bmp|gif|jpe?g|png)$ /images/hotlink.$1 [L]

Finally, the Rule itself. If the URL ends in one of a known image type, and the preceding conditionals have allowed us to reach here, we've got a hot–linker! So replace the image call with one of the specially prepared hotlink images. I feel it is safest to return an image of the type requested, in case the browser takes exception. And if I've taken the trouble to create one version of an error image, it's easy enough to simply do a "save as" to create the other versions. Photoshop is sooo cool!

Completed Solution

<VirtualHost zeltus.eu>
    ServerName zeltus.eu
    ServerAlias www.zeltus.eu

    RewriteEngine On

    RewriteCond %{HTTP_HOST} !^zeltus\.eu [NC]
    RewriteCond %{HTTP_HOST} !^$
    RewriteRule ^/(.*) http://zeltus.eu/$1 [R]

    RewriteCond %{REQUEST_FILENAME} –d
    RewriteRule ^(.+[^/])$ $1/

    RewriteCond %{HTTP_REFERER} .
    RewriteCond %{HTTP_REFERER} !^http://zeltus\.eu/ [NC]
    RewriteCond %{HTTP_USER_AGENT} !(googlebot–image|msnbot|psbot|yahoo–mmcrawler) [NC]
    RewriteRule \.(bmp|gif|jpe?g|png)$ /images/hotlink.$1 [L]
    .
    .
    . 
</VirtualHost>

All three rulesets chained together, along with a minor adjustment of the flags. One thing many people do not realise is that the [R] flag turns the URL into a fully–formed URI by, if necessary, pre–pending "http://" and appending ":80". Note that I have also amended the hotlink rules in the third section, as by this point I KNOW I don't have a preceding "www." to worry about.

A quick call to "apachectl graceful" to restart Apache and, if no errors are thrown up, I can proceed to testing these rules. Ideally I'd do it on a dedicated test site, but this really is quite a simple update and I only have myself to answer to. Job done!

This is a relatively simple example of rewriting, but then, if you are a SysAdmin God, you'll already know all this and won't need this information. But I'm hoping this will be of use to some of you out there. And that once you are here, you'll stay and look at my other pages.

Whatever, this page is a helpful reminder to me as to what I did, when I did it and why.

Links

Googling for specific solutions is easy enough, but the source documentation for all of this is at...

 http://httpd.apache.org/docs/2.0/misc/rewriteguide.html 

and

 http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html 

 

 

 

 

 William Parker 
© 2017 All Rights Reserved
  Amended:- 14th November 2009
Review:- (whenever)
Best viewed at a resolution of 1024x768 or greater
Powered by PHP
Valid XHTML 1.0 Transitional
Valid CSS!