A while back, Mark Pilgrim wrote an essay about cruft-free URLs in MoveableType. Wordpress, too, generates the mod_rewrite regular expressions needed for your .htaccess file to give your weblog the sensible and hackable form:
/archives/%year%/%monthnum%/%day%/%postname%
You’ll notice there’s no file suffix in that URL. You might argue that some suffixes aren’t cruft — a .pdf suffix, for instance, is surely meaningful to someone wondering what they’re about to download. But in weblog posts, it’s irrelevant whether a page is served up from PHP or ASP or as plain HTML. The suffix is cruft. It’s beside the point to your readers, if you have any.
Decrufting can be managed easily on Apache with mod_rewrite and less easily on IIS — you’ll need an ISAPI DLL that mimics mod_rewrite. Either way you’ll need to understand a little bit about regular expressions.
Rewriting URLs to reflect an organisational ideal rather than messy file structures and query strings is mildly diverting. On one website I’ve turned this:
http://www.{blahblahblah}.com/biography.aspx?lang=en&id=123
Into this:
http://www.{blahblahblah}.com/en/biography/bertrandrussell
—which is more ‘hackable’ by Jakob Nielsen’s lights and much more informative to human readers. It’s probably going to be a biography, in English, of Bertrand Russell. It’s also a cool URI, according to Tim Berners Lee: Cool URIs don’t change — we don’t have the .aspx suffix anymore and we can change platforms whenver we wish. In fact, we already have.
Decrufting in Wales
On bilingual websites in Wales, we’ve always had the problem of the language indicator. In the bad-old crufty days, a query string item would be used: …&lang=en…. Now we’re all mod-rewriting, the language specifier forms part of the URL, as above. But where should it go in the new virtual path?
On my bilingual websites I make it the first item after the domain. That’s partly because I always use a language-specifier start page and it is the first choice a user makes, if they come through the front door. It’s also a significant division in content.
Where could the language specifier go in the example?
- {blahblahblah}.com/en/biography/bertrandrussell
- {blahblahblah}.com/biography/en/bertrandrussell
- {blahblahblah}.com/biography/bertrandrussell/en
There’s no sensible argument for option 2 so it’s between 1 & 3.
At least one Welsh website has gone for the third option but I always plump for 1 because 3 implies to me that a user might have arrived at the Bertrand Russell biography in any language and then made the choice to read it in English. That isn’t how visitors use the website, in fact. A visitor will usually be consistent about their language choice, so always browse in English or always browse in Welsh.
And what happens as you descend the virtual paths if you put the language selector at the end?
- {blahblahblah}.com/biography/en
- {blahblahblah}.com/biography/industry/en
- {blahblahblah}.com/biography/industry/coal/en
Like that? That doesn’t make too much sense because successive URLs don’t reflect the paths i’ve really taken through the website.







