J. R. Boynton

URLs

The purpose of this essay is to get out, in one place, most of the usability and publishing aspects of URLs.

User Interface

The first issue is that the url is part of the user interface. It is one way for users to access information. That is, a user could remember and type a url, or copy and paste a url, or look at a url to decide whether to click on a link – whether to trust the link, even. Users can also email urls in plain text.

Seen this way, a url should be short enough so it doesn’t wrap in text-only email clients. The upper bound is probably 60 characters. It should not wrap, because some software will not recognize that the complete url is on two lines, and some software would require two copy/pastes to get the url to a browser.

I contend that it’s better to use hostnames than subdirectories: jobs.domain.com rather than www.domain.com/jobs.

Large websites wind up with a problem of how to insure uniqueness and consistency of urls. Some use the date. Some use a meaningless id number. Some try to have meaningful filenames. You get urls like: /2001/07/21/columnist.html or /story?321 or /articles/essays/columnist/rant_on_microsoft.html.

I suggest urls that provide a clue to users, both internal and external. For example, instead of a meaningless code (“321”), you could use a code that indicates the structure mnemonically. I once devised a system with filenames like “lcitpro04.html”. “lc” and “it” represented sections, and “pro” represented a specific article. Articles were chopped into multiple files, hence the numeric suffix. In many cases it would work to have a code that would be easy for the site maintainers to understand and generate uniquely, plus a word or two that would give someone a hint about the content of the file: “cb010721microsoft.html”. This might indicate a columnist whose last name begins with B on a particular date, writing a column primarily to do with Microsoft.

Some web software includes a session id in every url. I’ve seen websites with 120 urls on the front page, and each url was hundreds of characters long. Obviously sites like this have completely given up on usability aspects of the urls. No one will email those urls to a friend, or link to them from their homepage. It may be that sufficiently large marketing budgets or lack of competitors will make up for breaking usability, but it’s more likely that sites using such software will just go out of business.

URLs are abstract

The url is the location of a resource – for outside access. It is specifically intended that there is a translation from the url requested to a file or other resource on the web server. Internally, the resource might be:

/cgi-bin/gx.cgi/SomeVeryLongJavaClassname?code=2001

Externally, that might not be a helpful or useful label for the resource. You could assign it a url of /cb010721microsoft.html, for example, or /shoppingcart.html. The translation is made in a url rewrite phase of handling an http request in the web server.

Directory and filename as metadata

We have used directories (folders) to group files, and to give us separate namespaces. Essentially, filenames and directories are metadata of the content. You could store a document as record 1121 of some table in an Oracle database, but on the webserver, it could appear as /cb010721microsoft.html. The filename would be stored in a field in that record in the database. If you want to organize your website into directories, the directory could be stored in another field in the database.

/lcitpro04.html could have been /lc/it/pro/04.html. But why? Scalability of your publishing software? Say you use DreamWeaver and have a thousand files in one directory. You get a little File Open dialog box in which you’re expected to highlight one file. The dialog box is not an effective tool if there are 1,000 files, because there would be too much scrolling, and you would have to scroll through the same small window each time you open a file. If you edit /lc/it/pro/04.html, the next time you choose File Open, you will still be in the /lc/it/pro directory, and it will be easy to select 05.html.

This is a fundamental flaw in current desktop GUI software. Part of the answer is to be willing to type. Unix software frequently expects you to type, but “autocompletes” for you, so you don’t type as much as you would otherwise.

Dependencies

It’s a bad idea to make urls depend on anything. You should maintain the same url forever. Some software makes the url depend on the record number in the database, or the name of the template used to turn the content into a webpage. Not only are these meaningless to users, but they remove flexibility. What if you start out using one template for a set of articles, but then you decide you want to use two or more different templates for them. You would have to change their urls.

Similarly, if you make urls dependent on the site structure, you won’t be able to change the structure: /arts/newmedia/flash/typefaces.html. If you decide you want newmedia to be a top-level category, it’s too late.

Actually, you need to track urls separately from the content associated with the url, so if you delete or move the content, your publishing software can automatically put a message at that url to go to the new location or that it has been deleted, and offer some alternatives.




Copyright © 1998-2011 J. R. Boynton