Potty Little Details

Just another WordPress.com weblog

Rewriting or Redirecting URLs

leave a comment »

Fritz Onion (http://pluralsight.com/blogs/fritz/archive/2004/07/21/1651.aspx) has a great URL redirecting engine, very similar to the URL rewriting module written for DasBlog (http://www.sf.net/projects/dasblogce). There is often confusion between URL redirecting and the URL rewriting.

Redirecting is the server’s way of informing the client that something has moved. For example, a browser requests http://www.computerzen.com and the server responses with an HTTP 302 Status Code and points the browser to http://www.hanselman.com/blog. The browser then has to request http://www.hanselman.com/blog itself, and receives an HTTP 200 Status Code indicating success. During this redirection, the URL in the browser’s address bar will be updated. The final successful URL will ultimately appear in the address bar.

Rewriting, on the other hand, occurs entirely on the server side; the browser only requests a page once and the address bar’s displayed URL doesn’t change. For example, if you type http://www.hanselman.com/blog/zenzoquincy.aspx into your browser’s address bar, you’ll get a page showing off my infant son. However, the file zenzoquincy.aspx doesn’t actually exist anywhere on the disk. The only page that does exist is permalink.aspx, the page that my blog engine uses to show all blog posts. The real page is permalink.aspx?guid=cee8aa6e-de46-43ad-8d27-e1c764df30f5. However, that unique post ID isn’t very memorable and certainly not any fun. When the blog engine I run, DasBlog, sees ZenzoQuincy.aspx requested, it looks in its data store to see whether the words “ZenzoQuincy” are associated with a unique blog post ID and then rewrites the requested URL on-the-fly, on the server side, and ASP.NET continues dispatching the request.

URL redirecting and URL rewriting are together the most powerful techniques you have available to control the URL presented to the user, as well as to maintain your site’s permalinks. It is very important to most website content owners that their links remain permanent, hence “permalink.” Netiquette—Internet etiquette—dictates that if the URL does change, then you at least provide a redirect to inform the browser automatically that the resource has moved. As a protocol, HTTP provides two ways to alert the browser: the first is a temporary redirect, or 302, and the second is a permanent redirect, or 301.

To extend the example, my website uses a temporary redirect to send visitors from http://www.computerzen.com to http://www.hanselman.com/blog. It’s temporary because I might change the location of my blog at some point, pointing my top-level domain somewhere else. I use a permanent redirect for my blog’s RSS (Rich Site Summary) feed to inform aggregators and syndicators that I would prefer they always use a specific URL. When aggregators receive a 301, or permanent redirect, they know to update their own data and never visit the original URL again.

Fritz’s HttpModule uses a configuration section like this, which includes regular expressions to match target URLs to destination URLs via a redirect. Note that Fritz’s, and most, rewriting modules use regular expressions to express their intent. Regular expressions give a concise description of intent. For example, ∘/(fritz|aaron|keith|mike)/rss\.xml matches both the strings /fritz/rss.xml and /mike /rss.xml. Regular expressions are used in both the target and destination URL. The destination URL uses an expression like /blogs/$1/rss.aspx, where $1 is the first match in parentheses, in this case “fritz” or “mike”.

A simple hard-coded 301 redirect looks like this within ASP.NET:

response.StatusCode = 301;
response.Status = “301 Moved Permanently”;
response.RedirectLocation = “http://www.hanselman.com/blog”;
response.End();

Here’s an example of how DasBlog uses URL rewriting to service HTTP requests for files that don’t exist on the file system. Within my blog’s web.config file is a custom configuration section that includes regular expressions that are matched against the requested file. For example, the file http://www.hanselman.com/blog/rss.ashx doesn’t exist. There’s no handler for it, and the file doesn’t exist on disk. However, I’d like people to think of it as my main URL for the RSS XML content on my site. I’d like to easily change which service handles it internally with just a configuration change. I add this exception to my web.config custom section:

Note that it is mapped to “http://www.hanselman.com/blog/SyndicationService.asmx/GetRss” with the {basedir} having expanded. That URL isn’t nearly as friendly as rss.ashx, is it? Remember that the name rss.ashx isn’t special, it’s just unique. I picked it because the extension was already mapped within ASP.NET. It could have been something else like foo.bar, as long as the .bar extension was mapped to ASP.NET within the IIS configuration.

private void HandleBeginRequest( object sender, EventArgs evargs )
{
HttpApplication app = sender as HttpApplication;
string requestUrl = app.Context.Request.Url.PathAndQuery;
NameValueCollection urlMaps =
(NameValueCollection)ConfigurationSettings.GetConfig(“newtelligence.DasBlog.UrlMapp
er”);
for ( int loop=0;loop<urlMaps.Count;loop++)
{
string matchExpression = urlMaps.GetKey(loop);
Regex regExpression = new Regex(matchExpression,RegexOptions.IgnoreCase|
RegexOptions.Singleline|RegexOptions.CultureInvariant|
RegexOptions.Compiled);
Match matchUrl = regExpression.Match(requestUrl);
if ( matchUrl != null && matchUrl.Success )
{
string mapTo = urlMaps[matchExpression];
Regex regMap = new Regex(“\\{(?\\w+)\\}”);
foreach( Match matchExpr in regMap.Matches(mapTo) )
{
Group urlExpr;
string expr = matchExpr.Groups[“expr”].Value;
urlExpr = matchUrl.Groups[expr];
if ( urlExpr != null )
{
mapTo = mapTo.Replace(“{“+expr+”}”, urlExpr.Value);
}
}
app.Context.RewritePath(mapTo);
}
}
}
It starts by getting the NameValueCollection of URLs from the web.config file. The regular expression for each potential match is run against the request URL, which is pulled from HttpContext.Current.Request.Url.PathAndQuery. If an expression is found to match, each match in the requested URL is mapped to its spot in the destination URL. For example, note in the following code how the {postid} is extracted from the request URL and reused in the destination. Any good URL rewriting engine has support for this in some fashion, whether by {token} or by numeric position such as $1.

Check out the source code for DasBlog, or one of the other redirecting/rewriting modules I’ve mentioned, for more details and ideas on how you can create more “hackable” URLs for your application. Christopher Pietschmann has a nice VB version for ASP.NET 2.0 at http://pietschsoft.com/blog/post.aspx?postid=762.

Advertisements

Written by oneil

September 10, 2008 at 2:12 pm

Posted in ASP DOT NET

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: