Using ISAPI_Rewrite to canonicalize ASP.NET URLs and remove default.aspx
In the comments of my post on Google PageRanks, Jeff Atwood says:
[The existence of] Default.aspx is another reason to consider URL rewriting. A few of my rewrite rules relative to PR:
- I don't allow links to come in as codinghorror.com, I add the www. if it is not there.
- I remove index.html if it is present
This got me thinking, as it appears that are quite a few ways to get to my home page.
- http://www.hanselman.com/blog/
- http://www.hanselman.com/blog/default.aspx
- http://www.hanselman.com/blog
- http://hanselman.com/blog/
- http://hanselman.com/blog/default.aspx
- http://hanselman.com/blog
- http://www.hanselman.com/blog/Default.aspx
- http://www.computerzen.com/blog/
- http://www.computerzen.com
- http://computerzen.com/blog/
- http://computerzen.com/
You get the idea...Heck, probably just by mentioning them I'm getting in trouble, right? The URI that dare not speak its name.
Away, if we start by assuming my home page is http://www.hanselman.com/blog/ and that includes the trailing slash. We know that if my browser requests http://www.hanselman.com/blog without the slash, it'll be told by the Web Server to try it again anyway, which is just wasteful.
Apache folks have mod_rewrite and love to remind ASP.NET/IIS folks about their awesomeness. Many sites rely on mod_rewrite for certain behaviors. It's really a fundamental part of the Apache experience. The IIS story becomes better in newer versions of IIS, but the easiest and most flexible way to handle these kinds of things is ISAPI_Rewrite.
Sure, one could create an HTTP Module for ASP.NET for some of this, but at some point you'll realize that you need to catch these requests WAY earlier. Now, ISAPI_Rewrite uses Regular Expressions, and now it's time for my oft-repeated favorite RegEx joke - get ready for it:
"So you've got a problem, and you want to use Regular Expressions to solve it. Now you've got two problems."
Thanks for indulging me. Yes, writing ISAPI_Rewrite stuff is freaking voodoo and I hate it. Once you've written them, they're done. Here's mine:
[ISAPI_Rewrite]
RewriteRule /blog/default\.aspx http\://www.hanselman.com/blog/ [I,RP]
RewriteCond Host: ^hanselman\.com
RewriteRule (.*) http\://www.hanselman.com$1 [I,RP]RewriteCond Host: ^computerzen\.com
RewriteRule (.*) http\://www.hanselman.com$1 [I,RP]RewriteCond Host: ^www.computerzen\.com
RewriteRule (.*) http\://www.hanselman.com/blog/ [I,RP]
This rules normalize (canonicalize), to the best of my ability, all the not-really-good URLs above. It'll put everyone to http://www.hanselman.com/blog/ and even take totally lame links like http://computerzen.com/blog/GooglePageRanksConsideredSubtle.aspx and make then "correct." The "I" means "case insensitive" and the "RP" means "Redirect Permanently" - an HTTP 301. If it was just "R" it'd be a 302. When you're testing with ISAPI_Rewrite, always start with "R" to do temporary redirects, because you don't get a second chance with a 301.
So now, even if someone asks for http://www.hanselman.com/blog, they'll be told where to go(here's an HTTP conversation):
- GET /blog HTTP/1.1
- Heh, uh, get me /blog, m'kay?
- HTTP/1.1 301 Moved Permanently
Location: http://www.hanselman.com/blog/- Don't ever come to me with that kind of crap again. Always go here, and stop bothering me. Seriously. Go over here http://www.hanselman.com/blog/
- GET /blog/ HTTP/1.1
- Gosh, sorrey (my browser is Canadian) get me /blog/ then.
And it was Good™.
This kind of control is useful in any public facing application or web site and one should take an hour or so and really think about their website's "public face." ISAPI_Rewrite can be a powerful component as part of a larger ASP.NET solution, especially one where Google Ranks do matter and hackable or "pretty" URLs are highly valued.
For us, in the banking industry, having nice URLs like http://www.foobank.com/banking/ or http://mobile.foobank.com makes everyone happy.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
It works really well. I love the fact that with the new version you can edit the .ini file on the fly and not have to restart the application pool.
i'll have to wait till tomorrow to withdraw all my foo.
Are you saying it should work like this?
http://www.hanselman.com/blog/?date=2007-02-01
I noticed this little subtlety after logging in to my blog. The header link was /foo/default.aspx?page=admin but would be redirected to /foo/default.aspx resulting in the admin bar not being shown.
Gee, I dunno, why don't you ask http://google.com ?
Oh wait, you can't. Because it redirects to http://www.google.com .
WORLD WIDE WEB BABY! DUBYA DUBYA DUBYA!
RewriteRule /(.*)/default\.aspx /$1 [I,R=301] takes me straight to http://foo/bar/ when clicking a link with href http://foo/bar/default.aspx?page=admin. One way to resolve this either to use RewriteRule ^/(.*)/default\.aspx$ /$1 [I,R=301] to match URLs that contain nothing but a vdir followed by default.aspx or to append QueryStrings like depicted above.
That said, it will gain you nothing but awareness about the differences between ISAPI_Rewrite's and IIRF's RegEx engines.
Or with the default rewriter of asp.net 2.0?
I've the same problem, but hosted on wh4l, so cannot access IIS to install ISAPI
Comments are closed.
PS. You put the ™ initial on "And it was Good™" but I didn't get the reference. Please do elaborate :-)