Are Blog URLs important?
I had an interesting e-conversation with Rob Howard and Scott Watermasysk today. I had noticed recently that a number of blogs I'd visited had things like _2D00_ and similar codes in their URLs. There was a forum post a while back (July of 2006) that asked about things like hyphens getting encoded in some builds of CS2.1.
There were a bunch of posts on http://www.asp.net that had URLs like: .../2006/09/07/Startup-doesn_2700_t-always-mean-venture-capital... where the 2700 was a single quote or .../2006/09/05/Should-tags-be-moderated_3F00_... where 3F00 was a "?". The non-latin characters in these cases were being encoded in the URL with their Unicode Code Point. This was a bug in a beta of CS that was quickly fixed, but it got me thinking about URLs in blog engines and more generally. These particular URLs and their untidiness really irked me.
Different Ways to Get to the Same Place
Personally I like URLs that use Pascal Casing, like the one for this post, for example, is:
http://www.hanselman.com/blog/AreBlogURLsImportant.aspx
Although URLs are technically supposed to be case sensitive, and you used to see that a lot when URLs belied the underlying case -sensitivity of the file system, they aren't in our case. The only thing that would make it better, IMHO, is the removal of the .aspx extension. More on that later.
Years ago DasBlog had really lame URLs and Jeff Atwood picked on us. ;) These urls live on in some comments pages within DasBlog in some cases, unfortunately.
We started using the blog title to general the URL. This, of course, has problems when you change a title after a pile of folks link to the original URL, but unless you want the engine to keep track of every title a post has ever had and 301 to the "final URL," you've got a nasty problem. Anyway...
There's a number of options in DasBlog that affect your URLs, although DasBlog canonicalizes URLs internally and will always accept any of these formats without breaking your URLs. That is, you can change your URLs scheme and you won't be penalized.
There's options to use a + for a space, as well as including the date, so any of these are potentially valid:
- http://www.hanselman.com/blog/AreBlogURLsImportant.aspx
- http://www.hanselman.com/blog/Are+Blog+URLs+Important.aspx
- http://www.hanselman.com/blog/2007/04/06/AreBlogURLsImportant.aspx
- http://www.hanselman.com/blog/2007/04/06/Are+Blog+URLs+Important.aspx
Is One URL Format more Search Engine Friendly?
A number of folks have said they preferred hyphens over pluses, specifically that it helps Google. Rob mentioned during our email discussion:
The hyphens, however, are something you guys should investigate using for DasBlog. Search engines actually look at the URL for keywords. The hyphen is considered a word-break indicator, i.e. HelloWorld to Google appears as "HelloWorld" whereas Hello-World is "Hello" and "World". The underscore is also considered a word-break, but given less points.
I'm wasn't sure about this, and initially was skeptical, but it he's right - mostly. However, it seems to matter less and less, as Google seems to have added some smarts.
If you Google for "happybirthdaytomiiwiireview" all-one-word, you'll get my post on the Nintendo Wii with the URL highlighted. You'll also get that post if you Google for "Happy Birthday to Mii" as a list of words, or as a phrase with surrounding quotes because it also happens to be the title.
ASIDE: Oddly, if you Google for the phrase with hyphens (which is odd, in itself) as in happy-birthday-to-mii you'll get less results than if you do it with quotes. Not that there's any reason to do that.
Notice in the screenshot below how the word "Mii" appears bold in the URL. Not in the title, in the URL. That implies to me that Google either cares about casing, in this case the Pascal Casing of my blog's URL, or that it picked "Mii" up as a fragment and really cares about fragments of things in URLs.
Let's see which it is. If we search for "Happy Birthday to Mi" with just one i in "Mii" - where "Mi" is a fragment of "Mii" - we don't see my post anywhere at all, which implies, to me at least, that Pascal Casing in a Blog Post is likely as effective from Google's perspective in delimiting spaces as is a hyphen, so from a Search Engine Optimization (SEO) perspective, hyphens versus Pascal Casing versus whatever is pretty much a moo point.
Not moot, rather, "moo" like a cow's opinion. It just doesn't matter. It's moo.
So, pick the URL style that makes you feel good, I say.
Many Options for URLs
Scott Water has used ISAPI_Rewrite to completely remove the .aspx extension from his site, and he has nice clean URLs like http://scottwater.com/blog/archive/url-rewriting-via-isapi-rewrite/. He also has nice "hackable" URLs like http://scottwater.com/blog/search/hanselman/ which is pretty sweet. You too can remove the .ASPX extension from your ASP.NET site using ISAPI_Rewrite.
Here's some example URL styles I've seen out there in Blog Land:
- Subtext .../blog/archive/2007/02/11/Subtext_v1.9.4_quotWindwardquot_Edition_Released.aspx
- CS with ISAPI_Rewrite .../blog/archive/twitter-for-windows/
- Typo - .../articles/2007/03/27/microsoft-technical-summit
- DasBlog .../weblog/StringFormattingFun.aspx
- DasBlog with Dates - .../2007/03/27/Abschlussbericht+Zum+NET+Wintercamp+2007.aspx
- Radio Userland .../2007/04/05/itsNotTheCoverOfRollingSto.html
- MovableType - .../blog/archives/000093.html
- Blogger - .../2007/04/mulan.html
- Drupal - .../node/133257
- Blogware - .../blog/_archives/2006/8/18/2242665.html
Yes, there's 1,000 blogging engines out there, each with its own URL style, and yes, this is not an exhaustive list.
The Trailing Slash in a/an URL and removing Technology from your URL
Note that in ScottWater's case, the URLs are lower-case and include the trailing /.
There's a lot of controversy about the Trailing Slash. I've always felt that the trailing slash implied we were visiting a directory, while no slash implied we were visiting a page. Simon Willison seems to advocate for the trailing slash as in his comment at http://jessey.net/archive/2004/05/31/rewritten/.
Personally, I like the trailing slash only for the home page of this blog and set it up that way earlier this year. At least I picked one, as these things matter.
What I'd really like to do is remove the Technology from my URLs. I could remove the .aspx extension from my blog's URLs by:
- Making it output Permalink URLs with out .aspx
- Adding a ISAPI_Rewrite rule to add the .aspx before the request gets to ASP.NET
- Add some magic dust in ASP.NET 1.1 or A Form Control Adapter in ASP.NET 2.0 to change the HTML FORM Action in the case of a Post Back.
Of course, I'd need to do this without invalidating all the existing permalinks out there. The idea being that once you've put a permalink out there, it's out there. Forever. Only Feed Readers and Search Bots will respect a 301 and update their record of those links. All that static HTML out there cares not about your pretty URLs.
It's probably too late for me, Dear Reader, but perhaps not for you and your URLs. Pick a scheme and be excited about it, for these are religious issues that will never be solved.
Conclusion
I don't think ScottWater will mind me quoting him directly from a private email, in this case, to end this blog post:
What I meant is that if the goal is SEO, nice URLs are well…nice, but there are way better things you can do, such as writing relevant content. - Scott Watermasysk
It's true! I should stop now.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
I had several painful experiences with websites which simply didn't work because of one f***n case. Whenever I see the upper case in URL, it means a big exclamation mark for me: "be careful, maybe you will need to type the case correctly!". I was always wondering why would anybody force me to be worried when there is absolutely no technical reason for it. I just don't get it (and it really doesn't matter that your site runs on Windows filesystem which is case insensitive; you still make me worry about case sensitivity).
On my websites, I follow this simple rule of thumb: always-use-dashes-and-lower-case. It has several obvious advantages:
- my visitors are 100% sure that the typed-in address will work
- this-kind-of-url is more readable than thiskindofurl (this what you get if you convert PascalCasing into lower case - and users will do this when typing the URL in manually)
As a final note, another thing I don't understand is the trailing slash thing. I haven't read the debates and I never will because I design my website for end users - and they are definitely visiting PAGES, not DIRECTORIES. Again, it's that simple - this/makes/sense while this/does/not/.
Borek
"It just doesn't matter"
At least, from a user experience perspective it doesn't. Scott makes some great points about SEO and it strikes me as a little Orwellian that we all jump whatever direction Google tells us to (and that certainly includes me), but most human factors testing tells us that users almost never type URLs themselves any more or even look up at what they even are in the address box. They use some form of search instead. Case in point, this blog post. Never in a million years would I type this, but Google Reader told me it was here and I clicked on a 1-line description.
In the rare case where you want someone to remember a URL off of an advertisement or something similar, you typically give them a friendly URL that redirects to the "real" thing anyway. For example, at HP, we do things like hp.com/go/personal. That's about as friendly to type as you can get, but it takes you to a pretty ugly URL where the actual content resides.
To me, then, they URLs matter for SOE and in certain advertising instances, but not for every day browsing.
For content, ISAPI_Rewrite makes this very easy to require. This allows me to have consistent urls regardless if users enter the trailing slash or not.
#Fix missing slash char on folders
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http\://$1$2/ [I,RP]
MT generates what it calls an entry "basename" from the first title you give a post, and it generates the permalink from this basename. (The basename does not change even if the title does, which is one way to solve the issue you mention with retitling posts, although it has drawbacks of its own.) The permalink also includes, by default, year and month posted for the same reason you mention.
So for example a post titled "Teaching Kids to Program, Redux" posted in July 2006 gets an entry basename of teaching_kids_to_program_redux and a permalink of archives/2006/07/teaching_kids_to_program_redux.html. This permalink will never change, even if I retitle the post as "Penguins attack small children, film at 11".
The Wordpress/Typo method is pretty sweet, they just make sense.
I don't care what a url looks like as long as I don't have to type them in. Once a url is too long, I don't even notice it. Thank god for bookmarks, hyperlinks and rss readers.
Subtext now defaults to hyphens and all lower case.
I thank Scott for the URL format provided by .Text, and Phil for not taking it out of subText.
Seriously, I wouldn't even want to consider coming up with URLs for my posts, and as I tend to use obscene words in my titles I prefer to not have them used as URLs.
If I WERE forced into a naming scheme, I would have to say I prefer Pascal-With-Dashes-As-Seperators.aspx
Yes, I also want to keep the .aspx. Not only for the .NET recognition but also because I think urls that lead to files or pages without extensions suck!
I also think Scott nailed it with the /year/month/day thing. Especially when I can drop the page portion of the URL and go to a list of all posts that day/month/year. And, as it is still a page, it should (and does, thanks Phil) still have the .aspx.
BTW: Love the new skin, Scott!
Regarding the urls, and speaking of Wordpress (which happens to be one of the most popular blog tools currently, like it or not)... Wordpress automatically converts blog posts to the dash notation. So a recent blog post I made titled "Our Money Or My Money" was published like this:
http://www.moneytreeplanning.com/2007/our-money-or-my-money/
So I think we'll be seeing a lot more urls in this format than any other. Just my opinion.
1) Search: happybirthdaytomiiwiireview - the search engine will trigger on the url as a whole word.
2) Search: happybirthdaytomiiwii - you get no search results.
3) Search: happybirthdaytomiiwii review - you get no search results.
4) Search: Happy Birthday to Mii - you get a result, but the word to is filtered out (see highlighting)
5) Search: happy birth to mii - you get the result but only birth is highlighed not day (even if the camel case has no capital d)
1 shows that the words you put in your url do matter.
2 and 3 shows that the pascal case is not working
4 and 5 show that the search engine finds the article based on the title and content and that the highlighting of words in both the title and url is an independent process done in post processing.
The point of creating nice urls is not just about getting a high rank in a search engine, but also get the user (searcher) to actually click the link once found. If I see that the url contains the topic I searched for I am more likely to click the link (especially since it is highligthed).
SEO in its most basic form is just take the information and present it in a way that makes it easy to find. So I would prefer the happy-birthday-to-mii-wii-review pattern since google atleast then indexes each individual word and not the whole happybirthdaytomiiwiireview. Storing this url independent of the title is also a sane strategy in most situations, this allows me to have a long dynamic title "only 5 days until the big opening show" (counting down) and a nice short url /big-opening-show .
AreBlogURLsImportant.aspx#commentstart would be better.
Comments are closed.