The Importance of being UTF-8
Kevin Hammond at http://www.casadehambone.com/ wanted the title of his blog to be "Casa dé Hambone" - note the é. He's running DasBlog and saw "Casa d Hambone" - note the missing é.
I knew/know that this works fine in DasBlog because it's been internationalized since Day 1 - we've got 14 languages out of the box. He sent me his site.config file (that's where DasBlog stores its configuration) and I opened it in Notepad2.
Notice in the screenshot that this file is saved as ANSI/ASCII. This file was probably manually edited with a non-clever editor.
However, if you do a straight convert, of course, you'll lose data (and Notepad2 warns you of this fact). Notice what happens when I do a convert via File|Encoding:
This is one situation where the Windows Clipboard works great and can save you a hassle. I selected all , copied to the clipboard, changed the encoding, then pasted.
Now we're cool. ASP.NET and .NET in general will almost always "do the right thing" if you're using UTF-8. You can certainly specify alternate encodings if you like when you're opening a file via code. We use the StreamReader internally and the docs say:
StreamReader defaults to UTF-8 encoding unless specified otherwise, instead of defaulting to the ANSI code page for the current system. UTF-8 handles Unicode characters correctly and provides consistent results on localized versions of the operating system.
Joel's got a good article I've pointed to before about Internationalization. I've also got some posts in my Internationalization/i18n category.
Changing to UTF-8 fixed Kevin's problem.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
Comments are closed.