Converting from a String Representation of a Unicode Character back into a char
Hopefully Michael Kaplan will step in here and explain some edge case or just a general comment like "that's totally wrong, Scott" - but until he does:
A fellow emailed me this question:
I want to convert a string representation of a Unicode character back into a 'char' in .NET C#. Can you help?
i.e."U+0041" which is Hexidecimal for 65 which is ASCII for "A"
There's got to be a built in function(s) for this, and I just can't seem to find them?
To give you an idea, the pseudocode would be something like:
string s = "U+0041";
char c = new ?Unicode.Decoder.Decode?(s);
textBox1.Text = c.ToString();
Now, I have no idea why this gentleman would want to do this, but it's interesting enough. Here's what I came up with. I'm sure there's a better way.
//Just a reminder that you can use \u to escape Unicode in C# char c = '\u0063'; Console.WriteLine(c.ToString()); //Here's how you'd go from a string to stuff like // U+0053 U+0063 U+006f string scott = "Scott and the letter c"; foreach(char s in scott) { Console.Write("U+{0:x4} ",(int)s); } //Here's how converted a string (assuming it starts with U+) // containing the representation of a char // back to a char // Is there a built in, or cleaner way? Would this work in Chinese? string maybeC = "U+0063"; int p = int.Parse(maybeC.Substring(2), System.Globalization.NumberStyles.HexNumber); Console.WriteLine((char)p);
Now playing: Craig Armstrong - Ray's Theme
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
So,
b = Text.Encoding.Unicode.GetBytes(s)
s = Text.Encoding.Unicode.GetString(b)
where s = string and b = array of bytes
// Completely untested
// String to Unicode code points
string scott = "Scott and the letter c";
int highbits = 0;
foreach (char ch in scott)
{
/**/ int i = (int) ch;
/**/ if (i < 0xD800 || i > 0xDFFF)
/**/ /**/ Console.Write("U+{0:x4} ", i);
/**/ else if (i < 0xDC00) // ... Surrogate high
/**/ /**/ highbits = i - 0xD800;
/**/ else // ... Surrogate low
/**/ /**/ Console.Write("U+{0:x6} ", highbits << 10 + (i - 0xDC00) + 0x10000);
}
// Unicode code point to string
string codePoint = "U+12345";
int ordinal = int.Parse(codePoint.substring(2), System.Globalization.NumberStyles.HexNumber);
if (ordinal < 0x10000)
/**/ Console.WriteLine((char) ordinal);
else
/**/ Console.WriteLine((char) ((ordinal - 0x10000) >> 10 + 0xD800), (char) ((ordinal - 0x10000) & 0x3FF + 0xDC00));
Comments are closed.
Encoding.Default.GetBytes(stUnicodeString)
to get back a byte array containing the non-Unicode character(s).