Internationalized Regular Expressions
UPDATE: There's more on Internationalized RegExs in this StackOverflow question.
I was trying to make a regular expression for use in client-side JavaScript (using a PeterBlum Validator) that allowed a series of special characters:
-'.,&#@:?!()$\/
Plus letters and numbers and whitespace:
\w\d\s
However, I mistakenly assumed that \w meant truly "word characters." It doesn't, it means [A-Za-z].
That sucks. What about José, when he wants to put his First Name into a form?
Well, I could do a RegEx that denies specific characters and allows all others, but I really just wanted to support Spanish, French, English, German, and any language that uses the general Latin Character Set.
So, here's what I have.
^[
ÀÈÌÒÙ àèìòù ÁÉÍÓÚ Ý áéíóúý
ÂÊÎÔÛ âêîôû ÃÑÕ ãñõ ÄËÏÖÜŸ
äëïöüŸ ¡¿çÇŒœ ßØøÅå ÆæÞþ
Ðð ""\w\d\s-'.,&#@:?!()$\/
]+$
Did I miss anything? (Ignore the whitespace for the purposes of this post's RegEx)
It's lame that \w doesn't work on the client-side based on your browser's locale. This makes it difficult for your RegExes to have parity between the client and server.
About Scott
Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.
About Newsletter
I'll update the post.
From MSDN:
"Character classes are specified differently in matching expressions. Canonical regular expressions support Unicode character categories by default. ECMAScript does not support Unicode.
Matches any word character. Equivalent to the Unicode character categories
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9]."
And what about &? Do regular expressions know about HTML-Entities? I guess the correct Expression would be
^[-ÀÈÌÒÙ àèìòù ÁÉÍÓÚ Ý áéíóúý
ÂÊÎÔÛ âêîôû ÃÑÕ ãñõ ÄËÏÖÜŸ
äëïöüŸ ¡¿çÇŒœ ßØøÅå ÆæÞþ
Ðð ""\w\d\s'.,&;#@:?!()$\/
]+$
wouldn't it?
Regards,
Ralf
You should just do that...
BTW, which characters do you want to exclude?
I guess that's what makes those regular expressions hard to read :-)
Can anyone refer me to a working example of Scott's regex. When I try using it in a script, it does not load in Firefox or IE. Here is my code:
// Is a proper name?
function proper() {
if (field.type == "text" || field.type == "textarea") {
var regx = /^[ÀÈÌÒÙàèìòùÁÉÍÓÚÝáéíóúýÂÊÎÔÛâêîôûÃÑÕãñõÄËÏÖÜäëïöü¡¿çÇßØøÅåÆæÞþÐð""\w\d\s-'.,&#@:?!()$\/]+$/;
if (field.value.length > 0 && !regx.test(field.value)) {
alert('Not a proper name');
return false;
}
}
return true;
};
Note: The above funtion is part of a class and the 'field' variable is set when the class is instantiated.
Thank you,
Daniel
Comments are closed.