February 2004
Intermediate to advanced
544 pages
9h 55m
English
You may recall a time when the string was seemingly a very simple data type. Computing the length of a string or converting it to lowercase or uppercase was a trivial exercise. (However, your trivial solution almost certainly worked for only one particular language or locale.)
Well, no more. Unicode is considerably more complex than the strings of yore. With characters that occupy one or many bytes, simple operations like computing the string length are no longer so simple. There are special cases like the famous “Turkish I” in which the ordinary letter I (U+0049) in the Turkish language turns into a lowercase special dotless ι (U+0131) instead of the usual dotted i (U+0069). Changing the case of a string ...