Internationalization
Don’t assume anything about the encoding or the environment. You and everyone you know might use the same setup, but once you distribute your work you’re likely to find a world of differences.
Use Unicode inside your program. Do any translation to and from other character sets at your interfaces to the outside world. See Chapter 6.
Outside the world of Unicode, you should assume little about
character sets and nothing about the ord values of characters. Do not assume that
the alphabetic characters have sequential ord values. The lowercase letters may come
before or after the uppercase letters; the lowercase and uppercase may
be interlaced so that both a and
A come before b; the accented and other international
characters may be interlaced so that ä comes before b.
Even within Unicode, most of those warnings hold. There are many sequences of alphabetic characters in the same sequence whose codepoint order has nothing to do with their alphabetic order.
If your program is to operate on a POSIX system (a rather large assumption), consult the perllocale manpage for more information about POSIX locales. Locales affect character sets and encodings, and date and time formatting, among other things. Proper use of locales will make your program a little bit more portable, or at least more convenient and native-friendly for non-English users. But be aware that locales and Unicode don’t mix well yet.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access