Unicode and Programming Languages

Most of the more recent programming languages now either use Unicode as their base internal character encoding or have a way to let you use it if you want. Many even allow Unicode characters in the syntax of the language in addition to using it in comments and literal strings.

The Unicode Identifier Guidelines

The Unicode standard gives guidelines for how programming languages (and other protocols such as XML) that want to use the full Unicode range in their identifiers can do so.[6] The basic idea is to extend the common definition of an identifier (a letter followed by zero or more letters or digits) to the full Unicode repertoire, and to do so in a way that allows for combining character sequences and invisible ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.