UnicodeData.txt
The “nerve center” of the Unicode Standard is the UnicodeData.txt file, which contains most of the Unicode Character Database. As the database has grown, and as supplementary information has been added to the database, various pieces of it have been split out into separate files. Nevertheless, the most important parts of the standard continue to reside in UnicodeData.txt.
The designers of Unicode wanted the database to be as simple and universal as possible, so it's maintained as a simple ASCII text file (we'll gloss over the irony of having the Unicode Character Database stored in an ASCII text file). For ease of parsing, this file is a simple semicolon-delimited text file. Each record in the database (i.e., the information pertaining ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access