Unihan.txt
Finally, there's the Unihan.txt file. One of the most important scripts in Unicode is the Chinese characters (also called the Han characters or CJK Ideographs). Unicode 3.1 includes more than 70,000 Han characters, and these characters have additional properties beyond those assigned to the other Unicode characters.
Chief among these properties are mappings to various source standards. Unicode defines the meanings of the various Han characters by specifying exactly where they came from. This approach also lets you see just which characters from which source standards were unified together in Unicode. All of these mappings, plus a lot of other useful data, are found in Unihan.txt.
For each Han character, the Unihan.txt file gives at ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access