The UNIDATA Directory

The UNIDATA folder on the Unicode FTP/Web site ( is the official repository of the Unicode Character Database. Here's a quick rundown of what's in this directory:

  • ArabicShaping.txt groups Arabic and Syriac letters into categories depending on how they connect to their neighbors. The data in this file can be used to put together a minimally correct implementation of Arabic and Syriac character shaping.

  • BidiMirroring.txt is useful for implementing a rudimentary version of mirroring. For the characters whose glyphs in right-to-left text are supposed to be the mirror image of their glyphs in left-to-right text (the characters with the “mirrored” property), this file identifies those that ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.