September 2002
Intermediate to advanced
896 pages
21h 3m
English
The Unicode 2.0 standard, in its discussion of the byte order mark, described how it could be used not just to tell whether a Unicode file was the proper endian-ness, but whether it was a Unicode file at all. The idea is that the sequence 0xFE 0xFF (in Latin-1, a lowercase y with a diaeresis followed by the lowercase Icelandic letter “thorn”) would basically never be the first two characters of a normal ASCII/Latin-1 document. Therefore, you could look at something you knew was a text file and tell what it was: If the first two bytes were 0xFE 0xFF, it was Unicode; if the bytes were 0xFF 0xFE, it was byte-swapped Unicode; and if the bytes were anything else, it was whatever the default encoding for the system ...
Read now
Unlock full access