Unicode I/O

Internally, Unicode strings are represented as sequences of 16-bit integer character values.As in 8-bit strings, all characters are the same size, and most common string operations are simply extended to handle strings with a larger range of character values. However, whenever Unicode strings are converted to a stream of bytes, a number of issues arise. First, to preserve compatibility with existing software, it may be desirable to convert Unicode to an 8-bit representation compatible with software that expects to receive ASCII or other 8-bit data. Second, the use of 16-bit characters introduces problems related to byte ordering. For a Unicode character U+HHLL, little endian encoding places the low-order byte first, as in LL HH

Get Python Essential Reference, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.