Python: Essential Reference, Third Edition

Unicode I/O

Internally, Unicode strings are represented as sequences of 16-bit (UCS-2) or 32-bit (UCS-4) integer character values, depending on how Python is built. As in 8-bit strings, all characters are the same size, and most common string operations are simply extended to handle strings with a larger range of character values. However, whenever Unicode strings are converted to a stream of bytes, a number of issues arise. First, to preserve compatibility with existing software, it may be desirable to convert Unicode to an 8-bit representation compatible with software that expects to receive ASCII or other 8-bit data. Second, the use of 16-bit or 32-bit characters introduces problems related to byte ordering. For the Unicode character U+HHLL ...

Get Python: Essential Reference, Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python: Essential Reference, Third Edition by David Beazley

Unicode I/O

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly