Chapter 37. Unicode and Byte Strings
In Chapter 7, the Python string story was watered down on purpose to help you get started with the fundamentals. Now that you’ve learned the basics, this chapter moves on to extend them to include the full Unicode-text and binary-data string tales in Python.
This extension was more optional in earlier editions of this book because Unicode was an afterthought in Python 2.X. Python 3.X elevates it to required reading because its normal strings simply are Unicode. Still, how much you need to care about this topic depends in large part upon which of the following categories you fall into:
If you deal with non-ASCII Unicode text—for instance, in the context of internet content, internationalized applications, XML parsers, and some GUIs—you will find direct and seamless support for text encodings in both Python’s all-Unicode
strobject, as well as its Unicode-aware text files.If you deal with binary data—for example, in the form of image or audio files, network transfers, or packed data shared with lower-level tools—you will need to understand Python’s
bytesobject and its sharp distinction between text and binary data and files.If you fall into neither of the prior two categories, you may be able to defer this topic and use strings as you did in Chapter 7: with the general
strobject, text files, and all the familiar string operations. Your strings will be encoded and decoded using your platform’s default Unicode encoding, but you won’t notice—until, ...