Chapter 36. Unicode and Byte Strings
In the strings chapter in the core types part
of this book (Chapter 7), I deliberately limited
the scope to the subset of string topics that most Python programmers
need to know about. Because the vast majority of programmers deal with
simple forms of text like ASCII, they can happily work with Python’s
basic str
string type and its
associated operations and don’t need to come to grips with more advanced
string concepts. In fact, such programmers can largely ignore the string
changes in Python 3.0 and continue to use strings as they may have in
the past.
On the other hand, some programmers deal with more specialized types of data: non-ASCII character sets, image file contents, and so on. For those programmers (and others who may join them some day), in this chapter we’re going to fill in the rest of the Python string story and look at some more advanced concepts in Python’s string model.
Specifically, we’ll explore the basics of Python’s support for Unicode text—wide-character strings used in internationalized applications—as well as binary data—strings that represent absolute byte values. As we’ll see, the advanced string representation story has diverged in recent versions of Python:
Python 3.0 provides an alternative string type for binary data and supports Unicode text in its normal string type (ASCII is treated as a simple type of Unicode).
Python 2.6 provides an alternative string type for non-ASCII Unicode text and supports both simple text ...
Get Learning Python, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.