
Specific Built-in Types
|
21
is usually equivalent to a string method call such as:
res = X.replace('span', 'spam')
But the string method call form is preferred and quicker, and
string methods require no module imports. Note that the
string.join(list, delim) operation becomes a method of
the delimiter string
delim.join(list).
Unicode Strings
Python supports Unicode (wide) character strings, which
represent each character with 16 (or more) bits, not 8.
Literals
Unicode strings are written as u"string". Arbitrary Uni-
code characters can be written using a special escape
sequence,
\uHHHH, where HHHH is a four-digit hexadecimal
number from
0000 to FFFF. The traditional \xHH escape
sequence can also be used, and octal escapes can be used
for characters up to
+01FF, which is represented by \777.
Operations
Like normal strings, all immutable sequence operations
apply. Normal and Unicode string objects can be freely
mixed; combining 8-bit and Unicode strings always coerces
to Unicode, using the default ASCII encoding (e.g., the result
of
'a' + u'bc' is u'abc'). Mixed-type operations assume the
8-bit string contains 7-bit U.S. ASCII data (and raise an error
for non-ASCII characters). The built-in
str() and unicode()
functions can be used to convert between normal and Uni-
code strings. The
encode string method applies a desired
encoding scheme to Unicode strings. A handful of related
modules (e.g.,
codecs) and built-in