Plain strings are converted into Unicode strings either explicitly, with the
unicode built-in, or implicitly, when you pass a plain string to a function that expects Unicode. In either case, the conversion is done by an auxiliary object known as a codec (for coder-decoder). A codec can also convert Unicode strings to plain strings, either explicitly, with the
encode method of Unicode strings, or implicitly.
To identify a codec, pass the codec name to
encode. When you pass no codec name, and for implicit conversion, Python uses a default encoding, normally
'ascii'. You can change the default encoding in the startup phase of a Python program, as covered in The site and sitecustomize Modules; see also
setdefaultencoding in The sys Module. However, such a change is not a good idea for most “serious” Python code: it might too easily interfere with code in the standard Python libraries or third-party modules, written to expect the normal
Every conversion has a parameter
errors, a string specifying how conversion errors are to be handled. The default is
'strict', meaning any error raises an exception. When
'replace', the conversion replaces each character that causes an error with
'?' in a plain-string result and with
u'\ufffd' in a Unicode result. When
'ignore', the conversion silently skips characters that cause errors. When
'xmlcharrefreplace', the conversion replaces each character that causes an error with the XML character reference ...