Skip to Main Content
Fluent Python, 2nd Edition
book

Fluent Python, 2nd Edition

by Luciano Ramalho
April 2022
Intermediate to advanced content levelIntermediate to advanced
1014 pages
23h 59m
English
O'Reilly Media, Inc.
Book available
Content preview from Fluent Python, 2nd Edition

Chapter 4. Unicode Text Versus Bytes

Humans use text. Computers speak bytes.

Esther Nam and Travis Fischer, “Character Encoding and Unicode in Python”1

Python 3 introduced a sharp distinction between strings of human text and sequences of raw bytes. Implicit conversion of byte sequences to Unicode text is a thing of the past. This chapter deals with Unicode strings, binary sequences, and the encodings used to convert between them.

Depending on the kind of work you do with Python, you may think that understanding Unicode is not important. That’s unlikely, but anyway there is no escaping the str versus byte divide. As a bonus, you’ll find that the specialized binary sequence types provide features that the “all-purpose” Python 2 str type did not have.

In this chapter, we will visit the following topics:

  • Characters, code points, and byte representations

  • Unique features of binary sequences: bytes, bytearray, and memoryview

  • Encodings for full Unicode and legacy character sets

  • Avoiding and dealing with encoding errors

  • Best practices when handling text files

  • The default encoding trap and standard I/O issues

  • Safe Unicode text comparisons with normalization

  • Utility functions for normalization, case folding, and brute-force diacritic removal

  • Proper sorting of Unicode text with locale and the pyuca library

  • Character metadata in the Unicode database

  • Dual-mode APIs that handle str and bytes

What’s New in This Chapter

Support for Unicode in Python 3 has been comprehensive ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Learning Python, 6th Edition

Learning Python, 6th Edition

Mark Lutz

Publisher Resources

ISBN: 9781492056348Errata PageSupplemental Content