Skip to Content
Fluent Python
book

Fluent Python

by Luciano Ramalho
August 2015
Intermediate
790 pages
18h 48m
English
O'Reilly Media, Inc.
Content preview from Fluent Python

Chapter 4. Text versus Bytes

Humans use text. Computers speak bytes.1

Esther Nam and Travis Fischer, Character Encoding and Unicode in Python

Python 3 introduced a sharp distinction between strings of human text and sequences of raw bytes. Implicit conversion of byte sequences to Unicode text is a thing of the past. This chapter deals with Unicode strings, binary sequences, and the encodings used to convert between them.

Depending on your Python programming context, a deeper understanding of Unicode may or may not be of vital importance to you. In the end, most of the issues covered in this chapter do not affect programmers who deal only with ASCII text. But even if that is your case, there is no escaping the str versus byte divide. As a bonus, you’ll find that the specialized binary sequence types provide features that the “all-purpose” Python 2 str type does not have.

In this chapter, we will visit the following topics:

  • Characters, code points, and byte representations

  • Unique features of binary sequences: bytes, bytearray, and memoryview

  • Codecs for full Unicode and legacy character sets

  • Avoiding and dealing with encoding errors

  • Best practices when handling text files

  • The default encoding trap and standard I/O issues

  • Safe Unicode text comparisons with normalization

  • Utility functions for normalization, case folding, and brute-force diacritic removal

  • Proper sorting of Unicode text with locale and the PyUCA library

  • Character metadata in the Unicode database ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Fluent Python, 2nd Edition

Fluent Python, 2nd Edition

Luciano Ramalho

Publisher Resources

ISBN: 9781491946237Errata Page