In this chapter, you’ll learn many techniques for taming data. Most of them concern these built-in Python data types:
Text is the most familiar type of data to most readers, so we’ll begin with some of the powerful features of text strings in Python.
All of the text examples in this book thus far have been plain old ASCII. ASCII was defined in the 1960s, when computers were the size of refrigerators and only slightly better at performing computations. The basic unit of computer storage is the byte, which can store 256 unique values in its eight bits. For various reasons, ASCII only used 7 bits (128 unique values): 26 uppercase letters, 26 lowercase letters, 10 digits, some punctuation symbols, some spacing characters, and some nonprinting control codes.
Unfortunately, the world has more letters than ASCII provides. You could have a hot dog at a diner, but never a Gewürztraminer at a café. Many attempts have been made to add more letters and symbols, and you’ll see them at times. Just a couple of those include:
Each of these uses all eight bits, but even that’s not enough, especially when you need non-European languages. Unicode is an ongoing international standard to define the characters of all the world’s languages, plus symbols from mathematics ...