Chapter 4. Introducing Python Object Types

This chapter begins our tour of the Python language. In an informal sense, in Python we do things with stuff.¹ “Things” take the form of operations like addition and concatenation, and “stuff” refers to the objects on which we perform those operations. In this part of the book, our focus is on that stuff, and the things our programs can do with it.

Somewhat more formally, in Python, data takes the form of objects—either built-in objects that Python provides, or objects we create using Python classes or external language tools such as C extension libraries. Although we’ll firm up this definition later, objects are essentially just pieces of memory, with values and sets of associated operations. As we’ll see, everything is an object in a Python script. Even simple numbers qualify, with values (e.g., 99), and supported operations (addition, subtraction, and so on).

Because objects are also the most fundamental notion in Python programming, we’ll start this chapter with a survey of Python’s built-in object types. Later chapters provide a second pass that fills in details we’ll gloss over in this survey. Here, our goal is a brief tour to introduce the basics.

The Python Conceptual Hierarchy

Before we get to the code, let’s first establish a clear picture of how this chapter fits into the overall Python picture. From a more concrete perspective, Python programs can be decomposed into modules, statements, expressions, and objects, as follows:

Programs are composed of modules.
Modules contain statements.
Statements contain expressions.
Expressions create and process objects.

The discussion of modules in Chapter 3 introduced the highest level of this hierarchy. This part’s chapters begin at the bottom—exploring both built-in objects and the expressions you can code to use them.

We’ll move on to study statements in the next part of the book, though we will find that they largely exist to manage the objects we’ll meet here. Moreover, by the time we reach classes in the OOP part of this book, we’ll discover that they allow us to define new object types of our own, by both using and emulating the object types we will explore here. Because of all this, built-in objects are a mandatory point of embarkation for all Python journeys.

Note

Traditional introductions to programming often stress its three pillars of sequence (“Do this, then that”), selection (“Do this if that is true”), and repetition (“Do this many times”). Python has tools in all three categories, along with some for definition—of functions and classes. These themes may help you organize your thinking early on, but they are a bit artificial and simplistic. Expressions such as comprehensions, for example, are both repetition and selection; some of these terms have other meanings in Python; and many later concepts won’t seem to fit this mold at all. In Python, the more strongly unifying principle is objects, and what we can do with them. To see why, read on.

Why Use Built-in Types?

If you’ve used lower-level languages such as C or C++, you know that much of your work centers on implementing objects—also known as data structures—to represent the components in your application’s domain. You need to lay out memory structures, manage memory allocation, implement search and access routines, and so on. These chores are about as tedious (and error-prone) as they sound, and they usually distract from your program’s real goals.

In typical Python programs, most of this grunt work goes away. Because Python provides powerful object types as an intrinsic part of the language, there’s usually no need to code object implementations before you start solving problems. In fact, unless you have a need for special processing that built-in types don’t provide, you’re almost always better off using a built-in object instead of implementing your own. Here are some reasons why:

Built-in objects make programs easy to write. For simple tasks, built-in types are often all you need to represent the structure of problem domains. Because you get powerful tools such as collections (lists) and search tables (dictionaries) for free, you can use them immediately. You can get a lot of work done with Python’s built-in object types alone.
Built-in objects are components of extensions. For more complex tasks, you may need to provide your own objects using Python classes or C language interfaces. But as you’ll see in later parts of this book, objects implemented manually are often built on top of built-in types such as lists and dictionaries. For instance, a stack data structure may be implemented as a class that manages or customizes a built-in list.
Built-in objects are often more efficient than custom data structures. Python’s built-in types employ already optimized data structure algorithms that are implemented in C for speed. Although you can write similar object types on your own, you’ll usually be hard-pressed to get the level of performance built-in object types provide.
Built-in objects are a standard part of the language. In some ways, Python borrows both from languages that rely on built-in tools (e.g., LISP) and languages that rely on the programmer to provide tool implementations or frameworks of their own (e.g., C++). Although you can implement unique object types in Python, you don’t need to do so just to get started. Moreover, because Python’s built-ins are standard, they’re always the same; proprietary frameworks, on the other hand, tend to differ from site to site.

In other words, not only do built-in object types make programming easier, but they’re also more powerful and efficient than most of what can be created from scratch. Regardless of whether you implement new object types, built-in objects form the core of every Python program.

Python’s Core Data Types

Table 4-1 previews Python’s built-in object types and some of the syntax used to code their literals—that is, the expressions that generate these objects.² Some of these types will probably seem familiar if you’ve used other languages; for instance, numbers and strings represent numeric and textual values, respectively, and file objects provide an interface for processing real files stored on your computer.

To some readers, though, the object types in Table 4-1 may be more general and powerful than what you are accustomed to. For instance, you’ll find that lists and dictionaries alone are powerful data representation tools that obviate most of the work you do to support collections and searching in lower-level languages. In short, lists provide ordered collections of other objects, while dictionaries store objects by key; both lists and dictionaries may be nested, can grow and shrink on demand, and may contain objects of any type.

Table 4-1. Built-in objects preview
Object type	Example literals/creation
Num bers	`1234`, `3.1415`, `3+4j`, `0b111`, `Decimal()`, `Fraction()`
Strin gs	`'spam'`, `"Bob's"`, `b'a\x01c'`, `u'sp\xc4m'`
Lis ts	`[1, [2, 'three'], 4.5]`, `list(range(10))`
Dict ionaries	`{'food': 'spam', 'taste': 'yum'}`, `dict(hours=10)`
Tup les	`(1, 'spam', 4, 'U')`, `tuple('spam')`, `namedtuple`
Fil es	`open('eggs.txt')`, `open(r'C:\ham.bin', 'wb')`
Set s	`set('abc')`, `{'a', 'b', 'c'}`
Other core types	Booleans, types, `None`
Program unit types	Functions, modules, classes (Part IV, Part V, Part VI)
Implementation-related types	Compiled code, stack tracebacks (Part IV, Part VII)

Also shown in Table 4-1, program units such as functions, modules, and classes—which we’ll meet in later parts of this book—are objects in Python too; they are created with statements and expressions such as def, class, import, and lambda and may be passed around scripts freely, stored within other objects, and so on. Python also provides a set of implementation-related types such as compiled code objects, which are generally of interest to tool builders more than application developers; we’ll explore these in later parts too, though in less depth due to their specialized roles.

Despite its title, Table 4-1 isn’t really complete, because everything we process in Python programs is a kind of object. For instance, when we perform text pattern matching in Python, we create pattern objects, and when we perform network scripting, we use socket objects. These other kinds of objects are generally created by importing and using functions in library modules—for example, in the re and socket modules for patterns and sockets—and have behavior all their own.

We usually call the other object types in Table 4-1 core data types, though, because they are effectively built into the Python language—that is, there is specific expression syntax for generating most of them. For instance, when you run the following code with characters surrounded by quotes:

>>> 'spam'

you are, technically speaking, running a literal expression that generates and returns a new string object. There is specific Python language syntax to make this object. Similarly, an expression wrapped in square brackets makes a list, one in curly braces makes a dictionary, and so on. Even though, as we’ll see, there are no type declarations in Python, the syntax of the expressions you run determines the types of objects you create and use. In fact, object-generation expressions like those in Table 4-1 are generally where types originate in the Python language.

Just as importantly, once you create an object, you bind its operation set for all time—you can perform only string operations on a string and list operations on a list. In formal terms, this means that Python is dynamically typed, a model that keeps track of types for you automatically instead of requiring declaration code, but it is also strongly typed, a constraint that means you can perform on an object only operations that are valid for its type.

We’ll study each of the object types in Table 4-1 in detail in upcoming chapters. Before digging into the details, though, let’s begin by taking a quick look at Python’s core objects in action. The rest of this chapter provides a preview of the operations we’ll explore in more depth in the chapters that follow. Don’t expect to find the full story here—the goal of this chapter is just to whet your appetite and introduce some key ideas. Still, the best way to get started is to get started, so let’s jump right into some real code.

Numbers

If you’ve done any programming or scripting in the past, some of the object types in Table 4-1 will probably seem familiar. Even if you haven’t, numbers are fairly straightforward. Python’s core objects set includes the usual suspects: integers that have no fractional part, floating-point numbers that do, and more exotic types—complex numbers with imaginary parts, decimals with fixed precision, rationals with numerator and denominator, and full-featured sets. Built-in numbers are enough to represent most numeric quantities—from your age to your bank balance—but more types are available as third-party add-ons.

Although it offers some fancier options, Python’s basic number types are, well, basic. Numbers in Python support the normal mathematical operations. For instance, the plus sign (+) performs addition, a star (*) is used for multiplication, and two stars (**) are used for exponentiation:

>>> 123 + 222                    # Integer addition
345
>>> 1.5 * 4                      # Floating-point multiplication
6.0
>>> 2 ** 100                     # 2 to the power 100, again
1267650600228229401496703205376

Notice the last result here: Python 3.X’s integer type automatically provides extra precision for large numbers like this when needed (in 2.X, a separate long integer type handles numbers too large for the normal integer type in similar ways). You can, for instance, compute 2 to the power 1,000,000 as an integer in Python, but you probably shouldn’t try to print the result—with more than 300,000 digits, you may be waiting awhile!

>>> len(str(2 ** 1000000))       # How many digits in a really BIG number?
301030

This nested-call form works from inside out—first converting the ** result’s number to a string of digits with the built-in str function, and then getting the length of the resulting string with len. The end result is the number of digits. str and len work on many object types; more on both as we move along.

On Pythons prior to 2.7 and 3.1, once you start experimenting with floating-point numbers, you’re likely to stumble across something that may look a bit odd at first glance:

>>> 3.1415 * 2                   # repr: as code (Pythons < 2.7 and 3.1)
6.2830000000000004
>>> print(3.1415 * 2)            # str: user-friendly
6.283

The first result isn’t a bug; it’s a display issue. It turns out that there are two ways to print every object in Python—with full precision (as in the first result shown here), and in a user-friendly form (as in the second). Formally, the first form is known as an object’s as-code repr, and the second is its user-friendly str. In older Pythons, the floating-point repr sometimes displays more precision than you might expect. The difference can also matter when we step up to using classes. For now, if something looks odd, try showing it with a print built-in function call statement.

Better yet, upgrade to Python 2.7 and the latest 3.X, where floating-point numbers display themselves more intelligently, usually with fewer extraneous digits—since this book is based on Pythons 2.7 and 3.3, this is the display form I’ll be showing throughout this book for floating-point numbers:

>>> 3.1415 * 2                   # repr: as code (Pythons >= 2.7 and 3.1)
6.283

Besides expressions, there are a handful of useful numeric modules that ship with Python—modules are just packages of additional tools that we import to use:

>>> import math
>>> math.pi
3.141592653589793
>>> math.sqrt(85)
9.219544457292887

The math module contains more advanced numeric tools as functions, while the random module performs random-number generation and random selections (here, from a Python list coded in square brackets—an ordered collection of other objects to be introduced later in this chapter):

>>> import random
>>> random.random()
0.7082048489415967
>>> random.choice([1, 2, 3, 4])
1

Python also includes more exotic numeric objects—such as complex, fixed-precision, and rational numbers, as well as sets and Booleans—and the third-party open source extension domain has even more (e.g., matrixes and vectors, and extended precision numbers). We’ll defer discussion of these types until later in this chapter and book.

So far, we’ve been using Python much like a simple calculator; to do better justice to its built-in types, let’s move on to explore strings.

Strings

Strings are used to record both textual information (your name, for instance) as well as arbitrary collections of bytes (such as an image file’s contents). They are our first example of what in Python we call a sequence—a positionally ordered collection of other objects. Sequences maintain a left-to-right order among the items they contain: their items are stored and fetched by their relative positions. Strictly speaking, strings are sequences of one-character strings; other, more general sequence types include lists and tuples, covered later.

Sequence Operations

As sequences, strings support operations that assume a positional ordering among items. For example, if we have a four-character string coded inside quotes (usually of the single variety), we can verify its length with the built-in len function and fetch its components with indexing expressions:

>>> S = 'Spam'           # Make a 4-character string, and assign it to a name
>>> len(S)               # Length
4
>>> S[0]                 # The first item in S, indexing by zero-based position
'S'
>>> S[1]                 # The second item from the left
'p'

In Python, indexes are coded as offsets from the front, and so start from 0: the first item is at index 0, the second is at index 1, and so on.

Notice how we assign the string to a variable named S here. We’ll go into detail on how this works later (especially in Chapter 6), but Python variables never need to be declared ahead of time. A variable is created when you assign it a value, may be assigned any type of object, and is replaced with its value when it shows up in an expression. It must also have been previously assigned by the time you use its value. For the purposes of this chapter, it’s enough to know that we need to assign an object to a variable in order to save it for later use.

In Python, we can also index backward, from the end—positive indexes count from the left, and negative indexes count back from the right:

>>> S[-1]                # The last item from the end in S
'm'
>>> S[-2]                # The second-to-last item from the end
'a'

Formally, a negative index is simply added to the string’s length, so the following two operations are equivalent (though the first is easier to code and less easy to get wrong):

>>> S[-1]                # The last item in S
'm'
>>> S[len(S)-1]          # Negative indexing, the hard way
'm'

Notice that we can use an arbitrary expression in the square brackets, not just a hardcoded number literal—anywhere that Python expects a value, we can use a literal, a variable, or any expression we wish. Python’s syntax is completely general this way.

In addition to simple positional indexing, sequences also support a more general form of indexing known as slicing, which is a way to extract an entire section (slice) in a single step. For example:

>>> S                     # A 4-character string
'Spam'
>>> S[1:3]                # Slice of S from offsets 1 through 2 (not 3)
'pa'

Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form, X[I:J], means “give me everything in X from offset I up to but not including offset J.” The result is returned in a new object. The second of the preceding operations, for instance, gives us all the characters in string S from offsets 1 through 2 (that is, 1 through 3 – 1) as a new string. The effect is to slice or “parse out” the two characters in the middle.

In a slice, the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced. This leads to some common usage variations:

>>> S[1:]                 # Everything past the first (1:len(S))
'pam'
>>> S                     # S itself hasn't changed
'Spam'
>>> S[0:3]                # Everything but the last
'Spa'
>>> S[:3]                 # Same as S[0:3]
'Spa'
>>> S[:-1]                # Everything but the last again, but simpler (0:-1)
'Spa'
>>> S[:]                  # All of S as a top-level copy (0:len(S))
'Spam'

Note in the second-to-last command how negative offsets can be used to give bounds for slices, too, and how the last operation effectively copies the entire string. As you’ll learn later, there is no reason to copy a string, but this form can be useful for sequences like lists.

Finally, as sequences, strings also support concatenation with the plus sign (joining two strings into a new string) and repetition (making a new string by repeating another):

>>> S
'Spam'
>>> S + 'xyz'             # Concatenation
'Spamxyz'
>>> S                     # S is unchanged
'Spam'
>>> S * 8                 # Repetition
'SpamSpamSpamSpamSpamSpamSpamSpam'

Notice that the plus sign (+) means different things for different objects: addition for numbers, and concatenation for strings. This is a general property of Python that we’ll call polymorphism later in the book—in sum, the meaning of an operation depends on the objects being operated on. As you’ll see when we study dynamic typing, this polymorphism property accounts for much of the conciseness and flexibility of Python code. Because types aren’t constrained, a Python-coded operation can normally work on many different types of objects automatically, as long as they support a compatible interface (like the + operation here). This turns out to be a huge idea in Python; you’ll learn more about it later on our tour.

Immutability

Also notice in the prior examples that we were not changing the original string with any of the operations we ran on it. Every string operation is defined to produce a new string as its result, because strings are immutable in Python—they cannot be changed in place after they are created. In other words, you can never overwrite the values of immutable objects. For example, you can’t change a string by assigning to one of its positions, but you can always build a new one and assign it to the same name. Because Python cleans up old objects as you go (as you’ll see later), this isn’t as inefficient as it may sound:

>>> S
'Spam'

>>> S[0] = 'z'             # Immutable objects cannot be changed
...error text omitted...
TypeError: 'str' object does not support item assignment

>>> S = 'z' + S[1:]        # But we can run expressions to make new objects
>>> S
'zpam'

Every object in Python is classified as either immutable (unchangeable) or not. In terms of the core types, numbers, strings, and tuples are immutable; lists, dictionaries, and sets are not—they can be changed in place freely, as can most new objects you’ll code with classes. This distinction turns out to be crucial in Python work, in ways that we can’t yet fully explore. Among other things, immutability can be used to guarantee that an object remains constant throughout your program; mutable objects’ values can be changed at any time and place (and whether you expect it or not).

Strictly speaking, you can change text-based data in place if you either expand it into a list of individual characters and join it back together with nothing between, or use the newer bytearray type available in Pythons 2.6, 3.0, and later:

>>> S = 'shrubbery'
>>> L = list(S)                                     # Expand to a list: [...]
>>> L
['s', 'h', 'r', 'u', 'b', 'b', 'e', 'r', 'y']
>>> L[1] = 'c'                                      # Change it in place
>>> ''.join(L)                                      # Join with empty delimiter
'scrubbery'

>>> B = bytearray(b'spam')                          # A bytes/list hybrid (ahead)
>>> B.extend(b'eggs')                               # 'b' needed in 3.X, not 2.X
>>> B                                               # B[i] = ord(x) works here too
bytearray(b'spameggs')
>>> B.decode()                                      # Translate to normal string
'spameggs'

The bytearray supports in-place changes for text, but only for text whose characters are all at most 8-bits wide (e.g., ASCII). All other strings are still immutable—bytearray is a distinct hybrid of immutable bytes strings (whose b'...' syntax is required in 3.X and optional in 2.X) and mutable lists (coded and displayed in []), and we have to learn more about both these and Unicode text to fully grasp this code.

Type-Specific Methods

Every string operation we’ve studied so far is really a sequence operation—that is, these operations will work on other sequences in Python as well, including lists and tuples. In addition to generic sequence operations, though, strings also have operations all their own, available as methods—functions that are attached to and act upon a specific object, which are triggered with a call expression.

For example, the string find method is the basic substring search operation (it returns the offset of the passed-in substring, or −1 if it is not present), and the string replace method performs global searches and replacements; both act on the subject that they are attached to and called from:

>>> S = 'Spam'
>>> S.find('pa')                 # Find the offset of a substring in S
1
>>> S
'Spam'
>>> S.replace('pa', 'XYZ')       # Replace occurrences of a string in S with another
'SXYZm'
>>> S
'Spam'

Again, despite the names of these string methods, we are not changing the original strings here, but creating new strings as the results—because strings are immutable, this is the only way this can work. String methods are the first line of text-processing tools in Python. Other methods split a string into substrings on a delimiter (handy as a simple form of parsing), perform case conversions, test the content of the string (digits, letters, and so on), and strip whitespace characters off the ends of the string:

>>> line = 'aaa,bbb,ccccc,dd'
>>> line.split(',')              # Split on a delimiter into a list of substrings
['aaa', 'bbb', 'ccccc', 'dd']

>>> S = 'spam'
>>> S.upper()                    # Upper- and lowercase conversions
'SPAM'
>>> S.isalpha()                  # Content tests: isalpha, isdigit, etc.
True

>>> line = 'aaa,bbb,ccccc,dd\n'
>>> line.rstrip()                # Remove whitespace characters on the right side
'aaa,bbb,ccccc,dd'
>>> line.rstrip().split(',')     # Combine two operations
['aaa', 'bbb', 'ccccc', 'dd']

Notice the last command here—it strips before it splits because Python runs from left to right, making a temporary result along the way. Strings also support an advanced substitution operation known as formatting, available as both an expression (the original) and a string method call (new as of 2.6 and 3.0); the second of these allows you to omit relative argument value numbers as of 2.7 and 3.1:

>>> '%s, eggs, and %s' % ('spam', 'SPAM!')          # Formatting expression (all)
'spam, eggs, and SPAM!'

>>> '{0}, eggs, and {1}'.format('spam', 'SPAM!')    # Formatting method (2.6+, 3.0+)
'spam, eggs, and SPAM!'

>>> '{}, eggs, and {}'.format('spam', 'SPAM!')      # Numbers optional (2.7+, 3.1+)
'spam, eggs, and SPAM!'

Formatting is rich with features, which we’ll postpone discussing until later in this book, and which tend to matter most when you must generate numeric reports:

>>> '{:,.2f}'.format(296999.2567)                   # Separators, decimal digits
'296,999.26'
>>> '%.2f | %+05d' % (3.14159, −42)                 # Digits, padding, signs
'3.14 | −0042'

One note here: although sequence operations are generic, methods are not—although some types share some method names, string method operations generally work only on strings, and nothing else. As a rule of thumb, Python’s toolset is layered: generic operations that span multiple types show up as built-in functions or expressions (e.g., len(X), X[0]), but type-specific operations are method calls (e.g., aString.upper()). Finding the tools you need among all these categories will become more natural as you use Python more, but the next section gives a few tips you can use right now.

Getting Help

The methods introduced in the prior section are a representative, but small, sample of what is available for string objects. In general, this book is not exhaustive in its look at object methods. For more details, you can always call the built-in dir function. This function lists variables assigned in the caller’s scope when called with no argument; more usefully, it returns a list of all the attributes available for any object passed to it. Because methods are function attributes, they will show up in this list. Assuming S is still the string, here are its attributes on Python 3.3 (Python 2.X varies slightly):

>>> dir(S)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__',
'__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count',
'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index',
'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower',
'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust',
'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex',
'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith',
'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

You probably won’t care about the names with double underscores in this list until later in the book, when we study operator overloading in classes—they represent the implementation of the string object and are available to support customization. The __add__ method of strings, for example, is what really performs concatenation; Python maps the first of the following to the second internally, though you shouldn’t usually use the second form yourself (it’s less intuitive, and might even run slower):

>>> S + 'NI!'
'spamNI!'
>>> S.__add__('NI!')
'spamNI!'

In general, leading and trailing double underscores is the naming pattern Python uses for implementation details. The names without the underscores in this list are the callable methods on string objects.

The dir function simply gives the methods’ names. To ask what they do, you can pass them to the help function:

>>> help(S.replace)
Help on built-in function replace:

replace(...)
    S.replace(old, new[, count]) -> str

    Return a copy of S with all occurrences of substring
    old replaced by new.  If the optional argument count is
    given, only the first count occurrences are replaced.

help is one of a handful of interfaces to a system of code that ships with Python known as PyDoc—a tool for extracting documentation from objects. Later in the book, you’ll see that PyDoc can also render its reports in HTML format for display on a web browser.

You can also ask for help on an entire string (e.g., help(S)), but you may get more or less help than you want to see—information about every string method in older Pythons, and probably no help at all in newer versions because strings are treated specially. It’s generally better to ask about a specific method.

Both dir and help also accept as arguments either a real object (like our string S), or the name of a data type (like str, list, and dict). The latter form returns the same list for dir but shows full type details for help, and allows you to ask about a specific method via type name (e.g., help on str.replace).

For more details, you can also consult Python’s standard library reference manual or commercially published reference books, but dir and help are the first level of documentation in Python.

Other Ways to Code Strings

So far, we’ve looked at the string object’s sequence operations and type-specific methods. Python also provides a variety of ways for us to code strings, which we’ll explore in greater depth later. For instance, special characters can be represented as backslash escape sequences, which Python displays in \xNN hexadecimal escape notation, unless they represent printable characters:

>>> S = 'A\nB\tC'            # \n is end-of-line, \t is tab
>>> len(S)                   # Each stands for just one character
5

>>> ord('\n')                # \n is one character coded as decimal value 10
10

>>> S = 'A\0B\0C'            # \0, a binary zero byte, does not terminate string
>>> len(S)
5
>>> S                        # Non-printables are displayed as \xNN hex escapes
'A\x00B\x00C'

Python allows strings to be enclosed in single or double quote characters—they mean the same thing but allow the other type of quote to be embedded without an escape (most programmers prefer single quotes). It also allows multiline string literals enclosed in triple quotes (single or double)—when this form is used, all the lines are concatenated together, and end-of-line characters are added where line breaks appear. This is a minor syntactic convenience, but it’s useful for embedding things like multiline HTML, XML, or JSON code in a Python script, and stubbing out lines of code temporarily—just add three quotes above and below:

>>> msg = """
aaaaaaaaaaaaa
bbb'''bbbbbbbbbb""bbbbbbb'bbbb
cccccccccccccc
"""
>>> msg
'\naaaaaaaaaaaaa\nbbb\'\'\'bbbbbbbbbb""bbbbbbb\'bbbb\ncccccccccccccc\n'

Python also supports a raw string literal that turns off the backslash escape mechanism. Such literals start with the letter r and are useful for strings like directory paths on Windows (e.g., r'C:\text\new').

Unicode Strings

Python’s strings also come with full Unicode support required for processing text in internationalized character sets. Characters in the Japanese and Russian alphabets, for example, are outside the ASCII set. Such non-ASCII text can show up in web pages, emails, GUIs, JSON, XML, or elsewhere. When it does, handling it well requires Unicode support. Python has such support built in, but the form of its Unicode support varies per Python line, and is one of their most prominent differences.

In Python 3.X, the normal str string handles Unicode text (including ASCII, which is just a simple kind of Unicode); a distinct bytes string type represents raw byte values (including media and encoded text); and 2.X Unicode literals are supported in 3.3 and later for 2.X compatibility (they are treated the same as normal 3.X str strings):

>>> 'sp\xc4m'                     # 3.X: normal str strings are Unicode text
'spÄm'
>>> b'a\x01c'                     # bytes strings are byte-based data
b'a\x01c'
>>> u'sp\u00c4m'                  # The 2.X Unicode literal works in 3.3+: just str
'spÄm'

In Python 2.X, the normal str string handles both 8-bit character strings (including ASCII text) and raw byte values; a distinct unicode string type represents Unicode text; and 3.X bytes literals are supported in 2.6 and later for 3.X compatibility (they are treated the same as normal 2.X str strings):

>>> print u'sp\xc4m'              # 2.X: Unicode strings are a distinct type
spÄm
>>> 'a\x01c'                      # Normal str strings contain byte-based text/data
'a\x01c'
>>> b'a\x01c'                     # The 3.X bytes literal works in 2.6+: just str
'a\x01c'

Formally, in both 2.X and 3.X, non-Unicode strings are sequences of 8-bit bytes that print with ASCII characters when possible, and Unicode strings are sequences of Unicode code points—identifying numbers for characters, which do not necessarily map to single bytes when encoded to files or stored in memory. In fact, the notion of bytes doesn’t apply to Unicode: some encodings include character code points too large for a byte, and even simple 7-bit ASCII text is not stored one byte per character under some encodings and memory storage schemes:

>>> 'spam'                        # Characters may be 1, 2, or 4 bytes in memory
'spam'
>>> 'spam'.encode('utf8')         # Encoded to 4 bytes in UTF-8 in files
b'spam'
>>> 'spam'.encode('utf16')        # But encoded to 10 bytes in UTF-16
b'\xff\xfes\x00p\x00a\x00m\x00'

Both 3.X and 2.X also support the bytearray string type we met earlier, which is essentially a bytes string (a str in 2.X) that supports most of the list object’s in-place mutable change operations.

Both 3.X and 2.X also support coding non-ASCII characters with \x hexadecimal and short \u and long \U Unicode escapes, as well as file-wide encodings declared in program source files. Here’s our non-ASCII character coded three ways in 3.X (add a leading “u” and say “print” to see the same in 2.X):

>>> 'sp\xc4\u00c4\U000000c4m'
'spÄÄÄm'

What these values mean and how they are used differs between text strings, which are the normal string in 3.X and Unicode in 2.X, and byte strings, which are bytes in 3.X and the normal string in 2.X. All these escapes can be used to embed actual Unicode code-point ordinal-value integers in text strings. By contrast, byte strings use only \x hexadecimal escapes to embed the encoded form of text, not its decoded code point values—encoded bytes are the same as code points, only for some encodings and characters:

>>> '\u00A3', '\u00A3'.encode('latin1'), b'\xA3'.decode('latin1')
('£', b'\xa3', '£')

As a notable difference, Python 2.X allows its normal and Unicode strings to be mixed in expressions as long as the normal string is all ASCII; in contrast, Python 3.X has a tighter model that never allows its normal and byte strings to mix without explicit conversion:

u'x' + b'y'            # Works in 2.X (where b is optional and ignored)
u'x' + 'y'             # Works in 2.X: u'xy'

u'x' + b'y'            # Fails in 3.3 (where u is optional and ignored)
u'x' + 'y'             # Works in 3.3: 'xy'

'x' + b'y'.decode()    # Works in 3.X if decode bytes to str: 'xy'
'x'.encode() + b'y'    # Works in 3.X if encode str to bytes: b'xy'

Apart from these string types, Unicode processing mostly reduces to transferring text data to and from files—text is encoded to bytes when stored in a file, and decoded into characters (a.k.a. code points) when read back into memory. Once it is loaded, we usually process text as strings in decoded form only.

Because of this model, though, files are also content-specific in 3.X: text files implement named encodings and accept and return str strings, but binary files instead deal in bytes strings for raw binary data. In Python 2.X, normal files’ content is str bytes, and a special codecs module handles Unicode and represents content with the unicode type.

We’ll meet Unicode again in the files coverage later in this chapter, but save the rest of the Unicode story for later in this book. It crops up briefly in a Chapter 25 example in conjunction with currency symbols, but for the most part is postponed until this book’s advanced topics part. Unicode is crucial in some domains, but many programmers can get by with just a passing acquaintance. If your data is all ASCII text, the string and file stories are largely the same in 2.X and 3.X. And if you’re new to programming, you can safely defer most Unicode details until you’ve mastered string basics.

Pattern Matching

One point worth noting before we move on is that none of the string object’s own methods support pattern-based text processing. Text pattern matching is an advanced tool outside this book’s scope, but readers with backgrounds in other scripting languages may be interested to know that to do pattern matching in Python, we import a module called re. This module has analogous calls for searching, splitting, and replacement, but because we can use patterns to specify substrings, we can be much more general:

>>> import re
>>> match = re.match('Hello[ \t]*(.*)world', 'Hello    Python world')
>>> match.group(1)
'Python '

This example searches for a substring that begins with the word “Hello,” followed by zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched group, terminated by the word “world.” If such a substring is found, portions of the substring matched by parts of the pattern enclosed in parentheses are available as groups. The following pattern, for example, picks out three groups separated by slashes or colons, and is similar to splitting by an alternatives pattern:

>>> match = re.match('[/:](.*)[/:](.*)[/:](.*)', '/usr/home:lumberjack')
>>> match.groups()
('usr', 'home', 'lumberjack')

>>> re.split('[/:]', '/usr/home:lumberjack')
['', 'usr', 'home', 'lumberjack']

Pattern matching is an advanced text-processing tool by itself, but there is also support in Python for even more advanced text and language processing, including XML and HTML parsing and natural language analysis. We’ll see additional brief examples of patterns and XML parsing at the end of Chapter 37, but I’ve already said enough about strings for this tutorial, so let’s move on to the next type.

Lists

The Python list object is the most general sequence provided by the language. Lists are positionally ordered collections of arbitrarily typed objects, and they have no fixed size. They are also mutable—unlike strings, lists can be modified in place by assignment to offsets as well as a variety of list method calls. Accordingly, they provide a very flexible tool for representing arbitrary collections—lists of files in a folder, employees in a company, emails in your inbox, and so on.

Sequence Operations

Because they are sequences, lists support all the sequence operations we discussed for strings; the only difference is that the results are usually lists instead of strings. For instance, given a three-item list:

>>> L = [123, 'spam', 1.23]            # A list of three different-type objects
>>> len(L)                             # Number of items in the list
3

we can index, slice, and so on, just as for strings:

>>> L[0]                               # Indexing by position
123
>>> L[:-1]                             # Slicing a list returns a new list
[123, 'spam']

>>> L + [4, 5, 6]                      # Concat/repeat make new lists too
[123, 'spam', 1.23, 4, 5, 6]
>>> L * 2
[123, 'spam', 1.23, 123, 'spam', 1.23]

>>> L                                  # We're not changing the original list
[123, 'spam', 1.23]

Type-Specific Operations

Python’s lists may be reminiscent of arrays in other languages, but they tend to be more powerful. For one thing, they have no fixed type constraint—the list we just looked at, for example, contains three objects of completely different types (an integer, a string, and a floating-point number). Further, lists have no fixed size. That is, they can grow and shrink on demand, in response to list-specific operations:

>>> L.append('NI')                     # Growing: add object at end of list
>>> L
[123, 'spam', 1.23, 'NI']

>>> L.pop(2)                           # Shrinking: delete an item in the middle
1.23
>>> L                                  # "del L[2]" deletes from a list too
[123, 'spam', 'NI']

Here, the list append method expands the list’s size and inserts an item at the end; the pop method (or an equivalent del statement) then removes an item at a given offset, causing the list to shrink. Other list methods insert an item at an arbitrary position (insert), remove a given item by value (remove), add multiple items at the end (extend), and so on. Because lists are mutable, most list methods also change the list object in place, instead of creating a new one:

>>> M = ['bb', 'aa', 'cc']
>>> M.sort()
>>> M
['aa', 'bb', 'cc']
>>> M.reverse()
>>> M
['cc', 'bb', 'aa']

The list sort method here, for example, orders the list in ascending fashion by default, and reverse reverses it—in both cases, the methods modify the list directly.

Bounds Checking

Although lists have no fixed size, Python still doesn’t allow us to reference items that are not present. Indexing off the end of a list is always a mistake, but so is assigning off the end:

>>> L
[123, 'spam', 'NI']

>>> L[99]
...error text omitted...
IndexError: list index out of range

>>> L[99] = 1
...error text omitted...
IndexError: list assignment index out of range

This is intentional, as it’s usually an error to try to assign off the end of a list (and a particularly nasty one in the C language, which doesn’t do as much error checking as Python). Rather than silently growing the list in response, Python reports an error. To grow a list, we call list methods such as append instead.

Nesting

One nice feature of Python’s core data types is that they support arbitrary nesting—we can nest them in any combination, and as deeply as we like. For example, we can have a list that contains a dictionary, which contains another list, and so on. One immediate application of this feature is to represent matrixes, or “multidimensional arrays” in Python. A list with nested lists will do the job for basic applications (you’ll get “...” continuation-line prompts on lines 2 and 3 of the following in some interfaces, but not in IDLE):

>>> M = [[1, 2, 3],               # A 3 × 3 matrix, as nested lists
         [4, 5, 6],               # Code can span lines if bracketed
         [7, 8, 9]]
>>> M
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Here, we’ve coded a list that contains three other lists. The effect is to represent a 3 × 3 matrix of numbers. Such a structure can be accessed in a variety of ways:

>>> M[1]                          # Get row 2
[4, 5, 6]

>>> M[1][2]                       # Get row 2, then get item 3 within the row
6

The first operation here fetches the entire second row, and the second grabs the third item within that row (it runs left to right, like the earlier string strip and split). Stringing together index operations takes us deeper and deeper into our nested-object structure.³

Comprehensions

In addition to sequence operations and list methods, Python includes a more advanced operation known as a list comprehension expression, which turns out to be a powerful way to process structures like our matrix. Suppose, for instance, that we need to extract the second column of our sample matrix. It’s easy to grab rows by simple indexing because the matrix is stored by rows, but it’s almost as easy to get a column with a list comprehension:

>>> col2 = [row[1] for row in M]             # Collect the items in column 2
>>> col2
[2, 5, 8]

>>> M                                        # The matrix is unchanged
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

List comprehensions derive from set notation; they are a way to build a new list by running an expression on each item in a sequence, one at a time, from left to right. List comprehensions are coded in square brackets (to tip you off to the fact that they make a list) and are composed of an expression and a looping construct that share a variable name (row, here). The preceding list comprehension means basically what it says: “Give me row[1] for each row in matrix M, in a new list.” The result is a new list containing column 2 of the matrix.

List comprehensions can be more complex in practice:

>>> [row[1] + 1 for row in M]                 # Add 1 to each item in column 2
[3, 6, 9]

>>> [row[1] for row in M if row[1] % 2 == 0]  # Filter out odd items
[2, 8]

The first operation here, for instance, adds 1 to each item as it is collected, and the second uses an if clause to filter odd numbers out of the result using the % modulus expression (remainder of division). List comprehensions make new lists of results, but they can be used to iterate over any iterable object—a term we’ll flesh out later in this preview. Here, for instance, we use list comprehensions to step over a hardcoded list of coordinates and a string:

>>> diag = [M[i][i] for i in [0, 1, 2]]      # Collect a diagonal from matrix
>>> diag
[1, 5, 9]

>>> doubles = [c * 2 for c in 'spam']        # Repeat characters in a string
>>> doubles
['ss', 'pp', 'aa', 'mm']

These expressions can also be used to collect multiple values, as long as we wrap those values in a nested collection. The following illustrates using range—a built-in that generates successive integers, and requires a surrounding list to display all its values in 3.X only (2.X makes a physical list all at once):

>>> list(range(4))                           # 0..3 (list() required in 3.X)
[0, 1, 2, 3]
>>> list(range(−6, 7, 2))                    # −6 to +6 by 2 (need list() in 3.X)
[−6, −4, −2, 0, 2, 4, 6]

>>> [[x ** 2, x ** 3] for x in range(4)]     # Multiple values, "if" filters
[[0, 0], [1, 1], [4, 8], [9, 27]]
>>> [[x, x / 2, x * 2] for x in range(−6, 7, 2) if x > 0]
[[2, 1, 4], [4, 2, 8], [6, 3, 12]]

As you can probably tell, list comprehensions, and relatives like the map and filter built-in functions, are too involved to cover more formally in this preview chapter. The main point of this brief introduction is to illustrate that Python includes both simple and advanced tools in its arsenal. List comprehensions are an optional feature, but they tend to be very useful in practice and often provide a substantial processing speed advantage. They also work on any type that is a sequence in Python, as well as some types that are not. You’ll hear much more about them later in this book.

As a preview, though, you’ll find that in recent Pythons, comprehension syntax has been generalized for other roles: it’s not just for making lists today. For example, enclosing a comprehension in parentheses can also be used to create generators that produce results on demand. To illustrate, the sum built-in sums items in a sequence—in this example, summing all items in our matrix’s rows on request:

>>> G = (sum(row) for row in M)              # Create a generator of row sums
>>> next(G)                                  # iter(G) not required here
6
>>> next(G)                                  # Run the iteration protocol next()
15
>>> next(G)
24

The map built-in can do similar work, by generating the results of running items through a function, one at a time and on request. Like range, wrapping it in list forces it to return all its values in Python 3.X; this isn’t needed in 2.X where map makes a list of results all at once instead, and is not needed in other contexts that iterate automatically, unless multiple scans or list-like behavior is also required:

>>> list(map(sum, M))                        # Map sum over items in M
[6, 15, 24]

In Python 2.7 and 3.X, comprehension syntax can also be used to create sets and dictionaries:

>>> {sum(row) for row in M}                  # Create a set of row sums
{24, 6, 15}

>>> {i : sum(M[i]) for i in range(3)}        # Creates key/value table of row sums
{0: 6, 1: 15, 2: 24}

In fact, lists, sets, dictionaries, and generators can all be built with comprehensions in 3.X and 2.7:

>>> [ord(x) for x in 'spaam']                # List of character ordinals
[115, 112, 97, 97, 109]
>>> {ord(x) for x in 'spaam'}                # Sets remove duplicates
{112, 97, 115, 109}
>>> {x: ord(x) for x in 'spaam'}             # Dictionary keys are unique
{'p': 112, 'a': 97, 's': 115, 'm': 109}
>>> (ord(x) for x in 'spaam')                # Generator of values
<generator object <genexpr> at 0x000000000254DAB0>

To understand objects like generators, sets, and dictionaries, though, we must move ahead.

Dictionaries

Python dictionaries are something completely different (Monty Python reference intended)—they are not sequences at all, but are instead known as mappings. Mappings are also collections of other objects, but they store objects by key instead of by relative position. In fact, mappings don’t maintain any reliable left-to-right order; they simply map keys to associated values. Dictionaries, the only mapping type in Python’s core objects set, are also mutable: like lists, they may be changed in place and can grow and shrink on demand. Also like lists, they are a flexible tool for representing collections, but their more mnemonic keys are better suited when a collection’s items are named or labeled—fields of a database record, for example.

Mapping Operations

When written as literals, dictionaries are coded in curly braces and consist of a series of “key: value” pairs. Dictionaries are useful anytime we need to associate a set of values with keys—to describe the properties of something, for instance. As an example, consider the following three-item dictionary (with keys “food,” “quantity,” and “color,” perhaps the details of a hypothetical menu item?):

>>> D = {'food': 'Spam', 'quantity': 4, 'color': 'pink'}

We can index this dictionary by key to fetch and change the keys’ associated values. The dictionary index operation uses the same syntax as that used for sequences, but the item in the square brackets is a key, not a relative position:

>>> D['food']              # Fetch value of key 'food'
'Spam'

>>> D['quantity'] += 1     # Add 1 to 'quantity' value
>>> D
{'color': 'pink', 'food': 'Spam', 'quantity': 5}

Although the curly-braces literal form does see use, it is perhaps more common to see dictionaries built up in different ways (it’s rare to know all your program’s data before your program runs). The following code, for example, starts with an empty dictionary and fills it out one key at a time. Unlike out-of-bounds assignments in lists, which are forbidden, assignments to new dictionary keys create those keys:

>>> D = {}
>>> D['name'] = 'Bob'      # Create keys by assignment
>>> D['job']  = 'dev'
>>> D['age']  = 40

>>> D
{'age': 40, 'job': 'dev', 'name': 'Bob'}

>>> print(D['name'])
Bob

Here, we’re effectively using dictionary keys as field names in a record that describes someone. In other applications, dictionaries can also be used to replace searching operations—indexing a dictionary by key is often the fastest way to code a search in Python.

As we’ll learn later, we can also make dictionaries by passing to the dict type name either keyword arguments (a special name=value syntax in function calls), or the result of zipping together sequences of keys and values obtained at runtime (e.g., from files). Both the following make the same dictionary as the prior example and its equivalent {} literal form, though the first tends to make for less typing:

>>> bob1 = dict(name='Bob', job='dev', age=40)                      # Keywords
>>> bob1
{'age': 40, 'name': 'Bob', 'job': 'dev'}

>>> bob2 = dict(zip(['name', 'job', 'age'], ['Bob', 'dev', 40]))    # Zipping
>>> bob2
{'job': 'dev', 'name': 'Bob', 'age': 40}

Notice how the left-to-right order of dictionary keys is scrambled. Mappings are not positionally ordered, so unless you’re lucky, they’ll come back in a different order than you typed them. The exact order may vary per Python, but you shouldn’t depend on it, and shouldn’t expect yours to match that in this book.

Nesting Revisited

In the prior example, we used a dictionary to describe a hypothetical person, with three keys. Suppose, though, that the information is more complex. Perhaps we need to record a first name and a last name, along with multiple job titles. This leads to another application of Python’s object nesting in action. The following dictionary, coded all at once as a literal, captures more structured information:

>>> rec = {'name': {'first': 'Bob', 'last': 'Smith'},
           'jobs': ['dev', 'mgr'],
           'age':  40.5}

Here, we again have a three-key dictionary at the top (keys “name,” “jobs,” and “age”), but the values have become more complex: a nested dictionary for the name to support multiple parts, and a nested list for the jobs to support multiple roles and future expansion. We can access the components of this structure much as we did for our list-based matrix earlier, but this time most indexes are dictionary keys, not list offsets:

>>> rec['name']                         # 'name' is a nested dictionary
{'last': 'Smith', 'first': 'Bob'}

>>> rec['name']['last']                 # Index the nested dictionary
'Smith'

>>> rec['jobs']                         # 'jobs' is a nested list
['dev', 'mgr']
>>> rec['jobs'][-1]                     # Index the nested list
'mgr'

>>> rec['jobs'].append('janitor')       # Expand Bob's job description in place
>>> rec
{'age': 40.5, 'jobs': ['dev', 'mgr', 'janitor'], 'name': {'last': 'Smith',
'first': 'Bob'}}

Notice how the last operation here expands the nested jobs list—because the jobs list is a separate piece of memory from the dictionary that contains it, it can grow and shrink freely (object memory layout will be discussed further later in this book).

The real reason for showing you this example is to demonstrate the flexibility of Python’s core data types. As you can see, nesting allows us to build up complex information structures directly and easily. Building a similar structure in a low-level language like C would be tedious and require much more code: we would have to lay out and declare structures and arrays, fill out values, link everything together, and so on. In Python, this is all automatic—running the expression creates the entire nested object structure for us. In fact, this is one of the main benefits of scripting languages like Python.

Just as importantly, in a lower-level language we would have to be careful to clean up all of the object’s space when we no longer need it. In Python, when we lose the last reference to the object—by assigning its variable to something else, for example—all of the memory space occupied by that object’s structure is automatically cleaned up for us:

>>> rec = 0                             # Now the object's space is reclaimed

Technically speaking, Python has a feature known as garbage collection that cleans up unused memory as your program runs and frees you from having to manage such details in your code. In standard Python (a.k.a. CPython), the space is reclaimed immediately, as soon as the last reference to an object is removed. We’ll study how this works later in Chapter 6; for now, it’s enough to know that you can use objects freely, without worrying about creating their space or cleaning up as you go.

Also watch for a record structure similar to the one we just coded in Chapter 8, Chapter 9, and Chapter 27, where we’ll use it to compare and contrast lists, dictionaries, tuples, named tuples, and classes—an array of data structure options with tradeoffs we’ll cover in full later.⁴

Missing Keys: if Tests

As mappings, dictionaries support accessing items by key only, with the sorts of operations we’ve just seen. In addition, though, they also support type-specific operations with method calls that are useful in a variety of common use cases. For example, although we can assign to a new key to expand a dictionary, fetching a nonexistent key is still a mistake:

>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> D
{'a': 1, 'c': 3, 'b': 2}

>>> D['e'] = 99                      # Assigning new keys grows dictionaries
>>> D
{'a': 1, 'c': 3, 'b': 2, 'e': 99}

>>> D['f']                           # Referencing a nonexistent key is an error
...error text omitted...
KeyError: 'f'

This is what we want—it’s usually a programming error to fetch something that isn’t really there. But in some generic programs, we can’t always know what keys will be present when we write our code. How do we handle such cases and avoid errors? One solution is to test ahead of time. The dictionary in membership expression allows us to query the existence of a key and branch on the result with a Python if statement. In the following, be sure to press Enter twice to run the if interactively after typing its code (as explained in Chapter 3, an empty line means “go” at the interactive prompt), and just as for the earlier multiline dictionaries and lists, the prompt changes to “...” on some interfaces for lines two and beyond:

>>> 'f' in D
False

>>> if not 'f' in D:                           # Python's sole selection statement
       print('missing')

missing

This book has more to say about the if statement in later chapters, but the form we’re using here is straightforward: it consists of the word if, followed by an expression that is interpreted as a true or false result, followed by a block of code to run if the test is true. In its full form, the if statement can also have an else clause for a default case, and one or more elif (“else if”) clauses for other tests. It’s the main selection statement tool in Python; along with both its ternary if/else expression cousin (which we’ll meet in a moment) and the if comprehension filter lookalike we saw earlier, it’s the way we code the logic of choices and decisions in our scripts.

If you’ve used some other programming languages in the past, you might be wondering how Python knows when the if statement ends. I’ll explain Python’s syntax rules in depth in later chapters, but in short, if you have more than one action to run in a statement block, you simply indent all their statements the same way—this both promotes readable code and reduces the number of characters you have to type:

>>> if not 'f' in D:
        print('missing')
        print('no, really...')                 # Statement blocks are indented

missing
no, really...

Besides the in test, there are a variety of ways to avoid accessing nonexistent keys in the dictionaries we create: the get method, a conditional index with a default; the Python 2.X has_key method, an in work-alike that is no longer available in 3.X; the try statement, a tool we’ll first meet in Chapter 10 that catches and recovers from exceptions altogether; and the if/else ternary (three-part) expression, which is essentially an if statement squeezed onto a single line. Here are a few examples:

>>> value = D.get('x', 0)                      # Index but with a default
>>> value
0
>>> value = D['x'] if 'x' in D else 0          # if/else expression form
>>> value
0

We’ll save the details on such alternatives until a later chapter. For now, let’s turn to another dictionary method’s role in a common use case.

Sorting Keys: for Loops

As mentioned earlier, because dictionaries are not sequences, they don’t maintain any dependable left-to-right order. If we make a dictionary and print it back, its keys may come back in a different order than that in which we typed them, and may vary per Python version and other variables:

>>> D = {'a': 1, 'b': 2, 'c': 3}
>>> D
{'a': 1, 'c': 3, 'b': 2}

What do we do, though, if we do need to impose an ordering on a dictionary’s items? One common solution is to grab a list of keys with the dictionary keys method, sort that with the list sort method, and then step through the result with a Python for loop (as for if, be sure to press the Enter key twice after coding the following for loop, and omit the outer parenthesis in the print in Python 2.X):

>>> Ks = list(D.keys())                # Unordered keys list
>>> Ks                                 # A list in 2.X, "view" in 3.X: use list()
['a', 'c', 'b']

>>> Ks.sort()                          # Sorted keys list
>>> Ks
['a', 'b', 'c']

>>> for key in Ks:                     # Iterate though sorted keys
        print(key, '=>', D[key])       # <== press Enter twice here (3.X print)

a => 1
b => 2
c => 3

This is a three-step process, although, as we’ll see in later chapters, in recent versions of Python it can be done in one step with the newer sorted built-in function. The sorted call returns the result and sorts a variety of object types, in this case sorting dictionary keys automatically:

>>> D
{'a': 1, 'c': 3, 'b': 2}

>>> for key in sorted(D):
        print(key, '=>', D[key])

a => 1
b => 2
c => 3

Besides showcasing dictionaries, this use case serves to introduce the Python for loop. The for loop is a simple and efficient way to step through all the items in a sequence and run a block of code for each item in turn. A user-defined loop variable (key, here) is used to reference the current item each time through. The net effect in our example is to print the unordered dictionary’s keys and values, in sorted-key order.

The for loop, and its more general colleague the while loop, are the main ways we code repetitive tasks as statements in our scripts. Really, though, the for loop, like its relative the list comprehension introduced earlier, is a sequence operation. It works on any object that is a sequence and, like the list comprehension, even on some things that are not. Here, for example, it is stepping across the characters in a string, printing the uppercase version of each as it goes:

>>> for c in 'spam':
        print(c.upper())

S
P
A
M

Python’s while loop is a more general sort of looping tool; it’s not limited to stepping across sequences, but generally requires more code to do so:

>>> x = 4
>>> while x > 0:
        print('spam!' * x)
        x -= 1

spam!spam!spam!spam!
spam!spam!spam!
spam!spam!
spam!

We’ll discuss looping statements, syntax, and tools in depth later in the book. First, though, I need to confess that this section has not been as forthcoming as it might have been. Really, the for loop, and all its cohorts that step through objects from left to right, are not just sequence operations, they are iterable operations—as the next section describes.

Iteration and Optimization

If the last section’s for loop looks like the list comprehension expression introduced earlier, it should: both are really general iteration tools. In fact, both will work on any iterable object that follows the iteration protocol—pervasive ideas in Python that underlie all its iteration tools.

In a nutshell, an object is iterable if it is either a physically stored sequence in memory, or an object that generates one item at a time in the context of an iteration operation—a sort of “virtual” sequence. More formally, both types of objects are considered iterable because they support the iteration protocol—they respond to the iter call with an object that advances in response to next calls and raises an exception when finished producing values.

The generator comprehension expression we saw earlier is such an object: its values aren’t stored in memory all at once, but are produced as requested, usually by iteration tools. Python file objects similarly iterate line by line when used by an iteration tool: file content isn’t in a list, it’s fetched on demand. Both are iterable objects in Python—a category that expands in 3.X to include core tools like range and map. By deferring results as needed, these tools can both save memory and minimize delays.

I’ll have more to say about the iteration protocol later in this book. For now, keep in mind that every Python tool that scans an object from left to right uses the iteration protocol. This is why the sorted call used in the prior section works on the dictionary directly—we don’t have to call the keys method to get a sequence because dictionaries are iterable objects, with a next that returns successive keys.

It may also help you to see that any list comprehension expression, such as this one, which computes the squares of a list of numbers:

>>> squares = [x ** 2 for x in [1, 2, 3, 4, 5]]
>>> squares
[1, 4, 9, 16, 25]

can always be coded as an equivalent for loop that builds the result list manually by appending as it goes:

>>> squares = []
>>> for x in [1, 2, 3, 4, 5]:          # This is what a list comprehension does
        squares.append(x ** 2)         # Both run the iteration protocol internally

>>> squares
[1, 4, 9, 16, 25]

Both tools leverage the iteration protocol internally and produce the same result. The list comprehension, though, and related functional programming tools like map and filter, will often run faster than a for loop today on some types of code (perhaps even twice as fast)—a property that could matter in your programs for large data sets. Having said that, though, I should point out that performance measures are tricky business in Python because it optimizes so much, and they may vary from release to release.

A major rule of thumb in Python is to code for simplicity and readability first and worry about performance later, after your program is working, and after you’ve proved that there is a genuine performance concern. More often than not, your code will be quick enough as it is. If you do need to tweak code for performance, though, Python includes tools to help you out, including the time and timeit modules for timing the speed of alternatives, and the profile module for isolating bottlenecks.

You’ll find more on these later in this book (see especially Chapter 21’s benchmarking case study) and in the Python manuals. For the sake of this preview, let’s move ahead to the next core data type.

Tuples

The tuple object (pronounced “toople” or “tuhple,” depending on whom you ask) is roughly like a list that cannot be changed—tuples are sequences, like lists, but they are immutable, like strings. Functionally, they’re used to represent fixed collections of items: the components of a specific calendar date, for instance. Syntactically, they are normally coded in parentheses instead of square brackets, and they support arbitrary types, arbitrary nesting, and the usual sequence operations:

>>> T = (1, 2, 3, 4)            # A 4-item tuple
>>> len(T)                      # Length
4

>> T + (5, 6)                   # Concatenation
(1, 2, 3, 4, 5, 6)

>>> T[0]                        # Indexing, slicing, and more
1

Tuples also have type-specific callable methods as of Python 2.6 and 3.0, but not nearly as many as lists:

>>> T.index(4)                  # Tuple methods: 4 appears at offset 3
3
>>> T.count(4)                  # 4 appears once
1

The primary distinction for tuples is that they cannot be changed once created. That is, they are immutable sequences (one-item tuples like the one here require a trailing comma):

>>> T[0] = 2                    # Tuples are immutable
...error text omitted...
TypeError: 'tuple' object does not support item assignment

>>> T = (2,) + T[1:]            # Make a new tuple for a new value
>>> T
(2, 2, 3, 4)

Like lists and dictionaries, tuples support mixed types and nesting, but they don’t grow and shrink because they are immutable (the parentheses enclosing a tuple’s items can often be omitted, as done here; in contexts where commas don’t otherwise matter, the commas are what actually builds a tuple):

>>> T = 'spam', 3.0, [11, 22, 33]
>>> T[1]
3.0
>>> T[2][1]
22
>>> T.append(4)
AttributeError: 'tuple' object has no attribute 'append'

Why Tuples?

So, why have a type that is like a list, but supports fewer operations? Frankly, tuples are not generally used as often as lists in practice, but their immutability is the whole point. If you pass a collection of objects around your program as a list, it can be changed anywhere; if you use a tuple, it cannot. That is, tuples provide a sort of integrity constraint that is convenient in programs larger than those we’ll write here. We’ll talk more about tuples later in the book, including an extension that builds upon them called named tuples. For now, though, let’s jump ahead to our last major core type: the file.

Files

File objects are Python code’s main interface to external files on your computer. They can be used to read and write text memos, audio clips, Excel documents, saved email messages, and whatever else you happen to have stored on your machine. Files are a core type, but they’re something of an oddball—there is no specific literal syntax for creating them. Rather, to create a file object, you call the built-in open function, passing in an external filename and an optional processing mode as strings.

For example, to create a text output file, you would pass in its name and the 'w' processing mode string to write data:

>>> f = open('data.txt', 'w')      # Make a new file in output mode ('w' is write)
>>> f.write('Hello\n')             # Write strings of characters to it
6
>>> f.write('world\n')             # Return number of items written in Python 3.X
6
>>> f.close()                      # Close to flush output buffers to disk

This creates a file in the current directory and writes text to it (the filename can be a full directory path if you need to access a file elsewhere on your computer). To read back what you just wrote, reopen the file in 'r' processing mode, for reading text input—this is the default if you omit the mode in the call. Then read the file’s content into a string, and display it. A file’s contents are always a string in your script, regardless of the type of data the file contains:

>>> f = open('data.txt')           # 'r' (read) is the default processing mode
>>> text = f.read()                # Read entire file into a string
>>> text
'Hello\nworld\n'

>>> print(text)                    # print interprets control characters
Hello
world

>>> text.split()                   # File content is always a string
['Hello', 'world']

Other file object methods support additional features we don’t have time to cover here. For instance, file objects provide more ways of reading and writing (read accepts an optional maximum byte/character size, readline reads one line at a time, and so on), as well as other tools (seek moves to a new file position). As we’ll see later, though, the best way to read a file today is to not read it at all—files provide an iterator that automatically reads line by line in for loops and other contexts:

>>> for line in open('data.txt'): print(line)

We’ll meet the full set of file methods later in this book, but if you want a quick preview now, run a dir call on any open file and a help on any of the method names that come back:

>>> dir(f)
[ ...many names omitted...
'buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush',
'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable',
'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable',
'write', 'writelines']

>>>help(f.seek)
...try it and see...

Binary Bytes Files

The prior section’s examples illustrate file basics that suffice for many roles. Technically, though, they rely on either the platform’s Unicode encoding default in Python 3.X, or the 8-bit byte nature of files in Python 2.X. Text files always encode strings in 3.X, and blindly write string content in 2.X. This is irrelevant for the simple ASCII data used previously, which maps to and from file bytes unchanged. But for richer types of data, file interfaces can vary depending on both content and the Python line you use.

As hinted when we met strings earlier, Python 3.X draws a sharp distinction between text and binary data in files: text files represent content as normal str strings and perform Unicode encoding and decoding automatically when writing and reading data, while binary files represent content as a special bytes string and allow you to access file content unaltered. Python 2.X supports the same dichotomy, but doesn’t impose it as rigidly, and its tools differ.

For example, binary files are useful for processing media, accessing data created by C programs, and so on. To illustrate, Python’s struct module can both create and unpack packed binary data—raw bytes that record values that are not Python objects—to be written to a file in binary mode. We’ll study this technique in detail later in the book, but the concept is simple: the following creates a binary file in Python 3.X (binary files work the same in 2.X, but the “b” string literal prefix isn’t required and won’t be displayed):

>>> import struct
>>> packed = struct.pack('>i4sh', 7, b'spam', 8)     # Create packed binary data
>>> packed                                           # 10 bytes, not objects or text
b'\x00\x00\x00\x07spam\x00\x08'
>>>
>>> file = open('data.bin', 'wb')                    # Open binary output file
>>> file.write(packed)                               # Write packed binary data
10
>>> file.close()

Reading binary data back is essentially symmetric; not all programs need to tread so deeply into the low-level realm of bytes, but binary files make this easy in Python:

>>> data = open('data.bin', 'rb').read()              # Open/read binary data file
>>> data                                              # 10 bytes, unaltered
b'\x00\x00\x00\x07spam\x00\x08'
>>> data[4:8]                                         # Slice bytes in the middle
b'spam'
>>> list(data)                                        # A sequence of 8-bit bytes
[0, 0, 0, 7, 115, 112, 97, 109, 0, 8]
>>> struct.unpack('>i4sh', data)                      # Unpack into objects again
(7, b'spam', 8)

Unicode Text Files

Text files are used to process all sorts of text-based data, from memos to email content to JSON and XML documents. In today’s broader interconnected world, though, we can’t really talk about text without also asking “what kind?”—you must also know the text’s Unicode encoding type if either it differs from your platform’s default, or you can’t rely on that default for data portability reasons.

Luckily, this is easier than it may sound. To access files containing non-ASCII Unicode text of the sort introduced earlier in this chapter, we simply pass in an encoding name if the text in the file doesn’t match the default encoding for our platform. In this mode, Python text files automatically encode on writes and decode on reads per the encoding scheme name you provide. In Python 3.X:

>>> S = 'sp\xc4m'                                          # Non-ASCII Unicode text
>>> S
'spÄm'
>>> S[2]                                                   # Sequence of characters
'Ä'

>>> file = open('unidata.txt', 'w', encoding='utf-8')      # Write/encode UTF-8 text
>>> file.write(S)                                          # 4 characters written
4
>>> file.close()

>>> text = open('unidata.txt', encoding='utf-8').read()    # Read/decode UTF-8 text
>>> text
'spÄm'
>>> len(text)                                              # 4 chars (code points)
4

This automatic encoding and decoding is what you normally want. Because files handle this on transfers, you may process text in memory as a simple string of characters without concern for its Unicode-encoded origins. If needed, though, you can also see what’s truly stored in your file by stepping into binary mode:

>>> raw = open('unidata.txt', 'rb').read()                 # Read raw encoded bytes
>>> raw
b'sp\xc3\x84m'
>>> len(raw)                                               # Really 5 bytes in UTF-8
5

You can also encode and decode manually if you get Unicode data from a source other than a file—parsed from an email message or fetched over a network connection, for example:

>>> text.encode('utf-8')                                   # Manual encode to bytes
b'sp\xc3\x84m'
>>> raw.decode('utf-8')                                    # Manual decode to str
'spÄm'

This is also useful to see how text files would automatically encode the same string differently under different encoding names, and provides a way to translate data to different encodings—it’s different bytes in files, but decodes to the same string in memory if you provide the proper encoding name:

>>> text.encode('latin-1')                                 # Bytes differ in others
b'sp\xc4m'
>>> text.encode('utf-16')
b'\xff\xfes\x00p\x00\xc4\x00m\x00'

>>> len(text.encode('latin-1')), len(text.encode('utf-16'))
(4, 10)

>>> b'\xff\xfes\x00p\x00\xc4\x00m\x00'.decode('utf-16')    # But same string decoded
'spÄm'

This all works more or less the same in Python 2.X, but Unicode strings are coded and display with a leading “u,” byte strings don’t require or show a leading “b,” and Unicode text files must be opened with codecs.open, which accepts an encoding name just like 3.X’s open, and uses the special unicode string to represent content in memory. Binary file mode may seem optional in 2.X since normal files are just byte-based data, but it’s required to avoid changing line ends if present (more on this later in the book):

>>> import codecs
>>> codecs.open('unidata.txt', encoding='utf8').read()     # 2.X: read/decode text
u'sp\xc4m'
>>> open('unidata.txt', 'rb').read()                       # 2.X: read raw bytes
'sp\xc3\x84m'
>>> open('unidata.txt').read()                             # 2.X: raw/undecoded too
'sp\xc3\x84m'

Although you won’t generally need to care about this distinction if you deal only with ASCII text, Python’s strings and files are an asset if you deal with either binary data (which includes most types of media) or text in internationalized character sets (which includes most content on the Web and Internet at large today). Python also supports non-ASCII file names (not just content), but it’s largely automatic; tools such as walkers and listers offer more control when needed, though we’ll defer further details until Chapter 37.

Other File-Like Tools

The open function is the workhorse for most file processing you will do in Python. For more advanced tasks, though, Python comes with additional file-like tools: pipes, FIFOs, sockets, keyed-access files, persistent object shelves, descriptor-based files, relational and object-oriented database interfaces, and more. Descriptor files, for instance, support file locking and other low-level tools, and sockets provide an interface for networking and interprocess communication. We won’t cover many of these topics in this book, but you’ll find them useful once you start programming Python in earnest.

Other Core Types

Beyond the core types we’ve seen so far, there are others that may or may not qualify for membership in the category, depending on how broadly it is defined. Sets, for example, are a recent addition to the language that are neither mappings nor sequences; rather, they are unordered collections of unique and immutable objects. You create sets by calling the built-in set function or using new set literals and expressions in 3.X and 2.7, and they support the usual mathematical set operations (the choice of new {...} syntax for set literals makes sense, since sets are much like the keys of a valueless dictionary):

>>> X = set('spam')                 # Make a set out of a sequence in 2.X and 3.X
>>> Y = {'h', 'a', 'm'}             # Make a set with set literals in 3.X and 2.7

>>> X, Y                            # A tuple of two sets without parentheses
({'m', 'a', 'p', 's'}, {'m', 'a', 'h'})

>>> X & Y                           # Intersection
{'m', 'a'}
>>> X | Y                           # Union
{'m', 'h', 'a', 'p', 's'}
>>> X - Y                           # Difference
{'p', 's'}
>>> X > Y                           # Superset
False

>>> {n ** 2 for n in [1, 2, 3, 4]}  # Set comprehensions in 3.X and 2.7
{16, 1, 4, 9}

Even less mathematically inclined programmers often find sets useful for common tasks such as filtering out duplicates, isolating differences, and performing order-neutral equality tests without sorting—in lists, strings, and all other iterable objects:

>>> list(set([1, 2, 1, 3, 1]))      # Filtering out duplicates (possibly reordered)
[1, 2, 3]
>>> set('spam') - set('ham')        # Finding differences in collections
{'p', 's'}
>>> set('spam') == set('asmp')      # Order-neutral equality ('spam'=='asmp' False)
True

Sets also support in membership tests, though all other collection types in Python do too:

>>> 'p' in set('spam'), 'p' in 'spam', 'ham' in ['eggs', 'spam', 'ham']
(True, True, True)

In addition, Python recently grew a few new numeric types: decimal numbers, which are fixed-precision floating-point numbers, and fraction numbers, which are rational numbers with both a numerator and a denominator. Both can be used to work around the limitations and inherent inaccuracies of floating-point math:

>>> 1 / 3                           # Floating-point (add a .0 in Python 2.X)
0.3333333333333333
>>> (2/3) + (1/2)
1.1666666666666665

>>> import decimal                  # Decimals: fixed precision
>>> d = decimal.Decimal('3.141')
>>> d + 1
Decimal('4.141')

>>> decimal.getcontext().prec = 2
>>> decimal.Decimal('1.00') / decimal.Decimal('3.00')
Decimal('0.33')

>>> from fractions import Fraction  # Fractions: numerator+denominator
>>> f = Fraction(2, 3)
>>> f + 1
Fraction(5, 3)
>>> f + Fraction(1, 2)
Fraction(7, 6)

Python also comes with Booleans (with predefined True and False objects that are essentially just the integers 1 and 0 with custom display logic), and it has long supported a special placeholder object called None commonly used to initialize names and objects:

>>> 1 > 2, 1 < 2                    # Booleans
(False, True)
>>> bool('spam')                    # Object's Boolean value
True

>>> X = None                        # None placeholder
>>> print(X)
None
>>> L = [None] * 100                # Initialize a list of 100 Nones
>>> L
[None, None, None, None, None, None, None, None, None, None, None, None,
None, None, None, None, None, None, None, None, ...a list of 100 Nones...]

How to Break Your Code’s Flexibility

I’ll have more to say about all of Python’s object types later, but one merits special treatment here. The type object, returned by the type built-in function, is an object that gives the type of another object; its result differs slightly in 3.X, because types have merged with classes completely (something we’ll explore in the context of “new-style” classes in Part VI). Assuming L is still the list of the prior section:

# In Python 2.X:
>>> type(L)                         # Types: type of L is list type object
<type 'list'>
>>> type(type(L))                   # Even types are objects
<type 'type'>

# In Python 3.X:
>>> type(L)                         # 3.X: types are classes, and vice versa
<class 'list'>
>>> type(type(L))                   # See Chapter 32 for more on class types
<class 'type'>

Besides allowing you to explore your objects interactively, the type object in its most practical application allows code to check the types of the objects it processes. In fact, there are at least three ways to do so in a Python script:

>>> if type(L) == type([]):         # Type testing, if you must...
        print('yes')

yes
>>> if type(L) == list:             # Using the type name
        print('yes')

yes
>>> if isinstance(L, list):         # Object-oriented tests
        print('yes')

yes

Now that I’ve shown you all these ways to do type testing, however, I am required by law to tell you that doing so is almost always the wrong thing to do in a Python program (and often a sign of an ex-C programmer first starting to use Python!). The reason why won’t become completely clear until later in the book, when we start writing larger code units such as functions, but it’s a (perhaps the) core Python concept. By checking for specific types in your code, you effectively break its flexibility—you limit it to working on just one type. Without such tests, your code may be able to work on a whole range of types.

This is related to the idea of polymorphism mentioned earlier, and it stems from Python’s lack of type declarations. As you’ll learn, in Python, we code to object interfaces (operations supported), not to types. That is, we care what an object does, not what it is. Not caring about specific types means that code is automatically applicable to many of them—any object with a compatible interface will work, regardless of its specific type. Although type checking is supported—and even required in some rare cases—you’ll see that it’s not usually the “Pythonic” way of thinking. In fact, you’ll find that polymorphism is probably the key idea behind using Python well.

User-Defined Classes

We’ll study object-oriented programming in Python—an optional but powerful feature of the language that cuts development time by supporting programming by customization—in depth later in this book. In abstract terms, though, classes define new types of objects that extend the core set, so they merit a passing glance here. Say, for example, that you wish to have a type of object that models employees. Although there is no such specific core type in Python, the following user-defined class might fit the bill:

>>> class Worker:
         def __init__(self, name, pay):          # Initialize when created
             self.name = name                    # self is the new object
             self.pay  = pay
         def lastName(self):
             return self.name.split()[-1]        # Split string on blanks
         def giveRaise(self, percent):
             self.pay *= (1.0 + percent)         # Update pay in place

This class defines a new kind of object that will have name and pay attributes (sometimes called state information), as well as two bits of behavior coded as functions (normally called methods). Calling the class like a function generates instances of our new type, and the class’s methods automatically receive the instance being processed by a given method call (in the self argument):

>>> bob = Worker('Bob Smith', 50000)             # Make two instances
>>> sue = Worker('Sue Jones', 60000)             # Each has name and pay attrs
>>> bob.lastName()                               # Call method: bob is self
'Smith'
>>> sue.lastName()                               # sue is the self subject
'Jones'
>>> sue.giveRaise(.10)                           # Updates sue's pay
>>> sue.pay
66000.0

The implied “self” object is why we call this an object-oriented model: there is always an implied subject in functions within a class. In a sense, though, the class-based type simply builds on and uses core types—a user-defined Worker object here, for example, is just a collection of a string and a number (name and pay, respectively), plus functions for processing those two built-in objects.

The larger story of classes is that their inheritance mechanism supports software hierarchies that lend themselves to customization by extension. We extend software by writing new classes, not by changing what already works. You should also know that classes are an optional feature of Python, and simpler built-in types such as lists and dictionaries are often better tools than user-coded classes. This is all well beyond the bounds of our introductory object-type tutorial, though, so consider this just a preview; for full disclosure on user-defined types coded with classes, you’ll have to read on. Because classes build upon other tools in Python, they are one of the major goals of this book’s journey.

And Everything Else

As mentioned earlier, everything you can process in a Python script is a type of object, so our object type tour is necessarily incomplete. However, even though everything in Python is an “object,” only those types of objects we’ve met so far are considered part of Python’s core type set. Other types in Python either are objects related to program execution (like functions, modules, classes, and compiled code), which we will study later, or are implemented by imported module functions, not language syntax. The latter of these also tend to have application-specific roles—text patterns, database interfaces, network connections, and so on.

Moreover, keep in mind that the objects we’ve met here are objects, but not necessarily object-oriented—a concept that usually requires inheritance and the Python class statement, which we’ll meet again later in this book. Still, Python’s core objects are the workhorses of almost every Python script you’re likely to meet, and they usually are the basis of larger noncore types.

Chapter Summary

And that’s a wrap for our initial data type tour. This chapter has offered a brief introduction to Python’s core object types and the sorts of operations we can apply to them. We’ve studied generic operations that work on many object types (sequence operations such as indexing and slicing, for example), as well as type-specific operations available as method calls (for instance, string splits and list appends). We’ve also defined some key terms, such as immutability, sequences, and polymorphism.

Along the way, we’ve seen that Python’s core object types are more flexible and powerful than what is available in lower-level languages such as C. For instance, Python’s lists and dictionaries obviate most of the work you do to support collections and searching in lower-level languages. Lists are ordered collections of other objects, and dictionaries are collections of other objects that are indexed by key instead of by position. Both dictionaries and lists may be nested, can grow and shrink on demand, and may contain objects of any type. Moreover, their space is automatically cleaned up as you go. We’ve also seen that strings and files work hand in hand to support a rich variety of binary and text data.

I’ve skipped most of the details here in order to provide a quick tour, so you shouldn’t expect all of this chapter to have made sense yet. In the next few chapters we’ll start to dig deeper, taking a second pass over Python’s core object types that will fill in details omitted here, and give you a deeper understanding. We’ll start off the next chapter with an in-depth look at Python numbers. First, though, here is another quiz to review.

Test Your Knowledge: Quiz

We’ll explore the concepts introduced in this chapter in more detail in upcoming chapters, so we’ll just cover the big ideas here:

Name four of Python’s core data types.
Why are they called “core” data types?
What does “immutable” mean, and which three of Python’s core types are considered immutable?
What does “sequence” mean, and which three types fall into that category?
What does “mapping” mean, and which core type is a mapping?
What is “polymorphism,” and why should you care?

Test Your Knowledge: Answers

Numbers, strings, lists, dictionaries, tuples, files, and sets are generally considered to be the core object (data) types. Types, None, and Booleans are sometimes classified this way as well. There are multiple number types (integer, floating point, complex, fraction, and decimal) and multiple string types (simple strings and Unicode strings in Python 2.X, and text strings and byte strings in Python 3.X).
They are known as “core” types because they are part of the Python language itself and are always available; to create other objects, you generally must call functions in imported modules. Most of the core types have specific syntax for generating the objects: 'spam', for example, is an expression that makes a string and determines the set of operations that can be applied to it. Because of this, core types are hardwired into Python’s syntax. In contrast, you must call the built-in open function to create a file object (even though this is usually considered a core type too).
An “immutable” object is an object that cannot be changed after it is created. Numbers, strings, and tuples in Python fall into this category. While you cannot change an immutable object in place, you can always make a new one by running an expression. Bytearrays in recent Pythons offer mutability for text, but they are not normal strings, and only apply directly to text if it’s a simple 8-bit kind (e.g., ASCII).
A “sequence” is a positionally ordered collection of objects. Strings, lists, and tuples are all sequences in Python. They share common sequence operations, such as indexing, concatenation, and slicing, but also have type-specific method calls. A related term, “iterable,” means either a physical sequence, or a virtual one that produces its items on request.
The term “mapping” denotes an object that maps keys to associated values. Python’s dictionary is the only mapping type in the core type set. Mappings do not maintain any left-to-right positional ordering; they support access to data stored by key, plus type-specific method calls.
“Polymorphism” means that the meaning of an operation (like a +) depends on the objects being operated on. This turns out to be a key idea (perhaps the key idea) behind using Python well—not constraining code to specific types makes that code automatically applicable to many types.

¹ Pardon my formality. I’m a computer scientist.

² In this book, the term literal simply means an expression whose syntax generates an object—sometimes also called a constant. Note that the term “constant” does not imply objects or variables that can never be changed (i.e., this term is unrelated to C++’s const or Python’s “immutable”—a topic explored in the section “Immutability”).

³ This matrix structure works for small-scale tasks, but for more serious number crunching you will probably want to use one of the numeric extensions to Python, such as the open source NumPy and SciPy systems. Such tools can store and process large matrixes much more efficiently than our nested list structure. NumPy has been said to turn Python into the equivalent of a free and more powerful version of the Matlab system, and organizations such as NASA, Los Alamos, JPL, and many others use this tool for scientific and financial tasks. Search the Web for more details.

⁴ Two application notes here. First, as a preview, the rec record we just created really could be an actual database record, when we employ Python’s object persistence system—an easy way to store native Python objects in simple files or access-by-key databases, which translates objects to and from serial byte streams automatically. We won’t go into details here, but watch for coverage of Python’s pickle and shelve persistence modules in Chapter 9, Chapter 28, Chapter 31, and Chapter 37, where we’ll explore them in the context of files, an OOP use case, classes, and 3.X changes, respectively.

Second, if you are familiar with JSON (JavaScript Object Notation)—an emerging data-interchange format used for databases and network transfers—this example may also look curiously similar, though Python’s support for variables, arbitrary expressions, and changes can make its data structures more general. Python’s json library module supports creating and parsing JSON text, but the translation to Python objects is often trivial. Watch for a JSON example that uses this record in Chapter 9 when we study files. For a larger use case, see MongoDB, which stores data using a language-neutral binary-encoded serialization of JSON-like documents, and its PyMongo interface.

Get Learning Python, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Chapter 4. Introducing Python Object Types

The Python Conceptual Hierarchy

Note

Why Use Built-in Types?

Python’s Core Data Types

Numbers

Strings

Sequence Operations

Immutability

Type-Specific Methods

Getting Help

Other Ways to Code Strings

Unicode Strings

Pattern Matching

Lists

Sequence Operations

Type-Specific Operations

Bounds Checking

Nesting

Comprehensions

Dictionaries

Mapping Operations

Nesting Revisited

Missing Keys: if Tests

Sorting Keys: for Loops

Iteration and Optimization

Tuples

Why Tuples?

Files

Binary Bytes Files

Unicode Text Files

Other File-Like Tools

Other Core Types

How to Break Your Code’s Flexibility

User-Defined Classes

And Everything Else

Chapter Summary

Test Your Knowledge: Quiz

Test Your Knowledge: Answers

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly