Chapter 1. Primitives
Computer programs manipulate data. This chapter describes the simplest kinds of
Python data and the simplest ways of
manipulating them. An individual item of data is called a value. Every value in Python has
a type that identifies the
kind of value it is. For example, the type of 2
is int
.
You’ll get more comfortable with the concepts of types and values as you see
more examples.
The Preface pointed out that Python is a multiparadigm programming language. The terms “type” and “value” come from traditional procedural programming. The equivalent object-oriented terms are class and object. We’ll mostly use the terms “type” and “value” early on, then gradually shift to using “class” and “object” more frequently. Although Python’s history is tied more to object-oriented programming than to traditional programming, we’ll use the term instance with both terminologies: each value is an instance of a particular type, and each object is an instance of a particular class.
Simple Values
Types for some simple kinds of values are an integral part of Python’s
implementation. Four of these are used far more frequently than others:
logical (Boolean), integer,
float, and string. There is also
a special no-value value called None
.
When you enter a value in the Python interpreter, it prints it on the following line:
>>> 90
90
>>>
When the value is None
, nothing is
printed, since None
means
“nothing”:
>>> None
>>>
If you type something Python finds unacceptable in some way, you will see a multiline message describing the problem. Most of what this message says won’t make sense until we’ve covered some other topics, but the last line should be easy to understand and you should learn to pay attention to it. For example:
>>> Non
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
Non
NameError: name 'Non' is not defined
>>>
When a #
symbol appears on a
line of code, Python ignores it and the rest of the line. Text following
the #
is called a comment. Typically comments offer
information about the code to aid the reader, but they can include many
other kinds of text: a programmer’s notes to fix or investigate something,
a reference (documentation entry, book title, URL, etc.), and so on. They
can even be used to “comment out” code lines that are not working or are
obsolete but still of interest. The code examples that follow include
occasional comments that point out important details.
Booleans
There are only two Boolean values: True
and False
. Their type is bool
. Python names
are “case-sensitive,” so true
is not
the same as True
:
>>>True
True >>>False
False
Integers
There’s not much to say about Python integers. Their type is int
, and they can have as many digits as you
want. They may be preceded by a plus or minus sign. Separators such as
commas or periods are not used:
>>>14
14 >>>−1
−1 >>>1112223334445556667778889990000000000000
# a very large integer! 1112223334445556667778889990000000000000
Warning
Python 2: A distinction is
made between integers that fit within a certain (large) range and
those that are larger; the latter are a separate type called long
.
Integers can also be entered in hexadecimal notation, which uses
base 16 instead of base 10. The
letters A through F represent the hexadecimal digits 10 through 15.
Hexadecimal notation begins with 0x
. For example:
>>>0x12
# (1 x 16 )+ 2 18 >>>0xA40
# (10 x 16 x 16) + (4 x 16) + 0 2624 >>>0xFF
# (15 x 16) + 15 255
The result of entering a hexadecimal number is still an
integer—the only difference is in how you write it. Hexadecimal notation
is used in a lot of computer-related contexts because each hexadecimal
digit occupies one half-byte. For instance, colors on a web page can be
specified as a set of three one-byte values indicating the red, green,
and blue levels, such as FFA040
.
Floats
“Float” is an abbreviated version of the term “floating point,” which refers to a number that is represented in computer hardware in the equivalent of scientific notation. Such numbers consist of two parts: digits and an exponent. The exponent is adjusted so the decimal point “floats” to just after the first digit (or just before, depending on the implementation), as in scientific notation.
The written form of a float
always
contains a decimal point and at least one digit after it:
>>> 2.5
2.5
You might occasionally see floats represented in a form of
scientific notation, with the letter “e” separating the
base from the exponent. When Python prints a number in scientific
notation it will always have a single digit before the decimal point,
some number of digits following the decimal point, a +
or -
following the e
, and finally an
integer. On input, there can be more than one digit before the decimal
point. Regardless of the form used when entering a float, Python will
output very small and very large numbers using scientific notation. (The
exact cutoffs are dependent on the Python implementation.) Here are some
examples:
>>>2e4
# Scientific notation, but... 20000.0 # within the range of ordinary floats. >>>2e-2
0.02 >>>.0001
# Within the range of ordinary floats 0.0001 # so printed as an ordinary float. >>>.00001
# An innocent-looking float that is 1e-05 # smaller than the lower limit, so e. >>>1002003004005000.
# A float with many digits that is 1002003004005000.0 # smaller than the upper limit, so no e. >>>100200300400500060.
# Finally, a float that is larger than the 1.0020030040050006e+17 # upper limit, so printed with an e.
Strings
Strings are series of Unicode[5] characters. Their type is str
. Many languages
have a separate “character” type, but Python does not: a lone character
is simply a string of length one. A string is enclosed in a pair of
single or double quotes. Other than style preference, the main reason to
choose one or the other kind of quote is to make it convenient to
include the other kind inside a string.
If you want a string to span multiple lines, you must enclose it in a matched pair
of three single or double quotes. Adding a backslash in front of certain
characters causes those characters to be treated specially; in
particular, '\n'
represents a line
break and '\t'
represents a
tab.
Warning
Python 2: Strings are
composed of one-byte characters, not Unicode characters; there is a
separate string type for Unicode, designated by preceding the string’s
opening quote with the character u
.
We will be working with strings a lot throughout this book, especially in representing DNA/RNA base and amino acid sequences. Here are the amino acid sequences for some unusually small bacterial restriction enzymes:[6]
>>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'
MNKMDLVADVAEKTDLSKAKATEVIDAVFA >>>"AARHQGRGAPCGESFWHWALGADGGHGHAQPPFRSSRLIGAERQPTSDCRQSLQQSPPC"
AARHQGRGAPCGESFWHWALGADGGHGHAQPPFRSSRLIGAERQPTSDCRQSLQQSPPC >>>"""MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS
LFPGEELK SLLKKTPDVV KAFPLLLAVR DESISLLD"""
'MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS LFPGEELK\nSLLKKT PDVV KAFPLLLAVR DESISLLD' >>>'''MWNSNLPKPN AIYVYGVANA NITFFKGSDI LSYETREVLL KYFDILDKDE RSLKNALKD LEN PFGFAPYI
RKAYEHKRNF LTTTRLKASF RPTTF'''
'MWNSNLPKPN AIYVYGVANA NITFFKGSDI LSYETREVLL KYFDILDKDE RSLKNALKDL EN\nPFGF APYI RKAYEHKRNF LTTTRLKASF RPTTF'
There are three situations that cause input or output to begin on a new line:
You hit Return as you are typing inside a triple-quoted string.
You keep typing characters until they “wrap around” to the next line before you press Return.
The interpreter responds with a string that is too long to fit on one line.
Only the first one is “real.” The other two are simply the effect
of output “line wrapping” like what you would see in text editors
or email programs. In the second and third situations, if you change the
width of the window the input and output strings will be “rewrapped” to
fit the new width. The first case does not cause a
corresponding line break when the interpreter prints the string—the
Return you typed becomes a '\n'
in
the string.
Normally, Python uses a pair of single quotes to enclose strings it prints. However, if
the string contains single quotes (and no double quotes), it will use
double quotes. It never prints strings using triple quotes; instead, the
line breaks typed inside the string become '\n'
s.
Expressions
An operator is a symbol that indicates a calculation using one or more operands. The combination of the operator and its operand(s) is an expression.
Numeric Operators
A unary operator is one that is followed by a single operand. A binary
operator is one that appears between two operands. It isn’t
necessary to surround operators with spaces, but it is good style to do
so. Incidentally, when used in a numeric expression, False
is treated as 0
and True
as 1
.
Plus and minus can be used as either unary or binary operators:
>>>−1
# unary minus -1 >>>4 + 2
6 >>>4 − 1
3 >>>4 * 3
12
The power operator is **
(i.e.,
nk is written n ** k
):
>>> 2 ** 10
1024
There are three operators for the division of one integer by another: /
produces a float, //
(floor
division) an integer with the remainder ignored,
and %
(modulo) the remainder of the floor division. The formal definition
of floor division is “the largest integer not greater than
the result of the division”:
>>>11 / 4
2.75 >>>11 // 4
# "floor" division 2 >>>11 % 4
# remainder of 11 // 3 3
Warning
Python 2: The /
operator performs floor division when both
operands are int
s, but ordinary
division if one or both operands are float
s.
Whenever one or both of the operators in an arithmetic expression is a float, the result will be a float:
>>>2.0 + 1
3.0 >>>12 * 2.5
30.0 >>>7.5 // 2
3.0
Warning
While the value of floor division is equal
to an integer value, its type may
not be integer! If both operands are int
s, the result will be an int
, but if either or both are float
s, the result will be a float
that represents an integer.
The result of an operation does not always print the way you might expect. Consider the following numbers:
>>>.009
.009 >>>.01
.01 >>>.029
.029 >>>.03
.03 >>>.001
.001
So far, everything is as expected. If we subtract the first from
the second and the third from the fourth, we should in both cases get
the result .001
. Typing in .001
also gives the expected result. However,
typing in the subtraction operations does not:
>>>.03 - .029
0.0009999999999999974 >>>.01 - .009
0.0010000000000000009
Strange results like this arise from two sources:
For a given base, only some rational numbers have “exact” representations—that is, their decimal-point representations terminate after a finite number of digits. The rest end in an infinitely repeating sequence of digits (e.g.,
1/3 = 0.3333333...
).A computer stores rational numbers in a finite number of binary digits; the binary representation of a rational number may in fact have an exact binary representation, but one that would require more digits than are used.
Note
A rational number is one that can be
expressed as a/b
, where b
is not zero; the decimal-point expression
of a rational number in a given number system either has a finite
number of digits or ends with an infinitely repeating sequence of
digits. There’s nothing wrong with the binary system: whatever base is
used, some real numbers have exact representations and others don’t.
Just as only some rational numbers have exact decimal representations,
only some rational numbers have exact binary representations.
As you can see from the results of the two division operations, the difference between the ideal rational number and its actual representation is quite small, but in certain kinds of computations the differences do accumulate.[7]
Here’s an early lesson in an extremely important programming
principle: don’t trust what you see! Everything
printed in a computing environment or by a programming language is an
interpretation of an internal representation. That internal
representation may be manipulated in ways that are intended to be
helpful but can be misleading. In the preceding example, 0.009
in fact does not
have an exact binary representation. In Python 2, it would have printed
as 0.0089999999999999993
, and
0.003
would have printed as 0.0089999999999999993
. The difference is that
Python 3 implements a more sophisticated printing mechanism for rational
numbers that makes some of them look as they would
have had you typed them.
Logical Operations
Python, like other programming languages, provides operations on “truth
values.” These follow the mathematical laws of Boolean
logic. The classic Boolean operators are not
, and
, and or
. In Python,
those are written just that way rather than using special
symbols:
>>>not True
False >>>not False
True >>>True and True
True >>>True and False
False >>>True or True
True >>>True or False
True >>>False and False
False >>>False or True
Warning
The results of and
and
or
operations are not converted to
Booleans. For and
expressions, the
first operand is returned if it is false; otherwise, the second
operand is returned. For or
expressions, the first operand is returned if it is true; otherwise,
the second operand is returned. For example:
>>>'' and 'A'
'' # Not False: '' is a false value >>>0 and 1 or 2
# Read as (0 and 1) or 2 2 # Not True: 2 is a false value
While confusing, this can be useful; we’ll see some examples later.
The operands of and
and or
can actually be anything. None
, 0
,
0.0
, and the empty string, as well as
the other kinds of “empty” values explained in Chapter 3, are considered False
. Everything else is treated as True
.
Note
To avoid repetition and awkward phrases, this book will use
“true” and “false” in regular typeface to indicate values considered
to be True
and False
, respectively. It will only use the
capitalized words True
and False
in the code typeface when referring to
those specific Boolean values.
There is one more logical operation in Python that forms
a conditional expression. Written
using the keywords if
and else
, it returns the value following the
if
when the condition is true and the
value following the else
when it is
false. We’ll look at some more meaningful examples a bit later, but here
are a few trivial examples that show what conditional expressions look
like:
>>>'yes' if 2 - 1 else 'no'
'yes' >>>'no' if 1 % 2 else 'no'
'no'
In addition to the Boolean operators, there are six comparison operators that return
Boolean values: ==
, !=
, <
,
<=
, >
, and >=
. These work with many different kinds of
operands:
>>>2 == 5 // 2
True >>>3 > 13 % 5
False >>>'one' < 'two'
True >>>'one' != 'one'
False
You may already be familiar with logical and comparison operations from other computer work you’ve done, if only entering spreadsheet formulas. If these are new to you, spend some time experimenting with them in the Python interpreter until you become comfortable with them. You will use them frequently in code you write.
String Operations
There are four binary operators that act on strings: in
, not in
, +
,
and *
. The first three expect both
operands to be strings. The last requires the other
operator to be an integer. A one-character substring can be extracted
with subscription and a longer substring
by slicing. Both use square brackets,
as we’ll see shortly.
String operators
The in
and not in
operators test whether the first string is a substring
of the second one (starting at any position). The result is True
or False
:
>>>'TATA' in 'TATATATATATATATATATATATA'
True >>>'AA' in 'TATATATATATATATATATATATA'
False >>>'AA' not in 'TATATATATATATATATATATATA'
True
A new string can be produced by concatenating two existing strings. The result is a string consisting of all the characters of the first operand followed by all the characters of the second. Concatenation is expressed with the plus operator:
>>>'AC' + 'TG'
'ACTG' >>>'aaa' + 'ccc' + 'ttt' + 'ggg'
'aaaccctttggg'
A string can be repeated a certain number of times by multiplying it by an integer:
>>>'TA' * 12
'TATATATATATATATATATATATA' >>>6 * 'TA'
'TATATATATATA'
Subscription
Subscription extracts a one-character substring of a string. Subscription is expressed with a pair of square brackets enclosing an integer-valued expression called an index. The first character is at position 0, not 1:
>>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[0]
'M' >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[1]
'N'
The index can also be negative, in which case the index is counted from the
end of the string. The last character is at index −1
:
>>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[−1]
'A' >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[−5]
'D' >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7 // 2]
'K'
As Figure 1-1 shows, starting
at 0
from the beginning or end of a
string, an index can be thought of as a label for the character to its
right. The end of a string is the position one after the last element.
If you are unfamiliar with indexing in programming languages, this is
probably an easier way to visualize it than if you picture the indexes
as aligned with the characters.
Attempting to extract a character before the first or after the last causes an error, as shown here:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[50]
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[50]
IndexError: string index out of range
The last line reports the nature of the error, while the next-to-last line shows the input that caused the error.
Slicing
Slicing extracts a series of characters from a string. You’ll use it often to clearly and concisely designate parts of strings. Figure 1-2 illustrates how it works.
The character positions of a slice are specified by two or three
integers inside square brackets, separated by colons. The first index
indicates the position of the first character to be extracted. The
second index indicates where the slice ends. The character at that
position is not included in the slice. A slice [m:n]
would therefore be read as “from
character m
up to but not including
character n
.” (We’ll explore the
use of the third index momentarily). Here are a few slicing
examples:
>>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[1:4]
'NKM' >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[4:-1]
'DLVADVAEKTDLSKAKATEVIDAVF' >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-5:-4]
'D'
Either of the indexes can be positive, indicating “from the beginning,” or negative, indicating “from the end.” If neither of the two numbers is negative, the length of the resulting string is the difference between the second and the first. If either (or both) is negative, just add it to the length of the string to convert it to a nonnegative number.
What if the two numbers are the same? For example:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[5:5]
''
Since this reads as “from character 5 up to but not including character 5,” the result is an empty string. Now, what about character positions that are out of order—i.e., where the first character occurs after the second? This results in an empty string too:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-4:-6]
''
For subscription, the index must designate a character in the string, but the rules for slicing are less constraining.
When the slice includes the beginning or end of the string, that
part of the slice notation may be omitted. Note that omitting the
second index is not the same as providing −1
as the second index—omitting the second
index says to go up to the end of the string, one past the last
character, whereas −1
means go up
to the penultimate character (i.e., up to but not including the last
character):
>>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:8]
'MNKMDLVADVAEKTDLSKAKAT' >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[9:]
'VAEKTDLSKAKATEVIDAVFA' >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[9:-1]
'VAEKTDLSKAKATEVIDAVF'
In fact, both indexes can be omitted, in which case the entire string is selected:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:]
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'
Finally, as mentioned earlier, a slice operation can specify a
third number, also following a colon. This indicates a number of
characters to skip after each one that is included, known as a
step. When the third number is omitted, as it
often is, the default is 1
, meaning
don’t skip any. Here’s a simple example:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[0:9:3]
'MMV'
This example’s result was obtained by taking the first, fourth, and seventh characters from the string. The step can be also be a negative integer. When the step is negative, the slice takes characters in reverse order. To get anything other than an empty string when you specify a negative step, the start index must be greater than the stop index:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[16:0:-4]
'SKDD'
Notice that the first character of the string is not included in
this example’s results. The character at the stop index is never
included. Omitting the second index so that it defaults to the
beginning of the string—beginning, not end, because the step is
negative—results in a string
that does include the first character, assuming the step would select
it. Changing the previous example to omit the 0
results in a longer string:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[16::-4]
'SKDDM'
Omitting the first index when the step is negative means start from the end of the string:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:25:-1]
'AFVA'
A simple but nonobvious slice expression produces a reversed
copy of a string: s[::-1]
. This
reads as “starting at the end of the string, take every character up
to and including the first, in reverse order”:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[::-1]
'AFVADIVETAKAKSLDTKEAVDAVLDMKNM'
Calls
We’ll look briefly at calls here, deferring details until later. A call is a kind of expression.
Function calls
The simplest kind of call invokes a function. A call to a function consists of a function name, a pair of parentheses, and zero or more argument expressions separated by commas. The function is called, does something, then returns a value. Before the function is called the argument expressions are evaluated, and the resulting values are passed to the function to be used as input to the computation it defines. An argument can be any kind of expression whose result has a type acceptable to the function. Those expressions can also include function calls.
Each function specifies the number of arguments it is prepared to receive. Most functions accept a fixed number—possibly zero—of arguments. Some accept a fixed number of required arguments plus some number of optional arguments. We will follow the convention used in the official Python documentation, which encloses optional arguments in square brackets. Some functions can even take an arbitrary number of arguments, which is shown by the use of an ellipsis.
Python has a fairly small number of “built-in” functions. Some of the more frequently used are:
len(
arg
)
Returns the number of characters in
arg
(although it’s actually more general than that, as will be discussed later)print(
args
...[, sep=
seprstr
][, end=
endstr
])
Prints the arguments, of which there may be any number, separating each by a
seprstr
(default' '
) and omitting certain technical details such as the quotes surrounding a string, and ending with anendstr
(default'\n'
)
Warning
Python 2: print
is a statement, not a function.
There is no way to specify a separator. The only control over the
end is that a final comma suppresses the newline.
Warning
Python 2: The function’s
name is raw_input
.
Here are a few examples:
>>>len('TATA')
4 >>>print('AAT', 'AAC', 'AAG', 'AAA')
AAT AAC AAG AAA >>>input('Enter a codon: ')
Enter a codon:CGC
'CGC' >>>
Here are some common numeric functions in Python:
Types can be called as functions too. They take an argument and return a value of the type called. For example:
Here are some examples of these functions in action:
>>>str(len('TATA'))
'4' >>>int(2.1)
2 >>>int('44')
44 >>>bool('')
False >>>bool(' ')
True >>>float(3)
3.0
Note
Using int
is the only way
to guarantee that the result of a division is an integer. As noted
earlier, //
is the floor operator
and results in a float if either operand is a float.
There is a built-in help facility for use in the Python interpreter. Until we’ve explored more of Python, much of what the help functions print will probably appear strange or even unintelligible. Nevertheless, the help facility is a useful tool even at this early stage. You can use either of these commands to access it:
help()
Enters the interactive help facility
help(
x
)
Prints information about
x
, which can be anything (a value, a type, a function, etc.); help for a type generally includes a long list of things that are part of the type’s implementation but not its general use, indicated by names beginning with underscores
Occasionally your code needs to test whether a value is an instance of a certain type; for example, it may do one thing with strings and another with numbers. You can do this with the following built-in function:
Method calls
Many different types of values can be supplied as arguments to Python’s
built-in functions. Most functions, however, are part of the
implementation of a specific type. These are called methods. Calling a method is just
like calling a function, except that the first argument goes before the function name, followed by a
period. For example, the method count
returns the number of times its
argument appears in the string that precedes it in the call. The
following example returns 2
because
the string 'DL'
appears twice in
the longer string:
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.count('DL')
2
Except for having their first argument before the function name,
calls to methods have the same features as calls to ordinary
functions: optional arguments, indefinite number of arguments, etc.
Here are some commonly used methods of the str
type:
string1
.count(
string2
[,
start
[,
end
]])
Returns the number of times
string2
appears instring1
. Ifstart
is specified, starts counting at that position instring1
; ifend
is also specified, stops counting before that position instring1
.string1
.find(
string2
[,
start
[,
end
]])
Returns the position of the last occurrence of
string2
instring1
;−1
meansstring2
was not found instring1
. Ifstart
is specified, starts searching at that position instring1
; ifend
is also specified, stops searching before that position instring1
.string1
.startswith(
string2
[,
start
[,
end
]])
Returns
True
orFalse
according to whetherstring2
starts withstring1
. Ifstart
is specified, uses that as the position at which to start the comparison; ifend
is also specified, stops searching before that position instring1
.string1
.strip([
string2
])
Returns a string with all characters in
string2
removed from its beginning and end; ifstring2
is not specified, all whitespace is removed.string1
.lstrip([
string2
])
Returns a string with all characters in
string2
removed from its beginning; ifstring2
is not specified, all whitespace is removed.string1
.rstrip([
string2
])
Returns a string with all characters in
string2
removed from its end; ifstring2
is not specified, all whitespace is removed.
Here are some examples of method calls in action:
>>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.find('DL')
4 >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.find('DL', 5)
14 >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.find('DL', 5, 12)
-1 >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.startswith('DL')
False >>>'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.startswith('DL', 4)
True
The restriction enzyme with the amino acid sequence in these examples recognizes the site with the base sequence TCCGGA. It’s easy enough to find the first location in a DNA base sequence where this occurs:
>>> 'AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA'.find('TCCGGA')
27
>>>
If the recognition site did not occur in the sequence, find
would have returned −1
.
Compound Expressions
The examples of operator expressions that we’ve looked at thus far have had only a single operator. However, just as in traditional algebra, operators can be compounded in a series. For example:
>>> 2 * 3 + 4 − 1
9
This is read as “2*3 is 6, 6+4 is 10, and 10−1 is 9.” The story isn’t quite that simple, though. Consider the following example:
>>> 4 + 2 * 3 − 1
9
Reading from left to right, we’d have “4+2 is 6, 6*3 is 18, 18−1 is 17,” not 9. So why do we get 9 as the result? Programming languages incorporate operator precedence rules that determine the order in which operations in a series should be performed. Like most programming languages, Python performs multiplications and divisions first and then goes back and performs additions and subtractions.
You can indicate your intended interpretation of a sequence of operations by surrounding parts of an expression with parentheses. Everything inside a pair of parentheses will be evaluated completely before the result is used in another operation. For instance, parentheses could be used as follows to make the result of the preceding example be 17:
>>> (4 + 2) * 3 − 1
17
Comparisons can be combined to form “between” expressions:
>>>1 < 4 < 6
True >>>2 <= 2 < 5
True >>>2 < 2 < 5
False
Strings can participate in sequences of operations:
>>>'tc' in ('ttt' + 'ccc' + 'ggg' + 'aaa')
True >>>'tc' in 't' * 3 + 'c' * 3 + 'g' * 3 + 'a' * 3
True
The second variation demonstrates that *
has a higher precedence than +
, and +
has a higher precedence than in
.
Don’t hesitate to use parentheses if you have any doubt about the
interpretation of operation series.
Here is a list of the operators mentioned in this chapter, ordered from highest precedence to lowest:
Tips, Traps, and Tracebacks
Tips
Statements and expressions
The results of
and
andor
expressions are not converted to Booleans. Forand
expressions, the first operand is returned if it is false, and otherwise the second operand is returned. Foror
expressions, the first operand is returned if it is true, and otherwise the second operand is returned. For example,'' and 'A'
evaluates to''
, notFalse
, while'' or 'A'
evaluates to'A'
, notTrue
. We’ll see examples later of idioms based on this behavior.Function calls are both expressions and statements.
Experiment with using the
sep
andend
keyword arguments toprint
. They give you more control over your output. The default is to separate every argument by a space and end with a newline.A method call is simply a function call with its first argument moved before the function name, followed by a period.
If you are ever in doubt about the order in which an expression’s operations are performed, use parentheses to indicate the ordering you want. Parentheses can sometimes help make the code more readable. They are never required in operation expressions.
Running Python interactively
Start the Python interpreter from the command line[8] by typing
python
at a command prompt. Here are a few points to keep in mind:If the initial message you see when Python starts indicates that its version number begins with a 2, exit and try typing
python3
. If that doesn’t work, try including a version number (e.g.,python3.1
orpython3.2
).If that doesn’t work, either you don’t have Python 3 installed or it’s not on the path used in your command-line environment. If you don’t know how to add it, find someone knowledgeable about the command-line environment in your operating system to help you resolve the problem.
The way to exit Python follows each platform’s usual conventions: Ctrl-D on Unix-based systems, Ctrl-Z on Windows variants. You can also type
quit()
.In Unix and OS X shells, depending on how Python was installed, you may be able to edit the current line you are typing to Python and navigate back and forth in the history of inputs.[9] After you’ve typed at least one line to Python, try Ctrl-P or the up arrow. If that changes the input to what you typed previously, the editing capability is functioning. You can use Ctrl-P or the down arrow to move to a later line in the input history. Following are some editing operations that work on the current line:
- Ctrl-A
Go to the beginning of the line.
- Ctrl-E
Go to the end of the line.
- Ctrl-B or left arrow
Move one character to the left.
- Ctrl-F or right arrow
Move one character to the right.
- Backspace
Delete the preceding character.
- Ctrl-D
Delete the next character.
- Ctrl-K
Delete the rest of the line after the cursor.
- Ctrl-Y
“Yank” the last killed text into the line at the location of the cursor.
- Ctrl-_ (underscore)
Undo; can be repeated.
- Ctrl-R
Search incrementally for a preceding input line.
- Ctrl-S
Search incrementally for a subsequent input line.
- Return
Give the current line to the interpreter. Similar functionality may be available when Python is run in a Windows command window.
Traps
The value of a floor division (
//
) equals an integer but has the typeint
only if both operands wereint
s; otherwise, the value is afloat
that prints with a0
after the decimal point.The result of an operation with a
float
operand may produce a result very slightly more or very slightly less than its “true” mathematical equivalent.Remember that the first element of a string is at index
0
and the last at−1
.The index in a string indexing expression must be greater than or equal to
0
and less than the length of the string. (The restriction does not apply to slices.)In a function call with more than one argument, every argument except the last must be followed by a comma. Usually omitting a comma will cause syntax errors, but in some situations you will accidentally end up with a syntactically correct expression that is not what you intended.
Omitting a right parenthesis that closes a function call’s argument list results in a syntax error message pointing to the line after the one containing the function call.
Function and method calls with no arguments must still be followed by (an empty pair of) parentheses. Failing to include them will not lead to a syntax error, because the value of the name of the function is the function itself—a legitimate value—but it will lead to very unexpected results and, often, runtime errors.
Tracebacks
[5] Unicode characters occupy between one and four bytes each in memory, depending on several factors. See http://docs.python.org/3.1/howto/unicode.html for details (in particular, http://docs.python.org/3.1/howto/unicode.html#encodings). For general information about Unicode outside of Python, consult http://www.unicode.org/standard/WhatIsUnicode.html, http://www.unicode.org/standard/principles.html, and http://www.unicode.org/resources.
[6] Data for these examples was obtained from the “Official REBASE Homepage” site. Files in formats used by various applications can be downloaded from http://rebase.neb.com/rebase/rebase.files.html.
[7] A computer science field called “numerical analysis” provides techniques for managing the accumulation of such errors in complex or repetitive computations.
[8] Command line is a term that refers to an interactive terminal-like window: a Unix shell, OS X Terminal window, or Windows Command window. The command line prompts for input and executes the commands you type.
[9] If not, the Python you are using was built without the
readline
system (not
Python) library. If you configured, compiled, and installed
Python yourself, you probably know how to get the readline
library, install it, and
repeat the configure-compile-install process. If not, you will
have no idea what any of this is about, and there probably
isn’t anything you can do about it.
Get Bioinformatics Programming Using Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.