Bioinformatics Programming Using Python

Chapter 1. Primitives

Computer programs manipulate data. This chapter describes the simplest kinds of Python data and the simplest ways of manipulating them. An individual item of data is called a value. Every value in Python has a type that identifies the kind of value it is. For example, the type of 2 is int. You’ll get more comfortable with the concepts of types and values as you see more examples.

The Preface pointed out that Python is a multiparadigm programming language. The terms “type” and “value” come from traditional procedural programming. The equivalent object-oriented terms are class and object. We’ll mostly use the terms “type” and “value” early on, then gradually shift to using “class” and “object” more frequently. Although Python’s history is tied more to object-oriented programming than to traditional programming, we’ll use the term instance with both terminologies: each value is an instance of a particular type, and each object is an instance of a particular class.

Simple Values

Types for some simple kinds of values are an integral part of Python’s implementation. Four of these are used far more frequently than others: logical (Boolean), integer, float, and string. There is also a special no-value value called None.

When you enter a value in the Python interpreter, it prints it on the following line:

>>> 90
90
>>>

When the value is None, nothing is printed, since None means “nothing”:

>>> None
>>>

If you type something Python finds unacceptable in some way, you will see a multiline message describing the problem. Most of what this message says won’t make sense until we’ve covered some other topics, but the last line should be easy to understand and you should learn to pay attention to it. For example:

>>> Non
Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    Non
NameError: name 'Non' is not defined
>>>

When a # symbol appears on a line of code, Python ignores it and the rest of the line. Text following the # is called a comment. Typically comments offer information about the code to aid the reader, but they can include many other kinds of text: a programmer’s notes to fix or investigate something, a reference (documentation entry, book title, URL, etc.), and so on. They can even be used to “comment out” code lines that are not working or are obsolete but still of interest. The code examples that follow include occasional comments that point out important details.

Booleans

There are only two Boolean values : True and False. Their type is bool. Python names are “case-sensitive,” so true is not the same as True:

>>> True
True
>>> False
False

Integers

There’s not much to say about Python integers. Their type is int, and they can have as many digits as you want. They may be preceded by a plus or minus sign. Separators such as commas or periods are not used:

>>> 14
14
>>> −1
−1
>>> 1112223334445556667778889990000000000000        # a very large integer!
1112223334445556667778889990000000000000

Warning

Python 2: A distinction is made between integers that fit within a certain (large) range and those that are larger; the latter are a separate type called long.

Integers can also be entered in hexadecimal notation, which uses base 16 instead of base 10. The letters A through F represent the hexadecimal digits 10 through 15. Hexadecimal notation begins with 0x. For example:

>>> 0x12                       # (1 x 16 )+ 2
18
>>> 0xA40                      # (10 x 16 x 16) + (4 x 16) + 0
2624
>>> 0xFF                       # (15 x 16) + 15
255

The result of entering a hexadecimal number is still an integer—the only difference is in how you write it. Hexadecimal notation is used in a lot of computer-related contexts because each hexadecimal digit occupies one half-byte. For instance, colors on a web page can be specified as a set of three one-byte values indicating the red, green, and blue levels, such as FFA040.

Floats

“Float” is an abbreviated version of the term “floating point,” which refers to a number that is represented in computer hardware in the equivalent of scientific notation. Such numbers consist of two parts: digits and an exponent. The exponent is adjusted so the decimal point “floats” to just after the first digit (or just before, depending on the implementation), as in scientific notation.

The written form of a float always contains a decimal point and at least one digit after it:

>>> 2.5
2.5

You might occasionally see floats represented in a form of scientific notation, with the letter “e” separating the base from the exponent. When Python prints a number in scientific notation it will always have a single digit before the decimal point, some number of digits following the decimal point, a + or - following the e, and finally an integer. On input, there can be more than one digit before the decimal point. Regardless of the form used when entering a float, Python will output very small and very large numbers using scientific notation. (The exact cutoffs are dependent on the Python implementation.) Here are some examples:

>>> 2e4                        # Scientific notation, but...
20000.0                        # within the range of ordinary floats.
>>> 2e-2
0.02
>>>.0001                       # Within the range of ordinary floats
0.0001                         # so printed as an ordinary float.
>>>.00001                      # An innocent-looking float that is
1e-05                          # smaller than the lower limit, so e.
>>> 1002003004005000.          # A float with many digits that is
1002003004005000.0             # smaller than the upper limit, so no e.
>>> 100200300400500060.        # Finally, a float that is larger than the
1.0020030040050006e+17         # upper limit, so printed with an e.

Strings

Strings are series of Unicode^[5] characters. Their type is str. Many languages have a separate “character” type, but Python does not: a lone character is simply a string of length one. A string is enclosed in a pair of single or double quotes. Other than style preference, the main reason to choose one or the other kind of quote is to make it convenient to include the other kind inside a string.

If you want a string to span multiple lines, you must enclose it in a matched pair of three single or double quotes. Adding a backslash in front of certain characters causes those characters to be treated specially; in particular, '\n' represents a line break and '\t' represents a tab.

Warning

Python 2: Strings are composed of one-byte characters, not Unicode characters; there is a separate string type for Unicode, designated by preceding the string’s opening quote with the character u.

We will be working with strings a lot throughout this book, especially in representing DNA/RNA base and amino acid sequences. Here are the amino acid sequences for some unusually small bacterial restriction enzymes:^[6]

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'
MNKMDLVADVAEKTDLSKAKATEVIDAVFA
>>> "AARHQGRGAPCGESFWHWALGADGGHGHAQPPFRSSRLIGAERQPTSDCRQSLQQSPPC"
AARHQGRGAPCGESFWHWALGADGGHGHAQPPFRSSRLIGAERQPTSDCRQSLQQSPPC
>>> """MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS
LFPGEELK SLLKKTPDVV KAFPLLLAVR DESISLLD"""
'MKQLNFYKKN SLNNVQEVFS YFMETMISTN RTWEYFINWD KVFNGADKYR NELMKLNSLC GS
LFPGEELK\nSLLKKT PDVV KAFPLLLAVR DESISLLD'
>>> '''MWNSNLPKPN AIYVYGVANA NITFFKGSDI LSYETREVLL KYFDILDKDE RSLKNALKD LEN PFGFAPYI 
RKAYEHKRNF LTTTRLKASF RPTTF'''
'MWNSNLPKPN AIYVYGVANA NITFFKGSDI LSYETREVLL KYFDILDKDE RSLKNALKDL EN\nPFGF
APYI RKAYEHKRNF LTTTRLKASF RPTTF'

There are three situations that cause input or output to begin on a new line:

You hit Return as you are typing inside a triple-quoted string.
You keep typing characters until they “wrap around” to the next line before you press Return.
The interpreter responds with a string that is too long to fit on one line.

Only the first one is “real.” The other two are simply the effect of output “line wrapping” like what you would see in text editors or email programs. In the second and third situations, if you change the width of the window the input and output strings will be “rewrapped” to fit the new width. The first case does not cause a corresponding line break when the interpreter prints the string—the Return you typed becomes a '\n' in the string.

Normally, Python uses a pair of single quotes to enclose strings it prints. However, if the string contains single quotes (and no double quotes), it will use double quotes. It never prints strings using triple quotes; instead, the line breaks typed inside the string become '\n's.

Expressions

An operator is a symbol that indicates a calculation using one or more operands. The combination of the operator and its operand(s) is an expression.

Numeric Operators

A unary operator is one that is followed by a single operand. A binary operator is one that appears between two operands. It isn’t necessary to surround operators with spaces, but it is good style to do so. Incidentally, when used in a numeric expression, False is treated as 0 and True as 1.

Plus and minus can be used as either unary or binary operators:

>>> −1                # unary minus
-1
 >>> 4 + 2
6
>>> 4 − 1
3
>>> 4 * 3
12

The power operator is ** (i.e., n^k is written n ** k):

>>> 2 ** 10
1024

There are three operators for the division of one integer by another: / produces a float, // (floor division) an integer with the remainder ignored, and % (modulo) the remainder of the floor division. The formal definition of floor division is “the largest integer not greater than the result of the division”:

>>> 11 / 4
2.75
>>> 11 // 4           # "floor" division
2
>>> 11 % 4            # remainder of 11 // 3
3

Warning

Python 2: The / operator performs floor division when both operands are ints, but ordinary division if one or both operands are floats.

Whenever one or both of the operators in an arithmetic expression is a float, the result will be a float:

>>> 2.0 + 1
3.0
>>> 12 * 2.5
30.0
>>> 7.5 // 2
3.0

Warning

While the value of floor division is equal to an integer value, its type may not be integer! If both operands are ints, the result will be an int, but if either or both are floats, the result will be a float that represents an integer.

The result of an operation does not always print the way you might expect. Consider the following numbers:

>>> .009
.009
>>> .01
.01
>>> .029
.029
>>> .03
.03
>>> .001
.001

So far, everything is as expected. If we subtract the first from the second and the third from the fourth, we should in both cases get the result .001. Typing in .001 also gives the expected result. However, typing in the subtraction operations does not:

>>> .03 - .029
0.0009999999999999974
>>> .01 - .009
0.0010000000000000009

Strange results like this arise from two sources:

For a given base, only some rational numbers have “exact” representations—that is, their decimal-point representations terminate after a finite number of digits. The rest end in an infinitely repeating sequence of digits (e.g., 1/3 = 0.3333333...).
A computer stores rational numbers in a finite number of binary digits; the binary representation of a rational number may in fact have an exact binary representation, but one that would require more digits than are used.

Note

A rational number is one that can be expressed as a/b, where b is not zero; the decimal-point expression of a rational number in a given number system either has a finite number of digits or ends with an infinitely repeating sequence of digits. There’s nothing wrong with the binary system: whatever base is used, some real numbers have exact representations and others don’t. Just as only some rational numbers have exact decimal representations, only some rational numbers have exact binary representations.

As you can see from the results of the two division operations, the difference between the ideal rational number and its actual representation is quite small, but in certain kinds of computations the differences do accumulate.^[7]

Here’s an early lesson in an extremely important programming principle: don’t trust what you see! Everything printed in a computing environment or by a programming language is an interpretation of an internal representation. That internal representation may be manipulated in ways that are intended to be helpful but can be misleading. In the preceding example, 0.009 in fact does not have an exact binary representation. In Python 2, it would have printed as 0.0089999999999999993, and 0.003 would have printed as 0.0089999999999999993. The difference is that Python 3 implements a more sophisticated printing mechanism for rational numbers that makes some of them look as they would have had you typed them.

Logical Operations

Python, like other programming languages, provides operations on “truth values.” These follow the mathematical laws of Boolean logic. The classic Boolean operators are not, and, and or. In Python, those are written just that way rather than using special symbols:

>>> not True
False
>>> not False
True
>>> True and True
True
>>> True and False
False
>>> True or True
True
>>> True or False
True
>>> False and False
False
>>> False or True

Warning

The results of and and or operations are not converted to Booleans. For and expressions, the first operand is returned if it is false; otherwise, the second operand is returned. For or expressions, the first operand is returned if it is true; otherwise, the second operand is returned. For example:

>>> '' and 'A'
''                       # Not False: '' is a false value
>>> 0 and 1 or 2         # Read as (0 and 1) or 2
2                        # Not True: 2 is a false value

While confusing, this can be useful; we’ll see some examples later.

The operands of and and or can actually be anything. None, 0, 0.0, and the empty string, as well as the other kinds of “empty” values explained in Chapter 3, are considered False. Everything else is treated as True.

Note

To avoid repetition and awkward phrases, this book will use “true” and “false” in regular typeface to indicate values considered to be True and False, respectively. It will only use the capitalized words True and False in the code typeface when referring to those specific Boolean values.

There is one more logical operation in Python that forms a conditional expression. Written using the keywords if and else, it returns the value following the if when the condition is true and the value following the else when it is false. We’ll look at some more meaningful examples a bit later, but here are a few trivial examples that show what conditional expressions look like:

>>> 'yes' if 2 - 1 else 'no'
'yes'
>>> 'no' if 1 % 2  else 'no'
'no'

In addition to the Boolean operators, there are six comparison operators that return Boolean values : ==, !=, <, <=, >, and >=. These work with many different kinds of operands:

>>> 2 == 5 // 2
True
>>> 3 > 13 % 5
False
>>> 'one' < 'two'
True
>>> 'one' != 'one'
False

You may already be familiar with logical and comparison operations from other computer work you’ve done, if only entering spreadsheet formulas. If these are new to you, spend some time experimenting with them in the Python interpreter until you become comfortable with them. You will use them frequently in code you write.

String Operations

There are four binary operators that act on strings : in, not in, +, and *. The first three expect both operands to be strings. The last requires the other operator to be an integer. A one-character substring can be extracted with subscription and a longer substring by slicing. Both use square brackets, as we’ll see shortly.

String operators

The in and not in operators test whether the first string is a substring of the second one (starting at any position). The result is True or False:

>>> 'TATA' in 'TATATATATATATATATATATATA'
True
>>> 'AA' in 'TATATATATATATATATATATATA'
False
>>> 'AA' not in 'TATATATATATATATATATATATA'
True

A new string can be produced by concatenating two existing strings. The result is a string consisting of all the characters of the first operand followed by all the characters of the second. Concatenation is expressed with the plus operator:

>>> 'AC' + 'TG'
'ACTG'
>>> 'aaa' + 'ccc' + 'ttt' + 'ggg'
'aaaccctttggg'

A string can be repeated a certain number of times by multiplying it by an integer:

>>> 'TA' * 12
'TATATATATATATATATATATATA'
>>> 6 * 'TA'
'TATATATATATA'

Subscription

Subscription extracts a one-character substring of a string. Subscription is expressed with a pair of square brackets enclosing an integer-valued expression called an index. The first character is at position 0, not 1:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[0]
'M'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[1]
'N'

The index can also be negative, in which case the index is counted from the end of the string. The last character is at index −1:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[−1]
'A'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[−5]
'D'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[7 // 2]
'K'

As Figure 1-1 shows, starting at 0 from the beginning or end of a string, an index can be thought of as a label for the character to its right. The end of a string is the position one after the last element. If you are unfamiliar with indexing in programming languages, this is probably an easier way to visualize it than if you picture the indexes as aligned with the characters.

Figure 1-1. Index positions in strings

Attempting to extract a character before the first or after the last causes an error, as shown here:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[50]
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[50]
IndexError: string index out of range

The last line reports the nature of the error, while the next-to-last line shows the input that caused the error.

Slicing

Slicing extracts a series of characters from a string. You’ll use it often to clearly and concisely designate parts of strings. Figure 1-2 illustrates how it works.

Figure 1-2. String slicing

The character positions of a slice are specified by two or three integers inside square brackets, separated by colons. The first index indicates the position of the first character to be extracted. The second index indicates where the slice ends. The character at that position is not included in the slice. A slice [m:n] would therefore be read as “from character m up to but not including character n.” (We’ll explore the use of the third index momentarily). Here are a few slicing examples:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[1:4]
'NKM'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[4:-1]
'DLVADVAEKTDLSKAKATEVIDAVF'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-5:-4]
'D'

Either of the indexes can be positive, indicating “from the beginning,” or negative, indicating “from the end.” If neither of the two numbers is negative, the length of the resulting string is the difference between the second and the first. If either (or both) is negative, just add it to the length of the string to convert it to a nonnegative number.

What if the two numbers are the same? For example:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[5:5]
''

Since this reads as “from character 5 up to but not including character 5,” the result is an empty string. Now, what about character positions that are out of order—i.e., where the first character occurs after the second? This results in an empty string too:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[-4:-6]
''

For subscription, the index must designate a character in the string, but the rules for slicing are less constraining.

When the slice includes the beginning or end of the string, that part of the slice notation may be omitted. Note that omitting the second index is not the same as providing −1 as the second index—omitting the second index says to go up to the end of the string, one past the last character, whereas −1 means go up to the penultimate character (i.e., up to but not including the last character):

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:8]
'MNKMDLVADVAEKTDLSKAKAT'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[9:]
'VAEKTDLSKAKATEVIDAVFA'
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[9:-1]
'VAEKTDLSKAKATEVIDAVF'

In fact, both indexes can be omitted, in which case the entire string is selected:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:]
'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'

Finally, as mentioned earlier, a slice operation can specify a third number, also following a colon. This indicates a number of characters to skip after each one that is included, known as a step. When the third number is omitted, as it often is, the default is 1, meaning don’t skip any. Here’s a simple example:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[0:9:3]
'MMV'

This example’s result was obtained by taking the first, fourth, and seventh characters from the string. The step can be also be a negative integer. When the step is negative, the slice takes characters in reverse order. To get anything other than an empty string when you specify a negative step, the start index must be greater than the stop index:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[16:0:-4]
'SKDD'

Notice that the first character of the string is not included in this example’s results. The character at the stop index is never included. Omitting the second index so that it defaults to the beginning of the string—beginning, not end, because the step is negative—results in a string that does include the first character, assuming the step would select it. Changing the previous example to omit the 0 results in a longer string:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[16::-4]
'SKDDM'

Omitting the first index when the step is negative means start from the end of the string:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[:25:-1]
'AFVA'

A simple but nonobvious slice expression produces a reversed copy of a string: s[::-1]. This reads as “starting at the end of the string, take every character up to and including the first, in reverse order”:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'[::-1]
'AFVADIVETAKAKSLDTKEAVDAVLDMKNM'

Calls

We’ll look briefly at calls here, deferring details until later. A call is a kind of expression.

Function calls

The simplest kind of call invokes a function. A call to a function consists of a function name, a pair of parentheses, and zero or more argument expressions separated by commas. The function is called, does something, then returns a value. Before the function is called the argument expressions are evaluated, and the resulting values are passed to the function to be used as input to the computation it defines. An argument can be any kind of expression whose result has a type acceptable to the function. Those expressions can also include function calls.

Each function specifies the number of arguments it is prepared to receive. Most functions accept a fixed number—possibly zero—of arguments. Some accept a fixed number of required arguments plus some number of optional arguments. We will follow the convention used in the official Python documentation, which encloses optional arguments in square brackets. Some functions can even take an arbitrary number of arguments, which is shown by the use of an ellipsis.

Python has a fairly small number of “built-in” functions. Some of the more frequently used are:

len(arg): Returns the number of characters in arg (although it’s actually more general than that, as will be discussed later)
print(args...[, sep=seprstr][, end=endstr]): Prints the arguments, of which there may be any number, separating each by a seprstr (default ' ') and omitting certain technical details such as the quotes surrounding a string, and ending with an endstr (default '\n')

Warning

Python 2: print is a statement, not a function. There is no way to specify a separator. The only control over the end is that a final comma suppresses the newline.

input(string): Prompts the user by printing string, reads a line of input typed by the user (which ends when the Return or Enter key is pressed), and returns the line as a string

Warning

Python 2: The function’s name is raw_input.

Here are a few examples:

>>> len('TATA')
4
>>> print('AAT', 'AAC', 'AAG', 'AAA')
AAT AAC AAG AAA
>>> input('Enter a codon: ')
Enter a codon: CGC
'CGC'
>>>

Here are some common numeric functions in Python:

abs(value): Returns the absolute value of its argument
max(args...): Returns the maximum value of its arguments
min(args...): Returns the minimum value of its arguments

Types can be called as functions too. They take an argument and return a value of the type called. For example:

str(arg): Returns a string representation of its argument
int(arg): Returns an integer derived from its argument
float(arg): Returns a float derived from its argument
bool(arg): Returns False for None, zeros, empty strings, etc., and True otherwise; rarely used, because other types of values are automatically converted to Boolean values wherever Boolean values are expected

Here are some examples of these functions in action:

>>> str(len('TATA'))
'4'
>>> int(2.1)
2
>>> int('44')
44
>>> bool('')
False
>>> bool(' ')
True
>>> float(3)
3.0

Note

Using int is the only way to guarantee that the result of a division is an integer. As noted earlier, // is the floor operator and results in a float if either operand is a float.

There is a built-in help facility for use in the Python interpreter. Until we’ve explored more of Python, much of what the help functions print will probably appear strange or even unintelligible. Nevertheless, the help facility is a useful tool even at this early stage. You can use either of these commands to access it:

help(): Enters the interactive help facility
help(x): Prints information about x, which can be anything (a value, a type, a function, etc.); help for a type generally includes a long list of things that are part of the type’s implementation but not its general use, indicated by names beginning with underscores

Occasionally your code needs to test whether a value is an instance of a certain type; for example, it may do one thing with strings and another with numbers. You can do this with the following built-in function:

isinstance(x, sometype): Returns True if x is an instance of the type (class) sometype, and False otherwise

Method calls

Many different types of values can be supplied as arguments to Python’s built-in functions. Most functions, however, are part of the implementation of a specific type. These are called methods. Calling a method is just like calling a function, except that the first argument goes before the function name, followed by a period. For example, the method count returns the number of times its argument appears in the string that precedes it in the call. The following example returns 2 because the string 'DL' appears twice in the longer string:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.count('DL')
2

Except for having their first argument before the function name, calls to methods have the same features as calls to ordinary functions: optional arguments, indefinite number of arguments, etc. Here are some commonly used methods of the str type:

string1.count(string2[, start[, end]]): Returns the number of times string2 appears in string1. If start is specified, starts counting at that position in string1; if end is also specified, stops counting before that position in string1.
string1.find(string2[, start[, end]]): Returns the position of the last occurrence of string2 in string1; −1 means string2 was not found in string1. If start is specified, starts searching at that position in string1; if end is also specified, stops searching before that position in string1.
string1.startswith(string2[, start[, end]]): Returns True or False according to whether string2 starts with string1. If start is specified, uses that as the position at which to start the comparison; if end is also specified, stops searching before that position in string1.
string1.strip([string2]): Returns a string with all characters in string2 removed from its beginning and end; if string2 is not specified, all whitespace is removed.
string1.lstrip([string2]): Returns a string with all characters in string2 removed from its beginning; if string2 is not specified, all whitespace is removed.
string1.rstrip([string2]): Returns a string with all characters in string2 removed from its end; if string2 is not specified, all whitespace is removed.

Here are some examples of method calls in action:

>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.find('DL')
4
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.find('DL', 5)
14
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.find('DL', 5, 12)
-1
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.startswith('DL')
False
>>> 'MNKMDLVADVAEKTDLSKAKATEVIDAVFA'.startswith('DL', 4)
True

The restriction enzyme with the amino acid sequence in these examples recognizes the site with the base sequence TCCGGA. It’s easy enough to find the first location in a DNA base sequence where this occurs:

>>> 'AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA'.find('TCCGGA')
27
>>>

If the recognition site did not occur in the sequence, find would have returned −1.

Compound Expressions

The examples of operator expressions that we’ve looked at thus far have had only a single operator. However, just as in traditional algebra, operators can be compounded in a series. For example:

>>> 2 * 3 + 4 − 1
9

This is read as “2*3 is 6, 6+4 is 10, and 10−1 is 9.” The story isn’t quite that simple, though. Consider the following example:

>>> 4 + 2 * 3 − 1
9

Reading from left to right, we’d have “4+2 is 6, 6*3 is 18, 18−1 is 17,” not 9. So why do we get 9 as the result? Programming languages incorporate operator precedence rules that determine the order in which operations in a series should be performed. Like most programming languages, Python performs multiplications and divisions first and then goes back and performs additions and subtractions.

You can indicate your intended interpretation of a sequence of operations by surrounding parts of an expression with parentheses. Everything inside a pair of parentheses will be evaluated completely before the result is used in another operation. For instance, parentheses could be used as follows to make the result of the preceding example be 17:

>>> (4 + 2) * 3 − 1
17

Comparisons can be combined to form “between” expressions:

>>> 1 < 4 < 6
True
>>> 2 <= 2 < 5
True
>>> 2 < 2 < 5
False

Strings can participate in sequences of operations:

>>> 'tc' in ('ttt' + 'ccc' + 'ggg' + 'aaa')
True
>>> 'tc' in 't' * 3 + 'c' * 3 + 'g' * 3 + 'a' * 3
True

The second variation demonstrates that * has a higher precedence than +, and + has a higher precedence than in. Don’t hesitate to use parentheses if you have any doubt about the interpretation of operation series.

Here is a list of the operators mentioned in this chapter, ordered from highest precedence to lowest:

Calls
Slicings
Subscriptions
Exponentiation (**)
Unary +, -
Multiplication, division, and remainder (*, /, //, %)
Addition and subtraction (+, -)
Comparisons (==, !=, <, <=, >, >=)
Membership (in, not in)
Boolean not (not)
Boolean and (and)
Boolean or (or)

Tips, Traps, and Tracebacks

Tips

Don’t trust what you see! Everything printed out in a computing environment or by a programming language is an interpretation of an internal representation. The visible interpretation may not be what you anticipated, even though the internal representation is actually the result you expected.

Statements and expressions

The results of and and or expressions are not converted to Booleans. For and expressions, the first operand is returned if it is false, and otherwise the second operand is returned. For or expressions, the first operand is returned if it is true, and otherwise the second operand is returned. For example, '' and 'A' evaluates to '', not False, while '' or 'A' evaluates to 'A', not True. We’ll see examples later of idioms based on this behavior.
Function calls are both expressions and statements.
Experiment with using the sep and end keyword arguments to print. They give you more control over your output. The default is to separate every argument by a space and end with a newline.
A method call is simply a function call with its first argument moved before the function name, followed by a period.
If you are ever in doubt about the order in which an expression’s operations are performed, use parentheses to indicate the ordering you want. Parentheses can sometimes help make the code more readable. They are never required in operation expressions.

Running Python interactively

Start the Python interpreter from the command line^[8] by typing python at a command prompt. Here are a few points to keep in mind:
- If the initial message you see when Python starts indicates that its version number begins with a 2, exit and try typing python3. If that doesn’t work, try including a version number (e.g., python3.1 or python3.2).
- If that doesn’t work, either you don’t have Python 3 installed or it’s not on the path used in your command-line environment. If you don’t know how to add it, find someone knowledgeable about the command-line environment in your operating system to help you resolve the problem.
The way to exit Python follows each platform’s usual conventions: Ctrl-D on Unix-based systems, Ctrl-Z on Windows variants. You can also type quit().
In Unix and OS X shells, depending on how Python was installed, you may be able to edit the current line you are typing to Python and navigate back and forth in the history of inputs.^[9] After you’ve typed at least one line to Python, try Ctrl-P or the up arrow. If that changes the input to what you typed previously, the editing capability is functioning. You can use Ctrl-P or the down arrow to move to a later line in the input history. Following are some editing operations that work on the current line:
Ctrl-A
Go to the beginning of the line.
Ctrl-E
Go to the end of the line.
Ctrl-B or left arrow
Move one character to the left.
Ctrl-F or right arrow
Move one character to the right.
Backspace
Delete the preceding character.
Ctrl-D
Delete the next character.
Ctrl-K
Delete the rest of the line after the cursor.
Ctrl-Y
“Yank” the last killed text into the line at the location of the cursor.
Ctrl-_ (underscore)
Undo; can be repeated.
Ctrl-R
Search incrementally for a preceding input line.
Ctrl-S
Search incrementally for a subsequent input line.
Return
Give the current line to the interpreter. Similar functionality may be available when Python is run in a Windows command window.

Traps

The value of a floor division (//) equals an integer but has the type int only if both operands were ints; otherwise, the value is a float that prints with a 0 after the decimal point.
The result of an operation with a float operand may produce a result very slightly more or very slightly less than its “true” mathematical equivalent.
Remember that the first element of a string is at index 0 and the last at −1.
The index in a string indexing expression must be greater than or equal to 0 and less than the length of the string. (The restriction does not apply to slices.)
In a function call with more than one argument, every argument except the last must be followed by a comma. Usually omitting a comma will cause syntax errors, but in some situations you will accidentally end up with a syntactically correct expression that is not what you intended.
Omitting a right parenthesis that closes a function call’s argument list results in a syntax error message pointing to the line after the one containing the function call.
Function and method calls with no arguments must still be followed by (an empty pair of) parentheses. Failing to include them will not lead to a syntax error, because the value of the name of the function is the function itself—a legitimate value—but it will lead to very unexpected results and, often, runtime errors.

Tracebacks

Representative error messages include:

NameError: 'Non' is not defined: Python doesn’t recognize a name (more on this in the next chapter).
IndexError: string index out of range: For a string of length N, an index (i.e., the value between square brackets) must be in the range -N <= index < N-1.
SyntaxError: Python syntax violation.
ZeroDivisionError: /, //, or % with 0 as the second operand.

^[5]Unicode characters occupy between one and four bytes each in memory, depending on several factors. See http://docs.python.org/3.1/howto/unicode.html for details (in particular, http://docs.python.org/3.1/howto/unicode.html#encodings). For general information about Unicode outside of Python, consult http://www.unicode.org/standard/WhatIsUnicode.html, http://www.unicode.org/standard/principles.html, and http://www.unicode.org/resources.

^[6]Data for these examples was obtained from the “Official REBASE Homepage” site. Files in formats used by various applications can be downloaded from http://rebase.neb.com/rebase/rebase.files.html.

^[7]A computer science field called “numerical analysis” provides techniques for managing the accumulation of such errors in complex or repetitive computations.

^[8]Command line is a term that refers to an interactive terminal-like window: a Unix shell, OS X Terminal window, or Windows Command window. The command line prompts for input and executes the commands you type.

^[9]If not, the Python you are using was built without the readline system (not Python) library. If you configured, compiled, and installed Python yourself, you probably know how to get the readline library, install it, and repeat the configure-compile-install process. If not, you will have no idea what any of this is about, and there probably isn’t anything you can do about it.

Get Bioinformatics Programming Using Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Bioinformatics Programming Using Python by Mitchell L Model

Chapter 1. Primitives

Simple Values

Booleans

Integers

Warning

Floats

Strings

Warning

Expressions

Numeric Operators

Warning

Warning

Note

Logical Operations

Warning

Note

String Operations

String operators

Subscription

Slicing

Calls

Function calls

Warning

Warning

Note

Method calls

Compound Expressions

Tips, Traps, and Tracebacks

Tips

Statements and expressions

Running Python interactively

Traps

Tracebacks

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly