Chapter 4. Creating the Fibonacci Sequence: Writing, Testing, and Benchmarking Algorithms
Writing an implementation of the Fibonacci sequence is another step in the heroâs journey to becoming a coder. The Rosalind Fibonacci description notes that the genesis for the sequence was a mathematical simulation of breeding rabbits that relies on some important (and unrealistic) assumptions:
-
The first month starts with a pair of newborn rabbits.
-
Rabbits can reproduce after one month.
-
Every month, every rabbit of reproductive age mates with another rabbit of reproductive age.
-
Exactly one month after two rabbits mate, they produce a litter of the same size.
-
Rabbits are immortal and never stop mating.
The sequence always begins with the numbers 0 and 1. The subsequent numbers can be generated ad infinitum by adding the two immediately previous values in the list, as shown in Figure 4-1.
If you search the internet for solutions, youâll find dozens of different ways to generate the sequence. I want to focus on three fairly different approaches. The first solution uses an imperative approach where the algorithm strictly defines every step. The next solution uses a generator function, and the last will focus on a recursive solution. Recursion, while interesting, slows drastically as I try to generate more of the sequence, but it turns out the performance problems can be solved using caching.
You will learn:
-
How to manually validate arguments and throw errors
-
How to use a list as a stack
-
How to write a generator function
-
How to write a recursive function
-
Why recursive functions can be slow and how to fix this with memoization
-
How to use function decorators
Getting Started
The code and tests for this chapter are found in the 04_fib directory.
Start by copying the first solution to fib.py
:
$ cd 04_fib/ $ cp solution1_list.py fib.py
Ask for the usage to see how the parameters are defined.
You can use n
and k
, but I chose to use the names generations
and litter
:
$ ./fib.py -h usage: fib.py [-h] generations litter Calculate Fibonacci positional arguments: generations Number of generations litter Size of litter per generation optional arguments: -h, --help show this help message and exit
This will be the first program to accept arguments that are not strings. The Rosalind challenge indicates that the program should accept two positive integer values:
-
n
⤠40 representing the number of generations -
k
⤠5 representing the litter size produced by mate pairs
Try to pass noninteger values and notice how the program fails:
$ ./fib.py foo usage: fib.py [-h] generations litter fib.py: error: argument generations: invalid int value: 'foo'
You canât tell, but in addition to printing the brief usage and a helpful error message, the program also generated a nonzero exit value.
On the Unix command line, an exit value of 0
indicates success.
I think of this as âzero errors.â
In the bash
shell, I can inspect the $?
variable to look at the exit status of the most recent process.
For instance, the command echo Hello
should exit with a value of 0
, and indeed it does:
$ echo Hello Hello $ echo $? 0
Try the previously failing command again, and then inspect $?
:
$ ./fib.py foo usage: fib.py [-h] generations litter fib.py: error: argument generations: invalid int value: 'foo' $ echo $? 2
That the exit status is 2
is not as important as the fact that the value is not zero.
This is a well-behaved program because it rejects an invalid argument, prints a useful error message, and exits with a nonzero status.
If this program were part of a pipeline of data processing steps (such as a Makefile, discussed in Appendix A), a nonzero exit value would cause the entire process to stop, which is a good thing.
Programs that silently accept invalid values and fail quietly or not at all can lead to unreproducible results.
Itâs vitally important that programs properly validate arguments and fail very convincingly when they cannot proceed.
The program is very strict even about the type of number it accepts. The values must be integers. It will also repel any floating-point values:
$ ./fib.py 5 3.2 usage: fib.py [-h] generations litter fib.py: error: argument litter: invalid int value: '3.2'
All command-line arguments to the program are technically received as strings. Even though 5
on the command line looks like the number 5, itâs the character â5â. I am relying on argparse
in this situation to attempt to convert the value from a string to an integer. When that fails, argparse
generates these useful error messages.
Additionally, the program rejects values for the generations
and litter
parameters that are not in the allowed ranges.
Notice that the error message includes the name of the argument and the offending value to provide sufficient feedback to the user so you can fix it:
$ ./fib.py -3 2 usage: fib.py [-h] generations litter fib.py: error: generations "-3" must be between 1 and 40 $ ./fib.py 5 10 usage: fib.py [-h] generations litter fib.py: error: litter "10" must be between 1 and 5
The
generations
argument of-3
is not in the stated range of values.The
litter
argument of10
is too high.
Look at the first part of the solution to see how to make this work:
import argparse from typing import NamedTuple class Args(NamedTuple): """ Command-line arguments """ generations: int litter: int def get_args() -> Args: """ Get command-line arguments """ parser = argparse.ArgumentParser( description='Calculate Fibonacci', formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument('gen', metavar='generations', type=int, help='Number of generations') parser.add_argument('litter', metavar='litter', type=int, help='Size of litter per generation') args = parser.parse_args() if not 1 <= args.gen <= 40: parser.error(f'generations "{args.gen}" must be between 1 and 40') if not 1 <= args.litter <= 5: parser.error(f'litter "{args.litter}" must be between 1 and 5') return Args(generations=args.gen, litter=args.litter)
The
generations
field must be anint
.The
litter
field must also be anint
.The
gen
positional parameter is defined first, so it will receive the first positional value.The
type=int
indicates the required class of the value. Notice thatint
indicates the class itself, not the name of the class.The
litter
positional parameter is defined second, so it will receive the second positional value.Attempt to parse the arguments. Any failure will result in error messages and the program exiting with a nonzero value.
The
args.gen
value is now an actualint
value, so itâs possible to perform numeric comparisons on it. Check if it is in the acceptable range.Use the
parser.error()
function to generate an error and exit the program.Likewise check the value of the
args.litter
argument.Generate an error that includes information the user needs to fix the problem.
If the program makes it to this point, then the arguments are valid integer values in the accepted range, so return the
Args
.
I could check that the generations
and litter
values are in the correct ranges in the main()
function, but I prefer to do as much argument validation as possible inside the get_args()
function so that I can use the parser.error()
function to generate useful messages and exit the program with a nonzero value.
Remove the fib.py
program and start anew with new.py
or your preferred method for creating a program:
$ new.py -fp 'Calculate Fibonacci' fib.py Done, see new script "fib.py".
You can replace the get_args()
definition with the preceding code, then modify your main()
function like so:
def main() -> None: args = get_args() print(f'generations = {args.generations}') print(f'litter = {args.litter}')
Run your program with invalid inputs and verify that you see the kinds of error messages shown earlier. Try your program with acceptable values and verify that you see this kind of output:
$ ./fib.py 1 2 generations = 1 litter = 2
Run pytest
to see what your program passes and fails.
You should pass the first four tests and fail the fifth:
$ pytest -xv ========================== test session starts ========================== ... tests/fib_test.py::test_exists PASSED [ 14%] tests/fib_test.py::test_usage PASSED [ 28%] tests/fib_test.py::test_bad_generations PASSED [ 42%] tests/fib_test.py::test_bad_litter PASSED [ 57%] tests/fib_test.py::test_1 FAILED [ 71%] =============================== FAILURES ================================ ________________________________ test_1 _________________________________ def test_1(): """runs on good input""" rv, out = getstatusoutput(f'{RUN} 5 3') assert rv == 0 > assert out == '19' E AssertionError: assert 'generations = 5\nlitter = 3' == '19' E - 19 E + generations = 5 E + litter = 3 tests/fib_test.py:60: AssertionError ======================== short test summary info ======================== FAILED tests/fib_test.py::test_1 - AssertionError: assert 'generations... !!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!! ====================== 1 failed, 4 passed in 0.38s ======================
The first failing test. Testing halts here because of the
-x
flag.The program is run with
5
for the number of generations and3
for the litter size.The output should be
19
.This shows the two strings being compared are not equal.
The expected value was
19
.This is the output that was received.
The output from pytest
is trying very hard to point out exactly what went wrong.
It shows how the program was run and what was expected versus what was produced.
The program is supposed to print 19, which is the fifth number of the Fibonacci sequence when using a litter size of 3.
If you want to finish the program on your own, please jump right in.
You should use pytest
to verify that you are passing all the tests.
Also, run make test
to check your program using pylint
, flake8
, and mypy
.
If you want some guidance, Iâll cover the first approach I described.
An Imperative Approach
Figure 4-2 depicts the growth of the Fibonacci sequence. The smaller rabbits indicate nonbreeding pairs that must mature into larger, breeding pairs.
You can see that to generate any number after the first two, I need to know the two previous numbers. I can use this formula to describe the value of any position n of the Fibonacci sequence (F):
What kind of a data structure in Python would allow me to keep a sequence of numbers in order and refer to them by their position? A list. Iâll start off with F1 = 0 and F2 = 1:
>>> fib = [0, 1]
The F3 value is F2 + F1 = 1 + 0 = 1.
When generating the next number, Iâll always be referencing the last two elements of the sequence.
It will be easiest to use negative indexing to indicate a position from the end of the list.
The last value in a list is always at position -1
:
>>> fib[-1] 1
The penultimate value is at -2
:
>>> fib[-2] 0
I need to multiply this value by the litter size to calculate the number of offspring that generation created. To start, Iâll consider a litter size of 1:
>>> litter = 1 >>> fib[-2] * litter 0
I want to add these two numbers together and append the result to the list:
>>> fib.append((fib[-2] * litter) + fib[-1]) >>> fib [0, 1, 1]
If I do this again, I can see that the correct sequence is emerging:
>>> fib.append((fib[-2] * litter) + fib[-1]) >>> fib [0, 1, 1, 2]
I need to repeat this action generations
times.
(Technically it will be generations
-1 times because Python uses 0-based indexing.)
I can use Pythonâs range()
function to generate a list of numbers from 0
up to but not including the end value.
Iâm calling this function solely for the side effect of iterating a particular number of times and so donât need the values produced by the range()
function.
Itâs common to use the underscore (_
) variable to indicate oneâs intent to ignore a value:
>>> fib = [0, 1] >>> litter = 1 >>> generations = 5 >>> for _ in range(generations - 1): ... fib.append((fib[-2] * litter) + fib[-1]) ... >>> fib [0, 1, 1, 2, 3, 5]
This should be enough for you to create a solution that passes the tests. In the next section, Iâll cover two other solutions that highlight some very interesting parts of Python.
Solutions
All the following solutions share the same get_args()
shown previously.
Solution 1: An Imperative Solution Using a List as a Stack
Here is how I wrote my imperative solution. Iâm using a list as a kind of stack to keep track of past values. I donât need all the values, just the last two, but itâs pretty easy to keep growing the list and referring to the last two values:
def main() -> None: args = get_args() fib = [0, 1] for _ in range(args.generations - 1): fib.append((fib[-2] * args.litter) + fib[-1]) print(fib[-1])
Start with
0
and1
.Use the
range()
function to create the right number of loops.Append the next value to the sequence.
Print the last number of the sequence.
I used the _
variable name in the for
loop to indicate that I donât intend to use the variable. The underscore is a valid Python identifier, and itâs also a convention to use this to indicate a throwaway value. Linting tools, for instance, might see that Iâve assigned a variable some value but never used it, which would normally indicate a possible error. The underscore variable shows that I do not intend to use the value. In this case, Iâm using the range()
function purely for the side effect of creating the number of loops needed.
This is considered an imperative solution because the code directly encodes every instruction of the algorithm. When you read the recursive solution, you will see that the algorithm can be written in a more declarative manner, which also has unintended consequences that I must handle.
A slight variation on this would be to place this code inside a function Iâll call fib()
.
Note that itâs possible in Python to declare a function inside another function, as here Iâll create fib()
inside main()
.
I do this so I can reference the args.litter
parameter, creating a closure because the function is capturing the runtime value of the litter size:
def main() -> None: args = get_args() def fib(n: int) -> int: nums = [0, 1] for _ in range(n - 1): nums.append((nums[-2] * args.litter) + nums[-1]) return nums[-1] print(fib(args.generations))
Create a function called
fib()
that accepts an integer parametern
and returns an integer.This is the same code as before. Note this list is called
nums
so it doesnât clash with the function name.Use the
range()
function to iterate the generations. Use_
to ignore the values.The function references the
args.litter
parameter and so creates a closure.Use
return
to send the final value back to the caller.Call the
fib()
function with theargs.generations
parameter.
The scope of the fib()
function in the preceding example is limited to the main()
function.
Scope refers to the part of the program where a particular function name or variable is visible or legal.
I donât have to use a closure. Here is how I can express the same idea with a standard function:
def main() -> None: args = get_args() print(fib(args.generations, args.litter)) def fib(n: int, litter: int) -> int: nums = [0, 1] for _ in range(n - 1): nums.append((nums[-2] * litter) + nums[-1]) return nums[-1]
The
fib()
function must be called with two arguments.The function requires both the number of generations and the litter size. The function body is essentially the same.
In the preceding code, you see that I must pass two arguments to fib()
, whereas the closure required only one argument because the litter
was captured.
Binding values and reducing the number of parameters is a valid reason for creating a closure.
Another reason to write a closure is to limit the scope of a function.
The closure definition of fib()
is valid only inside the main()
function, but the preceding version is visible throughout the program.
Hiding a function inside another function makes it harder to test.
In this case, the fib()
function is almost the entire program, so the tests have already been written in tests/fib_test.py.
Solution 2: Creating a Generator Function
In the previous solution, I generated the Fibonacci sequence up to the value requested and then stopped; however, the sequence is infinite. Could I create a function that could generate all the numbers of the sequence? Technically, yes, but it would never finish, what with being infinite and all.
Python has a way to suspend a function that generates a possibly infinite sequence.
I can use yield
to return a value from a function, temporarily leaving the function later to resume at the same state when the next value is requested.
This kind of function is called a generator, and here is how I can use it to generate the sequence:
def fib(k: int) -> Generator[int, None, None]: x, y = 0, 1 yield x while True: yield y x, y = y * k, x + y
The type signature indicates the function takes the parameter
k
(litter size), which must be anint
. It returns a special function of the typeGenerator
which yieldsint
values and has no send or return types.I only ever need to track the last two generations, which I initialize to
0
and1
.Yield the
0
.Create an infinite loop.
Yield the last generation.
Set
x
(two generations back) to the current generation times the litter size. Sety
(one generation back) to the sum of the two current generations.
A generator acts like an iterator, producing values as requested by the code until it is exhausted.
Since this generator will only generate yield values, the send and return types are None
.
Otherwise, this code does exactly what the first version of the program did, only inside a fancy-pants generator function.
See Figure 4-3 to consider how the function works for two different litter sizes.
The type signature for Generator
looks a little complicated since it defines types for yield, send, and return.
I donât need to dive into it further here, but I recommend you read the docs on the typing
module.
Hereâs how to use this:
def main() -> None: args = get_args() gen = fib(args.litter) seq = [next(gen) for _ in range(args.generations + 1)] print(seq[-1])
The
fib()
function takes the litter size as an argument and returns a generator.Use the
next()
function to retrieve the next value from a generator. Use a list comprehension to do this the correct number of times to generate the sequence up to the requested value.Print the last number in the sequence.
The range()
is function different because the first version already had the 0
and 1
in place. Here I have to call the generator two extra times to produce those values.
Although I prefer the list comprehension, I donât need the entire list. I only care about the final value, so I could have written it like so:
def main() -> None: args = get_args() gen = fib(args.litter) answer = 0 for _ in range(args.generations + 1): answer = next(gen) print(answer)
Initialize the answer to
0
.Create the correct number of loops.
Get the value for the current generation.
Print the answer.
As it happens, itâs quite common to call a function repeatedly to generate a list, so there is a function to do this for us.
The itertools.islice()
function will âMake an iterator that returns selected elements from the iterable.â
Here is how I can use it:
def main() -> None: args = get_args() seq = list(islice(fib(args.litter), args.generations + 1)) print(seq[-1])
The first argument to
islice()
is the function that will be called, and the second argument is the number of times to call it. The function is lazy, so I uselist()
to coerce the values.Print the last value.
Since I only use the seq
variable one time, I could eschew that assignment.
If benchmarking proved the following to be the best-performing version, I might be willing to write a one-liner:
def main() -> None: args = get_args() print(list(islice(fib(args.litter), args.generations + 1))[-1])
Clever code is fun but can become unreadable.1 You have been warned.
Generators are cool but more complex than generating a list. They are the appropriate way to generate a very large or potentially infinite sequence of values because they are lazy, only computing the next value when your code requires it.
Solution 3: Using Recursion and Memoization
While there are many more fun ways to write an algorithm to produce an infinite series of numbers, Iâll show just one more using recursion, which is when a function calls itself:
def main() -> None: args = get_args() def fib(n: int) -> int: return 1 if n in (1, 2) \ else fib(n - 2) * args.litter + fib(n - 1) print(fib(args.generations))
Define a function called
fib()
that takes the number of the generation wanted as anint
and returns anint
.If the generation is
1
or2
, return1
. This is the all-important base case that does not make a recursive call.For all other cases, call the
fib()
function twice, once for two generations back and another for the previous generation. Factor in the litter size as before.Print the results of the
fib()
function for the given generations.
Hereâs another instance where I define a fib()
function as a closure inside the main()
function so as to use the args.litter
value inside the fib()
function. This is to close around args.litter
, effectively binding that value to the function. If I had defined the function outside the main()
function, I would have had to pass the args.litter
argument on the recursive calls.
This is a really elegant solution that gets taught in pretty much every introductory computer science class.
Itâs fun to study, but it turns out to be wicked slow because I end up calling the function so many times.
That is, fib(5)
needs to call fib(4)
and fib(3)
to add those values.
In turn, fib(4)
needs to call fib(3)
and fib(2)
, and so on.
Figure 4-4 shows that fib(5)
results in 14 function calls to produce 5 distinct values.
For instance, fib(2)
is calculated three times, but we only need to calculate it once.
To illustrate the problem, Iâll take a sampling of how long this program takes to finish up to the maximum n
of 40.
Again, Iâll use a for
loop in bash
to show you how I would commonly benchmark such a program on the command line:
$ for n in 10 20 30 40; > do echo "==> $n <==" && time ./solution3_recursion.py $n 1 > done ==> 10 <== 55 real 0m0.045s user 0m0.032s sys 0m0.011s ==> 20 <== 6765 real 0m0.041s user 0m0.031s sys 0m0.009s ==> 30 <== 832040 real 0m0.292s user 0m0.281s sys 0m0.009s ==> 40 <== 102334155 real 0m31.629s user 0m31.505s sys 0m0.043s
The jump from 0.29s for n
=30
to 31s for n
=40
is huge.
Imagine going to 50 and beyond.
I need to either find a way to speed this up or abandon all hope for recursion.
The solution is to cache previously calculated results.
This is called memoization, and there are many ways to implement this.
The following is one method.
Note you will need to import typing.Callable
:
def memoize(f: Callable) -> Callable: """ Memoize a function """ cache = {} def memo(x): if x not in cache: cache[x] = f(x) return cache[x] return memo
Define a function that takes a function (something that is callable) and returns a function.
Use a dictionary to store cached values.
Define
memo()
as a closure around the cache. The function will take some parameterx
when called.See if the argument value is in the cache.
If not, call the function with the argument and set the cache for that argument value to the result.
Return the cached value for the argument.
Return the new function.
Note that the memoize()
function returns a new function.
In Python, functions are considered first-class objects, meaning they can be used like other kinds of variablesâyou can pass them as arguments and overwrite their definitions.
The memoize()
function is an example of a higher-order function (HOF) because it takes other functions as arguments.
Iâll be using other HOFs, like filter()
and map()
, throughout the book.
To use the memoize()
function, I will define fib()
and then redefine it with the memoized version.
If you run this, you will see an almost instantaneous result no matter how high n
goes:
def main() -> None: args = get_args() def fib(n: int) -> int: return 1 if n in (1, 2) else fib(n - 2) * args.litter + fib(n - 1) fib = memoize(fib) print(fib(args.generations))
A preferred method to accomplish this uses a decorator, which is a function that modifies another function:
def main() -> None: args = get_args() @memoize def fib(n: int) -> int: return 1 if n in (1, 2) else fib(n - 2) * args.litter + fib(n - 1) print(fib(args.generations))
As fun as writing memoization functions is, it again turns out that this is such a common need that others have already solved it for us.
I can remove the memoize()
function and instead import the functools.lru_cache
(least-recently-used cache) function:
from functools import lru_cache
Decorate the fib()
function with the lru_cache()
function to get memoization with minimal distraction:
def main() -> None: args = get_args() @lru_cache() def fib(n: int) -> int: return 1 if n in (1, 2) else fib(n - 2) * args.litter + fib(n - 1) print(fib(args.generations))
Benchmarking the Solutions
Which is the fastest solution?
Iâve shown you how to use a for
loop in bash
with the time
command to compare the runtimes of commands:
$ for py in ./solution1_list.py ./solution2_generator_islice.py \ ./solution3_recursion_lru_cache.py; do echo $py && time $py 40 5; done ./solution1_list.py 148277527396903091 real 0m0.070s user 0m0.043s sys 0m0.016s ./solution2_generator_islice.py 148277527396903091 real 0m0.049s user 0m0.033s sys 0m0.013s ./solution3_recursion_lru_cache.py 148277527396903091 real 0m0.041s user 0m0.030s sys 0m0.010s
It would appear that the recursive solution using LRU caching is the fastest, but again I have very little dataâjust one run per program. Also, I have to eyeball this data and figure out which is the fastest.
Thereâs a better way.
I have installed a tool called hyperfine
to run each command many times and compare the results:
$ hyperfine -L prg ./solution1_list.py,./solution2_generator_islice.py,\ ./solution3_recursion_lru_cache.py '{prg} 40 5' --prepare 'rm -rf __pycache__' Benchmark #1: ./solution1_list.py 40 5 Time (mean ± Ï): 38.1 ms ± 1.1 ms [User: 28.3 ms, System: 8.2 ms] Range (min ⦠max): 36.6 ms ⦠42.8 ms 60 runs Benchmark #2: ./solution2_generator_islice.py 40 5 Time (mean ± Ï): 38.0 ms ± 0.6 ms [User: 28.2 ms, System: 8.1 ms] Range (min ⦠max): 36.7 ms ⦠39.2 ms 66 runs Benchmark #3: ./solution3_recursion_lru_cache.py 40 5 Time (mean ± Ï): 37.9 ms ± 0.6 ms [User: 28.1 ms, System: 8.1 ms] Range (min ⦠max): 36.6 ms ⦠39.4 ms 65 runs Summary './solution3_recursion_lru_cache.py 40 5' ran 1.00 ± 0.02 times faster than './solution2_generator_islice.py 40 5' 1.01 ± 0.03 times faster than './solution1_list.py 40 5'
It appears that hyperfine
ran each command 60-66 times, averaged the results, and found that the solution3_recursion_lru_cache.py
program is perhaps slightly faster.
Another benchmarking tool you might find useful is bench
, but you can search for other benchmarking tools on the internet that might suit your tastes more.
Whatever tool you use, benchmarking along with testing is vital to challenging assumptions about your code.
I used the --prepare
option to tell hyperfine
to remove the pycache directory before running the commands. This is a directory created by Python to cache bytecode of the program. If a programâs source code hasnât changed since the last time it was run, then Python can skip compilation and use the bytecode version that exists in the pycache directory. I needed to remove this as hyperfine
detected statistical outliers when running the commands, probably due to caching effects.
Testing the Good, the Bad, and the Ugly
For every challenge, I hope you spend part of your time reading through the tests.
Learning how to design and write tests is as important as anything else Iâm showing you.
As I mentioned before, my first tests check that the expected program exists and will produce usage statements on request.
After that, I usually give invalid inputs to ensure the program fails.
I would like to highlight the tests for bad n
and k
parameters.
They are essentially the same, so Iâll just show the first one as it demonstrates how to randomly select an invalid integer valueâone that is possibly negative or too high:
def test_bad_n(): """ Dies when n is bad """ n = random.choice(list(range(-10, 0)) + list(range(41, 50))) k = random.randint(1, 5) rv, out = getstatusoutput(f'{RUN} {n} {k}') assert rv != 0 assert out.lower().startswith('usage:') assert re.search(f'n "{n}" must be between 1 and 40', out)
Join the two lists of invalid number ranges and randomly select one value.
Select a random integer in the range from
1
to5
(both bounds inclusive).Run the program with the arguments and capture the output.
Make sure the program reported a failure (nonzero exit value).
Check the output begins with a usage statement.
Look for an error message describing the problem with the
n
argument.
I often like to use randomly selected invalid values when testing.
This partially comes from writing tests for students so that they wonât write programs that fail on a single bad input, but I also find it helps me to not accidentally code for a specific input value.
I havenât yet covered the random
module, but it gives you a way to make pseudorandom choices.
First, you need to import the module:
>>> import random
For instance, you can use random.randint()
to select a single integer from a given range:
>>> random.randint(1, 5) 2 >>> random.randint(1, 5) 5
Or use the random.choice()
function to randomly select a single value from some sequence.
Here I wanted to construct a discontiguous range of negative numbers separated from a range of positive numbers:
>>> random.choice(list(range(-10, 0)) + list(range(41, 50))) 46 >>> random.choice(list(range(-10, 0)) + list(range(41, 50))) -1
The tests that follow all provide good inputs to the program. For example:
def test_2(): """ Runs on good input """ rv, out = getstatusoutput(f'{RUN} 30 4') assert rv == 0 assert out == '436390025825'
These are values I was given while attempting to solve the Rosalind challenge.
The program should not fail on this input.
This is the correct answer per Rosalind.
Testing, like documentation, is a love letter to your future self. As tedious as testing may seem, youâll appreciate failing tests when you try to add a feature and end up accidentally breaking something that previously worked. Assiduously writing and running tests can prevent you from deploying broken programs.
Running the Test Suite on All the Solutions
Youâve seen that in each chapter I write multiple solutions to explore various ways to solve the problems.
I completely rely on my tests to ensure my programs are correct.
You might be curious to see how Iâve automated the process of testing every single solution.
Look at the Makefile and find the all
target:
$ cat Makefile .PHONY: test test: python3 -m pytest -xv --flake8 --pylint --mypy fib.py tests/fib_test.py all: ../bin/all_test.py fib.py
The all_test.py
program will overwrite the fib.py
program with each of the solutions before running the test suite. This could overwrite your solution. Be sure you commit your version to Git or at least make a copy before you run make all
or you could lose your work.
The following is the all_test.py
program that is run by the all
target.
Iâll break it into two parts, starting with the first part up to get_args()
.
Most of this should be familiar by now:
#!/usr/bin/env python3 """ Run the test suite on all solution*.py """ import argparse import os import re import shutil import sys from subprocess import getstatusoutput from functools import partial from typing import NamedTuple class Args(NamedTuple): """ Command-line arguments """ program: str quiet: bool # -------------------------------------------------- def get_args() -> Args: """ Get command-line arguments """ parser = argparse.ArgumentParser( description='Run the test suite on all solution*.py', formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument('program', metavar='prg', help='Program to test') parser.add_argument('-q', '--quiet', action='store_true', help='Be quiet') args = parser.parse_args() return Args(args.program, args.quiet)
The name of the program to test, which in this case is
fib.py
.A Boolean value of
True
orFalse
to create more or less output.The default type is
str
.The
action='store_true'
makes this a Boolean flag. If the flag is present the value will beTrue
, and it will beFalse
otherwise.
The main()
function is where the testing happens:
def main() -> None: args = get_args() cwd = os.getcwd() solutions = list( filter(partial(re.match, r'solution.*\.py'), os.listdir(cwd))) for solution in sorted(solutions): print(f'==> {solution} <==') shutil.copyfile(solution, os.path.join(cwd, args.program)) subprocess.run(['chmod', '+x', args.program], check=True) rv, out = getstatusoutput('make test') if rv != 0: sys.exit(out) if not args.quiet: print(out) print('Done.')
Get the current working directory, which will be the 04_fib directory if you are in that directory when running the command.
Find all the
solution*.py
files in the current directory.Both
filter()
andpartial()
are HOFs; Iâll explain them next.The filenames will be in random order, so iterate through the sorted files.
Copy the
solution*.py
file to the testing filename.Make the program executable.
Run the
make test
command, and capture the return value and output.See if the return value is not
0
.Exit this program while printing the output from the testing and returning a nonzero value.
Unless the program is supposed to be quiet, print the testing output.
Let the user know the program finishes normally.
In the preceding code, Iâm using sys.exit()
to immediately halt the program, print an error message, and return a nonzero exit value.
If you consult the documentation, youâll find you can invoke sys.exit()
with no arguments, an integer value, or a object like a string, which is what Iâm using:
exit(status=None, /) Exit the interpreter by raising SystemExit(status). If the status is omitted or None, it defaults to zero (i.e., success). If the status is an integer, it will be used as the system exit status. If it is another kind of object, it will be printed and the system exit status will be one (i.e., failure).
The preceding program also uses the functions filter()
or partial()
, which I havenât covered yet.
Both of these are HOFs.
Iâll explain how and why Iâm using them.
To start, the os.listdir()
function will return the entire contents of a directory, including files and directories:
>>> import os >>> files = os.listdir()
Thereâs a lot there, so Iâll import the pprint()
function from the pprint
module to pretty-print this:
>>> from pprint import pprint >>> pprint(files) ['solution3_recursion_memoize_decorator.py', 'solution2_generator_for_loop.py', '.pytest_cache', 'Makefile', 'solution2_generator_islice.py', 'tests', '__pycache__', 'fib.py', 'README.md', 'solution3_recursion_memoize.py', 'bench.html', 'solution2_generator.py', '.mypy_cache', '.gitignore', 'solution1_list.py', 'solution3_recursion_lru_cache.py', 'solution3_recursion.py']
I want to filter those to names that start with solution and end with .py.
On the command line, I would match this pattern using a file glob like solution*.py
, where the *
means zero or more of any character and the .
is a literal dot.
A regular expression version of this pattern is the slightly more complicated solution.*\.py
, where .
(dot) is a regex metacharacter representing any character, and *
(star or asterisk) means zero or more (see Figure 4-5).
To indicate a literal dot, I need to escape it with a backslash (\.
).
Note that itâs prudent to use the r-string (raw string) to enclose this pattern.
When the match is successful, a re.Match
object is returned:
>>> import re >>> re.match(r'solution.*\.py', 'solution1.py') <re.Match object; span=(0, 12), match='solution1.py'>
When a match fails, the None
value is returned.
I have to use type()
here because the None
value is not displayed in the REPL:
>>> type(re.match(r'solution.*\.py', 'fib.py')) <class 'NoneType'>
I want to apply this match to all the files returned by os.listdir()
.
I can use filter()
and the lambda
keyword to create an anonymous function.
Each filename in files
is passed as the name
argument used in the match.
The filter()
function will only return elements that return a truthy value from the given function, so those filenames that return None
when they fail to match are excluded:
>>> pprint(list(filter(lambda name: re.match(r'solution.*\.py', name), files))) ['solution3_recursion_memoize_decorator.py', 'solution2_generator_for_loop.py', 'solution2_generator_islice.py', 'solution3_recursion_memoize.py', 'solution2_generator.py', 'solution1_list.py', 'solution3_recursion_lru_cache.py', 'solution3_recursion.py']
You see that the re.match()
function takes two argumentsâa pattern and a string to match.
The partial()
function allows me to partially apply the function, and the result is a new function.
For example, the operator.add()
function expects two values and returns their sum:
>>> import operator >>> operator.add(1, 2) 3
I can create a function that adds 1
to any value, like so:
>>> from functools import partial >>> succ = partial(op.add, 1)
The succ()
function requires one argument, and will return the successor:
>>> succ(3) 4 >>> succ(succ(3)) 5
Likewise, I can create a function f()
that partially applies the re.match()
function with its first argument, a regular expression pattern:
>>> f = partial(re.match, r'solution.*\.py')
The f()
function is waiting for a string to apply the match:
>>> type(f('solution1.py')) <class 're.Match'> >>> type(f('fib.py')) <class 'NoneType'>
If you call it without an argument, you will get an exception:
>>> f() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: match() missing 1 required positional argument: 'string'
I can replace the lambda
with the partially applied function as the first argument to filter()
:
>>> pprint(list(filter(f, files))) ['solution3_recursion_memoize_decorator.py', 'solution2_generator_for_loop.py', 'solution2_generator_islice.py', 'solution3_recursion_memoize.py', 'solution2_generator.py', 'solution1_list.py', 'solution3_recursion_lru_cache.py', 'solution3_recursion.py']
My programming style leans heavily on purely functional programming ideas. I find this style to be like playing with LEGO bricksâsmall, well-defined, and tested functions can be composed into larger programs that work well.
Going Further
There are many different styles of programming, such as procedural, functional, object-oriented, and so forth.
Even within an object-oriented language like Python, I can use very different approaches to writing code.
The first solution could be considered a dynamic programming approach because you try to solve the larger problem by first solving smaller problems.
If you find recursive functions interesting, the Tower of Hanoi problem is another classic exercise.
Purely functional languages like Haskell mostly avoid constructs like for
loops and rely heavily on recursion and higher-order functions.
Both spoken and programming languages shape the way we think, and I encourage you to try solving this problem using other languages you know to see how you might write different solutions.
Review
Key points from this chapter:
-
Inside the
get_args()
function, you can perform manual validation of arguments and use theparser.error()
function to manually generateargparse
errors. -
You can use a list as a stack by pushing and popping elements.
-
Using
yield
in a function turns it into a generator. When the function yields a value, the value is returned and the state of the function is preserved until the next value is requested. Generators can be used to create a potentially infinite stream of values. -
A recursive function calls itself, and the recursion can cause serious performance issues. One solution is to use memoization to cache values and avoid recomputation.
-
Higher-order functions are functions that take other functions as arguments.
-
Pythonâs function decorators apply HOFs to other functions.
-
Benchmarking is an important technique for determining the best-performing algorithm. The
hyperfine
andbench
tools allow you to compare runtimes of commands over many iterations. -
The
random
module offers many functions for the pseudorandom selection of values.
1 As the legendary David St. Hubbins and Nigel Tufnel observed, âItâs such a fine line between stupid and clever.â
Get Mastering Python for Bioinformatics now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.