# Chapter 4. Creating the Fibonacci Sequence: Writing, Testing, and Benchmarking Algorithms

Writing an implementation of the Fibonacci sequence is another step in the heroâs journey to becoming a coder. The Rosalind Fibonacci description notes that the genesis for the sequence was a mathematical simulation of breeding rabbits that relies on some important (and unrealistic) assumptions:

• The first month starts with a pair of newborn rabbits.

• Rabbits can reproduce after one month.

• Every month, every rabbit of reproductive age mates with another rabbit of reproductive age.

• Exactly one month after two rabbits mate, they produce a litter of the same size.

• Rabbits are immortal and never stop mating.

The sequence always begins with the numbers 0 and 1. The subsequent numbers can be generated ad infinitum by adding the two immediately previous values in the list, as shown in FigureÂ 4-1.

If you search the internet for solutions, youâll find dozens of different ways to generate the sequence. I want to focus on three fairly different approaches. The first solution uses an imperative approach where the algorithm strictly defines every step. The next solution uses a generator function, and the last will focus on a recursive solution. Recursion, while interesting, slows drastically as I try to generate more of the sequence, but it turns out the performance problems can be solved using caching.

You will learn:

• How to manually validate arguments and throw errors

• How to use a list as a stack

• How to write a generator function

• How to write a recursive function

• Why recursive functions can be slow and how to fix this with memoization

• How to use function decorators

# Getting Started

The code and tests for this chapter are found in the 04_fib directory. Start by copying the first solution to `fib.py`:

```\$ cd 04_fib/
\$ cp solution1_list.py fib.py```

Ask for the usage to see how the parameters are defined. You can use `n` and `k`, but I chose to use the names `generations` and `litter`:

```\$ ./fib.py -h
usage: fib.py [-h] generations litter

Calculate Fibonacci

positional arguments:
generations  Number of generations
litter       Size of litter per generation

optional arguments:
-h, --help   show this help message and exit```

This will be the first program to accept arguments that are not strings. The Rosalind challenge indicates that the program should accept two positive integer values:

• `n` â¤ 40 representing the number of generations

• `k` â¤ 5 representing the litter size produced by mate pairs

Try to pass noninteger values and notice how the program fails:

```\$ ./fib.py foo
usage: fib.py [-h] generations litter
fib.py: error: argument generations: invalid int value: 'foo'```

You canât tell, but in addition to printing the brief usage and a helpful error message, the program also generated a nonzero exit value. On the Unix command line, an exit value of `0` indicates success. I think of this as âzero errors.â In the `bash` shell, I can inspect the `\$?` variable to look at the exit status of the most recent process. For instance, the command `echo Hello` should exit with a value of `0`, and indeed it does:

```\$ echo Hello
Hello
\$ echo \$?
0```

Try the previously failing command again, and then inspect `\$?`:

```\$ ./fib.py foo
usage: fib.py [-h] generations litter
fib.py: error: argument generations: invalid int value: 'foo'
\$ echo \$?
2```

That the exit status is `2` is not as important as the fact that the value is not zero. This is a well-behaved program because it rejects an invalid argument, prints a useful error message, and exits with a nonzero status. If this program were part of a pipeline of data processing steps (such as a Makefile, discussed in AppendixÂ A), a nonzero exit value would cause the entire process to stop, which is a good thing. Programs that silently accept invalid values and fail quietly or not at all can lead to unreproducible results. Itâs vitally important that programs properly validate arguments and fail very convincingly when they cannot proceed.

The program is very strict even about the type of number it accepts. The values must be integers. It will also repel any floating-point values:

```\$ ./fib.py 5 3.2
usage: fib.py [-h] generations litter
fib.py: error: argument litter: invalid int value: '3.2'```

All command-line arguments to the program are technically received as strings. Even though `5` on the command line looks like the number 5, itâs the character â5â. I am relying on `argparse` in this situation to attempt to convert the value from a string to an integer. When that fails, `argparse` generates these useful error messages.

Additionally, the program rejects values for the `generations` and `litter` parameters that are not in the allowed ranges. Notice that the error message includes the name of the argument and the offending value to provide sufficient feedback to the user so you can fix it:

```\$ ./fib.py -3 2
usage: fib.py [-h] generations litter
fib.py: error: generations "-3" must be between 1 and 40
\$ ./fib.py 5 10
usage: fib.py [-h] generations litter
fib.py: error: litter "10" must be between 1 and 5 ```

The `generations` argument of `-3` is not in the stated range of values.

The `litter` argument of `10` is too high.

Look at the first part of the solution to see how to make this work:

```import argparse
from typing import NamedTuple

class Args(NamedTuple):
""" Command-line arguments """
generations: int
litter: int

def get_args() -> Args:
""" Get command-line arguments """

parser = argparse.ArgumentParser(
description='Calculate Fibonacci',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)

metavar='generations',
type=int,
help='Number of generations')

metavar='litter',
type=int,
help='Size of litter per generation')

args = parser.parse_args()

if not 1 <= args.gen <= 40:
parser.error(f'generations "{args.gen}" must be between 1 and 40')

if not 1 <= args.litter <= 5:
parser.error(f'litter "{args.litter}" must be between 1 and 5')

return Args(generations=args.gen, litter=args.litter) ```

The `generations` field must be an `int`.

The `litter` field must also be an `int`.

The `gen` positional parameter is defined first, so it will receive the first positional value.

The `type=int` indicates the required class of the value. Notice that `int` indicates the class itself, not the name of the class.

The `litter` positional parameter is defined second, so it will receive the second positional value.

Attempt to parse the arguments. Any failure will result in error messages and the program exiting with a nonzero value.

The `args.gen` value is now an actual `int` value, so itâs possible to perform numeric comparisons on it. Check if it is in the acceptable range.

Use the `parser.error()` function to generate an error and exit the program.

Likewise check the value of the `args.litter` argument.

Generate an error that includes information the user needs to fix the problem.

If the program makes it to this point, then the arguments are valid integer values in the accepted range, so return the `Args`.

I could check that the `generations` and `litter` values are in the correct ranges in the `main()` function, but I prefer to do as much argument validation as possible inside the `get_args()` function so that I can use the `parser.error()` function to generate useful messages and exit the program with a nonzero value.

Remove the `fib.py` program and start anew with `new.py` or your preferred method for creating a program:

```\$ new.py -fp 'Calculate Fibonacci' fib.py
Done, see new script "fib.py".```

You can replace the `get_args()` definition with the preceding code, then modify your `main()` function like so:

```def main() -> None:
args = get_args()
print(f'generations = {args.generations}')
print(f'litter = {args.litter}')```

Run your program with invalid inputs and verify that you see the kinds of error messages shown earlier. Try your program with acceptable values and verify that you see this kind of output:

```\$ ./fib.py 1 2
generations = 1
litter = 2```

Run `pytest` to see what your program passes and fails. You should pass the first four tests and fail the fifth:

```\$ pytest -xv
========================== test session starts ==========================
...
tests/fib_test.py::test_exists PASSED                             [ 14%]
tests/fib_test.py::test_usage PASSED                              [ 28%]
tests/fib_test.py::test_1 FAILED                                  [ 71%]

=============================== FAILURES ================================
________________________________ test_1 _________________________________

def test_1():
"""runs on good input"""

rv, out = getstatusoutput(f'{RUN} 5 3')
assert rv == 0
>       assert out == '19'
E       AssertionError: assert 'generations = 5\nlitter = 3' == '19'
E         - 19
E         + generations = 5
E         + litter = 3

tests/fib_test.py:60: AssertionError
======================== short test summary info ========================
FAILED tests/fib_test.py::test_1 - AssertionError: assert 'generations...
!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!
====================== 1 failed, 4 passed in 0.38s ======================```

The first failing test. Testing halts here because of the `-x` flag.

The program is run with `5` for the number of generations and `3` for the litter size.

The output should be `19`.

This shows the two strings being compared are not equal.

The expected value was `19`.

This is the output that was received.

The output from `pytest` is trying very hard to point out exactly what went wrong. It shows how the program was run and what was expected versus what was produced. The program is supposed to print 19, which is the fifth number of the Fibonacci sequence when using a litter size of 3. If you want to finish the program on your own, please jump right in. You should use `pytest` to verify that you are passing all the tests. Also, run `make test` to check your program using `pylint`, `flake8`, and `mypy`. If you want some guidance, Iâll cover the first approach I described.

## An Imperative Approach

FigureÂ 4-2 depicts the growth of the Fibonacci sequence. The smaller rabbits indicate nonbreeding pairs that must mature into larger, breeding pairs.

You can see that to generate any number after the first two, I need to know the two previous numbers. I can use this formula to describe the value of any position n of the Fibonacci sequence (F):

$upper F Subscript n Baseline equals upper F Subscript n minus 1 Baseline plus upper F Subscript n minus 2$

What kind of a data structure in Python would allow me to keep a sequence of numbers in order and refer to them by their position? A list. Iâll start off with F1 = 0 and F2 = 1:

`>>> fib = [0, 1]`

The F3 value is F2 + F1 = 1 + 0 = 1. When generating the next number, Iâll always be referencing the last two elements of the sequence. It will be easiest to use negative indexing to indicate a position from the end of the list. The last value in a list is always at position `-1`:

```>>> fib[-1]
1```

The penultimate value is at `-2`:

```>>> fib[-2]
0```

I need to multiply this value by the litter size to calculate the number of offspring that generation created. To start, Iâll consider a litter size of 1:

```>>> litter = 1
>>> fib[-2] * litter
0```

I want to add these two numbers together and append the result to the list:

```>>> fib.append((fib[-2] * litter) + fib[-1])
>>> fib
[0, 1, 1]```

If I do this again, I can see that the correct sequence is emerging:

```>>> fib.append((fib[-2] * litter) + fib[-1])
>>> fib
[0, 1, 1, 2]```

I need to repeat this action `generations` times. (Technically it will be `generations`-1 times because Python uses 0-based indexing.) I can use Pythonâs `range()` function to generate a list of numbers from `0` up to but not including the end value. Iâm calling this function solely for the side effect of iterating a particular number of times and so donât need the values produced by the `range()` function. Itâs common to use the underscore (`_`) variable to indicate oneâs intent to ignore a value:

```>>> fib = [0, 1]
>>> litter = 1
>>> generations = 5
>>> for _ in range(generations - 1):
...     fib.append((fib[-2] * litter) + fib[-1])
...
>>> fib
[0, 1, 1, 2, 3, 5]```

This should be enough for you to create a solution that passes the tests. In the next section, Iâll cover two other solutions that highlight some very interesting parts of Python.

# Solutions

All the following solutions share the same `get_args()` shown previously.

## Solution 1: An Imperative Solution Using a List as a Stack

Here is how I wrote my imperative solution. Iâm using a list as a kind of stack to keep track of past values. I donât need all the values, just the last two, but itâs pretty easy to keep growing the list and referring to the last two values:

```def main() -> None:
args = get_args()

fib = [0, 1]
for _ in range(args.generations - 1):
fib.append((fib[-2] * args.litter) + fib[-1])

print(fib[-1]) ```

Start with `0` and `1`.

Use the `range()` function to create the right number of loops.

Append the next value to the sequence.

Print the last number of the sequence.

I used the `_` variable name in the `for` loop to indicate that I donât intend to use the variable. The underscore is a valid Python identifier, and itâs also a convention to use this to indicate a throwaway value. Linting tools, for instance, might see that Iâve assigned a variable some value but never used it, which would normally indicate a possible error. The underscore variable shows that I do not intend to use the value. In this case, Iâm using the `range()` function purely for the side effect of creating the number of loops needed.

This is considered an imperative solution because the code directly encodes every instruction of the algorithm. When you read the recursive solution, you will see that the algorithm can be written in a more declarative manner, which also has unintended consequences that I must handle.

A slight variation on this would be to place this code inside a function Iâll call `fib()`. Note that itâs possible in Python to declare a function inside another function, as here Iâll create `fib()` inside `main()`. I do this so I can reference the `args.litter` parameter, creating a closure because the function is capturing the runtime value of the litter size:

```def main() -> None:
args = get_args()

def fib(n: int) -> int:
nums = [0, 1]
for _ in range(n - 1):
nums.append((nums[-2] * args.litter) + nums[-1])
return nums[-1]

print(fib(args.generations)) ```

Create a function called `fib()` that accepts an integer parameter `n` and returns an integer.

This is the same code as before. Note this list is called `nums` so it doesnât clash with the function name.

Use the `range()` function to iterate the generations. Use `_` to ignore the values.

The function references the `args.litter` parameter and so creates a closure.

Use `return` to send the final value back to the caller.

Call the `fib()` function with the `args.generations` parameter.

The scope of the `fib()` function in the preceding example is limited to the `main()` function. Scope refers to the part of the program where a particular function name or variable is visible or legal.

I donât have to use a closure. Here is how I can express the same idea with a standard function:

```def main() -> None:
args = get_args()

print(fib(args.generations, args.litter))

def fib(n: int, litter: int) -> int:
nums = [0, 1]
for _ in range(n - 1):
nums.append((nums[-2] * litter) + nums[-1])

return nums[-1]```

The `fib()` function must be called with two arguments.

The function requires both the number of generations and the litter size. The function body is essentially the same.

In the preceding code, you see that I must pass two arguments to `fib()`, whereas the closure required only one argument because the `litter` was captured. Binding values and reducing the number of parameters is a valid reason for creating a closure. Another reason to write a closure is to limit the scope of a function. The closure definition of `fib()` is valid only inside the `main()` function, but the preceding version is visible throughout the program. Hiding a function inside another function makes it harder to test. In this case, the `fib()` function is almost the entire program, so the tests have already been written in tests/fib_test.py.

## Solution 2: Creating a Generator Function

In the previous solution, I generated the Fibonacci sequence up to the value requested and then stopped; however, the sequence is infinite. Could I create a function that could generate all the numbers of the sequence? Technically, yes, but it would never finish, what with being infinite and all.

Python has a way to suspend a function that generates a possibly infinite sequence. I can use `yield` to return a value from a function, temporarily leaving the function later to resume at the same state when the next value is requested. This kind of function is called a generator, and here is how I can use it to generate the sequence:

```def fib(k: int) -> Generator[int, None, None]:
x, y = 0, 1
yield x

while True:
yield y
x, y = y * k, x + y ```

The type signature indicates the function takes the parameter `k` (litter size), which must be an `int`. It returns a special function of the type `Generator` which yields `int` values and has no send or return types.

I only ever need to track the last two generations, which I initialize to `0` and `1`.

Yield the `0`.

Create an infinite loop.

Yield the last generation.

Set `x` (two generations back) to the current generation times the litter size. Set `y` (one generation back) to the sum of the two current generations.

A generator acts like an iterator, producing values as requested by the code until it is exhausted. Since this generator will only generate yield values, the send and return types are `None`. Otherwise, this code does exactly what the first version of the program did, only inside a fancy-pants generator function. See FigureÂ 4-3 to consider how the function works for two different litter sizes.

The type signature for `Generator` looks a little complicated since it defines types for yield, send, and return. I donât need to dive into it further here, but I recommend you read the docs on the `typing` module.

Hereâs how to use this:

```def main() -> None:
args = get_args()
gen = fib(args.litter)
seq = [next(gen) for _ in range(args.generations + 1)]
print(seq[-1]) ```

The `fib()` function takes the litter size as an argument and returns a generator.

Use the `next()` function to retrieve the next value from a generator. Use a list comprehension to do this the correct number of times to generate the sequence up to the requested value.

Print the last number in the sequence.

The `range()` is function different because the first version already had the `0` and `1` in place. Here I have to call the generator two extra times to produce those values.

Although I prefer the list comprehension, I donât need the entire list. I only care about the final value, so I could have written it like so:

```def main() -> None:
args = get_args()
gen = fib(args.litter)
for _ in range(args.generations + 1):

Initialize the answer to `0`.

Create the correct number of loops.

Get the value for the current generation.

As it happens, itâs quite common to call a function repeatedly to generate a list, so there is a function to do this for us. The `itertools.islice()` function will âMake an iterator that returns selected elements from the iterable.â Here is how I can use it:

```def main() -> None:
args = get_args()
seq = list(islice(fib(args.litter), args.generations + 1))
print(seq[-1]) ```

The first argument to `islice()` is the function that will be called, and the second argument is the number of times to call it. The function is lazy, so I use `list()` to coerce the values.

Print the last value.

Since I only use the `seq` variable one time, I could eschew that assignment. If benchmarking proved the following to be the best-performing version, I might be willing to write a one-liner:

```def main() -> None:
args = get_args()
print(list(islice(fib(args.litter), args.generations + 1))[-1])```

Clever code is fun but can become unreadable.1 You have been warned.

Generators are cool but more complex than generating a list. They are the appropriate way to generate a very large or potentially infinite sequence of values because they are lazy, only computing the next value when your code requires it.

## Solution 3: Using Recursion and Memoization

While there are many more fun ways to write an algorithm to produce an infinite series of numbers, Iâll show just one more using recursion, which is when a function calls itself:

```def main() -> None:
args = get_args()

def fib(n: int) -> int:
return 1 if n in (1, 2) \
else fib(n - 2) * args.litter + fib(n - 1)

print(fib(args.generations)) ```

Define a function called `fib()` that takes the number of the generation wanted as an `int` and returns an `int`.

If the generation is `1` or `2`, return `1`. This is the all-important base case that does not make a recursive call.

For all other cases, call the `fib()` function twice, once for two generations back and another for the previous generation. Factor in the litter size as before.

Print the results of the `fib()` function for the given generations.

Hereâs another instance where I define a `fib()` function as a closure inside the `main()` function so as to use the `args.litter` value inside the `fib()` function. This is to close around `args.litter`, effectively binding that value to the function. If I had defined the function outside the `main()` function, I would have had to pass the `args.litter` argument on the recursive calls.

This is a really elegant solution that gets taught in pretty much every introductory computer science class. Itâs fun to study, but it turns out to be wicked slow because I end up calling the function so many times. That is, `fib(5)` needs to call `fib(4)` and `fib(3)` to add those values. In turn, `fib(4)` needs to call `fib(3)` and `fib(2)`, and so on. FigureÂ 4-4 shows that `fib(5)` results in 14 function calls to produce 5 distinct values. For instance, `fib(2)` is calculated three times, but we only need to calculate it once.

To illustrate the problem, Iâll take a sampling of how long this program takes to finish up to the maximum `n` of 40. Again, Iâll use a `for` loop in `bash` to show you how I would commonly benchmark such a program on the command line:

```\$ for n in 10 20 30 40;
> do echo "==> \$n <==" && time ./solution3_recursion.py \$n 1
> done
==> 10 <==
55

real	0m0.045s
user	0m0.032s
sys	0m0.011s
==> 20 <==
6765

real	0m0.041s
user	0m0.031s
sys	0m0.009s
==> 30 <==
832040

real	0m0.292s
user	0m0.281s
sys	0m0.009s
==> 40 <==
102334155

real	0m31.629s
user	0m31.505s
sys	0m0.043s```

The jump from 0.29s for `n`=`30` to 31s for `n`=`40` is huge. Imagine going to 50 and beyond. I need to either find a way to speed this up or abandon all hope for recursion. The solution is to cache previously calculated results. This is called memoization, and there are many ways to implement this. The following is one method. Note you will need to import `typing.Callable`:

```def memoize(f: Callable) -> Callable:
""" Memoize a function """

cache = {}

def memo(x):
if x not in cache:
cache[x] = f(x)
return cache[x]

return memo ```

Define a function that takes a function (something that is callable) and returns a function.

Use a dictionary to store cached values.

Define `memo()` as a closure around the cache. The function will take some parameter `x` when called.

See if the argument value is in the cache.

If not, call the function with the argument and set the cache for that argument value to the result.

Return the cached value for the argument.

Return the new function.

Note that the `memoize()` function returns a new function. In Python, functions are considered first-class objects, meaning they can be used like other kinds of variablesâyou can pass them as arguments and overwrite their definitions. The `memoize()` function is an example of a higher-order function (HOF) because it takes other functions as arguments. Iâll be using other HOFs, like `filter()` and `map()`, throughout the book.

To use the `memoize()` function, I will define `fib()` and then redefine it with the memoized version. If you run this, you will see an almost instantaneous result no matter how high `n` goes:

```def main() -> None:
args = get_args()

def fib(n: int) -> int:
return 1 if n in (1, 2) else fib(n - 2) * args.litter + fib(n - 1)

fib = memoize(fib)

print(fib(args.generations))```

Overwrite the existing `fib()` definition with the memoized function.

A preferred method to accomplish this uses a decorator, which is a function that modifies another function:

```def main() -> None:
args = get_args()

@memoize
def fib(n: int) -> int:
return 1 if n in (1, 2) else fib(n - 2) * args.litter + fib(n - 1)

print(fib(args.generations))```

Decorate the `fib()` function with the `memoize()` function.

As fun as writing memoization functions is, it again turns out that this is such a common need that others have already solved it for us. I can remove the `memoize()` function and instead import the `functools.lru_cache` (least-recently-used cache) function:

`from functools import lru_cache`

Decorate the `fib()` function with the `lru_cache()` function to get memoization with minimal distraction:

```def main() -> None:
args = get_args()

@lru_cache()
def fib(n: int) -> int:
return 1 if n in (1, 2) else fib(n - 2) * args.litter + fib(n - 1)

print(fib(args.generations))```

Memoize the `fib()` function via decoration with the `lru_cache()` function. Note that Python 3.6 requires the parentheses, but 3.8 and later versions do not.

# Benchmarking the Solutions

Which is the fastest solution? Iâve shown you how to use a `for` loop in `bash` with the `time` command to compare the runtimes of commands:

```\$ for py in ./solution1_list.py ./solution2_generator_islice.py \
./solution3_recursion_lru_cache.py; do echo \$py && time \$py 40 5; done
./solution1_list.py
148277527396903091

real	0m0.070s
user	0m0.043s
sys	0m0.016s
./solution2_generator_islice.py
148277527396903091

real	0m0.049s
user	0m0.033s
sys	0m0.013s
./solution3_recursion_lru_cache.py
148277527396903091

real	0m0.041s
user	0m0.030s
sys	0m0.010s```

It would appear that the recursive solution using LRU caching is the fastest, but again I have very little dataâjust one run per program. Also, I have to eyeball this data and figure out which is the fastest.

Thereâs a better way. I have installed a tool called `hyperfine` to run each command many times and compare the results:

```\$ hyperfine -L prg ./solution1_list.py,./solution2_generator_islice.py,\
./solution3_recursion_lru_cache.py '{prg} 40 5' --prepare 'rm -rf __pycache__'
Benchmark #1: ./solution1_list.py 40 5
Time (mean Â± Ï):      38.1 ms Â±   1.1 ms    [User: 28.3 ms, System: 8.2 ms]
Range (min â¦ max):    36.6 ms â¦  42.8 ms    60 runs

Benchmark #2: ./solution2_generator_islice.py 40 5
Time (mean Â± Ï):      38.0 ms Â±   0.6 ms    [User: 28.2 ms, System: 8.1 ms]
Range (min â¦ max):    36.7 ms â¦  39.2 ms    66 runs

Benchmark #3: ./solution3_recursion_lru_cache.py 40 5
Time (mean Â± Ï):      37.9 ms Â±   0.6 ms    [User: 28.1 ms, System: 8.1 ms]
Range (min â¦ max):    36.6 ms â¦  39.4 ms    65 runs

Summary
'./solution3_recursion_lru_cache.py 40 5' ran
1.00 Â± 0.02 times faster than './solution2_generator_islice.py 40 5'
1.01 Â± 0.03 times faster than './solution1_list.py 40 5'```

It appears that `hyperfine` ran each command 60-66 times, averaged the results, and found that the `solution3_recursion_lru_cache.py` program is perhaps slightly faster. Another benchmarking tool you might find useful is `bench`, but you can search for other benchmarking tools on the internet that might suit your tastes more. Whatever tool you use, benchmarking along with testing is vital to challenging assumptions about your code.

I used the `--prepare` option to tell `hyperfine` to remove the pycache directory before running the commands. This is a directory created by Python to cache bytecode of the program. If a programâs source code hasnât changed since the last time it was run, then Python can skip compilation and use the bytecode version that exists in the pycache directory. I needed to remove this as `hyperfine` detected statistical outliers when running the commands, probably due to caching effects.

# Testing the Good, the Bad, and the Ugly

For every challenge, I hope you spend part of your time reading through the tests. Learning how to design and write tests is as important as anything else Iâm showing you. As I mentioned before, my first tests check that the expected program exists and will produce usage statements on request. After that, I usually give invalid inputs to ensure the program fails. I would like to highlight the tests for bad `n` and `k` parameters. They are essentially the same, so Iâll just show the first one as it demonstrates how to randomly select an invalid integer valueâone that is possibly negative or too high:

```def test_bad_n():
""" Dies when n is bad """

n = random.choice(list(range(-10, 0)) + list(range(41, 50)))
k = random.randint(1, 5)
rv, out = getstatusoutput(f'{RUN} {n} {k}')
assert rv != 0
assert out.lower().startswith('usage:')
assert re.search(f'n "{n}" must be between 1 and 40', out) ```

Join the two lists of invalid number ranges and randomly select one value.

Select a random integer in the range from `1` to `5` (both bounds inclusive).

Run the program with the arguments and capture the output.

Make sure the program reported a failure (nonzero exit value).

Check the output begins with a usage statement.

Look for an error message describing the problem with the `n` argument.

I often like to use randomly selected invalid values when testing. This partially comes from writing tests for students so that they wonât write programs that fail on a single bad input, but I also find it helps me to not accidentally code for a specific input value. I havenât yet covered the `random` module, but it gives you a way to make pseudorandom choices. First, you need to import the module:

`>>> import random`

For instance, you can use `random.randint()` to select a single integer from a given range:

```>>> random.randint(1, 5)
2
>>> random.randint(1, 5)
5```

Or use the `random.choice()` function to randomly select a single value from some sequence. Here I wanted to construct a discontiguous range of negative numbers separated from a range of positive numbers:

```>>> random.choice(list(range(-10, 0)) + list(range(41, 50)))
46
>>> random.choice(list(range(-10, 0)) + list(range(41, 50)))
-1```

The tests that follow all provide good inputs to the program. For example:

```def test_2():
""" Runs on good input """

rv, out = getstatusoutput(f'{RUN} 30 4')
assert rv == 0
assert out == '436390025825' ```

These are values I was given while attempting to solve the Rosalind challenge.

The program should not fail on this input.

This is the correct answer per Rosalind.

Testing, like documentation, is a love letter to your future self. As tedious as testing may seem, youâll appreciate failing tests when you try to add a feature and end up accidentally breaking something that previously worked. Assiduously writing and running tests can prevent you from deploying broken programs.

# Running the Test Suite on All the Solutions

Youâve seen that in each chapter I write multiple solutions to explore various ways to solve the problems. I completely rely on my tests to ensure my programs are correct. You might be curious to see how Iâve automated the process of testing every single solution. Look at the Makefile and find the `all` target:

```\$ cat Makefile
.PHONY: test

test:
python3 -m pytest -xv --flake8 --pylint --mypy fib.py tests/fib_test.py

all:
../bin/all_test.py fib.py```

The `all_test.py` program will overwrite the `fib.py` program with each of the solutions before running the test suite. This could overwrite your solution. Be sure you commit your version to Git or at least make a copy before you run `make all` or you could lose your work.

The following is the `all_test.py` program that is run by the `all` target. Iâll break it into two parts, starting with the first part up to `get_args()`. Most of this should be familiar by now:

```#!/usr/bin/env python3
""" Run the test suite on all solution*.py """

import argparse
import os
import re
import shutil
import sys
from subprocess import getstatusoutput
from functools import partial
from typing import NamedTuple

class Args(NamedTuple):
""" Command-line arguments """
program: str
quiet: bool

# --------------------------------------------------
def get_args() -> Args:
""" Get command-line arguments """

parser = argparse.ArgumentParser(
description='Run the test suite on all solution*.py',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)

args = parser.parse_args()

return Args(args.program, args.quiet)```

The name of the program to test, which in this case is `fib.py`.

A Boolean value of `True` or `False` to create more or less output.

The default type is `str`.

The `action='store_true'` makes this a Boolean flag. If the flag is present the value will be `True`, and it will be `False` otherwise.

The `main()` function is where the testing happens:

```def main() -> None:
args = get_args()
cwd = os.getcwd()
solutions = list(
filter(partial(re.match, r'solution.*\.py'), os.listdir(cwd)))

for solution in sorted(solutions):
print(f'==> {solution} <==')
shutil.copyfile(solution, os.path.join(cwd, args.program))
subprocess.run(['chmod', '+x', args.program], check=True)
rv, out = getstatusoutput('make test')
if rv != 0:
sys.exit(out)

if not args.quiet:
print(out)

print('Done.') ```

Get the current working directory, which will be the 04_fib directory if you are in that directory when running the command.

Find all the `solution*.py` files in the current directory.

Both `filter()` and `partial()` are HOFs; Iâll explain them next.

The filenames will be in random order, so iterate through the sorted files.

Copy the `solution*.py` file to the testing filename.

Make the program executable.

Run the `make test` command, and capture the return value and output.

See if the return value is not `0`.

Exit this program while printing the output from the testing and returning a nonzero value.

Unless the program is supposed to be quiet, print the testing output.

Let the user know the program finishes normally.

In the preceding code, Iâm using `sys.exit()` to immediately halt the program, print an error message, and return a nonzero exit value. If you consult the documentation, youâll find you can invoke `sys.exit()` with no arguments, an integer value, or a object like a string, which is what Iâm using:

```exit(status=None, /)
Exit the interpreter by raising SystemExit(status).

If the status is omitted or None, it defaults to zero (i.e., success).
If the status is an integer, it will be used as the system exit status.
If it is another kind of object, it will be printed and the system
exit status will be one (i.e., failure).```

The preceding program also uses the functions `filter()` or `partial()`, which I havenât covered yet. Both of these are HOFs. Iâll explain how and why Iâm using them. To start, the `os.listdir()` function will return the entire contents of a directory, including files and directories:

```>>> import os
>>> files = os.listdir()```

Thereâs a lot there, so Iâll import the `pprint()` function from the `pprint` module to pretty-print this:

```>>> from pprint import pprint
>>> pprint(files)
['solution3_recursion_memoize_decorator.py',
'solution2_generator_for_loop.py',
'.pytest_cache',
'Makefile',
'solution2_generator_islice.py',
'tests',
'__pycache__',
'fib.py',
'solution3_recursion_memoize.py',
'bench.html',
'solution2_generator.py',
'.mypy_cache',
'.gitignore',
'solution1_list.py',
'solution3_recursion_lru_cache.py',
'solution3_recursion.py']```

I want to filter those to names that start with solution and end with .py. On the command line, I would match this pattern using a file glob like `solution*.py`, where the `*` means zero or more of any character and the `.` is a literal dot. A regular expression version of this pattern is the slightly more complicated `solution.*\.py`, where `.` (dot) is a regex metacharacter representing any character, and `*` (star or asterisk) means zero or more (see FigureÂ 4-5). To indicate a literal dot, I need to escape it with a backslash (`\.`). Note that itâs prudent to use the r-string (raw string) to enclose this pattern.

When the match is successful, a `re.Match` object is returned:

```>>> import re
>>> re.match(r'solution.*\.py', 'solution1.py')
<re.Match object; span=(0, 12), match='solution1.py'>```

When a match fails, the `None` value is returned. I have to use `type()` here because the `None` value is not displayed in the REPL:

```>>> type(re.match(r'solution.*\.py', 'fib.py'))
<class 'NoneType'>```

I want to apply this match to all the files returned by `os.listdir()`. I can use `filter()` and the `lambda` keyword to create an anonymous function. Each filename in `files` is passed as the `name` argument used in the match. The `filter()` function will only return elements that return a truthy value from the given function, so those filenames that return `None` when they fail to match are excluded:

```>>> pprint(list(filter(lambda name: re.match(r'solution.*\.py', name), files)))
['solution3_recursion_memoize_decorator.py',
'solution2_generator_for_loop.py',
'solution2_generator_islice.py',
'solution3_recursion_memoize.py',
'solution2_generator.py',
'solution1_list.py',
'solution3_recursion_lru_cache.py',
'solution3_recursion.py']```

You see that the `re.match()` function takes two argumentsâa pattern and a string to match. The `partial()` function allows me to partially apply the function, and the result is a new function. For example, the `operator.add()` function expects two values and returns their sum:

```>>> import operator
3```

I can create a function that adds `1` to any value, like so:

```>>> from functools import partial

The `succ()` function requires one argument, and will return the successor:

```>>> succ(3)
4
>>> succ(succ(3))
5```

Likewise, I can create a function `f()` that partially applies the `re.match()` function with its first argument, a regular expression pattern:

`>>> f = partial(re.match, r'solution.*\.py')`

The `f()` function is waiting for a string to apply the match:

```>>> type(f('solution1.py'))
<class 're.Match'>
>>> type(f('fib.py'))
<class 'NoneType'>```

If you call it without an argument, you will get an exception:

```>>> f()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: match() missing 1 required positional argument: 'string'```

I can replace the `lambda` with the partially applied function as the first argument to `filter()`:

```>>> pprint(list(filter(f, files)))
['solution3_recursion_memoize_decorator.py',
'solution2_generator_for_loop.py',
'solution2_generator_islice.py',
'solution3_recursion_memoize.py',
'solution2_generator.py',
'solution1_list.py',
'solution3_recursion_lru_cache.py',
'solution3_recursion.py']```

My programming style leans heavily on purely functional programming ideas. I find this style to be like playing with LEGO bricksâsmall, well-defined, and tested functions can be composed into larger programs that work well.

# Going Further

There are many different styles of programming, such as procedural, functional, object-oriented, and so forth. Even within an object-oriented language like Python, I can use very different approaches to writing code. The first solution could be considered a dynamic programming approach because you try to solve the larger problem by first solving smaller problems. If you find recursive functions interesting, the Tower of Hanoi problem is another classic exercise. Purely functional languages like Haskell mostly avoid constructs like `for` loops and rely heavily on recursion and higher-order functions. Both spoken and programming languages shape the way we think, and I encourage you to try solving this problem using other languages you know to see how you might write different solutions.

# Review

Key points from this chapter:

• Inside the `get_args()` function, you can perform manual validation of arguments and use the `parser.error()` function to manually generate `argparse` errors.

• You can use a list as a stack by pushing and popping elements.

• Using `yield` in a function turns it into a generator. When the function yields a value, the value is returned and the state of the function is preserved until the next value is requested. Generators can be used to create a potentially infinite stream of values.

• A recursive function calls itself, and the recursion can cause serious performance issues. One solution is to use memoization to cache values and avoid recomputation.

• Higher-order functions are functions that take other functions as arguments.

• Pythonâs function decorators apply HOFs to other functions.

• Benchmarking is an important technique for determining the best-performing algorithm. The `hyperfine` and `bench` tools allow you to compare runtimes of commands over many iterations.

• The `random` module offers many functions for the pseudorandom selection of values.

1 As the legendary David St. Hubbins and Nigel Tufnel observed, âItâs such a fine line between stupid and clever.â

Get Mastering Python for Bioinformatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.