Chapter 1. Tetranucleotide Frequency: Counting Things
Counting the bases in DNA is perhaps the “Hello, World!” of bioinformatics. The Rosalind DNA challenge describes a program that will take a sequence of DNA and print a count of how many As, Cs, Gs, and Ts are found. There are surprisingly many ways to count things in Python, and I’ll explore what the language has to offer. I’ll also demonstrate how to write a well-structured, documented program that validates its arguments as well as how to write and run tests to ensure the program works correctly.
In this chapter, you’ll learn:
-
How to start a new program using
new.py -
How to define and validate command-line arguments using
argparse -
How to run a test suite using
pytest -
How to iterate the characters of a string
-
Ways to count elements in a collection
-
How to create a decision tree using
if/elifstatements -
How to format strings
Getting Started
Before you start, be sure you have read “Getting the Code and Tests” in the Preface. Once you have a local copy of the code repository, change into the 01_dna directory:
$ cd 01_dna
Here you’ll find several solution*.py programs along with tests and input data you can use to see if the programs work correctly.
To get an idea of how your program should work, start by copying the first solution to a program called dna.py:
$ cp solution1_iter.py dna.py
Now run the program with no arguments, or with the -h or --help flags.
It will print usage documentation (note that usage is ...