Skip to Content
Bioinformatics Programming Using Python
book

Bioinformatics Programming Using Python

by Mitchell L Model
December 2009
Intermediate to advanced
521 pages
15h 26m
English
O'Reilly Media, Inc.
Content preview from Bioinformatics Programming Using Python

Chapter 8. Structured Text

In Chapter 6, we took a very brief look at the csv module that is used to read and write lines of tab- or comma-separated values, with each line corresponding to one item in the file. We’ve also looked at a variety of ways to scan files looking for certain patterns of data, including using str methods and regular expressions. Files that are in tab- or comma-separated values format, FASTA files, GenBank files, and many other file formats encountered in bioinformatics work are called flat files.[35] What is “flat” about them is that they are just text files: the data has no explicit structure beyond agreed-on conventions regarding special characters, blank lines, whitespace, etc. They can have introductory material before the data, other material after the data, several sets of data in one file, and so on.

The opposite of “flat” in this context is structured. A structured text file contains elements, each of which can have attributes and/or “sub” or child elements. There can be different kinds of elements, and in general there are rules specifying what attributes and children each kind of element can have. The linear approaches for processing text files that we’ve seen so far are inadequate for structured files, essentially because the files are two-dimensional. This chapter describes some ways to process structured files.

HTML

An obvious example of a structured file format is basic HTML. (We’ll ignore all the fancy stuff like JavaScript, frames, and so on.) ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Bioinformatics with Python Cookbook - Third Edition

Bioinformatics with Python Cookbook - Third Edition

Tiago Antao

Publisher Resources

ISBN: 9780596804725Supplemental ContentErrata Page