Credit: Alex Martelli, Magnus Lie Hetland, Terry Reedy
You need to read a text file (or any other iterable whose items are lines of text) paragraph by paragraph, where a "paragraph" is defined as a sequence of nonwhite lines (i.e., paragraphs are separated by lines made up exclusively of whitespace).
A generator is quite suitable for bunching up lines this way:
def paragraphs(lines, is_separator=str.isspace, joiner=''.join): paragraph = [ ] for line in lines: if is_separator(line): if paragraph: yield joiner(paragraph) paragraph = [ ] else: paragraph.append(line) if paragraph: yield joiner(paragraph) if _ _name_ _ == '_ _main_ _': text = 'a first\nparagraph\n\nand a\nsecond one\n\n' for p in paragraphs(text.splitlines(True)): print repr(p)
Python doesn't directly support paragraph-oriented file reading,
but it's not hard to add such functionality. We define a "paragraph"
as the string formed by joining a nonempty sequence of nonseparator
lines, separated from any adjoining paragraphs by nonempty sequences
of separator lines. A separator line is one that satisfies the
predicate passed in as argument
predicate is a function whose result is taken
as a logical truth value, and we say a
predicate is satisfied
when the predicate returns a result that is true.) By default, a line
is a separator if it is made up entirely of whitespace characters
(e.g., space, tab, newline, etc.).
The recipe's code is quite straightforward. ...