July 2002
Intermediate to advanced
608 pages
15h 46m
English
Credit: Alex Martelli, Magnus Lie Hetland
You need to read a file paragraph by paragraph, in which a paragraph is defined as a sequence of nonempty lines (in other words, paragraphs are separated by empty lines).
A wrapper class is, as usual, the right Pythonic architecture for this (in Python 2.1 and earlier):
class Paragraphs:
def _ _init_ _(self, fileobj, separator='\n'):
# Ensure that we get a line-reading sequence in the best way possible:
import xreadlines
try:
# Check if the file-like object has an xreadlines method
self.seq = fileobj.xreadlines( )
except AttributeError:
# No, so fall back to the xreadlines module's implementation
self.seq = xreadlines.xreadlines(fileobj)
self.line_num = 0 # current index into self.seq (line number)
self.para_num = 0 # current index into self (paragraph number)
# Ensure that separator string includes a line-end character at the end
if separator[-1:] != '\n': separator += '\n'
self.separator = separator
def _ _getitem_ _(self, index):
if index != self.para_num:
raise TypeError, "Only sequential access supported"
self.para_num += 1
# Start where we left off and skip 0+ separator lines
while 1:
# Propagate IndexError, if any, since we're finished if it occurs
line = self.seq[self.line_num]
self.line_num += 1
if line != self.separator: break
# Accumulate 1+ nonempty lines into result
result = [line]
while 1:
# Intercept IndexError, since we have one last paragraph to return
try: # Let's check if ...