Skip to Content
Python Cookbook
book

Python Cookbook

by Alex Martelli, David Ascher
July 2002
Intermediate to advanced
608 pages
15h 46m
English
O'Reilly Media, Inc.
Content preview from Python Cookbook

Processing Every Word in a File

Credit: Luther Blissett

Problem

You need to do something to every word in a file, similar to the foreach function of csh.

Solution

This is best handled by two nested loops, one on lines and one on the words in each line:

for line in open(thefilepath).xreadlines(  ):
    for word in line.split(  ):
        dosomethingwith(word)

This implicitly defines words as sequences of nonspaces separated by sequences of spaces (just as the Unix program wc does). For other definitions of words, you can use regular expressions. For example:

import re
re_word = re.compile(r'[\w-]+')

for line in open(thefilepath).xreadlines(  ):
    for word in re_word.findall(line):
        dosomethingwith(word)

In this case, a word is defined as a maximal sequence of alphanumerics and hyphens.

Discussion

For other definitions of words you will obviously need different regular expressions. The outer loop, on all lines in the file, can of course be done in many ways. The xreadlines method is good, but you can also use the list obtained by the readlines method, the standard library module fileinput, or, in Python 2.2, even just:

for line in open(thefilepath):

which is simplest and fastest.

In Python 2.2, it’s often a good idea to wrap iterations as iterator objects, most commonly by simple generators:

from _ _future_ _ import generators

def words_of_file(thefilepath):
    for line in open(thefilepath):
        for word in line.split(  ):
            yield word

for word in words_of_file(thefilepath):
    dosomethingwith(word)

This approach lets you ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Modern Python Cookbook - Second Edition

Modern Python Cookbook - Second Edition

Steven F. Lott
Python Cookbook, 3rd Edition

Python Cookbook, 3rd Edition

David Beazley, Brian K. Jones

Publisher Resources

ISBN: 0596001673Supplemental ContentCatalog PageErrata