Manipulating Files

Scripting languages were designed in part in order to help people do repetitive tasks quickly and simply. One of the common things webmasters, system administrators, and programmers need to do is to take a set of files, select a subset of those files, do some sort of manipulation on this subset, and write the output to one or a set of output files. (For example, in each file in a directory, find the last word of every other line that starts with something other than the # character, and print it along with the name of the file.) This is a task for which special-purpose tools have been developed, such as sed and awk. We find that Python does the job just fine using very simple tools.

Doing Something to Each Line in a File

The sys module is most helpful when it comes to dealing with an input file, parsing the text it contains and processing it. Among its attributes are three file objects, called sys.stdin , sys.stdout , and sys.stderr . The names come from the notion of the three streams, called standard in, standard out, and standard error, which are used to connect command line tools. Standard output (stdout) is used by every print statement. It’s a file object with all the output methods of file objects opened in write mode, such as write and writelines. The other often-used stream is standard in (stdin), which is also a file object, but with the input methods, such as read, readline, and readlines. For example, the following script counts all the lines in the ...

Get Learning Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.