By Mark Lutz
Cover | Table of Contents | Colophon
http://www.python.org -- still the
official place to find all things Python.
|
Features
|
Benefits
|
|---|---|
|
No compile or link steps
|
Rapid development cycle turnaround
|
|
No type declarations
|
Simpler, shorter, and more flexible programs
|
|
Automatic memory management
|
Garbage collection avoids bookkeeping code
|
|
High-level datatypes and operations
|
Fast development using built-in object types
|
|
Object-oriented programming
|
Code reuse, C++, Java, and COM integration
|
|
Embedding and extending in C
|
Optimization, customization, system "glue"
|
|
Classes, modules, exceptions
|
Modular "programming-in-the-large" support
|
|
A simple, clear syntax and design
|
Readability, maintainability, ease of learning
|
|
Dynamic loading of C modules
|
sys and os, before this chapter
moves on to larger system programming concepts. As I'm not
going to demonstrate every item in every built-in module, the first
thing I want to do is show you how to get more details on your own.
Officially, this task also serves as an excuse for introducing a few
core system scripting concepts -- along the way, we'll code
a first script to format documentation.
sys and os.
That's somewhat oversimplified; other standard modules belong
to this domain too (e.g., glob,
socket, thread,
time, fcntl), and some built-in
functions are really system interfaces as well (e.g.,
open). But sys and
os together form the core of Python's system
tools arsenal.
sys exports components
related to the Python interpreter itself (e.g.,
the module search path), and os contains variables
and functions that map to the operating system on which Python is
run. In practice, this distinction may not always seem clear-cut
(e.g., the standard input and output streams show up in
sys, but they are at least arguably tied to
operating system paradigms). The good news is that you'll soon
use the tools in these modules so often that their locations will be
permanently stamped on your memory.
os module also attempts to provide a
portable programming interface to the underlying
operating system -- its functions may be implemented differently
on different platforms, but they look the same everywhere to Python
scripts. In addition, the os module exports a
nested submodule, os.path, that provides a
portable interface to file and directory processing tools.
sys and os modules form the
core of much of Python's system-related toolset. Let's
now take a quick, interactive tour through some of the tools in these
two modules, before applying them in bigger examples.
sys includes both informational
names and functions that take action. For instance, its attributes
give us the name of the underlying operating system the platform code
is running on, the largest possible integer on this machine, and the
version number of the Python interpreter running our code:
C:\...\PP2E\System>python >>> import sys >>> sys.platform, sys.maxint, sys.version ('win32', 2147483647, '1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)]') >>> >>> if sys.platform[:3] == 'win': print 'hello windows' ... hello windows
sys.platform string as done here;
although most of Python is cross-platform, nonportable tools are
usually wrapped in if tests like the one here. For
instance, we'll see later that program launch and low-level
console interaction tools vary per platform today -- simply test
sys.platform to pick the right tool for the
machine your script is running on.
sys module also lets us inspect the module
search path both interactively and within a Python program.
sys.path is a list of strings representing the
true search path in a running Python interpreter. When a module is
imported, Python scans this list from left to right, searching for
the module's file on each directory named in the list. Because
of that, this is the place to look to verify that your search path is
really set as intended.
sys.path list is simply initialized from your
PYTHONPATH setting plus system defaults, when the interpreter is
first started up. In fact, you'll notice quite a few
directories that are not on your PYTHONPATH if you inspect
os contains all the usual
operating-system calls you may have used in your C programs and shell
scripts. Its calls deal with directories, processes, shell variables,
and the like. Technically, this module provides
POSIX tools -- a portable standard for
operating-system calls -- along with platform-independent
directory processing tools as nested module
os.path. Operationally, os
serves as a largely portable interface to your computer's
system calls: scripts written with os and
os.path can usually be run on any platform
unchanged.
os module's source
code, you'll notice that it really just imports whatever
platform-specific system module you have on your computer (e.g.,
nt, mac,
posix). See the file os.py in
the Python source library directory -- it simply runs a
from* statement to copy all names out of a
platform-specific module. By always importing os
instead of platform-specific modules, though, your scripts are mostly
immune to platform implementation differences.
os. If you inspect this module's attributes
interactively, you get a huge list of names that will vary per Python
release, will likely vary per platform, and isn't incredibly
useful until you've learned what each name means:
>>> import os >>> dir(os) ['F_OK', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_RDONLY', 'O_RDWR', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'R_OK', 'UserDict', 'W_OK', 'X_OK', '_Environ', '__builtins__', '__doc__', '__file__', '__name__', '_execvpe', '_exit', '_notfound', 'access', 'altsep', 'chdir', 'chmod', 'close', 'curdir', 'defpath', 'dup', 'dup2', 'environ', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'fdopen', 'fstat', 'getcwd', 'getpid', 'i', 'linesep', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'remove', 'removedirs', 'rename', 'renames', 'rmdir', 'sep', 'spawnv', 'spawnve', 'stat', 'strerror', 'string', 'sys', 'system', 'times', 'umask', 'unlink', 'utime', 'write']
os.getcwd gives access to the directory from which
a script is started, and many file tools use its value implicitly.
sys.argv gives access to words typed on the
command line used to start the program that serve as script inputs.
os.environ provides an interface to names assigned
in the enclosing shell (or a parent program) and passed in to the
script.
sys.stdin, stdout, and
stderr export the three input/output streams that
are at the heart of command-line shell tools.
os.getcwd lets a script fetch the CWD
name explicitly, and os.chdir allows a script to
move to a new CWD.
python
dir1\dir2\file.py, the
CWD is the directory you were in when you typed this command, not
dir1\dir2. On the other hand, Python
automatically adds the identity of the script's home directory
to the front of the module search path, such that
file.py can always import other files in
dir1\dir2, no matter where it is run from. To
illustrate, let's write a simple script to echo both its CWD
and module search path:
C:\PP2ndEd\examples\PP2E\System>type whereami.py
import os, sys
print 'my os.getcwd =>', os.getcwd( ) # show my cwd execution dir
print 'my sys.path =>', sys.path[:6] # show first 6 import paths
raw_input( ) # wait for keypress if clicked
'')
to the front of the module search path, to designate the CWD (we met
the sys.path module search path earlier):
C:\PP2ndEd\examples\PP2E\System>set PYTHONPATH=C:\PP2ndEd\examples
C:\PP2ndEd\examples\PP2E\System>sys module is also where Python makes
available the words typed on the command used to start a Python
script. These words are usually referred to as command-line
arguments, and show up in sys.argv, a
built-in list of strings. C programmers may notice its similarity to
the C "argv" array (an array of C strings). It's
not much to look at interactively, because no command-line arguments
are passed to start up Python in this mode:
>>> sys.argv
['']
argv
list for inspection.
import sys print sys.argv
C:\...\PP2E\System>python testargv.py ['testargv.py'] C:\...\PP2E\System>python testargv.py spam eggs cheese ['testargv.py', 'spam', 'eggs', 'cheese'] C:\...\PP2E\System>python testargv.py -i data.txt -o results.txt ['testargv.py', '-i', 'data.txt', '-o', 'results.txt']
-i
data.txt means the -i
option's value is data.txt (e.g., an input
filename). Any words can be listed, but programs usually impose some
sort of structure on them.
os.environ, a
Python dictionary-like object with one entry per variable setting in
the shell. Shell variables live outside the Python system; they are
often set at your system prompt or within startup files, and
typically serve as systemwide configuration inputs to programs.
os.environ by
the desired shell variable's name string (e.g.,
os.environ['USER']) is the moral equivalent of
adding a dollar sign before a variable name in most Unix shells
(e.g., $USER), using surrounding percent signs on
DOS (%USER%), and calling
getenv("USER") in a C program. Let's start
up an interactive session to experiment:
>>> import os >>> os.environ.keys( ) ['WINBOOTDIR', 'PATH', 'USER', 'PP2HOME', 'CMDLINE', 'PYTHONPATH', 'BLASTER', 'X', 'TEMP', 'COMSPEC', 'PROMPT', 'WINDIR', 'TMP'] >>> os.environ['TEMP'] 'C:\\windows\\TEMP'
keys method returns a list of variables
set, and indexing fetches the value of shell variable TEMP on
Windows. This works the same on Linux, but other variables are
generally preset when Python starts up. Since we know about
PYTHONPATH, let's peek at its setting within Python to verify
its content:
>>> os.environ['PYTHONPATH']
'C:\\PP2ndEd\\examples\\Part3;C:\\PP2ndEd\\examples\\Part2;C:\\PP2ndEd\\
examples\\Part2\\Gui;C:\\PP2ndEd\\examples'
>>>
>>> sys is also the place where the standard
input, output, and error streams of your Python programs live:
>>> for f in (sys.stdin, sys.stdout, sys.stderr): print f
...
<open file '<stdin>', mode 'r' at 762210>
<open file '<stdout>', mode 'w' at 762270>
<open file '<stderr>', mode 'w' at 7622d0>
print statement and raw_input
functions are really nothing more than user-friendly interfaces to
the standard output and input streams, they are similar to using
stdout and stdin in
sys directly:
>>> print 'hello stdout world' hello stdout world >>> sys.stdout.write('hello stdout world' + '\n') hello stdout world >>> raw_input('hello stdin world>') hello stdin world>spam 'spam' >>> print 'hello stdin world>',; sys.stdin.readline( )[:-1] hello stdin world>eggs 'eggs'
open function is the
primary tool scripts use to access the files on the underlying
computer system. Since this function is an inherent part of the
Python language, you may already be familiar with its basic workings.
Technically, open gives direct access to the
stdio filesystem calls in the system's C
library -- it returns a new file object that is connected to the
external file, and has methods that map more or less directly to file
calls on your machine. The open function also provides a portable
interface to the underlying filesystem -- it works the same on
every platform Python runs on.
os), store objects away in files by key (modules
anydbm and shelve), and access
SQL databases. Most of these are larger topics addressed in Chapter 16. In this section, we take a brief tutorial
look at the built-in file object, and explore a handful of more
advanced file-related topics. As usual, you should consult the
library manual's file object entry for further details and
methods we don't have space to cover here.
open function is all you
need to remember to process files in your scripts. The file object
returned by open has methods for reading data
(read, readline,
readlines), writing data
(write, writelines), freeing
system resources (close), moving about in the file
(seek), forcing data to be transferred out of
buffers (flush), fetching the underlying file
handle (fileno), and more. Since the built-in file
object is so easy to use, though, let's jump right in to a few
interactive examples.
for loop, processing each file in turn. The
trick we need to learn here, then, is how to get such a directory
list within our scripts. There are at least three options: running
shell listing commands with os.popen, matching
filename patterns with glob.glob, and getting
directory listings with os.listdir. They vary in
interface, result format, and portability.
os.fork is called the child
process. In general, parents can make any number of children, and
children can create child processes of their own -- all forked
processes run independently and in parallel under the operating
system's control. It is probably simpler in practice than
theory, though; the Python script in Example 3-1
forks new child processes until you type a "q" at the
console.
# forks child processes until you type 'q'
import os
def child( ):
print 'Hello from child', os.getpid( )
os._exit(0) # else goes back to parent loop
def parent( ):
while 1:
newpid = os.fork( )
if newpid == 0:
child( )
else:
print 'Hello from parent', os.getpid( ), newpid
if raw_input( ) == 'q': break
parent( )
os module, are simply thin wrappers over standard
forking calls in the C library. To start a new, parallel process,
call the os.fork built-in function. Because this
function generates a copy of the calling program, it returns a
different value in each copy: zero in the child process, and the
process ID of the new child in the parent. Programs generally test
this result to begin different processing in the child only; this
script, for instance, runs the child function in
child processes only.
sys.exit function:
>>> sys.exit( ) # else exits on end of script
SystemExit exception. Because of this, we can
catch it as usual to intercept early exits and perform cleanup
activities; if uncaught, the interpreter exits as usual. For
instance:
C:\...\PP2E\System>python >>> import sys >>> try: ... sys.exit( ) # see also: os._exit, Tk( ).quit( ) ... except SystemExit: ... print 'ignoring exit' ... ignoring exit >>>
SystemExit exception with a Python
raise statement is equivalent to calling
sys.exit. More realistically, a
try block would catch the exit exception raised
elsewhere in a program; the script in Example 3-11
exits from within a processing function.
def later( ):
import sys
print 'Bye sys world'
sys.exit(42)
print 'Never reached'
if __name__ == '__main__': later( )
sys.exit raises a Python exception, importers of
its function can trap and override its exit exception, or specify a
finally cleanup block to be run during program
exit processing:
C:\...\PP2E\System\Exits>python testexit_sys.py Bye sys world C:\...\PP2E\System\Exits>python >>> from testexit_sys import later >>> try: ... later( ) ... except SystemExit: ... print 'Ignored...' ... Bye sys world Ignored... >>>
os.popen callsos.popen and simple files allow even more dynamic
communication -- data can be sent between programs at arbitrary
times, not only at program start and exit.
socket module, which lets us transfer data between
programs running on the same computer, as well as programs located on
remote networked machines.
os.pipe call. Pipes are
unidirectional channels that work something like a shared memory
buffer, but with an interface resembling a simple file on each of two
ends. In typical use, one program writes data on one end of the pipe,
and another reads that data on the other end. Each program only sees
its end of the pipes, and processes it using normal Python file
calls.
os.fork call to
make a copy of the calling process as usual (we met forks earlier in
this chapter). After forking, the original parent process and its
child copy speak through the two ends of a pipe created with
os.pipe prior to the fork. The
os.pipe call returns a tuple of two file
descriptors -- the low-level file identifiers we met
earlier -- representing the input and output sides of the pipe.
Because forked child processes get copies of
their parents' file descriptors, writing to the pipe's
output descriptor in the child sends data back to the parent on the
pipe created before the child was spawned.
signal module that allows Python programs to
register Python functions as handlers for signal events. This module
is available on both Unix-like platforms and Windows (though the
Windows version defines fewer kinds of signals to be caught). To
illustrate the basic signal interface, the script in Example 3-20 installs a Python handler function for the
signal number passed in as a command-line argument.
##########################################################
# catch signals in Python; pass signal number N as a
# command-line arg, use a "kill -N pid" shell command
# to send this process a signal; most signal handlers
# restored by Python after caught (see network scripting
# chapter for SIGCHLD details); signal module avaiable
# on Windows, but defines only a few signal types there;
##########################################################
import sys, signal, time
def now( ): return time.ctime(time.time( )) # current time string
def onSignal(signum, stackframe): # python signal handler
print 'Got signal', signum, 'at', now( ) # most handlers stay in effect
signum = int(sys.argv[1])
signal.signal(signum, onSignal) # install signal handler
while 1: signal.pause( ) # wait for signals (or: pass)