Chapter 4. Persistence: Saving data to files

image with no caption

It is truly great to be able to process your file-based data. But what happens to your data when you’re done? Of course, it’s best to save your data to a disk file, which allows you to use it again at some later date and time. Taking your memory-based data and storing it to disk is what persistence is all about. Python supports all the usual tools for writing to files and also provides some cool facilities for efficiently storing Python data. So...flip the page and let’s get started learning them.

Programs produce data

It’s a rare program that reads data from a disk file, processes the data, and then throws away the processed data. Typically, programs save the data they process, display their output on screen, or transfer data over a network.

image with no caption

Before you learn what’s involved in writing data to disk, let’s process the data from the previous chapter to work out who said what to whom.

When that’s done, you’ll have something worth saving.

Open your file in write mode

When you use the open() BIF to work with a disk file, you can specify an access mode to use. By default, open() uses mode r for reading, so you don’t need to specify it. To open a file for writing, use mode w:

image with no caption

By default, the print() BIF uses standard output (usually the screen) when displaying data. To write data to a file instead, use the file argument to specify the data file object to use:

image with no caption

When you’re done, be sure to close the file to ensure all of your data is written to disk. This is known as flushing and is very important:

image with no caption

Geek Bits

When you use access mode w, Python opens your named file for writing. If the file already exists, it is cleared of its contents, or clobbered. To append to a file, use access mode a, and to open a file for writing and reading (without clobbering), use w+. If you try to open a file for writing that does not already exist, it is first created for you, and then opened for writing.

Brain Power

Consider the following carefully: what happens to your data files if the second call to print() in your code causes an IOError?

Files are left open after an exception!

When all you ever do is read data from files, getting an IOError is annoying, but rarely dangerous, because your data is still in your file, even though you might be having trouble getting at it.

It’s a different story when writing data to files: if you need to handle an IOError before a file is closed, your written data might become corrupted and there’s no way of telling until after it has happened.

image with no caption

Your exception-handling code is doing its job, but you now have a situation where your data could potentially be corrupted, which can’t be good.

What’s needed here is something that lets you run some code regardless of whether an IOError has occured. In the context of your code, you’ll want to make sure the files are closed no matter what.

Extend try with finally

When you have a situation where code must always run no matter what errors occur, add that code to your try statement’s finally suite:

image with no caption

If no runtime errors occur, any code in the finally suite executes. Equally, if an IOError occurs, the except suite executes and then the finally suite runs.

No matter what, the code in the finally suite always runs.

By moving your file closing code into your finally suite, you are reducing the possibility of data corruption errors.

This is a big improvement, because you’re now ensuring that files are closed properly (even when write errors occur).

But what about those errors?

How do you find out the specifics of the error?

Knowing the type of error is not enough

When a file I/O error occurs, your code displays a generic “File Error” message. This is too generic. How do you know what actually happened?

image with no caption
image with no caption
image with no caption

Who knows?

It turns out that the Python interpreter knows...and it will give up the details if only you’d ask.

When an error occurs at runtime, Python raises an exception of the specific type (such as IOError, ValueError, and so on). Additionally, Python creates an exception object that is passed as an argument to your except suite.

Let’s use IDLE to see how this works.

Of course, all this extra logic is starting to obscure the real meaning of your code.

image with no caption

Use with to work with files

Because the use of the try/except/finally pattern is so common when it comes to working with files, Python includes a statement that abstracts away some of the details. The with statement, when used with files, can dramatically reduce the amount of code you have to write, because it negates the need to include a finally suite to handle the closing of a potentially opened data file. Take a look:

image with no caption

When you use with, you no longer have to worry about closing any opened files, as the Python interpreter automatically takes care of this for you. The with code on the the right is identical in function to that on the left. At Head First Labs, we know which approach we prefer.

Geek Bits

The with statement takes advantage of a Python technology called the context management protocol.

Default formats are unsuitable for files

Although your data is now stored in a file, it’s not really in a useful format. Let’s experiment in the IDLE shell to see what impact this can have.

Yikes! It would appear your list is converted to a large string by print() when it is saved. Your experimental code reads a single line of data from the file and gets all of the data as one large chunk of much for your code saving your list data.

What are your options for dealing with this problem?

Geek Bits

By default, print() displays your data in a format that mimics how your list data is actually stored by the Python interpreter. The resulting output is not really meant to be processed further... its primary purpose is to show you, the Python programmer, what your list data “looks like” in memory.

image with no caption
image with no caption

Parsing the data in the file is a possibility...although it’s complicated by all those square brackets, quotes, and commas. Writing the required code is doable, but it is a lot of code just to read back in your saved data.

Of course, if the data is in a more easily parseable format, the task would likely be easier, so maybe the second option is worth considering, too?

Brain Power

Can you think of a function you created from earlier in this book that might help here?

Why not modify print_lol()?

Recall your print_lol() function from Chapter 2, which takes any list (or list of lists) and displays it on screen, one line at a time. And nested lists can be indented, if necessary.

This functionality sounds perfect! Here’s your code from the module (last seen at the end of Chapter 2):

image with no caption

Amending this code to print to a disk file instead of the screen (known as standard output) should be relatively straightforward. You can then save your data in a more usable format.

The Scholar’s Corner

Standard Output The default place where your code writes its data when the “print()” BIF is used. This is typically the screen. In Python, standard output is referred to as “sys.stdout” and is importable from the Standard Library’s “sys” module.

image with no caption

That’s a good point.

This problem is not unlike the problem from the beginning of the chapter, in that you’ve got lines of text in a disk file that you need to process, only now you have two files instead of one.

You know how to write the code to process your new files, but writing custom code like this is specific to the format that you’ve created for this problem. This is brittle: if the data format changes, your custom code will have to change, too.

Ask yourself: is it worth it?

Pickle your data

Python ships with a standard library called pickle, which can save and load almost any Python data object, including lists.

Once you pickle your data to a file, it is persistent and ready to be read into another program at some later date/time:

image with no caption

You can, for example, store your pickled data on disk, put it in a database, or transfer it over a network to another computer.

When you are ready, reversing this process unpickles your persistent pickled data and recreates your data in its original form within Python’s memory:

image with no caption

Save with dump and restore with load

Using pickle is straightforward: import the required module, then use dump() to save your data and, some time later, load() to restore it. The only requirement when working with pickled files is that they have to be opened in binary access mode:

image with no caption

What if something goes wrong?

If something goes wrong when pickling or unpickling your data, the pickle module raises an exception of type PickleError.

Generic file I/O with pickle is the way to go!

image with no caption

Python takes care of your file I/O details, so you can concentrate on what your code actually does or needs to do.

As you’ve seen, being able to work with, save, and restore data in lists is a breeze, thanks to Python. But what other data structures does Python support out of the box?

Let’s dive into Chapter 5 to find out.

Your Python Toolbox

You’ve got Chapter 4 under your belt and you’ve added some key Python techiques to your toolbox.

Python Lingo

  • “Immutable types” - data types in Python that, once assigned a value, cannot have that value changed.

  • “Pickling” - the process of saving a data object to persistence storage.

  • “Unpickling” - the process of restoring a saved data object from persistence storage.

Get Head First Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.