Chapter 4. List of Files: Functions, Modules & Files
Your code can’t live in a notebook forever. It wants to be free.
And when it comes to freeing your code and sharing it with others, a bespoke function is the first step, followed shortly thereafter by a module, which lets you organize and share your code. In this chapter, you’ll create a function directly from the code you’ve written so far, and in the process create a shareable module, too. You’ll immediately put your module to work as you process the Coach’s swim data with for loops, if statements, conditional tests, and the PSL (Python Standard Library). You’ll learn how to comment your functions, too (which is always a good idea). There’s lots to be done, so let’s get to it!
Cubicle Conversation
Sam: I’ve updated the Coach on the progress-to-date.
Alex: And is he happy?
Sam: To a point, yes. He’s thrilled things have started. However, as you can imagine, he’s only really interested in the final product, which for the Coach is the bar chart.
Alex: Which should be easy enough to do now that the most-recent notebook produces the data we need, right?
Mara: Well… sort of.
Alex: How come?
Mara: The current notebook, Times.ipynb, produces data for Darius swimming the 100m fly in the under 13 age group. But, there’s a need to perform the conversions and the average calculation for any swimmer’s file.
Alex: Sure, that’s easy: just replace the filename at the top of the notebook with another filename, then press the Run All button and—voila!—you’ve got your data.
Mara: And you think the Coach will be happy to do that?
Alex: Errr… I hadn’t thought about how the Coach is going to run this stuff.
Sam: We are heading in the right direction, in that we do need a mechanism that works with any swimmer’s filename. If that can be produced, we can then get on with creating code for the bar chart.
Alex: So we have a ways to go yet…
Mara: Yes, but not far. As you already mentioned, all the code we need is in the Times.ipynb notebook…
Alex: …which you don’t want to give to the Coach…
Mara: …well, not it it’s current form.
Alex: Then how?
Sam: We need a way to package-up the code so it can be used with any filename and accessed outside of the notebook…
Alex: Ah, but of course: we need a function!
Sam: Which gets us part of the way.
Mara: If the function is put inside a Python module it can be shared in lots of places.
Alex: Sounds good to me. Where do we start?
Mara: Let’s start by turning the existing notebook code into a function that we can call, then share.
You already have most of the code you need
But the code you need is currently in your Times.ipynb notebook.
When it comes to experimenting and creating code from scratch, nothing quite beats using a Jupyter notebook. However, when it comes to reusing and sharing your existing code, notebooks may not be the best choice (and, to be fair, notebooks weren’t designed with this activity in mind).
You can give anybody a copy of your notebook to run within their own Jupyter environment, and that’s a great use case. But, imagine you’re building an application that needs to use some of the code that currently resides in your notebook…
Note
In the Appendix A we discuss an extension to Jupyter that can help with this requirement, but—out of the box—sharing your notebook’s code can be tricky.
How do you share that code?
To share your notebook’s code, you need to create a function that contains your code, then share your function in a module. And you’ll do both in this chapter.
To get going, create a new, empty file in your Learning folder, then rename your file to swimclub.py:
How to create a function in Python
In addition to the actual code for the function, you need to think about the function’s signature. There are three things to keep in mind. You need to:
Think up a nice, meaningful name.
The code in the Times.ipynb notebook first processes the filename, then reads the file’s contents to extract the data required by the Coach. So let’s call this function
read_swim_data
. It’s a nice name, it’s a meaningful name… golly, it’s nearly perfect!Decide on the number and names of any parameters.
Your new
read_swim_data
function takes a single parameter, which identifies the filename to use. Let’s call this parameterfilename
.Indent your function’s code under a def statement.
The def keyword introduces the function, letting you specify the function’s name and any parameters. Any code indented under the def keyword is the function’s code block.
Note
It can be useful to think of “def” as shorthand for “define function.”
Simply copying code is not enough
We went ahead and copied the code we think we need, adding it to our read_swim_data
funciton. Here’s what the code looks like in VS Code for us:
Indeed they do, and well spotted.
This is VS Code telling you that your code is relying on values that have yet to be defined. Although the code is syntactically correct Python, it won’t run as those values are missing.
Those values are in the Times.ipynb notebook.
Be sure to copy all the code you need
Looking at the squiggly lines from the previous page, it’s clear FN
, FOLDER
, and statistics
are all missing.
FOLDER
and statistics
are easy fixes. Simply add these two lines of code to the top of the swimclub.py file (outside the function):
If you’re following along, you’ll notice that the moment you type each of these lines of code into VS Code, the squiggly lines disappear.
Flushed with this success, you might be tempted to copy’n’paste the definition of the FN
constant too, but that would be an error. Recall that in the Times.ipynb notebook, FN
refers to one of the data files associated with Darius. If you continue to use FN
, your new function will use that file and no other. The solution to this issue is to use the value passed into the read_swim_data
function instead of the FN
constant. That way the Coach can use your function to process any swimmer’s data file:
Arguments sent into a function are assigned to the parameter names defined in the function’s signature, whereas any results are sent back to the calling code by a return statement.
Update and save your code before continuing…
Before moving onto the next page, be sure to add the following line of code as the last line in your read_swim_data
function within your swimclub.py file. Be careful to match the indentation of this line of code with the indentation used for all the other code in your function:
Having saved this change to your module’s code, you likely can’t wait to flip back to your Files.ipynb notebook to see what difference the change makes, right?
Neither can we, but… we hate to have to tell you that disappointment awaits us all.
Yes, it feels like there’s something seriously broken here…
In actual fact, it’s not Jupyter that’s causing the problem, it’s the Python interpreter. And (as weird as it might sound) things are meant to work this way.
We think someone has some serious questions to answer.
Bask in the glory of your returned data
Let’s take another look at the data returned from your most recent invocation of your read_swim_data
function:
Good eye. Well spotted, too.
This may not be the explanation you’re expecting here, but those parentheses are meant to be there.
Let’s dig into this a little so you can appreciate what’s going on. We’ve already grilled the import statement, so it’s Function’s turn now.
Functions return a tuple when required
When you call a function that looks like it’s going to return multiple results, think again, because it doesn’t. Instead, you get back a single tuple containing a collection of results, regardless of how many individual results values there are.
That’s a great suggestion.
Not that we’re suggesting there’s a bit of mind reading going on here, but it is a little spooky we had the same idea…
A list of filenames would be nice.
Your read_swim_data
function, part of the swimclub
module, takes any swimmer’s filename and returns a tuple of results to you.
Note
We’ve only used one of the data files for Darius so far. Feel free to use any other filename from your “swimdata” folder and pass it to your “read_swim_data” function to confirm that your code works with any of the Coach’s data files. Remember: Jupyter Notebook *lives* to let you experiment when creating your code.
What’s needed now is the full list of filenames, which you should be able to get from your underlying operating system. As you can imagine, the PSL has you covered when it comes to doing this sort of thing..
Let’s get a list of the Coach’s filenames
When it comes to working with your operating system (whether you’re on Windows, macOS, or Linux), the PSL has you covered. The os
module lets your Python code talk to your operating system in an platform-independent way, and you’ll now use the os
module to grab a list of the files in the swimdata folder.
Be sure to follow along in your Files.ipynb notebook.
You want the names of the files in your swimdata folder, and the os
module provides the handy-dandy listdir
function to do just that. When you pass in the location of a folder, listdir
returns a list of all the files it contains:
You’d be forgiven for expecting the swim_files
list to contain 60 pieces of data. After all, there are 60 files in your folder. However, on our Mac, we were in for a shock when we double-checked how big swim_files
is:
It’s time for a bit of detective work…
You were expecting your list of files to have 60 filenames, but the len BIF is reporting 61 items in your swim_files
variable.
In order to begin to try and work out what’s happening here, let’s first display the value of the swim_files
list on screen:
What a great idea.
Let’s use the combo mambo to see what’s built into lists.
What can you do to lists?
Here’s the print dir combo mambo output for your swim_files
list:
Yes, that’s a potential issue.
As the swimdata.zip file was initially created on a Mac, the .DS_Store file was automatically added to the ZIP archive. This type of OS-specific issue is often a concern.
Before moving on, it’s important to remove that unwanted filename from the swim_files
list.
That would be nice, wouldn’t it?
We could throw caution to the wind and dive into creating some bar charts, but it might be too soon for that.
Your read_swim_data
function has worked so far, but can you be sure it’ll work for any swimmer’s file? Let’s spend a moment ensuring our read_swim_data
function works as expected no matter the data file it’s presented with.
Is the issue with your data or your code?
Now that you’ve identified the offending file, let’s take a look at its contents to see if you can get to the root of the problem. Here’s the Abi-10-50m-Back.txt file opened in VS Code:
Here’s the line of code that is throwing the error. Can you see what the issue is?
An incorrect assumption is the problem.
Your code, as written, assumes every swim time conforms to the mins:secs.hundredths format, but this is clearly not the case with Abi’s 50m swim times, and this is why you’re getting that ValueError
.
Now that you know what the problem is, what’s the solution?
Cubicle Conversation
Sam: What are our options here?
Alex: We could fix the data, right?
Mara: How so?
Alex: We could preprocess each data file to make sure there’s no missing minutes, perhaps by prefixing a zero and a colon when the minutes are missing? That way, we won’t have to change any code.
Mara: That would work, but…
Sam: …it would be messy. Also, I’m not too keen on preprocessing all the files, as the vast majority won’t need to be changed, which feels like it might be wasteful.
Mara: And although, as a strategy, we wouldn’t have to change any existing code, we would have to create the code to do the preprocessing, perhaps as a separate utility.
Sam: Recall, too, that the data is in a fixed format, and that it’s generated by the Coach’s smart stopwatch. We really shouldn’t mess with the data, so let’s leave it as is.
Alex: So, we’re looking at changing our read_swim_data
function, then?
Mara: Yes, I think that’s a better strategy.
Sam: Me, too.
Alex: So, what do we need to do?
Mara: We need to identify where in our code the changes need to be made…
Sam: …and what those code changes need to be.
Alex: OK, sounds good. So we’re going to take a closer look at our read_swim_data
function so we can decide what code needs to change?
Mara: Yes, then we can use an if statement to make a decision based on whether or not the swim time currently being processed has a minute value.
Decisions, decisions, decisions
That’s what if statements do, day-in and day-out: they make decisions.
Yes, that is what’s needed here.
Let’s take a closer look at the two possible swim time formats.
First up, here is one of the times recorded for Darius in his file:
And here’s a time taken from Abi’s data:
It’s easy to spot the difference: Abi’s data doesn’t show any minutes. With this in mind, it’s possible to come up with a condition to check when making a decision. Can you work out what it is? (Hint: consider your BFF, the colon).
Let’s look for the colon “in” the string
If the colon appears in any swim time string, then the time has a minute value. Although strings come with lots of built-in methods, including methods that can perform a search, let’s not use any of these here. As searching is such a common requirement, Python provides the in operator. You’ve seen in before, with for:
Note
The “find” and “index” string methods both perform searching.
Using in with for identifies the sequence being iterated over. However, when in is used outside a loop it takes on searching powers. Consider these example uses of in:
We love the “in” keyword, too.
It’s a Python superpower.
We’re nearly there. One last edit.
Be sure to add the code shown above to your read_swim_data
function within your swimclub
module, and don’t forget to save your file.
Your swimclub.py code should be the same as on the next page.
Did you end up with 60 processed files?
You are likely feeling confident that your most recent code is processing all the files in your swimdata folder. We are, too. However, it is often nice to double-check these things. As always, there’s any number of ways to do this, but let’s number the results from your for loop by adding an enumeration, starting from 1, to each line of output displayed on screen.
To do so, let’s use yet another BIF created for this very purpose called enumerate:
The Coach’s code is taking shape…
Your swimclub
module is now ready. Given the name of a file that contains a collection of swim time strings, your new module can produce usable data. The Coach is expecting to see some bar charts created from this data, so let’s dive into implementing that functionality in the next chapter.
As always, you can move on after you’ve reviewed the chapter summary, then tried your hand at this chapter’s crossword.
Get Head First Python, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.