Chapter 4. 4 lists and iteration: Providing Some Structure
There’s more to data types than numbers, strings, and Booleans. So far you’ve been writing Python code using primitive types—those floats, integers, strings, and of course Booleans—with values like 3.14
, 42
, "hey, it’s my turn"
, and True
. And you can do a lot with primitives, but at some point you’ll want to write code that deals with lots of data—say, all the items in a shopping cart, the names of all the notable stars, or an entire product catalog. For that we need a little more ummph. In this chapter we’re going to look at a new type, called a list, which can hold a collection of values. With lists, you’ll be able to provide some structure for your data, rather than just having a zillion variables floating around your code holding values. You’re also going to learn how to treat all those values as a whole as well as how to iterate over each item in a list using that for loop
we mentioned in the last chapter. After this chapter, your ability to deal with data is going to grow and expand.
Can you help Bubbles-R-Us?
Check out the Bubbles-R-Us company. Their tireless research makes sure bubble wands and machines everywhere blow the best bubbles. Today they’re testing the “bubble factor” of several different formulations of their new bubble solution—that is, they’re testing how many bubbles can be blown with a given solution. Here’s their data:
Of course you want to get all this data into Python so you can write code to help analyze it. But that’s a lot of values. How are you going to construct your code to handle all these values?
How to represent multiple values in Python
You know how to represent single values like strings, numbers, and Booleans with Python, but how do you represent multiple values, like all the bubble factor scores from the 10 bubble solutions? To do that we use Python lists. A list is a Python data type that can hold many values. Here’s a Python list that holds all the bubble factor scores:
Note
Many programming languages call their ordered data type an array instead of a list.
Once you have your data in a list, you can access the individual scores when you need to. Each individual score, or item, has an index. Computer scientists like to number things starting at zero, so the first item has an index of 0. You can retrieve any item in the list using its index, like this:
How lists work
It looks like we might have some interesting work to do for Bubbles-R-Us, but before we start, let’s make sure we’ve got lists down, and rather than using bubble factor scores, let’s put another kind of value in our lists: strings, or even better, smoothie flavors! After we understand lists a bit better, we’ll get right back to helping Bubbles-R-Us.
So, once you have a bunch of values you want to group together, you can create a list that holds them, and then access those values in the list whenever you need them. Most often you’ll use lists when you want to group together similar things, like bubble factor scores, ice cream flavors, daytime temperatures, or even the answers to a set of true/false questions. Let’s look again at how to create a list, paying a little more attention to the syntax this time.
How to create a list
Let’s say you wanted to create a list that holds the name of a bunch of smoothies. Here’s how you’d do that:
As we already said, every item in a list resides at a location, or index. With the smoothies list, the first item, “coconut,” is at index 0; the second, “strawberry,” is at index 1; and so on. Here’s a conceptual look at how lists are stored:
How to access a list item
Each item in the list has an index, and that’s your key to both accessing and changing the values in a list. We’ve already seen how to access an item by starting with the list’s variable name and then adding on an index, surrounded by square brackets. You can use that notation anywhere you’d use a variable:
Updating a value in the list
You can also change the value of an item in a list using its index:
How big is that list, anyway?
Say someone hands you a nice big list with important data in it. You know how to get what’s in the list, but you have no idea exactly how big it is (in other words, how many items it has). Luckily, Python provides a built-in function to tell you, called len
. Here’s how you use the len
function:
Accessing the last item in the list
Accessing the last item of a list is something you’ll do often when coding. Say you’ve got a list that holds the most recent scores of a sports game and you need to display the latest score. Or say you have a list of current wind speeds of an approaching hurricane, and you need to report the lastest speeds. You get the point: lists often have data arranged with the latest, and often most important, values at the end (that is, at the largest index), so accessing the last item of the list is a common task.
The conventional way to do this, across many programming languages, is to use the length of the list as an index. But remember, lists are indexed starting at zero, so the index of the last item is actually one less than the length of the list. To get the last item of our smoothies list, we do this:
Python makes this even easier
Finding the last item of a list is such a common task that Python actually provides an easier way to do it. Here’s how it works: you can use a negative index, starting at –1
, to specify the items in a list in reverse order. So an index of –1
is the last item in the list, an index of –2
is the second to last, and so on.
Using Python’s negative indices
Let’s give Python’s negative indices a try. Let’s say we want to take the last three smoothies on our list and print them:
there are no Dumb Questions
Q: Does the order of items in a list matter?
A: The list is an ordered data type.So, most of the time, it matters, but not always. In the Bubbles-R-Us scores list, the ordering matters a lot, because the index of the score in the list tells us which bubble solution got that score—bubble solution 0 got score 60, and that score is stored at index 0. If we mixed up the scores in the list, then we’d ruin the experiment! However, in other cases, the order may not matter. For instance, if you’re using a list just to keep track of grocery items you need to pick up, the order probably doesn’t matter much. So it really depends on how you’re using the list. You’ll probably find that ordering matters more often than not when you use a list. Python also has other data types—for instance, dictionaries and sets—that are unordered. More on those later in the book.
Q: How many things can you put into a list?
A: Theoretically, as many as you want. Practically, however, the number is limited by things like the memory on your computer. Each list item takes up a little bit of space in memory and if you keep adding items to a list, eventually you’ll run out of memory. However, depending on the kind of items you’re putting in your list, the maximum number of items you can put into a list is probably in the many thousands or hundreds of thousands. Once you get into the millions there are other solutions (like databases) that are probably going to be more appropriate.
Q: Can you have a list without any elements?
A: Remember when we talked about empty strings? Yes, you can have empty lists too. In fact, you’ll see an example of using an empty list in this chapter. To create an empty list, just write:
empty_list = []
If you start with an empty list, you can add things to it later. We’ll see how shortly.
Q: So far we’ve seen strings and numbers in a list; can you put other things in lists too?
A: You can; in fact, you can put values from any Python type (including ones you haven’t seen yet) into a list.
Note
Or even another list!
Q: Can values in a list have different types, or do they all have to be the same?
A: There is no requirement in Python that all the values in a list be of the same type. We call lists with items of different types heterogeneous lists. Here’s one:
heterogenous = [‘blue’, True, 13.5]
Q: What happens if I try to access an item in a list that doesn’t exist?
A: You mean like you have a list of 10 items and you try to access item at index 99? If you do that you’ll get a runtime error, like this:
IndexError: list index out of range
Q: Okay, well, can I assign a new value to a list index that doesn’t exist?
A: No, you can reassign an item to a new value, but you can’t assign a value to an item that doesn’t exist—if you do you’ll get a runtime “out of bounds” error. Note that some languages do allow this, but not Python. In Python we have to first add a new item to the list instead.
Meanwhile, back at Bubbles-R-Us...
Cubicle conversation
Judy: The first thing we need to do is display every score along with its solution number.
Joe: And the solution number is just the index of the score in the list, right?
Judy: Oh, yeah, that’s totally right.
Frank: Slow down a sec. So we need to take each score, print its index, which is the bubble solution number, and then print the corresponding score.
Judy: You’ve got it, and the score is just the corresponding item in the list.
Joe: So, for bubble solution #10, its score is just scores[10]
.
Judy: Right.
Frank: Okay, but there are a lot of scores. How do we write code to output all of them?
Judy: Iteration, my friend.
Frank: Oh, you mean like a while
loop?
Judy: Right, we loop through all the values from zero to the length...oh, I mean the length minus one, of course.
Joe: This is starting to sound very doable. Let’s write some code; I think we know what we’re doing.
Judy: That works for me! Let’s do it, and then we’ll come back to the rest of the report.
How to iterate over a list
Your goal is to produce some output that looks like this:
We’ll do that by outputting the score at index 0
, and then we’ll do the same for index 1
, 2
, 3
, and so on, until we reach the last index in the list. You already know how to use a while
loop; let’s see how we can use that to output all the scores:
Note
And then we’ll show you a better way in a sec...
Fixing the output glitch
Let’s look at the print
statement to identify where the extra space is coming from:
To fix this we could just do something like:
Really fixing the output glitch
Did you figure out where we went wrong? Well, you can’t concatenate a string to an integer. D’oh! But how do we change an integer into a string? Well, remember when we did the opposite? We changed a string into an integer using the int function. As it turns out, there is also a str function that does the opposite: give it an integer, and it will give you back a string representation of that integer.
Given that, we can rework our code like this:
Let’s get that into our code, only we’ll do it without the extra bubble_string variable. Instead, we’ll make our code more concise and add the call to str right in with the print
arguments. Check out the Test Drive below for the changes.
The for loop, the preferred way to iterate over a list
So, you can use a while
loop to iterate over your lists, but the preferred method is actually using a for
loop. Think of the for
loop as the while
loop’s cousin—the two basically do about the same thing, except we typically use a while
loop when we’re looping over some condition, and a for
loop when we’re iterating over a sequence of values (like a list). Let’s return to our smoothies to see how we loop, or iterate, over a list with the for
loop. After we’ve done that, we’ll nail down the Bubbles-R-Us code.
How the for loop works
Let’s execute the code above. The first time through the loop, the first item in the list smoothies
is assigned to the variable smoothie
. After that the body of the for
loop is executed.
Next time through the loop, the next item, “strawberry,” in the list smoothies
is assigned to the variable smoothie
. After that the code block is executed.
The third time through the loop, the next item, “banana,” in the list smoothies
is assigned to the variable smoothie
. After that the code block of the for
loop is executed.
And by now you can see the pattern—the fourth time through the loop, the next item, “tropical,” is assigned to the variable smoothie
before we execute the code block.
And as you can guess at this point, the fifth, or last time, through the loop, the next item, “acai berry,” in the list smoothies
is assigned to the variable smoothie
. After that the code block of the for
loop is executed for the last time.
Judy: Oh, you’re saying when we used a while
loop we had the counter i
, which we used for the score number and as an index to get the scores.
Frank: Exactly, and when we’re using a for
statement, we just seem to have the item of the list. Where’s the index?
Judy: Uh, good question.
Joe (shouting from across the room): Guys, I did some research, there’s another way to use for
. The way you’re talking about is great for sequences when you don’t care about an index, but you can use for
with a range of indices to iterate through the bubble solutions.
Frank: Say what?
Joe: It’s almost easier to show you...
How the for loop works on a range of numbers
There’s another kind of sequence the for
loop works on: a range of numbers. In fact, Python gives you a built-in function called range
that you can use to generate different sequences of numbers. After you’ve generated a sequence of numbers, you can use the for
loop to iterate through them.
Here’s how you generate a range from 0 to 4:
You can combine range
with for
like this:
So say you want to iterate through our smoothies and print the index of each. Here’s how you can do that:
Doing more with ranges
With a range you don’t have to create sequences from zero to some number; you can create all kinds of ranges of numbers. Here’s a few examples:
Try a starting and ending number
Add a step size
Count backward
Or start from negative numbers
there are Dumb Questions
Q: Does range(5) just create a list, like [0, 1, 2, 3, 4]?
A: No, it doesn’t, although we can easily see how you’d think that. The reason it doesn’t is Python actually creates something a lot more efficient than a list. For now, though, it is fine to think of it that way; just know you can’t substitute range for a list in your code. Oh, and if you ever want to use range to create a list, you can do that like this:
list(range(5))
to create the list you mentioned in your question.
Q: You used a variable name called i. That doesn’t seem very good for readability. Why not index or smoothie_index or something like that?
A: Good catch. You’re right, the variable i
may not be the most readable variable name, but when a variable is used as an index in a iteration, there is a long history of using variables like i
, j
, and k
—so much so, it is almost blindly followed by programmers and in fact it might strike them as odd to use a longer variable name. So, we encourage you, for this exception, to use short variable names, and before long it will feel like second nature to you.
Putting it all together
Let’s now use our knowledge of ranges and the for
loop to rework the while
loop we previously wrote to generate the bubble solution numbers plus their scores.
Test drive the bubble report
Type the new code in and save it in the file bubbles.py, and then give it a test run. Check out the brilliant report you just generated for the Bubbles-R-Us CEO.
Fireside Chats
Tonight’s talk: The WHILE and FOR loop answer the question “Who’s more important?”
The WHILE loop |
The FOR loop |
---|---|
What, are you kidding me? Hello? I’m the general looping construct in Python. I don’t need a sequence or a range, as I can be used with any type of conditional. Did anyone notice I was taught first in this book? | |
I don’t appreciate that tone. | |
And that’s another thing: have you noticed that the FOR loop has no sense of humor? I mean if we all had to do skull-numbing iteration all day, I guess we’d all be that way. | |
Cute. But have you noticed that 9 times out of 10, coders use FOR loops? | |
Oh, I don’t think that could possibly be true. | |
Not to mention, doing iteration over, say, a list that has a fixed number of items with a WHILE loop is just a bad, clumsy practice. | |
This book just said that FOR and WHILE loops are pretty much the same thing, so how could that be? | |
Ah, so you admit we’re more equal than you let on, huh? I’ll tell you why... When you use a WHILE loop you have to initialize your counter and increment it in separate statements. If, after lots of code changes, you accidentally moved or deleted one of these statements, well, then things could get ugly. But with a FOR loop, everything is packaged right in the FOR statement for all to see and with no chance of things getting changed or lost. |
|
Well, isn’t that nice and neat of you. Hey, most of the iteration I see doesn’t even include counters; it’s stuff like: while (input != ''): try that with a FOR loop! |
|
So that’s all you got? You’re only better when you’ve got a condition to loop over? | |
Not only better, prettier. | |
Oh, I didn’t realize this was a beauty contest. I’d argue people iterate over sequences way more than they write loops over general conditionals. | |
Hey, I can iterate over a sequence too. | |
I think we’ve already covered that ground. Sure you can, but it’s, well, it ain’t pretty. Don’t forget I’m quite general too, I don’t just work on lists. | |
Like what? | |
There are lots of sequences in Python. We’ve seen lists and ranges and strings, but there’s even more you can iterate over, like files, and quite a few other more advanced data types the readers haven’t even looked at it in this book. | |
I’m sure I can work with them too. | |
Perhaps, but, again, wouldn’t be pretty. Face it, when it comes to heavy-duty iteration, I’m designed for it. | |
Oh sure, you’re the tough guy. Next time you need to iterate while a condition is True , don’t call me, and then we’ll see how heavy duty you are. |
|
Likewise, don’t call me when you need to iterate over a sequence! |
Cubicle conversation continued...
Judy: Right, and the first thing we need to do is determine the total number of bubble tests. That’s easy; it’s just the length of the scores list.
Joe: Oh, right. We’ve got to find the highest score too, and then the solutions that have the highest score.
Judy: Yeah, that last one is going to be the toughest. Let’s work out finding the highest score first.
Joe: Sounds like a good place to start.
Judy: To do that I think we just need to maintain a highest score variable that keeps track as we iterate through the list. Here, let me write some Python-like pseudocode:
Joe: Oh nice; you did it with just a few lines added to our existing code.
Judy: Each time through the list we look to see if the current score is greater than high_score
, and if so, that’s our new high score. Then, after the loop ends we just display the high score.
More than one? When we need to store more than one thing, what do we use? A list, of course. So, can we iterate through our existing scores list looking for scores that only match the highest score, and then add those to a new list that we can later display in the report? You bet we can, but to do that we’ll have to learn how to create a brand new, empty list, and then understand how to add new elements to it.
Building your own list, from scratch
Before we take on finishing this code, let’s get a sense for how to create a new list, and how to add items to it. You already know how to explicitly create a list with values, like this:
menu = [‘Pizza’, ‘Pasta’, ‘Soup’, ‘Salad’]
But you can also omit the initial items and just create an empty list:
Once you’ve created an empty list you can add new items with append
, like this:
Doing even more with lists
There’s a lot more you can do with lists, like insert new items, delete items, add lists together, and search for items in a list—here are a few examples to whet your appetite.
Delete an item from a list
Need to get rid of an item in a list? Python provides a built-in function called del to do just that. Here’s how it works:
Add one list to another
Let’s say you have a list, and someone hands you another list and you want to add all those items to your list. No worries, here’s how you do that:
There’s another way to combine lists as well—you can just add the lists together using the + operator, like this:
Or insert items into your list
Let’s say you really need to add an item in the middle of your list. Use the insert
function to do that.
As we said, we’ll be seeing even more list operations as the book progresses, but these are some good operations to get you started.
there are no Dumb Questions
Q: What happens if I insert an item after an index that doesn’t exist, like menu.insert(100, ‘French Fries’)?
A: If you try to insert an item beyond the end of your list, it will simply add the item in the last position in your list.
Q: What does the syntax mylist.append(value) actually mean? It looks similar to the random.randint(0,2) syntax we used in the last chapter.
A: Yes, they are related; both are an example of something we’ll get to later in the book: the use of functions and objects (actually we’ll make our use of terminology even more precise at that stage of the book). Now that all won’t mean a lot to you right now, but we’re going to see how data types, like lists, can provide their own special behavior to do things like append items. So, mylist.append is using the behavior append, which is provided by the list. For now, go with the syntax, and down the road you’ll better understand the true meaning behind it as we explore objects and functions.
Q: Well, why do we have menu.append and menu.insert, but del menu[0]? Why isn’t it menu.delete(0) or something similar? I thought Python was consistent?
A: It’s a very good question. It turns out the designers of Python thought common operations, like len and del, deserved a bit of special treatment. They also thought that, for example, len(menu) was more readable than menu.length(). The reasoning behind this has been debated at great length, but that’s the way it is in Python. And, as in the last question, you’re asking all the right things, and the madness behind the method will be clearer once we get to talking about functions and objects.
Judy: Yes, we’ll start with an empty list to hold the solutions with the highest scores, and add each solution that has that high score one at a time to it as we iterate through the scores list.
Frank: Great, let’s get started.
Judy: But hold on a second…I think we might need another loop.
Frank: We do? It seems like there should be a way to do that in our existing loop.
Judy: Actually, I’m sure we do. Here’s why: we have to know what the highest score is before we can find all the solutions that have that highest score. So we need two loops: one to find the highest score, which we’ve already written, and then a second one to find all the solutions that have that score.
Frank: Oh, I see. And in the second loop, we’ll compare each score to the highest score, and if it matches, we’ll add the index of the bubble solution score to the new list we’re creating for the solutions with the highest scores.
Judy: Exactly! Let’s do it.
Test drive the final report
Go ahead and add your code to generate the bubble solutions with the highest score to your code in bubbles.py and run another test drive. All of our code is shown below:
And the winners are...
Bubble solutions #11 and #18 both have a high score of 69, so they are the best bubble solutions in this batch of test solutions!
So, what’s the job here? It’s to take the leading bubble solutions—that is, the ones with the highest bubble scores—and choose the lowest-cost one. Now, luckily, we’ve been given a costs
list that mirrors the scores
list. That is, the bubble solution score at index 0
in the scores
list has the cost at index 0
in the costs
list (.25), the bubble solution at index 1
in the scores
list has a cost at index 1
in the costs
list (.27), and so on. So, for any score you’ll find its cost in the costs
list at the same index. Sometimes we call these parallel lists:
Judy: Well, we know the highest score already.
Frank: Right, but how do we use that? And we have these two lists, but how do we get those to work together?
Judy: I’m pretty sure either of us could write a simple for
loop that goes through the scores
list again and picks up the items that match the highest score.
Frank: Yeah, I could do that. But then what?
Judy: Anytime we hit a score that matches the highest score, we need to see if its cost is the lowest we’ve seen.
Frank: Oh, I see, so we’ll have a variable that keeps track of the index of the “lowest cost high score.” Wow, that’s a mouthful.
Judy: Exactly. And once we get through the entire list, whatever index is in that variable is the index of the item that not only matches the highest score, but has the lowest cost as well.
Frank: What if two items match in cost?
Judy: Hmm, we have to decide how to handle that. I’d say, whatever one we see first is the winner. Of course we could do something more complex, but let’s stick with that unless the CEO says differently.
Frank: This is complicated enough I think I want to sketch out some pseudocode before writing anything.
Judy: I agree; whenever you are managing indices of multiple lists things can get tricky. Let’s do that; in the long run I’m sure it will be faster to plan it first.
Frank: Okay, I’ll take a first stab at it…
Testing the most cost-effective solution
We should have everything coded below for the Bubbles-R-Us CEO. Check out the code and see how it matches the pseudocode, and then enter the new code into bubbles.py and give it another test run. All the code is shown below. When you’ve got a winning solution, turn to page to see if it matches ours.
You’re right: we didn’t need to.
We could have found the lowest-cost solution from just the list in best_solutions
, because that list is the result of already figuring out one or more solutions with the highest bubble scores. The only reason we didn’t was to keep things simple on our first attempt.
Others might be asking, though: what’s the difference? Who cares? It works! Well, it is all about the efficiency of the code. How much work is your code doing? And, for a list as small as ours, there really isn’t much of a difference; however, if you had a huge list of data you’d want to avoid iterating over it multiple times if you had a more efficient way. And we do.
To determine the lowest-cost solution (with the highest score), all we need to do is consider the solutions in the best_solutions
list. Doing that is a little more complex, but not much.
COMPARE this code to the previous version; can you see the differences? Think through how each executes; can you see how much less work this version does to compute the most cost-effective solution? It’s worth some time to see the difference.
Get Head First Learn to Code now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.