My story is a lot like yours, only more interesting ’cause it involves robots.
Bender, Futurama episode “30% Iron Chef”
The most common question I receive about Asyncio in Python 3 is this: “What is it, and what do I do with it?” The following story provides a backdrop for answering these questions. The central focus of Asyncio is about how to best perform multiple concurrent tasks at the same time. And not just any sort of tasks, but specifically tasks that involve waiting periods. The key insight required with this style of programming is that while you wait for this task to complete, work on other tasks can be performed.
The year is 2051, and you find yourself in the restaurant business! Automation, largely by robot workers, powers most of the economy, but it turns out that humans still enjoy going out to eat once in a while. In your restaurant, all the employees are robots; humanoid, of course, but unmistakably robots. The most successful manufacturer of robots is of course Threading Inc., and robot workers from this company have come to be called “ThreadBots.”
Except for this small robotic detail, your restaurant looks and operates like one of those old-time restaurants from, say, 2018. Your guests will be looking for that vintage experience. They want fresh food prepared from scratch. They want to sit at tables. They want to wait for their meals—but only a little. They want to pay at the end, and they sometimes even want to leave a tip, for old-time’s sake, of course.
Naturally, being new to the robotic restaurant business, you do what every other restaurateur does, and you hire a small fleet of robots: one to greet new diners at the front desk (hostbot); one to wait tables and take orders (waitbot); one to do the cooking (chefbot); and one to manage the bar (winebot).
Hungry diners will arrive at the front desk, and will be greeted by your front-of-house threadbot. They are then directed to a table, and once they are seated, your waiter threadbot will take their order. Then, the waiter threadbot will take that order to the kitchen on a slip of paper (because you want to preserve that old-time experience, remember?). The chefbot will look up the order on the slip and begin preparing the food. The waitbot will periodically check whether the food is ready, and if so, will immediately take the dish to the customer’s table. When the guests are ready to leave, they return to greetbot who calculates the bill, and graciously wishes them a pleasant evening further.
You soon open your restaurant, and exactly as you had anticipated, your menu is a hit and you soon grow a large customer base. Your robot employees do exactly what they’re told, and they are perfectly good at the tasks you assigned them. Everything is going really well, and you really couldn’t be happier.
Over time, however, you do begin to notice some problems. Oh, it’s nothing truly serious. Just a few things that seem to go wrong. Every other robotic restaurant owner seems to have similar niggling problems. It is a little worrying that these problems seem to get worse the more successful you become.
Though rare, there are the occasional collisions that are very unsettling: sometimes, when a plate of food is ready in the kitchen, the waitbot will grab it before the chefbot has even let go of the plate! This usually shatters the plate and leaves a big mess. Chefbot cleans it up of course, but still, you’d think that these top-notch robots would know how to be a bit more synchronized with each other. This happens at the bar too: sometimes winebot will place a new drink order on the bar and waitbot will grab it before winebot has let go, resulting in broken glass and spilled Nederburg Cabernet Sauvignon!
Sometimes greetbot will seat new diners at exactly the same moment that waitbot has decided to clean what it thinks was an empty table. It’s pretty awkward for the diners! You’ve tried adding delay logic to the waitbot’s cleaning function, or delays to the greetbot’s seating function, but these don’t really help, because the collisions still occur. At least these are only rare events.
Well, these used to be rare events. Your restaurant got so popular that you’ve had to hire a few more threadbots. For very busy Friday and Saturday evenings, you’ve had to add a second greetbot and two extra waitbots. Unfortunately the hiring contracts for threadbots mean that you have to hire for a whole week, so this effectively means that for most of the quiet part of the week, you’re carrying three extra threadbots that you don’t really need.
The other resource problem, in addition to the extra cost, of course, is that it’s more work for you to deal with these extra threadbots. It was fine to keep tabs on just four bots, but now you’re up to seven! Keeping track of seven threadbots is a lot more work, and because your restaurant keeps getting more and more famous, you become worried about taking on even more threadbots. It’s going to become a full-time job just to keep track of what each threadbot is doing! Another thing: these extra theadbots are using up a lot more space inside your restaurant. It’s becoming a tight squeeze for your customers, what with all these robots zipping around. You’re worried that if you need to add even more bots, this space problem is going to get even worse. You want to use the space in your restaurant for customers, not threadbots!
The collisions have also become worse since you added more threadbots. Now, sometimes two waitbots take the exact same order from the same table at the same time. It’s as if they both noticed that the table was ready to order and moved in to take it, without noticing that the other waitbot was doing the exact same thing. As you can imagine, this results in duplicated food orders which causes extra load on the kitchen and increases the chance of collisions when picking up the ready plates. You’re worried that if you added more waitbots, this problem might get worse.
Then, during one very, very busy Friday night service, you have a singular moment of clarity: time slows, lucidity overwhelms you and you see a snapshot of your restaurant frozen in time. My threadbots are doing nothing! Not really nothing, to be fair, but they’re just…waiting.
Each of your three waitbots at different tables is waiting for one of the diners at their table to give their order. The winebot already prepared 17 drinks which are now waiting to be collected (it took only a few seconds), and is now waiting for a new drink order. One of the hostbots has greeted a new party of guests, told them they need to wait a minute to be seated, and is waiting for the guest to respond. The other hostbot, now processing a credit card payment for another guest that is leaving, is waiting for confirmation on the payment gateway device. Even the chefbot, who is currently cooking 35 meals, is not actually doing anything at this moment, but is simply waiting for one of the meals to complete cooking so that it can be plated up and handed over to a waitbot.
You realize that even though your restaurant is now full of threadbots, and you’re even considering getting more (with all the problems that entails), the ones that you currently have are not even being fully utilized.
The moment passes, but not the realization. You wait for weekend service to pass, and the first thing you do is add a data collection module to your threadbots. For each threadbot, you’re measuring how much time is spent waiting and how much is spent actively doing work. Over the course of the following week, the data is collected and then on Sunday evening you analyze the results. It turns out that even when your restaurant is at full capacity, the most hardworking threadbot is idle for about 98% of the time! The threadbots are so enormously efficient that they can perform any task in fractions of a second.
As an entrepreneur, this inefficiency really bugs you. You know that every other robotic restaurant owner is running their business the same as you, with many of the same problems. But, you think, slamming your fist on your desk, “There must be a better way!”
So the very next day, which is a quiet Monday, you try something very bold: you program a single threadbot to do all the tasks. But every time it begins to wait, even for a second, instead of waiting, the threadbot will switch to the next task, whatever it may be in the entire restaurant. It sounds incredible at face value, only one threadbot doing the work of all the others, but you’re confident that your calculations are correct. And besides, Monday is a very quiet day; so even if something goes wrong, the impact will be small. For this new project, you call the bot “loopbot” because it will loop over all the jobs in the restaurant.
The programming was more difficult than usual. It isn’t just that you had to program one threadbot with all the different tasks; you also had to program some of the logic of when to switch between tasks. But by this stage, you’ve had a lot of experience with programming these threadbots so you manage to get it done.
Monday arrives, and you watch your loopbot like a hawk. It moves between stations in fractions of a second, checking whether there is work to be done. Not long after opening, the first guest arrives at the front desk. The loopbot shows up almost immediately, and asks whether the guest would like a table near the window or near the bar. And then, as the loopbot begins to wait, its programming tells it to switch to the next task, and it whizzes off! This seems like a dreadful error, but then you see that as the guest begins to say “window please,” the loopbot is back! It receives the answer and directs the guest to table 42. And off it goes again, checking for drinks orders, food orders, table cleanup, and arriving guests, over and over again.
Late Monday evening, you congratulate yourself on a remarkable success! You check the data collection module on the loopbot, and it confirms that even with a single threadbot doing the work of seven, the idle time was still around 97%! This result gives you the confidence to continue the experiment all through the rest of the week.
As the busy Friday service approaches, you reflect on the great success of your experiment. For service during a normal working week, you can easily manage the workload with a single loopbot. And there is another thing you’ve noticed: you don’t see any more collisions. It makes sense: since there is only one loopbot, it cannot get confused with itself. No more duplicate orders going to the kitchen, and no more confusion about when to grab a plate or drink.
Friday evening service begins, and as you had hoped, the single threadbot keeps up with all the customers and tasks, and service is proceeding even better than before. You imagine that you can take on even more customers now, and you don’t have to worry about having to bring on more threadbots. You think of all the money you’re going to save.
Unfortunately, something goes wrong: one of the meals, an intricate souffle, has flopped! This has never happened before in your restaurant. You begin to study the loopbot more closely. It turns out that at one of your tables, there is a very chatty guest. This guest has come to your restaurant alone, and keeps trying to make conversation with your loopbot, even sometimes holding your loopbot by the hand. When this happens, your loopbot is unable to dash off and attend to the ever-growing list of tasks elsewhere in your restaurant. This is why the kitchen produced its first flopped souffle. Your loopbot was unable to make it back to the kitchen to remove a souffle from the oven, because it was held up by a guest.
Friday service finishes, and you head home to reflect on what you have learned. It’s true that the loopbot could still do all the work that was required on a busy Friday service; but on the other hand, your kitchen produced its very first spoiled meal, something that has never happened before. Chatty guests used to keep waitbots busy all the time, but that never affected the kitchen service at all.
All things considered, you ponder, it is still better to continue using a single loopbot. Those worrying collisions no longer occur, and there is much more space in your restaurant, space that you can use for more customers. But you realize something profound about the loopbot: it can only be effective if every task is short; or at least can be performed in a very short period of time. If any activity keeps the loopbot busy for too long, other tasks will begin to suffer neglect.
It is difficult to know in advance which tasks may take too much time. What if a guest orders a cocktail that requires very intricate preparation, much more than usual? What if a guest wants to complain about a meal at the front-desk, refuses to pay, and grabs the loopbot by the arm, preventing it from task-switching? You decide that instead of figuring out all of these issues up front, it is better to continue with the loopbot, record as much information as possible, and deal with any problems later as they arise.
More time passes.
Gradually, other restaurant owners notice your operation, and eventually they figure out that they too can get by, and even thrive, with only a single threadbot. Word spreads. Soon every single restaurant operates in this way, and it becomes difficult to remember that robotic restaurants ever operated with multiple threadbots at all.
In our story, each of the robot workers in the restaurant is a single
thread. The key observation in the story is that the nature of the
work in our restaurant involves a great deal of waiting, just as
requests.get() is waiting for a response from a server.
In a restaurant, the worker time spent waiting isn’t huge when slow humans are doing manual work, but when super-efficient and quick robots are doing the work, then nearly all their time is spent waiting. With computer programming, the same is true when network programming is involved. CPUs do “work” and “wait” on network I/O. CPUs in modern computers are extremely fast, hundreds of thousands of times faster than network traffic. Thus, CPUs running networking programs spend a great deal of time waiting.
The insight in the story is that programs can be written to explicitly direct the CPU to move between work tasks as necessary. While there is an improvement in economy (using fewer CPUs for the same work), the real advantage, compared to a threading (multi-CPU) approach is the elimination of race conditions.
It’s not all roses, however: as we found in the story, there are benefits and drawbacks to most technology solutions. The introduction of the LoopBot solved a certain class of problems, but also introduced new problems—not least of which is that the restaurant owner had to learn a slightly different way of programming.
For I/O-bound workloads, there are exactly two reasons (only!) to use async-based concurrency over thread-based concurrency:
Asyncio offers a safer alternative to preemptive multitasking (i.e., using threads), thereby avoiding the bugs, race conditions, and other non-deterministic dangers that frequently occur in non-trivial threaded applications.
Asyncio offers a simple way to support many thousands of simultaneous socket connections, including being able to handle many long-lived connections for newer technologies like websockets, or MQTT for internet-of-things applications.
Threading—as a programming model—is best suited to certain kinds of computational tasks that are best executed with multiple CPUs and shared memory for efficient communication between the threads. In such tasks, the use of multicore processing with shared memory is a necessary evil because the problem domain requires it.
Network programming is not one of those domains. The key insight is that network programming involves a great deal of “waiting for things to happen,” and because of this, we don’t need the operating system to efficiently distribute our tasks over multiple CPUs. Furthermore, we don’t need the risks that preemptive multitasking brings, such as race conditions when working with shared memory.
However, there is a great deal of misinformation about other supposed benefits of event-based programming models that just ain’t so. Here are a few:
Unfortunately, no. In fact, most benchmarks seem to show that threading solutions are slightly faster than their comparable Asyncio solutions. If the extent of concurrency itself is considered a performance metric, Asyncio does make it a bit easier to create very large numbers of concurrent socket connections though. Operating systems often have limits on how many threads can be created, and this number is significantly lower than the number of socket connections that can be made. The OS limits can be changed, but it is certainly easier to do with Asyncio. And while we expect that having many thousands of threads should incur extra context-switching costs that coroutines avoid, it turns out to be difficult to benchmark this in practice.1 No, speed is not the benefit of Asyncio in Python; if that’s what you’re after, try Cython instead!
Definitely not! The true value of threading lies in being able to
write multi-CPU programs, in which different computational tasks can
share memory. The numerical library
numpy, for instance, already
makes use of this by speeding up certain matrix calculations through
the use of multiple CPUs, even though all the memory is shared. For
sheer performance, there is no competitor to this programming model
for CPU-bound computation.
Again, no. It is true that Asyncio is not affected by the GIL,2 but this is only because the GIL affects multithreaded programs. The “problems” with the GIL that people refer to are that it prevents true multicore concurrency when using threads. Since Asyncio is single-threaded (almost by definition), it is unaffected by the GIL, but it also cannot benefit from multiple CPU cores either.3 It is also worth pointing out that in multithreaded code, the Python GIL can cause additional performance problems beyond what has already been mentioned in other points: Dave Beazley presented a talk, “Understanding the Python GIL,” at PyCon 2010, and much of what is discussed in that talk remains true today.
False. The possibility of race conditions is always present with any
concurrent programming, regardless of whether threading or event-based
programming is used. It is true that Asyncio can virtually eliminate a
certain class of race conditions common in multithreaded programs,
such as intra-process shared memory access. However, it doesn’t
eliminate the possibility of other kinds of race conditions, for
example inter-process races with shared resources common in
distributed microservices architectures. You must still pay attention
to how shared resources are being used. The main advantage of Asyncio over
threaded code is that the points at which control of execution is
transferred between coroutines are visible (due to the presence of
await keywords), and thus it is much easier to reason about how shared
resources are being accessed.
Ahem, where do I even begin?
The last myth is the most dangerous one. Dealing with concurrency is always complex, regardless of whether you’re using threading or Asyncio. When experts say, “Asyncio makes concurrency easier,” what they really mean is that Asyncio makes it a little easier to avoid certain kinds of truly nightmarish race-condition bugs; the kind that keep you up at night, and about which you tell other programmers in hushed tones over campfires, wolves howling in the distance.
Even with Asyncio, there is still a great deal of complexity to deal with. How will your application support health checks? How will you communicate with a database which may allow only a few connections, much fewer than your 5,000 socket connections to clients? How will your program terminate connections gracefully when you receive a signal to shut down? How will you handle (blocking!) disk access and logging? These are just a few of the many complex design decisions that you will have to answer.
Application design will still be difficult, but the hope is that you will have an easier time reasoning about your application logic when you have only one thread to deal with.
1 Research in this area seems hard to find, but the numbers seem to be around 50 microseconds per threaded context-switch on Linux on modern hardware. To give a (very) rough idea: a thousand threads implies 50 ms total cost just for the context switching. It does add up, but it isn’t going to wreck your application either.
2 The global interpreter lock (GIL) makes the Python interpreter code (not your code!) thread-safe by locking the processing of each opcode; it has the unfortunate side effect of effectively pinning the execution of the interpreter to a single CPU, and thus preventing multicore parallelism.