Chapter 11. Batch Processes and Tasks
The cloud gives us unprecedented scale. It costs virtually nothing to spin up new application instances to accommodate demand, and once the dust has settled, it’s easy to scale down. This means that, as long as the work at hand lends itself to parallelization, we improve our efficiency with scale. Many problems are embarrassingly parallel; they require no coordination between nodes. Others may require some coordination. Both of these types of workloads are ideal for a cloud computing environment, while others are inherently serial. For work that is not particularly parallelized, a cloud computing environment is ideal for horizontally scaling computation to multiple nodes. In this chapter, we will look at a few different ways, both old and new, to ingest and process data using microservices.
Batch processing has a long history. Batch processing refers to the idea that a program processes batches of input data at the same time. Historically, batch processing is a more efficient way of utilizing computing resources. The approach amortizes the cost of a set of machines by prioritizing windows of interactive work—when operators are using the machines—and noninteractive work in the evening hours, when the machine would otherwise be idle. Today, in the era of the cloud, with virtually infinite and ephemeral computing capacity, efficient machine utilization isn’t a particularly compelling reason to adapt batch processing.
Batch processing ...