Chapter 11. Batch Processes and Tasks

The cloud gives us unprecedented scale. It costs virtually nothing to spin up new application instances to accommodate demand, and once the dust has settled, it’s easy to scale down. This means that, as long as the work at hand lends itself to parallelization, we improve our efficiency with scale. Many problems are embarrassingly parallel; they require no coordination between nodes. Others may require some coordination. Both of these types of workloads are ideal for a cloud computing environment, while others are inherently serial. For work that is not particularly parallelized, a cloud computing environment is ideal for horizontally scaling computation to multiple nodes. In this chapter, we will look at a few different ways, both old and new, to ingest and process data using microservices.

Batch Workloads

Batch processing has a long history. Batch processing refers to the idea that a program processes batches of input data at the same time. Historically, batch processing is a more efficient way of utilizing computing resources. The approach amortizes the cost of a set of machines by prioritizing windows of interactive work—when operators are using the machines—and noninteractive work in the evening hours, when the machine would otherwise be idle. Today, in the era of the cloud, with virtually infinite and ephemeral computing capacity, efficient machine utilization isn’t a particularly compelling reason to adapt batch processing.

Batch processing ...

Get Cloud Native Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.