Chapter 8. Parallel Data Processing with Streams

Our world is overwhelmingly concurrent and parallel; we can almost always do more than one thing at once. Our programs need to solve more and more problems, and that’s why data processing often benefits from being parallel, too.

In Chapter 6, you’ve learned about Streams as data processing pipelines built of functional operations. Now it’s time to go parallel!

In this chapter, you will learn about the importance of concurrency and parallelism, how and when to use parallel Streams, and when not to. Everything you learned in the previous two chapters about data processing with Streams so far also applies to using them for parallel processing. That’s why this chapter will concentrate on the differences and intricacies of parallel Streams.

Concurrency versus Parallelism

The terms parallelism and concurrency often get mixed up because the concepts are closely related. Rob Pike, one of the co-designers of the programming language Go, defined the terms nicely:

Concurrency is about dealing with a lot of things at once. Parallelism is about doing a lot of things at once. The ideas are, obviously, related, but one is inherently associated with structure, and the other is associated with execution. Concurrency is structuring things in a way that might allow parallelism to actually execute them simultaneously. But parallelism is not the goal of concurrency. The goal of concurrency is good structure and the possibility to implement execution ...

Get A Functional Approach to Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.