Chapter 1. Introduction

“Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.”

Rob Pike

To truly appreciate something, it is crucial to know how it came to be, especially if we can discern the steps it took and the challenges it overcame along the way. This understanding not only highlights ongoing progress but also helps us comprehend its relevance. Similarly, Java Concurrency has come a long way since its inception. It took a long time to evolve to its current state. But if we want to understand the advancement it has made recently, such as virtual threads and structured concurrency in modern Java, we must first delve into its evolution. In this chapter we will give you an initial view of Java concurrency and then briefly discuss the evolution of it.

A Brief History of Threads in Java

Java was designed with concurrency in mind, one of the first languages to provide built-in support for multi-threading. Over the years, Java’s concurrency capabilities have improved and refined, leaving behind some potholes and lessons along the way.

It began with basic synchronisation and thread management. Then came the introduction of the java.util.concurrent in Java 5, which brought important new capabilities such as the Executor framework, locks and concurrent collections. Next, with the introduction of the Fork/Join framework in Java 7, Java engaged the concurrency performance for multi-core processors. And most recently, with Project Loom, Java addressed the complexity and limitations of traditional thread-based concurrency, with lightweight, user-mode threads, and structured concurrency, ultimately aiming at making concurrent development simpler and more efficient.

Why does this evolution matter so much? As we uncover how Java’s concurrency story unfolds, we discover a relentless pursuit of greater efficiency and simplified programming in the face of ever-growing complexity. This narrative extends beyond just Java; it reflects the trajectory of software development and Java’s continued aspirations.

Let’s delve deeper into understanding this evolution and appreciate the strides made in Java Concurrency.

Java Is Made of Threads

Concurrency was an explicit design goal of Java, a language that was released with built-in thread capability and a threading feature that was a key differentiator of the language. It removed the dependency of developers on operating system-specific features to achieve concurrency.

In Java, a thread is the smallest unit of execution. It is an independent path of execution running within a program. Threads share the same address space, meaning they have access to the same variables and data structures of the program. However, each thread maintains its own program counter, stack, and local variables, enabling it to operate independently. This design also facilitates interaction among threads when necessary.

However, Java’s threading model relies on the underlying operating system to schedule and execute threads. The operating system allocates CPU time to each thread, managing the transition between threads to ensure efficient execution. By distributing threads across multiple CPUs, the system can achieve true parallelism, enhancing the performance of concurrent applications (Figure 1-1).

The more threads we can have in a Java program, the more execution environment we effectively create. This gives us the ability to execute several operations simultaneously. This is particularly beneficial for applications that require high levels of parallelism or concurrency, such as web servers, data processing pipelines, and real-time systems. By leveraging multiple threads, these applications can perform multiple tasks simultaneously, thereby improving throughput and responsiveness.

Note

Parallelism vs. concurrency: Parallelism and concurrency are often used interchangeably. However, they mean two different things. Parallelism entails doing more than one thing simultaneously, so multiple processing units, such as two or more CPU cores, are needed. Concurrency is about designing programs where portions of their operation can overlap – even if not always simultaneously. Think of parallelism as multiple workers building a house side-by-side, while concurrency is like a single chef juggling multiple dishes in the kitchen.

The notion of threads form the very fabric of how the whole of the Java ecosystem is implemented and how it functions. It’s the bedrock on which lots of powerful features and tools are based and functions that many programmers take for granted simply wouldn’t be possible without it. Whether it’s the garbage collection system, which tackles the problem of memory management in Java or the process of executing the simple output of a ‘Hello, World!’ program in Java, threads are working away in the background.

Here’s an example.Anyone who has written a Java program, even for the first time, will be familiar with the line most programming books start out with. It may seem like a single-threaded operation, however, the Java Virtual Machine (JVM) actually executes this code in a thread, commonly referred to as the “main thread.”

public class HelloWorld {
   public static void main(String[] args) {
       System.out.println("Hello, World!");
       // Displaying the thread that's executing the main method
       System.out.println("Executed by thread: " + Thread.currentThread().getName());
   }
}

When we run this program, the output would include the name of the thread executing the main method, which is typically “main.”

Output:

Hello, World!
Executed by thread: main

This example shows that even the simplest Java programs are already fundamentally threaded. The implication is profound: we, as Java developers, are harnessing the power of threads, whether we realize it or not, in virtually every part of our job, from running the most straightforward programs to using even the most advanced techniques, such as garbage collection.

And so threads are a required element of Java – they’re part of what makes Java a language that can scale up to handle large databases and massive distributed systems.

Threads: The Backbone of the Java Platform

Java threads are integral to all layers of the Java platform, playing a crucial role in various aspects beyond just executing code. For instance, they’re the basis of exception handling, debugging, and profiling Java applications..

Exceptions and Threads

In Java, every thread has its own separate call stack that records all method call invocations done during the thread’s lifetime. When an exception is raised, that thread’s call stack becomes a vital part of the diagnostic history – it shows the sequence of method invocations that led to the exception, helping developers trace the root cause of an issue. Here’s that same example again:

import java.sql.SQLException;
public class CallStackDemo {
   public static void main(String[] args) throws InterruptedException {
       Thread thread = new Thread(CallStackDemo::processOrder);
       thread.setName("mcj-thread");
       thread.start();
       thread.join();
   }
   static void processOrder()  {
       validateOrderDetails();
   }
   static void validateOrderDetails()  {
       checkInventory();
   }
   static void checkInventory()  {
       updateDatabase();
   }
   static void updateDatabase()  {
       try {
           throw new SQLException("Database connection error");
       } catch (SQLException e) {
           throw new InventoryUpdateException("Database Error: Unable to update inventory", e);
       }
   }
}
class InventoryUpdateException extends RuntimeException {
   public InventoryUpdateException(String message, Throwable cause) {
       super(message, cause);
   }
}

In this scenario, the database update operation fails, which propagates back up the call stack. The main thread starts calling processOrder, which calls validateOrderDetails, which calls checkInventory, which calls updateDatabase, which then throws the Exception. We now will get the following output in the console:

Exception in thread "mcj-thread" ca.bazlur.mcj.chap1.InventoryUpdateException: Database Error: Unable to update inventory
	at ca.bazlur.mcj.chap1.CallStackDemo.updateDatabase(CallStackDemo.java:33)
	at ca.bazlur.mcj.chap1.CallStackDemo.checkInventory(CallStackDemo.java:26)
	at ca.bazlur.mcj.chap1.CallStackDemo.validateOrderDetails(CallStackDemo.java:22)
	at ca.bazlur.mcj.chap1.CallStackDemo.processOrder(CallStackDemo.java:18)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.sql.SQLException: Database connection error
	at ca.bazlur.mcj.chap1.CallStackDemo.updateDatabase(CallStackDemo.java:31)
	... 4 more

This trace allows developers to inspect the call stack within a specific thread context (i.e., the context of the mcj-thread where the exception occurred). The granularity is highly beneficial in the JVM, especially for debugging multithreaded applications where several threads may run concurrently on the same or different tasks. This detail helps us pinpoint issues faster, making debugging more focused, efficient, and a quicker resolution.

Debugger and Threads

How does the Java debugger figure out where exactly to pause the execution of a running application? The answer again is threading. By attaching itself to the various threads in the application, the debugger can select which of these individual threads to examine or even change their state while debugging an active application. This is fundamental to your ability to find and fix bugs in applications where several threads might be simultaneously executing at different stages, and at which different threads might be interacting in ways that need to be understood.

Each action can be ‘stepped into’, ‘stepped over’, or ‘stepped out of’ with respect to a thread. One or more independent call stacks for the active threads may be inspected in a debugging session. The call stack tells you what happened before, e.g., what caused a certain thread to be at a given point in a program, in order to understand how the thread got to a certain state – invaluable when you are trying to figure out what an unexpected exception or an erroneous manipulation of data was caused by.

Profiler and Threads

Threads are equally vital to understanding Java’s operational dynamics and play a crucial role in profiling. Just as threading facilitates pinpoint precision in debugging, it extends its utility to offering a granular perspective in profiling practices. In fact, threads are the backbone of performance analysis in Java. Profiling tools rely on threads to give you a detailed look at how your application is running. They use thread information to pinpoint slowdowns, troubleshoot tricky timing issues in multithreaded code, and help you find ways to make your application run faster. In short, threads are the key to understanding and improving the performance of your multithreaded Java applications.

Java threads play a significant role in a lot of operations, such as diagnosis, debugging, and profiling, by providing detailed insight into the execution of individual thread-level programs. Despite not being directly used by developers, they keep operating underneath and augment the benefits of Java applications as well. Examples can be Garbage Collection Threads used for the management of memory, Compiler Threads in the JIT system for performance enhancement, Signal Dispatcher, Finalizer and Reference Handler Threads for smooth execution of JVM, VM and Service Threads for the core JVM tasks and diagnostics. Hence, it is easy to quote that Java threads are in the layer of the Java Platform, and a Java developer needs to understand them.

The Genesis of Java 1.0 Threads

Java 1.0 was released in 1996 and came with built-in support for threads. This defining feature set it apart from many languages at that time. In these early days, you could create threads by either extending the Thread class or implementing the Runnable interface. For example:

// Using Thread class
class MyThread extends Thread {
    public void run() {
        System.out.println("Thread using Thread class.");
    }
}
// Using Runnable interface
class MyRunnable implements Runnable {
    public void run() {
        System.out.println("Thread using Runnable interface.");
    }
}

To start a thread, we can do the following:

MyThread myThread = new MyThread();
myThread.start();

Thread thread = new Thread(new MyRunnable()); 
thread.start();

Starting Threads

Whichever method we choose for creating a thread starts by invoking the start method on the Thread object. This is an essential step because calling start does more than just execute the run method; it also performs setup tasks like allocating system resources. The start method subsequently calls the run method in a new thread of execution.

Note

It’s crucial not to call the run method directly. Doing so will execute the run method in the calling thread, not in a new thread.

However, in modern Java applications, using the Executors framework is more efficient than manually creating threads through constructors. Executors abstract the process of thread management by maintaining a thread pool—a group of pre-created threads ready for executing tasks. This method significantly reduces the overhead associated with creating new threads.

When a task is submitted to an Executor, it’s placed in a queue. A thread from the pool then picks up the task from the queue and executes it. This setup simplifies concurrent programming and optimizes resource utilization by reusing threads.

Here’s a simple example demonstrating how to use an Executor to manage a thread pool:

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class ExecutorExample {
    public static void main(String[] args) {
        try (ExecutorService executor = Executors.newFixedThreadPool(5)) {
            for (int i = 0; i < 10; i++) {
                final int taskId = i;
                executor.submit(() -> {
                    System.out.println("Executing task " + taskId 
                            + " in thread " + Thread.currentThread().getName());
                });
            }
        }
    }
}

Understanding the Hidden Costs of Threads

Many of today’s web applications run on a thread-per-request model, where a thread is assigned to each request – the thread manages the whole request/response life cycle. For instance, let’s consider the life cycle of a request on a typical web application running under a servlet container (Apache Tomcat, Jetty or any Java EE web server). A servlet container is software responsible for processing requests coming to the web application. When a request arrives at the servlet container, the container assigns the request to one of the threads in its thread pool to process the request. That thread fires up, in effect saying: “Hey, I got this request coming from the user; it’s mine now; I’ll take care of it.” The thread essentially takes responsibility for processing the whole request/response life cycle (Figure 1-2).

This is particularly helpful when we have large numbers of simultaneous connections because increasing concurrency, in turn, increases the throughput of the web application. In fact, this is a key feature of the thread-per-request model, which allows modern web applications to scale in order to serve a growing volume of requests efficiently.

Amazingly, most modern operating systems can handle millions of concurrent connections, so it would seem reasonable to create yet more threads in order to improve throughput. More threads in the thread pool should indeed equate to more throughput, according to the Little’s Law.¹ However, even though the assumption appeared to be accurate, it turns out that this line of reasoning is a bit slippery – and will not always produce the expected results.

Note

Throughput in web applications is the rate at which requests are processed, and the server delivers responses, typically measured in requests per second (RPS) or transactions per second (TPS). It indicates the application’s capacity to handle load, serving as a critical metric for performance, scalability, and resource utilization. Throughput can be calculated as

This serves as a guide for benchmarking performance, scalability, and demand planning, and it paves the way for the optimal use of resources that ensure a quality user experience.

An important detail to bear in mind is the memory footprint of each thread: about 2 MiB² of memory (outside the heap per thread). This can quickly become a lot, especially in large-scale applications running thousands of concurrent connections – the aggregate memory footprint of the threads can make a world of difference in the actual number of connections you can run. Additionally, suppose your application uses all physical memory. In that case, you’ll find yourself paging to disk, a significantly slower operation than even accessing RAM – the time-to-read difference when reading from disk vs RAM is a factor of 1000. This alone can impact performance heavily.

Also, it is essential to understand that Java threads are actually a thin wrapper around the native threads provided by the host operating system. This means that the maximum number of threads you can create for any application is effectively limited by the native thread creation limit provided by the host operating system. Almost all operating systems have an upper limit on the number of threads that can be spawned. So, an application’s scalability potential is limited by this bottleneck.

Besides, there is the expense of context switching. Thread creation isn’t just a memory overhead. It’s also a CPU overhead. When you switch between threads, that’s a context switch. The context of the current thread needs to be stored, and the context of the new thread needs to be brought back. All that context switching costs CPU cycles, and, on systems under high load, this overhead can make a huge difference in performance. This is another under-appreciated cost of weight in your application.

How Many Threads Can You Create?

Now that we have discussed the costs associated with threads let’s examine how to measure the limitation of thread creation in your environment. By running a simple test program, you can determine the maximum number of threads in your system can handle before issues. Here’s a skeleton code snippet to start:

import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.locks.LockSupport;
public class ThreadLimitTest {
    public static void main(String[] args) {
        var threadCount = new AtomicInteger(0);
        try {
            while (true) {
                var thread = new Thread(() -> {
                    threadCount.incrementAndGet();
                    LockSupport.park();
                });
                thread.start();
            }
        } catch (OutOfMemoryError error) {
            System.out.println("Reached thread limit: " + threadCount);
            error.printStackTrace();
        }
    }
}

Let’s execute this program to see how many threads can be created before running out of memory or hitting other system limitations.

Reached thread limit: 16363
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
	at java.base/java.lang.Thread.start0(Native Method)
	at java.base/java.lang.Thread.start(Thread.java:1526)
	at ca.bazlur.chapter0.ThreadLimitTest.main(ThreadLimitTest.java:15)

On my machine, I encountered an OutOfMemoryError after successfully creating 16,363 threads. This experiment highlights the point that there’s a finite limit to the number of threads one can create, a constraint that is largely dictated by the underlying operating system and the hardware. Hence, the limit can be different on your machine.

As you can see, while threads provide substantial benefits in web application scalability, they do come with associated costs. These costs, though sometimes less obvious, should be carefully considered to ensure optimal performance.

Resource Efficiency in High-Scale Applications

Today’s software applications are often expected to process large amounts of data and experience high volumes of incoming traffic simultaneously. These dynamics create a significant challenge for staying fiscally sound, especially when the cloud has become the default environment in which many businesses operate. Even the slightest degree of resource inefficiency can quickly turn into escalating costs. Since we have only a finite number of threads, it’s essential to use them carefully. But threads get blocked in practice, so they’re often not used effectively.

Consider the following example, where a method calculates a person’s credit score based on various factors:

public Credit calculateCredit(Long personId) {
    var person = getPerson(personId);
    var assets = getAssets(person);
    var liabilities = getLiabilities(person);
    importantWork();
    return calculateCredits(assets, liabilities);
}

The code snippet above shows a sequence of five method invocations, each happening one after the other. Suppose each would typically take 200 milliseconds to complete. Then the time it takes for the calculateCredit method to execute would be something like 1 second – roughly five times as long (1000 milliseconds = 200 × 5 milliseconds). The thread that caused the calculateCredit() invocation to happen is practically doing nothing for most of that time, just waiting, one after the other, for results from the five other method invocations. Therefore, the thread that started these invocations should be mostly idle for that time, just waiting for other results. This reveals an awful kind of inefficiency: the computational resources of the thread should be mostly unused, because it’s blocked, unable to perform other work that might be concurrent with the first calculation. That’s also a potential latency killer.

Dealing with the key challenge of achieving high thread efficiency—in particular, trying to minimise, or even eliminate the time that threads spend being blcoked—has driven many of Java’s evolutions over time, including support for various paradigms such as asynchronous method invocations (allowing methods to run independently of the main program flow), thread pooling (reusing a fixed number of threads to perform tasks), and, more recently, modern reactive programming models (emphasizing asynchronous, event-driven interactions).

Let’s start by introducing the classical methods of threading, such as manually creating and managing individual threads, and gradually move into what modern offerings offer.

The Parallel Execution Strategy

Looking at our code snippet above, we can see that there are some method invocations, namely, getAssets(), getLiabilities(), and importantWork(), do not have interdependencies, so these methods can actually be dispatched to run in parallel. This approach to parallel execution will reduce the duration for which our main thread is being blocked, even though we haven’t totally eliminated blocking.

Our parallel execution strategy effectively allows the main thread to diminish idle time. This essentially means that, since the main thread is less idle, once this calculateCredit() method is done executing, it can be quickly free to do other work. For example:

Credit calculateCreditWithUnboundedThreads
(Long personId) throws InterruptedException {
   var person = getPerson(personId);
   var assetsRef = new AtomicReference<List<Asset>>();
   var t1 = new Thread(() -> {
       var assets = getAssets(person);
       assetsRef.set(assets);
   });
   var liabilitiesRef = new AtomicReference<List<Liability>>();
   Thread t2 = new Thread(() -> {
       var liabilities = getLiabilities(person);
       liabilitiesRef.set(liabilities);
   });
   var t3 = new Thread(() -> importantWork());
   t1.start();
   t2.start();
   t3.start();
   t1.join();
   t2.join();
   var credit = calculateCredits(assetsRef.get(), liabilitiesRef.get());
   t3.join();
   return credit;
}

We’ve used AtomicReference from the standard Java concurrency package – a thread-safe data structure that contains a reference to a single object. Since we are dealing with multiple threads, it is essential that we use it.

To analyze performance improvements, it’s helpful to measure the execution time of methods. Here’s a utility class for this purpose:

import java.util.concurrent.Callable;
public class ExecutionTimer {
   public static <T> T measure(Callable<T> task) throws Exception {
       long startTime = System.nanoTime();
       try {
           T result = task.call();
           long endTime = System.nanoTime();
           long duration = (endTime - startTime) / 1_000_000;  // Convert to milliseconds
           System.out.println("Execution time: " + duration + " milliseconds");
           return result;
       } catch (Exception e) {
           throw e;
       }
   }
}

This is a simplistic approach for understanding code execution. For robust benchmarking, consider using a microbenchmark harness (like JMH) to account for factors like JVM warmup, dead-code elimination, and other optimizations.

Assuming that each method has a 200-millisecond execution time, we can determine how much time the above code will require for completion.

The Thread.sleep (200) method can be used to simulate this execution time:

public static void main(String[] args) throws Exception {
   CreditCalculatorService service = new CreditCalculatorService();
   ExecutionTimer.measure(() -> service.calculateCredit(1L));
}

If we execute the above code, it will print something like this:

Execution time: 616 milliseconds

We’ve significantly improved the performance by refactoring the credit calculation and employing multiple threads. The new version runs 38% faster than the old one (1000 milliseconds, whereas the latest version is 600), which is noticeably faster for the user.

Though parallelization enhances performance by mitigating or reducing the blocking time of the initiating thread, it comes with its own challenges, particularly concerning thread management and system resource utilization. The ad hoc creation of threads, as demonstrated, lacks a controlled mechanism for managing thread life cycles and resource allocation. This can lead to an overproliferation of threads, potentially exhausting system resources. Each thread, being a heavyweight resource, consumes memory and processing power. As a result, since threads are being created on an ad hoc basis and there is insufficient control, threads will be created in abundance, leading to java.lang.OutOfMemoryError exceptions, crashes and other runtime bottlenecks.

Introducing Executor Framework

To avoid the dangers of creating ad hoc threads, it’s better to use one of the Java concurrency frameworks, such as the ExecutorService that provides a more structured way of working with threads.

Furthermore, using an ExecutorService not only controls the number of concurrent running threads, but also allows for efficient reuse of threads, by managing their lifecycle for us, thus overcoming the overhead brought by the thread lifecycle. Let’s refactor the above code and use ExecutorService:

private ExecutorService executor = Executors.newFixedThreadPool(10);

Credit calculateCreditWithExecutor
(Long personId) throws ExecutionException, InterruptedException {
   var person = getPerson(personId);

   var assetsFuture = executor.submit(() -> getAssets(person));
   var liabilitiesFuture = executor.submit(() -> getLiabilities(person));
   executor.submit(() -> importantWork());

   return calculateCredits(assetsFuture.get(), liabilitiesFuture.get());
}

Now, let’s measure the execution time of the code using ExecutionTimer.measure() and compare the results to the previous implementation:

public static void main(String[] args) throws Exception {
   CreditCalculatorService service = new CreditCalculatorService();
   ExecutionTimer.measure(() -> service.calculateCreditWithExecutor(1L));
}

Execution time: 614 milliseconds

Interestingly, the execution time remains very similar to the previous version. The executors provide useful facilities for managing concurrent work in Java applications – they make it easier to create and manage threads, they provide possible speedups for suitable workloads, and they make the code easier to organize and keep clean. Now, let’s address the outstanding obstacles of the executors package.

Remaining Challenges

The Executor framework brings significant improvements in resource management and asynchronous execution to Java applications. However, to maximize its benefits, it’s crucial to be aware of its limitations.

Blocking on Future.get(): Despite introducing asynchrony via Future objects, the get() method call remains blocking. This means that while we’ve shifted the blocking from one place to another, we haven’t completely eliminated it.
Potential for Cache Corruption: When tasks are submitted to a thread pool, they are executed by threads that may run on different CPU cores than the main thread. This could lead to scenarios where the cache states of the two cores become inconsistent, creating the potential for cache corruption.

The idea is to improve data access speed so all modern CPUs have L-caches³. These caches store small chunks of data (size varies with hardware architecture) so that the next time the CPU needs that data, it can be retrieved from the cache instead of slower main memory. This chunk of data is called a cache line, and its use significantly improves overall performance.

However, when multiple tasks run concurrently, there’s a chance that data from two subsequent tasks might reside in the same cache line. If both tasks run on the same CPU, there’s no need to look up the main memory again, as the data is already cached. That means the second operation will have been faster than the first, which is a bit of improvement. However, if the two tasks run on different CPUs, the second CPU has to go to the main memory to load the data and put it in its cache.

This frequent data reloading may negatively impact overall performance in frameworks like the Executor Framework, where tasks are distributed across CPUs.
Lack of Composability: It is still quite imperative in nature. There is nothing wrong with imperative-style code, but many developers prefer functional and declarative styles to be easier to read and maintain.

This leads back to our starting point: the natural question is, ‘What now?’ How do we solve these problems, and how can we use executors to write even more efficient and maintainable code in the future?

A Leap Towards Efficient Multithreading

The executor framework is super helpful, however, scenarios like heavy cache contention or the need for complex task dependencies can be problematic, as I stated above. Java’s Fork/Join Pool addresses these challenges with a performance-focused design and specialized algorithms. This dedicated implementation of ExecutorService delivers a more sophisticated thread-pooling mechanism, enhancing performance and resource efficiency through intelligent task scheduling. Let’s delve into…

Cache Affinity and Task Distribution

Cache affinity refers to the idea of completing tasks in a way that leverages the locality of the CPU cache. If a task is executed on a particular CPU, its data gets loaded into the cache of that core. If other related tasks are executed on the same core, those other tasks can benefit from the data already cached there, reducing the memory access time and, thus, improving performance.

The Fork/Join Pool exploits this by trying to execute tasks on the same CPU core where they were created. This is especially helpful if the tasks frequently access the same shared data. By maintaining cache affinity, the Fork/Join Pool minimizes cache misses, which occur when the data needed by a task is not present in the cache and must be fetched from the main memory—a significantly slower process.

Consider, for instance, a set of parallel tasks working on a large array. If these tasks are all processing different slices of that array, it pays to have them run on the same core; then, the data in those slices of an array can stay in the cache and be accessed quickly by the next task. Otherwise, we will have to reload this data into the cache continually. This reduces the overhead associated with repeatedly loading data into the cache.

Work-Stealing Algorithm

The Fork/Join Pool also employs a dynamic work-stealing algorithm to balance the load among threads efficiently. Instead of a single shared task queue, each thread in a Fork/Join Pool has its own queue. When a thread runs out of work, it “steals” tasks from the tail of another thread’s queue. This dynamic work-stealing algorithm maximizes CPU usage, ensuring that no thread sits idle while tasks are to be done.

This figure (Figure 1-3) above illustrate the work-strealing algorithm used in the Fork/Join Pool. When a thread completes its tasks, it seeks out tasks from the tail end of another thread’s queue. This method of dynamically redistributing tasks ensures that all threads remain productive, enhancing overall CPU utilization and minimizing idle time.

Example of Using Fork/Join Pool

The Fork/Join Pool simplifies the process of sending tasks to it, and it automatically distributes the tasks efficiently without requiring additional management. Here’s a basic example of how to use the Fork/Join Pool:

ForkJoinPool forkJoinPool = new ForkJoinPool();
forkJoinPool.submit(() -> {
    // your parallelized tasks here
}).join();

Notice how easy it is to submit tasks to a Fork/Join Pool. The framework automatically handles efficient work distribution, taking the burden of complex thread management off the developer.

Typically, the Fork/Join Pool is associated with the divide-and-conquer algorithm, using RecursiveAction and RecursiveTask classes to break tasks into smaller pieces for the Fork/Join Pool. However, in this context, we are focusing on the Fork/Join Pool itself, not the ‘divide and conquer’ strategies used to create these tasks.

I’ll discuss this in more detail in the following chapter.

Bringing Composability into Play with CompletableFuture

The Executor framework and the Fork/Join Pool offer performance improvements, but composing complex asynchronous workflows can still be cumbersome. To address this challenge, Java 8 introduced CompletableFuture, a class designed to streamline asynchronous programming by focusing on composability and ease of use.

Let’s look at how a composable and fluent API with CompletableFuture aligns with the earler example.

Composable and Fluent API

CompletableFuture transforms how we write asynchronous code, moving away from callback-heavy styles towards a declarative and functional approach. Its rich API empowers us to chain multiple asynchronous operations effortlessly, improving code readability and maintainability. Let’s revisit the previous example:

Credit calculateCreditWithCompletableFuture(Long personId) throws InterruptedException, ExecutionException {
   return runAsync(() -> importantWork())
           .thenCompose(aVoid -> supplyAsync(() -> getPerson(personId)))
           .thenCombineAsync(supplyAsync(() -> getAssets(getPerson(personId))),
                   (person, assets) -> calculateCredits(assets, getLiabilities(person)))
           .get();
}

This example demonstrates how CompletableFuture allows us to structure asynchronous workflows in a clear and concise manner. However, it is not without its disadvantages. Let’s explore the advantages and disadvantages of this API.

Advantages of Using CompletableFuture

CompletableFuture transforms how you structure asynchronous code with its fluent API. Instead of a giant mud creak full of nested callbacks or very large class trees sprinkled with task objects, we end up with workflows defined as a series of transformations and intermediate computations. Arguably, the readability and maintainability of your code increases, and especially for projects where asynchronous operations are a fundamental part of your application’s logic, you’re less likely to have bugs and maintenance nightmares behind when you go async. CompletableFuture builds on the foundations of the Fork/Join framework, a foundation that leverages optimized work-stealing algorithms, meaning your app will respond better under load and be better at responsibly managing work for a set of concurrently executing requests. Excellent non-silent error handling also means giving you greater control over failure points. Moreover, the flexibility to customize thread pools empowers you to fine-tune resource allocation for specific workloads. Finally, while a blocking get() might unfortunately be unavoidable, you can build long swaths of non-blocking sections within your operation chains.

Disadvantages and Limitations

Although CompletableFuture seems to be a powerful feature in Java, it requires you to shift your mindset if you are already used to the imperative programming style. I would go on to say that you may have to invest quite a bit to learn this rich API in order to make meaningful use of it. You have to figure out the best places or value propagation techniques for the various methods it offers. This would be a considerable investment of time and practice, and with that comes the possibility of failing to use it correctly. For example, the get() method is still a blocking method, which is necessary but can induce blocking in a flow that would otherwise undermine the use of asynchronous programming altogether if not used frugally. You also can have an issue in multichain CompletableFuture cases where you have more than one dependency and have to propagate and check the error in order to recover. This, if not designed correctly, can create a very strong nightmare to debug. Finally, debugging code written in CompletableFuture’s chained style poses another challenge in flow control. Besides, if you are used to your default debugging tools available in various IDEs, stepping through the code with it, setting a breakpoint at a particular spot, and seeing the context of a statement by taking a step backward in the code is going to be highly challenging for you. This is because of the inverted and ambiguous execution flow of asynchronous code– they don’t just get executed line by line.

Note

While the benefits of asynchronous programming—really doing it—and the benefits of having CompletableFuture, its rich feature in the language (along with other alternatives such as reactive stack, which we will discuss in a bit) seem evident to me, at the same time, we must ask this question: Are we ready for it? At the end of the day, it’s on me and my team to manage it. Do we have the experience to preserve our application’s engineering principles? This should not be taken lightly since asynchronous programming offers a considerable gain but also affects the architecture of any application.

A Different Paradigm for Asynchronous Programming

Although CompletableFuture provides powerful tools, changing one’s mindset is not enough to address the remaining limitations (or to reach the higher levels of performance achievable in some cases). Reactive Programming introduces a paradigm focused on data streams, asynchronous event processing, and non-blocking operations. Frameworks like RxJava, Akka, Eclipse Vert.x, Spring WebFlux and others implement this paradigm in Java with rich toolsets.

For instance, let’s reimagine our previous calculateCredit() example using Spring WebFlux:

public Mono<Credit> calculateCredit(Long personId) {
    Mono<Void> importantWorkMono = Mono.fromRunnable(() -> importantWork());
    Mono<Person> personMono = Mono.fromSupplier(() -> getPerson(personId));
    Mono<Assets> assetsMono = Mono.fromSupplier(() -> getAssets(getPerson(personId)));
    Mono<Liabilities> liabilitiesMono = Mono.fromSupplier(() -> getLiabilities(getPerson(personId)));
    return importantWorkMono.then(
            Mono.zip(personMono, assetsMono)
                .flatMap(tuple -> {
                    Person person = tuple.getT1();
                    Assets assets = tuple.getT2();
                    return Mono.just(calculateCredits(assets, getLiabilities(person)));
                })
    );
}

Looking at the code, it seems apparent that this is similar to the CompletableFuture approach. However, if you are unfamiliar with the reactive stack, this will indeed puzzle you, and you may need help comprehending it.

I will not explain reactive programming in this book, as this is not the goal of my book, but if you want, you can explore books written on reactive stack. Let’s discuss some of the limitations we encountered in the reactive programming approach just to make sense of why we need something different from that or why we need virtual threads, which is the goal of this book.

Drawbacks of Using Reactive Frameworks

As the saying goes, “there’s no such thing as a free lunch,” and the same applies to Reactive Programming. Let’s outline some of the key challenges it presents:

Steep Learning Curve: Gripping the fundamentals of the Reactive Programming paradigm requires a significant mental shift for developers who are already used to the imperative programming paradigm. Concepts like Observables, Observable Operators, Schedulers, and Backpressure, which are fundamental to reactive frameworks, require an investment of time and a potentially steep learning curve compared to other concurrency models.
Increased Cognitive Load: The functional programming style used in reactive frameworks can be too much for full-time developers with long tenures of Imperative Programming or Object-Oriented Programming under their belts. Along with the heavy usage of lambdas, higher-order functions and functional composition, chains of operators and transformations can make the code itself harder to grasp initially and, thus, more challenging to maintain (at least in the short term) among large projects or teams where not all the members happen to have a similar amount of Reactive Programming experience.
Debugging Difficulties: Traditional debugging tools and techniques don’t always work so well with reactive systems on the fly. For instance, chained operators lead to asynchronous execution where, if an error does arise, the stack trace might provide insufficient context to find the root of the problem. You might miss the event that last occurred through event hopping between threads and operators. Specialized debugging tools, possibly provided as part of your reactive framework, might be required. This adds to the learning curve. For example, suppose a reactive pipeline fetches data from four different sources, transforms the four streams, and then combines the results. An error arising from the combine phase might result in a stack trace that points to the combine operator and makes it difficult to tell whether a particular upstream source or transformation is the root cause of the problem. Tracking the data flow throughout the reactive pipeline becomes tricky in complex pipelines with dozens of operators, tens of asynchronous operations, and so on. In the traditional threading model, debugging is relatively straightforward and comes at little cost.
Overcomplication Risk: The composability possible through reactive frameworks’ operator-based models, while very powerful, comes with the possibility of creating more complex products than are absolutely necessary for a given business requirement or user interface element. Just as one might be able to cook a chicken curry using every dish in a family’s silver set, it is possible (and tempting) to solve every problem with something slightly more complicated than really necessary. As such, reactive frameworks face the creative risk of allowing inexperienced programmers to over-engineer their domain – a risk not unique to reactive frameworks themselves but one that inherits in any powerful abstraction or library when used by those who do not know when to stop.
Potential Mismatch: Reactive Programming frameworks are best suited for scenarios involving significant asynchronous operations, event-driven data flows, or high-throughput data streams. In other situations, where some subset of reactive systems’ or frameworks’ capabilities are not needed because they don’t have much asynchronicity or their data flow isn’t inherently event-oriented, or where the effort to learn, abstract and apply into their framework justifies the greater simplicity of a simpler synchronous or request-response approach.
Vendor Lock-In: The essential ideas of Reactive Programming are undoubtedly transferable, however, each of the significant reactive frameworks has its own set of nuanced APIs. Picking one of them over the other means you will be locked into a specific one, which naturally makes it harder to change libraries later, and potentially constrains flexibility somewhere down the line of the project’s lifecycle. This lock-in can also have implications, such as difficulty in finding developers familiar with the chosen framework, potential for framework stagnation or lack of continued support, and challenges in migrating to a different framework if needed.

Having mentioned all of these, there is absolutely no denying that Reactive Programming frameworks have the potential to simplify the management of sophisticated asynchronous scenarios enormously. However, we must examine their trade-offs in detail to understand if the benefits outweigh the costs for the project we’re working on.

Revolutionizing Concurrency in Java

Java’s traditional concurrency mechanisms have served us well, but as the application of concurrency has grown to more complex and additional use cases, various challenges have been presented along the way. That’s why it is essential to explore new solutions.

The Promise of Virtual Threads

Now that we understand the shortcomings of our traditional approaches, we need to find a better way to fix those problems. This is precisely where Project Loom, a key step towards first-class modern concurrency in Java, comes into play. It introduced a new kind of thread called virtual threads, ushering in a new era of concurrency in Java. These are very lightweight threads and can be created on demand with next-to-no overhead compared to what is traditionally associated with thread creation. The virtual threads can be instantiated in the millions, unlocking new possibilities for highly concurrent and scalable applications.

Let’s discuss some of the characteristics of virtual threads.

Seamless Integration with Existing Codebases

One of the key strengths of Project Loom lies in its seamless compatibility with existing Java codebases. For instance, If your application already uses the Executor framework, all you have to do to take advantage of virtual threads is pass an Executors.newVirtualThreadPerTaskExecutor() to your existing Executor service. This ease of integration ensures a smooth transition to the new concurrency model.

Virtual Threads and Platform Threads

Virtual threads ride on top of classical threads—also known as platform threads—which are also backed by the Fork/Join Thread Pool, so it inherits the benefits of that more sophisticated thread pool. Thus, virtual and platform threads combine perfectly to get the best resource utilization and scalability results.

Intelligent Handling of Blocking Operations

One of the most exciting aspects of virtual threads is their intelligent handling of blocking operations. When a virtual thread encounters into a blocking operation – such as a sleep or network Input/Output (I/O) – it automatically yields control back to the underlying platform thread. This allows the platform thread to continue executing other virtual threads, ensuring optimal utilization of available resources. Once the blocking operation is complete, the virtual thread can simply take up execution from where it left off, avoiding perceptible bottlenecks.

Benefits of Embracing Virtual Threads

Before diving into the nitty-gritty of virtual threads in the next chapter, let’s take a quick look at how virtual threads revolutionize the way we handle concurrency in Java. Here are some of the key advantages:

Resource Efficiency: Virtual threads are lightweight, and we can spawn millions without exhausting resources, permitting highly concurrent, scalable applications.
Code Simplicity: With the ability to write blocking code without performance penalties, developers can now write and maintain a more straightforward, the tradditional imperative coding style. That means no additional learning curve and easier to maintain code.
Optimal Utilization: During blocking operations, virtual threads relinquish control, ensuring that platform threads stay utilized, increasing CPU and application performance.

One of the biggest concerns for developers using concurrency in Java is the specific details they must learn to use it correctly. Project Loom represents a huge step forward for the concurrent style Java developers create and use; it will let them write highly concurrent and efficient applications at scale while still using the patterns they are familiar with already.

In Closing

In this book, we will continue discussing virtual threads and explore more about this revolutionary technology in the next chapter.

As you delve deeper into virtual thread concepts, consider how they can transform your approach to designing concurrent applications. Virtual threads offer not just a new way to handle concurrency but also a more efficient, scalable, and intuitive method that aligns with modern computing demands.

¹ Little’s Law is a key principle in queuing systems, including multi-threaded applications. It relates latency, concurrency, and throughput, which are crucial for computing performance. In the next chapter, we’ll discuss this in detail and demonstrate how higher concurrency leads to higher throughput.

² In most Linux environment this is the default size but it depends on Operating System and can be tweak.

³ L-caches are the tiered levels of super-fast memory (L1, L2, L3) within a CPU that store frequently used data for quick access.

Get Modern Concurrency in Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Chapter 1. Introduction

A Brief History of Threads in Java

Java Is Made of Threads

Figure 1-1. Execution of Threads by Different CPUs

Note

Threads: The Backbone of the Java Platform

Exceptions and Threads

Debugger and Threads

Profiler and Threads

The Genesis of Java 1.0 Threads

Starting Threads

Note

Understanding the Hidden Costs of Threads

Figure 1-2. ThreadPool Handling Servlet Requests

Note

How Many Threads Can You Create?

Resource Efficiency in High-Scale Applications

The Parallel Execution Strategy

Introducing Executor Framework

Remaining Challenges

A Leap Towards Efficient Multithreading

Cache Affinity and Task Distribution

Work-Stealing Algorithm

Figure 1-3. Work-Stealing in Fork/Join Pool

Example of Using Fork/Join Pool

Bringing Composability into Play with CompletableFuture

Composable and Fluent API

Advantages of Using CompletableFuture

Disadvantages and Limitations

Note

A Different Paradigm for Asynchronous Programming

Drawbacks of Using Reactive Frameworks

Revolutionizing Concurrency in Java

The Promise of Virtual Threads

Seamless Integration with Existing Codebases

Virtual Threads and Platform Threads

Intelligent Handling of Blocking Operations

Benefits of Embracing Virtual Threads

In Closing

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly