Chapter 1. Introduction

This is a book about the art and science of Java performance.

The science part of this statement isn’t surprising; discussions about performance include lots of numbers and measurements and analytics. Most performance engineers have a background in the sciences, and applying scientific rigor is a crucial part of achieving maximum performance.

What about the art part? The notion that performance tuning is part art and part science is hardly new, but it is rarely given explicit acknowledgment in performance discussions. This is partly because the idea of “art” goes against our training. But what looks like art to some people is fundamentally based on deep knowledge and experience. It is said that magic is indistinguishable from sufficiently advanced technologies, and certainly it is true that a cell phone would look magical to a knight of the Round Table. Similarly, the work produced by a good performance engineer may look like art, but that art is really an application of deep knowledge, experience, and intuition.

This book cannot help with the experience and intuition part of that equation, but it can provide the deep knowledge—with the view that applying knowledge over time will help you develop the skills needed to be a good Java performance engineer. The goal is to give you an in-depth understanding of the performance aspects of the Java platform.

This knowledge falls into two broad categories. First is the performance of the Java Virtual Machine (JVM) itself: the way that the JVM is configured affects many aspects of a program’s performance. Developers who are experienced in other languages may find the need for tuning to be somewhat irksome, though in reality tuning the JVM is completely analogous to testing and choosing compiler flags during compilation for C++ programmers, or to setting appropriate variables in a php.ini file for PHP coders, and so on.

The second aspect is to understand how the features of the Java platform affect performance. Note the use of the word platform here: some features (e.g., threading and synchronization) are part of the language, and some features (e.g., string handling) are part of the standard Java API. Though important distinctions exist between the Java language and the Java API, in this case they will be treated similarly. This book covers both facets of the platform.

The performance of the JVM is based largely on tuning flags, while the performance of the platform is determined more by using best practices within your application code. For a long time, these were considered separate areas of expertise: developers code, and the performance group tests and recommends fixes for performance issues. That was never a particularly useful distinction—anyone who works with Java should be equally adept at understanding how code behaves in the JVM and what kinds of tuning are likely to help its performance. As projects move to a devops model, this distinction is starting to become less strict. Knowledge of the complete sphere is what will give your work the patina of art.

A Brief Outline

First things first, though: Chapter 2 discusses general methodologies for testing Java applications, including pitfalls of Java benchmarking. Since performance analysis requires visibility into what the application is doing, Chapter 3 provides an overview of some of the tools available to monitor Java applications.

Then it is time to dive into performance, focusing first on common tuning aspects: just-in-time compilation (Chapter 4) and garbage collection (Chapter 5 and Chapter 6). The remaining chapters focus on best-practice uses of various parts of the Java platform: memory use with the Java heap (Chapter 7), native memory use (Chapter 8), thread performance (Chapter 9), Java server technology (Chapter 10), database access, (Chapter 11), and general Java SE API tips (Chapter 12).

Appendix A lists all the tuning flags discussed in this book, with cross-references to the chapter where they are examined.

Platforms and Conventions

While this book is about the performance of Java, that performance will be influenced by a few factors: the version of Java itself, of course, as well as the hardware and software platforms it is running on.

Java Platforms

This book covers the performance of the Oracle HotSpot Java Virtual Machine (JVM) and the Java Development Kit (JDK), versions 8 and 11. This is also known as Java, Standard Edition (SE). The Java Runtime Environment (JRE) is a subset of the JDK containing only the JVM, but since the tools in the JDK are important for performance analysis, the JDK is the focus of this book. As a practical matter, that means it also covers platforms derived from the OpenJDK repository of that technology, which includes the JVMs released from the AdoptOpenJDK project. Strictly speaking, the Oracle binaries require a license for production use, and the AdoptOpenJdK binaries come with an open source license. For our purposes, we’ll consider the two versions to be the same thing, which we’ll refer to as the JDK or the Java platform.1

These releases have gone through various bug fix releases. As I write this, the current version of Java 8 is jdk8u222 (version 222), and the current version of Java 11 is 11.0.5. It is important to use at least these versions (if not later), particularly in the case of Java 8. Early releases of Java 8 (through about jdk8u60) do not contain many of the important performance enhancements and features discussed throughout this book (particularly so with regard to garbage collection and the G1 garbage collector).

These versions of the JDK were selected because they carry long-term support (LTS) from Oracle. The Java community is free to develop their own support models but so far have followed the Oracle model. So these releases will be supported and available for quite some time: through at least 2023 for Java 8 (via AdoptOpenJDK; later via extended Oracle support contracts), and through at least 2022 for Java 11. The next long-term release is expected to be in late 2021.

For the interim releases, the discussion of Java 11 obviously includes features that were first made available in Java 9 or Java 10, even though those releases are unsupported both by Oracle and by the community at large. In fact, I’m somewhat imprecise when discussing such features; it may seem that I’m saying features X and Y were originally included in Java 11 when they may have been available in Java 9 or 10. Java 11 is the first LTS release that carries those features, and that’s the important part: since Java 9 and 10 aren’t in use, it doesn’t really matter when the feature first appeared. Similarly, although Java 13 will be out at the time of this book’s release, there isn’t a lot of coverage of Java 12 or Java 13. You can use those releases in production, but only for six months, after which you’ll need to upgrade to a new release (so by the time you’re reading this, Java 12 is no longer supported, and if Java 13 is supported, it will be soon replaced by Java 14). We’ll peek into a few features of these interim releases, but since those releases are not likely to be put into production in most environments, the focus remains on Java 8 and 11.

Other implementations of the Java Language specification are available, including forks of the open source implementation. AdoptOpenJDK supplies one of these (Eclipse OpenJ9), and others are available from other vendors. Although all these platforms must pass a compatibility test in order to be able to use the Java name, that compatibility does not always extend to the topics discussed in this book. This is particularly true of tuning flags. All JVM implementations have one or more garbage collectors, but the flags to tune each vendor’s GC implementation are product-specific. Thus, while the concepts of this book apply to any Java implementation, the specific flags and recommendations apply only to the HotSpot JVM.

That caveat is applicable to earlier releases of the HotSpot JVM—flags and their default values change from release to release. The flags discussed here are valid for Java 8 (specifically, version 222) and 11 (specifically, 11.0.5). Later releases could slightly change some of this information. Always consult the release notes for important changes.

At an API level, different JVM implementations are much more compatible, though even then subtle differences might exist between the way a particular class is implemented in the Oracle HotSpot Java platform and an alternate platform. The classes must be functionally equivalent, but the actual implementation may change. Fortunately, that is infrequent, and unlikely to drastically affect performance.

For the remainder of this book, the terms Java and JVM should be understood to refer specifically to the Oracle HotSpot implementation. Strictly speaking, saying “The JVM does not compile code upon first execution” is wrong; some Java implementations do compile code the first time it is executed. But that shorthand is much easier than continuing to write (and read), “The Oracle HotSpot JVM…”

JVM tuning flags

With a few exceptions, the JVM accepts two kinds of flags: boolean flags, and flags that require a parameter.

Boolean flags use this syntax: -XX:+FlagName enables the flag, and -XX:-FlagName disables the flag.

Flags that require a parameter use this syntax: -XX:FlagName=something, meaning to set the value of FlagName to something. In the text, the value of the flag is usually rendered with something indicating an arbitrary value. For example, -XX:NewRatio=N means that the NewRatio flag can be set to an arbitrary value N (where the implications of N are the focus of the discussion).

The default value of each flag is discussed as the flag is introduced. That default is often based on a combination of factors: the platform on which the JVM is running and other command-line arguments to the JVM. When in doubt, “Basic VM Information” shows how to use the -XX:+PrintFlagsFinal flag (by default, false) to determine the default value for a particular flag in a particular environment, given a particular command line. The process of automatically tuning flags based on the environment is called ergonomics.

The JVM that is downloaded from Oracle and AdoptOpenJDK sites is called the product build of the JVM. When the JVM is built from source code, many builds can be produced: debug builds, developer builds, and so on. These builds often have additional functionality. In particular, developer builds include an even larger set of tuning flags so that developers can experiment with the most minute operations of various algorithms used by the JVM. Those flags are generally not considered in this book.

Hardware Platforms

When the first edition of this book was published, the hardware landscape looked different than it does today. Multicore machines were popular, but 32-bit platforms and single-CPU platforms were still very much in use. Other platforms in use today—virtual machines and software containers—were coming into their own. Here’s an overview of how those platforms affect the topics of this book.

Multicore hardware

Virtually all machines today have multiple cores of execution, which appear to the JVM (and to any other program) as multiple CPUs. Typically, each core is enabled for hyper-threading. Hyper-threading is the term that Intel prefers, though AMD (and others) use the term simultaneous multithreading, and some chip manufactures refer to hardware strands within a core. These are all the same thing, and we’ll refer to this technology as hyper-threading.

From a performance perspective, the important thing about a machine is its number of cores. Let’s take a basic four-core machine: each core can (for the most part) process independently of the others, so a machine with four cores can achieve four times the throughput of a machine with a single core. (This depends on other factors about the software, of course.)

In most cases, each core will contain two hardware or hyper-threads. These threads are not independent of each other: the core can run only one of them at a time. Often, the thread will stall: it will, for example, need to load a value from main memory, and that process can take a few cycles. In a core with a single thread, the thread stalls at that point, and those CPU cycles are wasted. In a core with two threads, the core can switch and execute instructions from the other thread.

So our four-core machine with hyper-threading enabled appears as if it can execute instructions from eight threads at once (even though, technically, it can execute only four instructions per CPU cycle). To the operating system—and hence to Java and other applications—the machine appears to have eight CPUs. But all of those CPUs are not equal from a performance perspective. If we run one CPU-bound task, it will use one core; a second CPU-bound task will use a second core; and so on up to four: we can run four independent CPU-bound tasks and get our fourfold increase in throughput.

If we add a fifth task, it will be able to run only when one of the other tasks stalls, which on average turns out to happen between 20% to 40% of the time. Each additional task faces the same challenge. So adding a fifth task adds only about 30% more performance; in the end, the eight CPUs will give us about five to six times the performance of a single core (without hyper-threading).

You’ll see this example in a few sections. Garbage collection is very much a CPU-bound task, so Chapter 5 shows how hyper-threading affects the parallelization of garbage collection algorithms. Chapter 9 discusses in general how to exploit Java’s threading facilities to best effect, so you’ll see an example of the scaling of hyper-threaded cores there as well.

Software containers

The biggest change in Java deployments in recent years is that they are now frequently deployed within a software container. That change is not limited to Java, of course; it’s an industry trend hastened by the move to cloud computing.

Two containers here are important. First is the virtual machine, which sets up a completely isolated copy of the operating system on a subset of the hardware on which the virtual machine is running. This is the basis of cloud computing: your cloud computing vendor has a data center with very large machines. These machines have potentially 128 cores, though they are likely smaller because of cost efficiencies. From the perspective of the virtual machine, that doesn’t really matter: the virtual machine is given access to a subset of that hardware. Hence, a given virtual machine may have two cores (and four CPUs, since they are usually hyper-threaded) and 16 GB of memory.

From Java’s perspective (and the perspective of other applications), that virtual machine is indistinguishable from a regular machine with two cores and 16 GB of memory. For tuning and performance purposes, you need only consider it in the same way.

The second container of note is the Docker container. A Java process running inside a Docker container doesn’t necessarily know it is in such a container (though it could figure it out by inspection), but the Docker container is just a process (potentially with resource constraints) within a running OS. As such, its isolation from other processes’ CPU and memory usage is somewhat different. As you’ll see, the way Java handles that differs between early versions of Java 8 (up until update 192) and later version of Java 8 (and all versions of Java 11).

By default, a Docker container is free to use all of the machine’s resources: it can use all the available CPUs and all the available memory on the machine. That’s fine if we want to use Docker merely to streamline deployment of our single application on the machine (and hence the machine will run only that Docker container). But frequently we want to deploy multiple Docker containers on a machine and restrict the resources of each container. In effect, given our four-core machine with 16 GB of memory, we might want to run two Docker containers, each with access to only two cores and 8 GB of memory.

Configuring Docker to do that is simple enough, but complications can arise at the Java level. Numerous Java resources are configured automatically (or ergonomically) based on the size of the machine running the JVM. This includes the default heap size and the number of threads used by the garbage collector, explained in detail in Chapter 5, and some thread pool settings, mentioned in Chapter 9.

If you are running a recent version of Java 8 (update version 192 or later) or Java 11, the JVM handles this as you would hope: if you limit the Docker container to use only two cores, the values set ergonomically based on the CPU count of the machine will be based on the limit of the Docker container.2 Similarly, heap and other settings that by default are based on the amount of memory on a machine are based on any memory limit given to the Docker container.

In earlier versions of Java 8, the JVM has no knowledge of any limits that the container will enforce: when it inspects the environment to find out how much memory is available so it can calculate its default heap size, it will see all the memory on the machine (instead of, as we would prefer, the amount of memory the Docker container is allowed to use). Similarly, when it checks how many CPUs are available to tune the garbage collector, it will see all the CPUs on the machine, rather than the number of CPUs assigned to the Docker container. As a result, the JVM will run suboptimally: it will start too many threads and will set up too large a heap. Having too many threads will lead to some performance degradation, but the real issue here is the memory: the maximum size of the heap will potentially be larger than the memory assigned to the Docker container. When the heap grows to that size, the Docker container (and hence the JVM) will be killed.

In early Java 8 versions, you can set the appropriate values for the memory and CPU usage by hand. As we come across those tunings, I’ll point out the ones that will need to be adjusted for this situation, but it is better simply to upgrade to a later Java 8 version (or Java 11).

Docker containers provide one additional challenge to Java: Java comes with a rich set of tools for diagnosing performance issues. These are often not available in a Docker container. We’ll look at that issue a little more in Chapter 3.

The Complete Performance Story

This book is focused on how to best use the JVM and Java platform APIs so that programs run faster, but many outside influences affect performance. Those influences pop up from time to time in the discussion, but because they are not specific to Java, they are not necessarily discussed in detail. The performance of the JVM and the Java platform is a small part of getting to fast performance.

This section introduces the outside influences that are at least as important as the Java tuning topics covered in this book. The Java knowledge-based approach of this book complements these influences, but many of them are beyond the scope of what we’ll discuss.

Write Better Algorithms

Many details about Java affect the performance of an application, and a lot of tuning flags are discussed. But there is no magical -XX:+RunReallyFast option.

Ultimately, the performance of an application is based on how well it is written. If the program loops through all elements in an array, the JVM will optimize the way it performs bounds checking of the array so that the loop runs faster, and it may unroll the loop operations to provide an additional speedup. But if the purpose of the loop is to find a specific item, no optimization in the world is going to make the array-based code as fast as a different version that uses a hash map.

A good algorithm is the most important thing when it comes to fast performance.

Write Less Code

Some of us write programs for money, some for fun, some to give back to a community, but all of us write programs (or work on teams that write programs). It is hard to feel like you’re making a contribution to a project by pruning code, and some managers still evaluate developers by the amount of code they write.

I get that, but the conflict here is that a small well-written program will run faster than a large well-written program. This is generally true for all computer programs, and it applies specifically to Java programs. The more code that has to be compiled, the longer it will take until that code runs quickly. The more objects that have to be allocated and discarded, the more work the garbage collector has to do. The more objects that are allocated and retained, the longer a GC cycle will take. The more classes that have to be loaded from disk into the JVM, the longer it will take for a program to start. The more code that is executed, the less likely that it will fit in the hardware caches on the machine. And the more code that has to be executed, the longer that execution will take.

I think of this as the “death by 1,000 cuts” principle. Developers will argue that they are just adding a very small feature and it will take no time at all (especially if the feature isn’t used). And then other developers on the same project make the same claim, and suddenly the performance has regressed by a few percent. The cycle is repeated in the next release, and now program performance has regressed by 10%. A couple of times during the process, performance testing may hit a certain resource threshold—a critical point in memory use, a code cache overflow, or something like that. In those cases, regular performance tests will catch that particular condition, and the performance team can fix what appears to be a major regression. But over time, as the small regressions creep in, it will be harder and harder to fix them.

I’m not advocating that you should never add a new feature or new code to your product; clearly benefits result from enhancing programs. But be aware of the trade-offs you are making, and when you can, streamline.

Oh, Go Ahead, Prematurely Optimize

Donald Knuth is widely credited with coining the term premature optimization, which is often used by developers to claim that the performance of their code doesn’t matter, and if it does matter, we won’t know that until the code is run. The full quote, if you’ve never come across it, is “We should forget about small efficiencies, say about 97% of the time; premature optimization is the root of all evil.”3

The point of this dictum is that in the end, you should write clean, straightforward code that is simple to read and understand. In this context, optimizing is understood to mean employing algorithmic and design changes that complicate program structure but provide better performance. Those kinds of optimizations indeed are best left undone until such time as the profiling of a program shows that a large benefit is gained from performing them.

What optimization does not mean in this context, however, is avoiding code constructs that are known to be bad for performance. Every line of code involves a choice, and if you have a choice between two simple, straightforward ways of programming, choose the better-performing one.

At one level, this is well understood by experienced Java developers (it is an example of their art, as they have learned it over time). Consider this code:

log.log(Level.FINE, "I am here, and the value of X is "
        + calcX() + " and Y is " + calcY());

This code does a string concatenation that is likely unnecessary, since the message won’t be logged unless the logging level is set quite high. If the message isn’t printed, unnecessary calls are also made to the calcX() and calcY() methods. Experienced Java developers will reflexively reject that; some IDEs will even flag the code and suggest it be changed. (Tools aren’t perfect, though: the NetBeans IDE will flag the string concatenation, but the suggested improvement retains the unneeded method calls.)

This logging code is better written like this:

if (log.isLoggable(Level.FINE)) {
    log.log(Level.FINE,
            "I am here, and the value of X is {} and Y is {}",
            new Object[]{calcX(), calcY()});
}

This avoids the string concatenation altogether (the message format isn’t necessarily more efficient, but it is cleaner), and there are no method calls or allocation of the object array unless logging has been enabled.

Writing code in this way is still clean and easy to read; it took no more effort than writing the original code. Well, OK, it required a few more keystrokes and an extra line of logic. But it isn’t the type of premature optimization that should be avoided; it’s the kind of choice that good coders learn to make.

Don’t let out-of-context dogma from pioneering heroes prevent you from thinking about the code you are writing. You’ll see other examples of this throughout this book, including in Chapter 9, which discusses the performance of a benign-looking loop construct to process a vector of objects.

Look Elsewhere: The Database Is Always the Bottleneck

If you are developing standalone Java applications that use no external resources, the performance of that application is (mostly) all that matters. Once an external resource (a database, for example) is added, the performance of both programs is important. And in a distributed environment—say with a Java REST server, a load balancer, a database, and a backend enterprise information system—the performance of the Java server may be the least of the performance issues.

This is not a book about holistic system performance. In such an environment, a structured approach must be taken toward all aspects of the system. CPU usage, I/O latencies, and throughput of all parts of the system must be measured and analyzed; only then can we determine which component is causing the performance bottleneck. Excellent resources are available on that subject, and those approaches and tools are not specific to Java. I assume you’ve done that analysis and determined that it is the Java component of your environment that needs to be improved.

On the other hand, don’t overlook that initial analysis. If the database is the bottleneck (and here’s a hint: it is), tuning the Java application accessing the database won’t help overall performance at all. In fact, it might be counterproductive. As a general rule, when load is increased into a system that is overburdened, performance of that system gets worse. If something is changed in the Java application that makes it more efficient—which only increases the load on an already overloaded database—overall performance may actually go down. The danger is then reaching the incorrect conclusion that the particular JVM improvement shouldn’t be used.

This principle—that increasing load to a component in a system that is performing badly will make the entire system slower—isn’t confined to a database. It applies when load is added to a server that is CPU-bound or if more threads start accessing a lock that already has threads waiting for it or any number of other scenarios. An extreme example of this that involves only the JVM is shown in Chapter 9.

Optimize for the Common Case

It is tempting—particularly given the “death by 1,000 cuts” syndrome—to treat all performance aspects as equally important. But we should focus on the common use case scenarios. This principle manifests itself in several ways:

  • Optimize code by profiling it and focusing on the operations in the profile taking the most time. Note, however, that this does not mean looking at only the leaf methods in a profile (see Chapter 3).

  • Apply Occam’s razor to diagnosing performance problems. The simplest explanation for a performance issue is the most conceivable cause: a performance bug in new code is more likely than a configuration issue on a machine, which in turn is more likely than a bug in the JVM or operating system. Obscure OS or JVM bugs do exist, and as more credible causes for a performance issue are ruled out, it does become possible that somehow the test case in question has triggered such a latent bug. But don’t jump to the unlikely case first.

  • Write simple algorithms for the most common operations in an application. Say a program estimates a mathematical formula, and the user can choose whether to get an answer within a 10% margin of error or a 1% margin. If most users will be satisfied with the 10% margin, optimize that code path—even if it means slowing down the code that provides the 1% margin of error.

Summary

Java has features and tools that make it possible to get the best performance from a Java application. This book will help you understand how best to use all the features of the JVM in order to end up with fast-running programs.

In many cases, though, remember that the JVM is a small part of the overall performance picture. A systemic approach to performance is required in Java environments where the performance of databases and other backend systems is at least as important as the performance of the JVM. That level of performance analysis is not the focus of this book—it is assumed due diligence has been performed to make sure that the Java component of the environment is the important bottleneck in the system.

However, the interaction between the JVM and other areas of the system is equally important—whether that interaction is direct (e.g., the best way to make database calls) or indirect (e.g., optimizing native memory usage of an application that shares a machine with several components of a large system). The information in this book should help solve performance issues along those lines as well.

1 Rarely, differences between the two exist; for example, the AdoptOpenJDK versions of Java contain new garbage collectors in JDK 11. I’ll point out those differences when they occur.

2 You can specify fractional values for CPU limits in Docker. Java rounds up all fractional values to the next highest integer.

3 There is some dispute over who said this originally, Donald Knuth or Topy Hoare, but it appears in an article by Knuth entitled “Structured Programming with goto Statements.” And in context, it is an argument for optimizing code, even if it requires inelegant solutions like a goto statement.

Get Java Performance, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.