Java Performance: The Definitive Guide

Chapter 1. Introduction

This is a book about the art and science of Java performance.

The science part of this statement isn’t surprising; discussions about performance include lots of numbers and measurements and analytics. Most performance engineers have a background in the sciences, and applying scientific rigor is a crucial part of achieving maximum performance.

What about the art part? The notion that performance tuning is part art and part science is hardly new, but it is rarely given explicit acknowledgment in performance discussions. This is partly because the idea of “art” goes against our training.

Part of the reason is that what looks like art to some people is fundamentally based on deep knowledge and experience. It is said that magic is indistinguishable from sufficiently advanced technologies, and certainly it is true that a cell phone would look magical to a knight of the Round Table. Similarly, the work produced by a good performance engineer may look like art, but that art is really an application of deep knowledge, experience, and intuition.

This book cannot help with the experience and intuition part of that equation, but its goal is to help with the deep knowledge—with the view that applying knowledge over time will help you develop the skills needed to be a good Java performance engineer. The goal is to give you an in-depth understanding of the performance aspects of the Java platform.

This knowledge falls into two broad categories. First is the performance of the Java Virtual Machine (JVM) itself: the way in which the JVM is configured affects many aspects of the performance of a program. Developers who are experienced in other languages may find the need for tuning to be somewhat irksome, though in reality tuning the JVM is completely analogous to testing and choosing compiler flags during compilation for C++ programmers, or to setting appropriate variables in a php.ini file for PHP coders, and so on.

The second aspect is to understand how the features of the Java platform affect performance. Note the use of the word platform here: some features (e.g., threading and synchronization) are part of the language, and some features (e.g., XML parsing performance) are part of the standard Java API. Though there are important distinctions between the Java language and the Java API, in this case they will be treated similarly. This book covers both facets of the platform.

The performance of the JVM is based largely on tuning flags, while the performance of the platform is determined more by using best practices within your application code. In an environment where developers code and a performance group tests, these are often considered separate areas of expertise: only performance engineers can tune the JVM to eke out every last bit of performance, and only developers worry about whether their code is written well. That is not a useful distinction—anyone who works with Java should be equally adept at understanding how code behaves in the JVM and what kinds of tuning is likely to help its performance. Knowledge of the complete sphere is what will give your work the patina of art.

A Brief Outline

First things first, though: Chapter 2 discusses general methodologies for testing Java applications, including pitfalls of Java benchmarking. Since performance analysis requires visibility into what the application is doing, Chapter 3 provides an overview of some of the tools available to monitor Java applications.

Then it is time to dive into performance, focusing first on common tuning aspects: just-in-time compilation (Chapter 4) and garbage collection (Chapter 5 and Chapter 6). The remaining chapters focus on best practice uses of various parts of the Java platform: memory use with the Java heap (Chapter 7), native memory use (Chapter 8), thread performance (Chapter 9), Java Enterprise Edition APIs (Chapter 10), JPA and JDBC (Chapter 11), and some general Java SE API tips (Chapter 12).

Appendix A lists all the tuning flags discussed in this book, with cross-references to the chapter where they are examined.

Platforms and Conventions

This book is based on the Oracle HotSpot Java Virtual Machine and the Java Platform, Standard Edition (Java SE), versions 7 and 8. Within versions, Oracle provides update releases periodically. For the most part, update releases provide only bug fixes; they never provide new language features or changes to key functionality. However, update releases do sometimes change the default value of tuning flags. Oracle will doubtless provide update releases that postdate publication of this book, which is current as of Java 7 update 40 and Java 8 (as of yet, there are no Java 8 update releases). When an update release provides an important change to JVM behavior, the update release is specified like this: 7u6 (Java 7 update 6).

Sections on Java Enterprise Edition (Java EE) are based on Java EE 7.

This book does not address the performance of previous releases of Java, though of course the current versions of Java build on those releases. Java 7 is a good starting point for a book on performance because it introduces a number of new performance features and optimizations. Chief among these is a new garbage collection (GC) algorithm called G1. (Earlier versions of Java had experimental versions of G1, but it was not considered production-ready until 7u4.) Java 7 also includes a number of new and enhanced performance-related tools to provide vastly increased visibility into the workings of a Java application. That progress in the platform is continued in Java 8, which further enhances the platform (e.g., by introducing lambda expressions). Java 8 offers a big performance advantage in its own right—the performance of Java 8 itself is much faster than Java 7 in several key areas.

There are other implementations of the Java Virtual Machine. Oracle has its JRockit JVM (which supports Java SE 6); IBM offers its own compatible Java implementation (including a Java 7 version). Many other companies license and enhance Oracle’s Java technology.

Java and the JVM are open source; anyone may participate in the development of Java by joining the project at http://openjdk.java.net. Even if you don’t want to actively participate in development, source code can be freely downloaded from that site. For the most part, everything discussed in this book is part of the open source version of Java.

Oracle also has a commercial version of Java, which is available via a support contract. That is based on the standard, open source Java platform, but it contains a few features that are not in the open source version. One feature of the commercial JVM that is important to performance work is Java Flight Recorder (see Java Flight Recorder).

Unless otherwise mentioned, all information in this book applies to the open source version of Java.

Although all these platforms must pass a compatibility test in order to be able to use the Java name, that compatibility does not always extend to the topics discussed in this book. This is particularly true of tuning flags. All JVM implementations have one or more garbage collectors, but the flags to tune each vendor’s GC implementation are product-specific. Thus, while the concepts of this book apply to any Java implementation, the specific flags and recommendations apply only to Oracle’s standard (HotSpot-based) JVM.

That caveat is applicable to earlier releases of the HotSpot JVM—flags and their default values change from release to release. Rather than attempting to be comprehensive and cover a variety of now-outdated versions, the information in this book covers only Java 7 (up through 7u40) and Java 8 (the initial release only) JVMs. It is possible that later releases (e.g., a hypothetical 7u60) may slightly change some of this information. Always consult the release notes for important changes.

At an API level, different JVM implementations are much more compatible, though even then there might be subtle differences between the way a particular class is implemented in the Oracle HotSpot Java SE (or EE) platform and an alternate platform. The classes must be functionally equivalent, but the actual implementation may change. Fortunately, that is infrequent, and unlikely to drastically affect performance.

For the remainder of this book, the terms Java and JVM should be understood to refer specifically to the Oracle HotSpot implementation. Strictly speaking, saying “The JVM does not compile code upon first execution” is wrong; there are Java implementations that do compile code the first time it is executed. But that shorthand is much easier than continuing to write (and read) “The Oracle HotSpot JVM…”

JVM Tuning Flags

With a few exceptions, the JVM accepts two kinds of flags: boolean flags, and flags that require a parameter.

Boolean flags use this syntax: -XX:+FlagName enables the flag, and -XX:-FlagName disables the flag.

Flags that require a parameter use this syntax: -XX:FlagName=something, meaning to set the value of FlagName to something. In the text, the value of the flag is usually rendered with something indicating an arbitrary value. For example, -XX:NewRatio=N means that the NewRatio flag can be set to some arbitrary value N (where the implications of N are the focus of the discussion).

The default value of each flag is discussed as the flag is introduced. That default is often a combination of different factors: the platform on which the JVM is running and other command-line arguments to the JVM. When in doubt, Basic VM Information shows how to use the -XX:+PrintFlagsFinal flag (by default, false) to determine the default value for a particular flag in a particular environment given a particular command line. The process of automatically tuning flags based on the environment is called ergonomics.

Java ergonomics is based on the notion that some machines are “client” class and some are “server” class. While those terms map directly to the compiler used for a particular platform (see Chapter 4), they apply to other default tunings as well. For example, the default garbage collector for a platform is determined by the class of a machine (see Chapter 5).

Client-class machines are any 32-bit JVM running on Microsoft Windows (regardless of the number of CPUs on the machine), and any 32-bit JVM running on a machine with one CPU (regardless of the operating system). All other machines (including all 64-bit JVMs) are considered server class.

The JVM that is downloaded from Oracle and OpenJDK sites is called the “product” build of the JVM. When the JVM is built from source code, there are many different builds that can be produced: debug builds, developer builds, and so on. These builds often have additional functionality in them. In particular, developer builds include an even larger set of tuning flags so that developers can experiment with the most minute operations of various algorithms used by the JVM. Those flags are generally not considered in this book.

The Complete Performance Story

This book is focused on how to best use the JVM and Java platform APIs so that programs run faster, but there are many outside influences that affect performance. Those influences pop up from time to time in the discussion, but because they are not specific to Java, they are not necessarily discussed in detail. The performance of the JVM and the Java platform is a small part of getting to fast performance.

Here are some of the outside influences that are at least as important as the Java tuning topics covered in this book. The Java knowledge-based approach of this book complements these influences, but many of them are beyond the scope of what we’ll discuss.

Write Better Algorithms

There are a lot of details about Java that affect the performance of an application, and a lot of tuning flags are discussed. But there is no magical -XX:+RunReallyFast option.

Ultimately, the performance of an application is based on how well it is written. If the program loops through all elements in an array, the JVM will optimize the array bounds-checking so that the loop runs faster, and it may unroll the loop operations to provide an additional speedup. But if the purpose of the loop is to find a specific item, no optimization in the world is going to make the array-based code as fast as a different version that uses a HashMap.

A good algorithm is the most important thing when it comes to fast performance.

Write Less Code

Some of us write programs for money, some for fun, some to give back to a community, but all of us write programs (or work on teams that write programs). It is hard to feel like a contribution to the project is being made by pruning code, and there are still those managers who evaluate developers by the amount of code they write.

I get that, but the conflict here is that a small well-written program will run faster than a large well-written program. This is true in general of all computer programs, and it applies specifically to Java programs. The more code that has to be compiled, the longer it will take until that code runs quickly. The more objects that have to be allocated and discarded, the more work the garbage collector has to do. The more objects that are allocated and retained, the longer a GC cycle will take. The more classes that have to be loaded from disk into the JVM, the longer it will take for a program to start. The more code that is executed, the less likely that it will fit in the hardware caches on the machine. And the more code that has to be executed, the longer it will take.

One aspect of performance that can be counterintuitive (and depressing) is that the performance of every application can be expected to decrease over time—meaning over new release cycles of the application. Often, that performance difference is not noticed, since hardware improvements make it possible to run the new programs at acceptable speeds.

Think what it would be like to run the Windows Aero interface on the same computer that used to run Windows 95. My favorite computer ever was a Mac Quadra 950, but it couldn’t run Mac OS X (and it if did, it would be so very, very slow compared to Mac OS 7.5). On a smaller level, it may seem that Firefox 23.0 is faster than Firefox 22.0, but those are essentially minor release versions. With its tabbed browsing and synced scrolling and security features, Firefox is far more powerful than Mosaic ever was, but Mosaic can load basic HTML files located on my hard disk about 50% faster than Firefox 23.0.

Of course, Mosaic cannot load actual URLs from almost any popular website; it is no longer possible to use Mosaic as a primary browser. That is also part of the general point here: particularly between minor releases, code may be optimized and run faster. As performance engineers, that’s what we can focus on, and if we are good at our job, we can win the battle. That is a good and valuable thing; my argument isn’t that we shouldn’t work to improve the performance of existing applications.

But the irony remains: as new features are added and new standards adopted—which is a requirement to match competing programs—programs can be expected to get larger and slower.

I think of this as the “death by 1,000 cuts” principle. Developers will argue that they are just adding a very small feature and it will take no time at all (especially if the feature isn’t used). And then other developers on the same project make the same claim, and suddenly the performance has regressed by a few percent. The cycle is repeated in the next release, and now program performance has regressed by 10%. A couple of times during the process, performance testing may hit some resource threshold—a critical point in memory use, or a code cache overflow, or something like that. In those cases, regular performance tests will catch that particular condition and the performance team can fix what appears to be a major regression. But over time, as the small regressions creep in, it will be harder and harder to fix them.

I’m not advocating here that you should never add a new feature or new code to your product; clearly there are benefits as programs are enhanced. But be aware of the trade-offs you are making, and when you can, streamline.

Oh Go Ahead, Prematurely Optimize

Donald Knuth is widely credited with coining the term “premature optimization,” which is often used by developers to claim that the performance of their code doesn’t matter, and if it does matter, we won’t know that until the code is run. The full quote, if you’ve never come across it, is “We should forget about small efficiencies, say about 97% of the time; premature optimization is the root of all evil.”

The point of this dictum is that in the end, you should write clean, straightforward code that is simple to read and understand. In this context, “optimizing” is understood to mean employing algorithmic and design changes that complicate program structure but provide better performance. Those kind of optimizations indeed are best left undone until such time as the profiling of a program shows that there is a large benefit from performing them.

What optimization does not mean in this context, however, is avoiding code constructs that are known to be bad for performance. Every line of code involves a choice, and if there is a choice between two simple, straightforward ways of programming, choose the more performant one.

At one level, this is well understood by experienced Java developers (it is an example of their art, as they have learned it over time). Consider this code:

log.log(Level.FINE, "I am here, and the value of X is "
        + calcX() + " and Y is " + calcY());

This code does a string concatenation that is likely unnecessary, since the message won’t be logged unless the logging level is set quite high. If the message isn’t printed, then unnecessary calls are also made to the calcX() and calcY() methods. Experienced Java developers will reflexively reject that; some IDEs (such as NetBeans) will even flag the code and suggest it be changed. (Tools aren’t perfect, though: NetBeans will flag the string concatenation, but the suggested improvement retains the unneeded method calls.)

This logging code is better written like this:

if (log.isLoggable(Level.FINE)) {
    log.log(Level.FINE,
            "I am here, and the value of X is {} and Y is {}",
            new Object[]{calcX(), calcY()});
}

This avoids the string concatenation altogether (the message format isn’t necessarily more efficient, but it is cleaner), and there are no method calls or allocation of the object array unless logging has been enabled.

Writing code in this way is still clean and easy to read; it took no more effort than writing the original code. Well, OK, it required a few more keystrokes and an extra line of logic. But it isn’t the type of premature optimization that should be avoided; it’s the kind of choice that good coders learn to make. Don’t let out-of-context dogma from pioneering heroes prevent you from thinking about the code you are writing.

We’ll see other examples of this throughout this book, including in Chapter 9, which discusses the performance of a benign-looking loop construct to process a Vector of objects.

Look Elsewhere: The Database Is Always the Bottleneck

If you are developing standalone Java applications that use no external resources, the performance of that application is (mostly) all that matters. Once an external resource—a database, for example—is added, then the performance of both programs is important. And in a distributed environment, say with a Java EE application server, a load balancer, a database, and a backend enterprise information system, the performance of the Java application server may be the least of the performance issues.

This is not a book about holistic system performance. In such an environment, a structured approach must be taken toward all aspects of the system. CPU usage, I/O latencies, and throughput of all parts of the system must be measured and analyzed; only then can it be determined which component is causing the performance bottleneck. There are a number of excellent resources on that subject, and those approaches and tools are not really specific to Java. I assume you’ve done that analysis and determined that it is the Java component of your environment than needs to be improved.

The performance of the database is the example used in this section, but any part of the environment may be the source of a performance issue.

I once faced an issue where a customer was installing a new version of an application server, and testing showed that the requests sent to the server took longer and longer over time. Applying Occam’s Razor (see the next tip) led me to consider all aspects of the application server that might be causing the issue.

After those were ruled out, the performance issue remained, and there was no backend database on which to place the blame. The next most likely issue, therefore, was the test harness, and some profiling determined that the load generator—Apache JMeter—was the source of the regression: it was keeping every response in a list, and when a new response came in, it processed the entire list in order to calculate the 90th% response time (if that term is unfamiliar, see Chapter 2).

Performance issues can be caused by any part of the entire system where an application is deployed. Common case analysis says to consider the newest part of the system first (which is often the application in the JVM), but be prepared to look at every possible component of the environment.

On the other hand, don’t overlook that initial analysis. If the database is the bottleneck (and here’s a hint: it is), then tuning the Java application accessing the database won’t help overall performance at all. In fact, it might be counterproductive. As a general rule, when load is increased into a system that is overburdened, performance of that system gets worse. If something is changed in the Java application that makes it more efficient—which only increases the load on an already-overloaded database—overall performance may actually go down. The danger there is then reaching the incorrect conclusion that the particular JVM improvement shouldn’t be used.

This principle—that increasing load to a component in a system that is performing badly will make the entire system slower—isn’t confined to a database. It applies when load is added to an application server that is CPU-bound, or if more threads start accessing a lock that already has threads waiting for it, or any of a number of other scenarios. An extreme example of this that involves only the JVM is shown in Chapter 9.

Optimize for the Common Case

It is tempting—particularly given the “death by 1,000 cuts” syndrome—to treat all performance aspects as equally important. But focus should be given to the common use case scenarios.

This principle manifests itself in several ways:

Optimize code by profiling it and focusing on the operations in the profile taking the most time. Note, however, that this does not mean looking at only the leaf methods in a profile (see Chapter 3).
Apply Occam’s Razor to diagnosing performance problems. The simplest explanation for a performance issue is the most conceivable cause: a performance bug in new code is more likely than a configuration issue on a machine, which in turn is more likely than a bug in the JVM or operating system. Obscure bugs do exist, and as more credible causes for a performance issue are ruled out, it does become possible that somehow the test case in question has triggered such a latent bug. But don’t jump to the unlikely case first.
Write simple algorithms for the most common operations in an application. Take the case of a program that estimates some mathematical formula, where the user can decide if she wants an answer within a 10% margin of error, or a 1% margin. If most users will be satisfied with the 10% margin, then optimize that code path—even if it means slowing down the code that provides the 1% margin of error.

Summary

Java 7 and 8 introduce a number of new features and tools that make it even easier to get the best possible performance from a Java application. This book should help you understand how best to use all the features of the JVM in order to end up with fast-running programs.

In many cases, though, remember that the JVM is a small part of the overall performance picture. A systemic, system-wide approach to performance is required in Java environments where the performance of databases and other backend systems is at least as important as the performance of the JVM. That level of performance analysis is not the focus of this book—it is assumed the due diligence has been performed to make sure that the Java component of the environment is the important bottleneck in the system.

However, the interaction between the JVM and other areas of the system is equally important—whether that interaction is direct (e.g., the best way to use JDBC) or indirect (e.g., optimizing native memory usage of an application that shares a machine with several components of a large system). The information in this book should help solve performance issues along those lines as well.

Get Java Performance: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Java Performance: The Definitive Guide by Scott Oaks