Java Generics and Collections

Chapter 1. Introduction

Generics and collections work well with a number of other new features introduced in the latest versions of Java, including boxing and unboxing, a new form of loop, and functions that accept a variable number of arguments. We begin with an example that illustrates all of these. As we shall see, combining them is synergistic: the whole is greater than the sum of its parts.

Taking that as our motto, let’s do something simple with sums: put three numbers into a list and add them together. Here is how to do it in Java with generics:

List<Integer> ints = Arrays.asList(1,2,3);
int s = 0;
for (int n : ints) { s += n; }
assert s == 6;

You can probably read this code without much explanation, but let’s touch on the key features. The interface List and the class Arrays are part of the Collections Framework (both are found in the package java.util). The type List is now generic; you write List<E> to indicate a list with elements of type E. Here we write List<Integer> to indicate that the elements of the list belong to the class Integer, the wrapper class that corresponds to the primitive type int. Boxing and unboxing operations, used to convert from the primitive type to the wrapper class, are automatically inserted. The static method asList takes any number of arguments, places them into an array, and returns a new list backed by the array. The new loop form, foreach, is used to bind a variable successively to each element of the list, and the loop body adds these into the sum. The assertion statement (introduced in Java 1.4), is used to check that the sum is correct; when assertions are enabled, it throws an error if the condition does not evaluate to true.

Here is how the same code looks in Java before generics:

List ints = Arrays.asList( new Integer[] {
  new Integer(1), new Integer(2), new Integer(3)
} );
int s = 0;
for (Iterator it = ints.iterator(); it.hasNext(); ) {
  int n = ((Integer)it.next()).intValue();
  s += n;
}
assert s == 6;

Reading this code is not quite so easy. Without generics, there is no way to indicate in the type declaration what kind of elements you intend to store in the list, so instead of writing List<Integer>, you write List. Now it is the coder rather than the compiler who is responsible for remembering the type of the list elements, so you must write the cast to (Integer) when extracting elements from the list. Without boxing and unboxing, you must explicitly allocate each object belonging to the wrapper class Integer and use the intValue method to extract the corresponding primitive int. Without functions that accept a variable number of arguments, you must explicitly allocate an array to pass to the asList method. Without the new form of loop, you must explicitly declare an iterator and advance it through the list.

By the way, here is how to do the same thing with an array in Java before generics:

int[] ints = new int[] { 1,2,3 };
int s = 0;
for (int i = 0; i < ints.length; i++) { s += ints[i]; }
assert s == 6;

This is slightly longer than the corresponding code that uses generics and collections, is arguably a bit less readable, and is certainly less flexible. Collections let you easily grow or shrink the size of the collection, or switch to a different representation when appropriate, such as a linked list or hash table or ordered tree. The introduction of generics, boxing and unboxing, foreach loops, and varargs in Java marks the first time that using collections is just as simple, perhaps even simpler, than using arrays.

Now let’s look at each of these features in a little more detail.

Generics

An interface or class may be declared to take one or more type parameters, which are written in angle brackets and should be supplied when you declare a variable belonging to the interface or class or when you create a new instance of a class.

We saw one example in the previous section. Here is another:

List<String> words = new ArrayList<String>();
words.add("Hello ");
words.add("world!");
String s = words.get(0)+words.get(1);
assert s.equals("Hello world!");

In the Collections Framework, class ArrayList<E> implements interface List<E>. This trivial code fragment declares the variable words to contain a list of strings, creates an instance of an ArrayList, adds two strings to the list, and gets them out again.

In Java before generics, the same code would be written as follows:

List words = new ArrayList();
words.add("Hello ");
words.add("world!");
String s = ((String)words.get(0))+((String)words.get(1))
assert s.equals("Hello world!");

Without generics, the type parameters are omitted, but you must explicitly cast whenever an element is extracted from the list.

In fact, the bytecode compiled from the two sources above will be identical. We say that generics are implemented by erasure because the types List<Integer>, List<String>, and List<List<String>> are all represented at run-time by the same type, List. We also use erasure to describe the process that converts the first program to the second. The term erasure is a slight misnomer, since the process erases type parameters but adds casts.

Generics implicitly perform the same cast that is explicitly performed without generics. If such casts could fail, it might be hard to debug code written with generics. This is why it is reassuring that generics come with the following guarantee:

Cast-iron guarantee: the implicit casts added by the compilation of generics never fail.

There is also some fine print on this guarantee: it applies only when no unchecked warnings have been issued by the compiler. Later, we will discuss at some length what causes unchecked warnings to be issued and how to minimize their effect.

Implementing generics by erasure has a number of important effects. It keeps things simple, in that generics do not add anything fundamentally new. It keeps things small, in that there is exactly one implementation of List, not one version for each type. And it eases evolution, since the same library can be accessed in both nongeneric and generic forms.

This last point is worth some elaboration. It means that you don’t get nasty problems due to maintaining two versions of the libraries: a nongeneric legacy version that works with Java 1.4 or earlier, and a generic version that works with Java 5 and 6. At the bytecode level, code that doesn’t use generics looks just like code that does. There is no need to switch to generics all at once—you can evolve your code by updating just one package, class, or method at a time to start using generics. We even explain how you may declare generic types for legacy code. (Of course, the cast-iron guarantee mentioned above holds only if you add generic types that match the legacy code.)

Another consequence of implementing generics by erasure is that array types differ in key ways from parameterized types. Executing

new String[size]

allocates an array, and stores in that array an indication that its components are of type String. In contrast, executing:

new ArrayList<String>()

allocates a list, but does not store in the list any indication of the type of its elements. In the jargon, we say that Java reifies array component types but does not reify list element types (or other generic types). Later, we will see how this design eases evolution (see Chapter 5) but complicates casts, instance tests, and array creation (see Chapter 6).

Generics Versus Templates Generics in Java resemble templates in C++. There are just two important things to bear in mind about the relationship between Java generics and C++ templates: syntax and semantics. The syntax is deliberately similar and the semantics are deliberately different.

Syntactically, angle brackets were chosen because they are familiar to C++ users, and because square brackets would be hard to parse. However, there is one difference in syntax. In C++, nested parameters require extra spaces, so you see things like this:

List< List<String> >

In Java, no spaces are required, and it’s fine to write this:

List<List<String>>

You may use extra spaces if you prefer, but they’re not required. (In C++, a problem arises because >> without the space denotes the right-shift operator. Java fixes the problem by a trick in the grammar.)

Semantically, Java generics are defined by erasure, whereas C++ templates are defined by expansion. In C++ templates, each instance of a template at a new type is compiled separately. If you use a list of integers, a list of strings, and a list of lists of string, there will be three versions of the code. If you use lists of a hundred different types, there will be a hundred versions of the code—a problem known as code bloat. In Java, no matter how many types of lists you use, there is always one version of the code, so bloat does not occur.

Expansion may lead to more efficient implementation than erasure, since it offers more opportunities for optimization, particularly for primitive types such as int. For code that is manipulating large amounts of data—for instance, large arrays in scientific computing—this difference may be significant. However, in practice, for most purposes the difference in efficiency is not important, whereas the problems caused by code bloat can be crucial.

In C++, you also may instantiate a template with a constant value rather than a type, making it possible to use templates as a sort of “macroprocessor on steroids” that can perform arbitrarily complex computations at compile time. Java generics are deliberately restricted to types, to keep them simple and easy to understand.

Boxing and Unboxing

Recall that every type in Java is either a reference type or a primitive type. A reference type is any class, interface, or array type. All reference types are subtypes of class Object, and any variable of reference type may be set to the value null. As shown in the following table, there are eight primitive types, and each of these has a corresponding library class of reference type. The library classes are located in the package java.lang.

Primitive	Reference
`byte`	`Byte`
`short`	`Short`
`int`	`Integer`
`long`	`Long`
`float`	`Float`
`double`	`Double`
`boolean`	`Boolean`
`char`	`Character`

Conversion of a primitive type to the corresponding reference type is called boxing and conversion of the reference type to the corresponding primitive type is called unboxing.

Java with generics automatically inserts boxing and unboxing coercions where appropriate. If an expression e of type int appears where a value of type Integer is expected, boxing converts it to new Integer(e) (however, it may cache frequently occurring values). If an expression e of type Integer appears where a value of type int is expected, unboxing converts it to the expression e.intValue(). For example, the sequence:

List<Integer> ints = new ArrayList<Integer>();
ints.add(1);
int n = ints.get(0);

is equivalent to the sequence:

List<Integer> ints = new ArrayList<Integer>();
ints.add(Integer.valueOf(1));
int n = ints.get(0).intValue();

The call Integer.valueOf(1) is similar in effect to the expression new Integer(1), but may cache some values for improved performance, as we explain shortly.

Here, again, is the code to find the sum of a list of integers, conveniently packaged as a static method:

public static int sum (List<Integer> ints) {
  int s = 0;
  for (int n : ints) { s += n; }
  return s;
}

Why does the argument have type List<Integer> and not List<int>? Because type parameters must always be bound to reference types, not primitive types. Why does the result have type int and not Integer? Because result types may be either primitive or reference types, and it is more efficient to use the former than the latter. Unboxing occurs when each Integer in the list ints is bound to the variable n of type int.

We could rewrite the method, replacing each occurrence of int with Integer:

public static Integer sumInteger(List<Integer> ints) {
  Integer s = 0;
  for (Integer n : ints) { s += n; }
  return s;
}

This code compiles but performs a lot of needless work. Each iteration of the loop unboxes the values in s and n, performs the addition, and boxes up the result again. With Sun’s current compiler, measurements show that this version is about 60 percent slower than the original.

Look Out for This! One subtlety of boxing and unboxing is that == is defined differently on primitive and on reference types. On type int, it is defined by equality of values, and on type Integer, it is defined by object identity. So both of the following assertions succeed using Sun’s JVM:

List<Integer> bigs = Arrays.asList(100,200,300);
assert sumInteger(bigs) == sum(bigs);
assert sumInteger(bigs) != sumInteger(bigs); // not recommended

In the first assertion, unboxing causes values to be compared, so the results are equal. In the second assertion, there is no unboxing, and the two method calls return distinct Integer objects, so the results are unequal even though both Integer objects represent the same value, 600.We recommend that you never use == to compare values of type Integer. Either unbox first, so == compares values of type int, or else use equals to compare values of type Integer.

A further subtlety is that boxed values may be cached. Caching is required when boxing an int or short value between–128 and 127, a char value between '\u0000' and '\u007f', a byte, or a boolean; and caching is permitted when boxing other values. Hence, in contrast to our earlier example, we have the following:

List<Integer> smalls = Arrays.asList(1,2,3);
assert sumInteger(smalls) == sum(smalls);
assert sumInteger(smalls) == sumInteger(smalls);  // not recommended

This is because 6 is smaller than 128, so boxing the value 6 always returns exactly the same object. In general, it is not specified whether boxing the same value twice should return identical or distinct objects, so the inequality assertion shown earlier may either fail or succeed depending on the implementation. Even for small values, for which == will compare values of type Integer correctly, we recommend against its use. It is clearer and cleaner to use equals rather than == to compare values of reference type, such as Integer or String.

Foreach

Here, again, is our code that computes the sum of a list of integers.

List<Integer> ints = Arrays.asList(1,2,3);
int s = 0;
for (int n : ints) { s += n; }
assert s == 6;

The loop in the third line is called a foreach loop even though it is written with the keyword for. It is equivalent to the following:

for (Iterator<Integer> it = ints. iterator(); it.hasNext(); ) {
  int n = it.next();
  s += n;
}

The emphasized code corresponds to what was written by the user, and the unemphasized code is added in a systematic way by the compiler. It introduces the variable it of type Iterator<Integer> to iterate over the list ints of type List<Integer>. In general, the compiler invents a new name that is guaranteed not to clash with any name already in the code. Note that unboxing occurs when the expression it.next() of type Integer is assigned to the variable n of type int.

The foreach loop can be applied to any object that implements the interface Iterable<E> (in package java.lang), which in turn refers to the interface Iterator<E> (in package java.util). These define the methods iterator, hasNext, and next, which are used by the translation of the foreach loop (iterators also have a method remove, which is not used by the translation):

interface Iterable<E> {
  public Iterator<E> iterator();
}
interface Iterator<E> {
  public boolean hasNext();
  public E next();
  public void remove();
}

All collections, sets, and lists in the Collections Framework implement the Iterable<E> interface; and classes defined by other vendors or users may implement it as well.

The foreach loop may also be applied to an array:

public static int sumArray(int[] a) {
  int s = 0;
  for (int n : a) { s += n; }
  return s;
}

The foreach loop was deliberately kept simple and catches only the most common case. You need to explicitly introduce an iterator if you wish to use the remove method or to iterate over more than one list in parallel. Here is a method that removes negative elements from a list of doubles:

public static void removeNegative(List<Double> v) {
  for (Iterator<Double> it = v.iterator(); it.hasNext();) {
    if (it.next() < 0) it.remove();
  }
}

Here is a method to compute the dot product of two vectors, represented as lists of doubles, both of the same length. Given two vectors, u₁, … , u_n and v₁, … , v_n, it computes u₁ * v₁> + … + u_n * v_n:

public static double dot(List<Double> u, List<Double> v) {
  if (u.size() != v.size())
    throw new IllegalArgumentException("different sizes");
  double d = 0;
  Iterator<Double> uIt = u.iterator();
  Iterator<Double> vIt = v.iterator();
  while (uIt.hasNext()) {
    assert uIt.hasNext() && vIt.hasNext();
    d += uIt.next() * vIt.next();
  }
  assert !uIt.hasNext() && !vIt.hasNext();
  return d;
}

Two iterators, uIt and vIt, advance across the lists u and v in lock step. The loop condition checks only the first iterator, but the assertions confirm that we could have used the second iterator instead, since we previously tested both lists to confirm that they have the same length.

Generic Methods and Varargs

Here is a method that accepts an array of any type and converts it to a list:

class Lists {
  public static <T> List<T> toList(T[] arr) {
    List<T> list = new ArrayList<T>();
    for (T elt : arr) list.add(elt);
    return list;
  }
}

The static method toList accepts an array of type T[] and returns a list of type List<T>, and does so for any type T. This is indicated by writing <T> at the beginning of the method signature, which declares T as a new type variable. A method which declares a type variable in this way is called a generic method. The scope of the type variable T is local to the method itself; it may appear in the method signature and the method body, but not outside the method.

The method may be invoked as follows:

List<Integer> ints = Lists.toList(new Integer[] { 1, 2, 3 });
List<String> words = Lists.toList(new String[] { "hello", "world" });

In the first line, boxing converts 1, 2, 3 from int to Integer.

Packing the arguments into an array is cumbersome. The vararg feature permits a special, more convenient syntax for the case in which the last argument of a method is an array. To use this feature, we replace T[] with T… in the method declaration:

class Lists {
  public static <T> List<T> toList(T... arr) {
    List<T> list = new ArrayList<T>();
    for (T elt : arr) list.add(elt);
    return list;
  }
}

Now the method may be invoked as follows:

List<Integer> ints = Lists.toList(1, 2, 3);
List<String> words = Lists.toList("hello", "world");

This is just shorthand for what we wrote above. At run time, the arguments are packed into an array which is passed to the method, just as previously.

Any number of arguments may precede a last vararg argument. Here is a method that accepts a list and adds all the additional arguments to the end of the list:

public static <T> void addAll(List<T> list, T... arr) {
  for (T elt : arr) list.add(elt);
}

Whenever a vararg is declared, one may either pass a list of arguments to be implicitly packed into an array, or explicitly pass the array directly. Thus, the preceding method may be invoked as follows:

List<Integer> ints = new ArrayList<Integer>();
Lists.addAll(ints, 1, 2);
Lists.addAll(ints, new Integer[] { 3, 4 });
assert ints.toString().equals("[1, 2, 3, 4]");

We will see later that when we attempt to create an array containing a generic type, we will always receive an unchecked warning. Since varargs always create an array, they should be used only when the argument does not have a generic type (see Array Creation and Varargs).

In the preceding examples, the type parameter to the generic method is inferred, but it may also be given explicitly, as in the following examples:

List<Integer> ints = Lists.<Integer>toList();
List<Object> objs = Lists.<Object>toList(1, "two");

Explicit parameters are usually not required, but they are helpful in the examples given here. In the first example, without the type parameter there is too little information for the type inference algorithm used by Sun’s compiler to infer the correct type. It infers that the argument to toList is an empty array of an arbitrary generic type rather than an empty array of integers, and this triggers the unchecked warning described earlier. (The Eclipse compiler uses a different inference algorithm, and compiles the same line correctly without the explicit parameter.) In the second example, without the type parameter there is too much information for the type inference algorithm to infer the correct type. You might think that Object is the only type that an integer and a string have in common, but in fact they also both implement the interfaces Serializable and Comparable. The type inference algorithm cannot choose which of these three is the correct type.

In general, the following rule of thumb suffices: in a call to a generic method, if there are one or more arguments that correspond to a type parameter and they all have the same type then the type parameter may be inferred; if there are no arguments that correspond to the type parameter or the arguments belong to different subtypes of the intended type then the type parameter must be given explicitly.

When a type parameter is passed to a generic method invocation, it appears in angle brackets to the left, just as in the method declaration. The Java grammar requires that type parameters may appear only in method invocations that use a dotted form. Even if the method toList is defined in the same class that invokes the code, we cannot shorten it as follows:

List<Integer> ints = <Integer>toList(); // compile-time error

This is illegal because it will confuse the parser.

Methods Arrays.asList and Collections.addAll in the Collections Framework are similar to toList and addAll shown earlier. (Both classes are in package java.util.) The Collections Framework version of asList does not return an ArrayList, but instead returns a specialized list class that is backed by a given array. Also, its version of addAll acts on general collections, not just lists.

Assertions

We clarify our code by liberal use of the assert statement. Each occurrence of assert is followed by a boolean expression that is expected to evaluate to true. If assertions are enabled and the expression evaluates to false, an AssertionError is thrown, including an indication of where the error occurred. Assertions are enabled by invoking the JVM with the -ea or -enableassertions flag.

We only write assertions that we expect to evaluate to true. Since assertions may not be enabled, an assertion should never have side effects upon which any nonassertion code depends. When checking for a condition that might not hold (such as confirming that the arguments to a method call are valid), we use a conditional and throw an exception explicitly.

To sum up, we have seen how generics, boxing and unboxing, foreach loops, and varargs work together to make Java code easier to write, having illustrated this through the use of the Collections Framework.

Get Java Generics and Collections now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Java Generics and Collections by

Chapter 1. Introduction

Generics

Boxing and Unboxing

Foreach

Generic Methods and Varargs

Assertions

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly