Collections

Collections are a fundamental idea in programming. Applications frequently need to keep track of many related things, like a group of employees or a set of images. To support the concept of many at a fundamental level, of course, Java includes the concept of arrays. Since a one-dimensional array has a fixed length, arrays are awkward for sets of things that grow and shrink over the lifetime of an application. Ever since SDK 1.0, the Java platform has had two handy classes for keeping track of sets. The java.util.Vector class represents a dynamic list of objects, and the java.util.Hashtable class is a set of key/value pairs. The Java 2 platform introduces a more comprehensive approach to collections called the Collections Framework. The Vector and Hashtable classes still exist, but they are now a part of the framework.

If you work with dictionaries or associative arrays in other languages, you should understand how useful these classes are. If you are someone who has worked in C or another static language, you should find collections to be truly magical. They are part of what makes Java powerful and dynamic. Being able to work with lists of objects and make associations between them is an abstraction from the details of the types. It lets you think about the problems at a higher level and saves you from having to reproduce common structures every time you need them.

The Collections Framework is based around a handful of interfaces in the java.util package. These interfaces are divided into two hierarchies. The first hierarchy descends from the Collection interface. This interface (and its descendants) represents a box that holds other objects. The second hierarchy is based on the Map interface, which represents a group of key/value pairs.

The Collection Interface

The mother of all collections is an interface appropriately named Collection. It serves as a box that holds other objects, its elements. It doesn’t specify whether duplicate objects are allowed or whether the objects will be ordered in some way. These kinds of details are left to child interfaces. Nevertheless, the Collection interface does define some basic operations:

public boolean add (Object o )

This method adds the supplied object to this collection. If the operation succeeds, this method returns true. If the object already exists in this collection and the collection does not permit duplicates, false is returned. Furthermore, some collections are read-only. These collections will throw an UnsupportedOperationException if this method is called.

public boolean remove (Object o )

This method removes the supplied object from this collection. Like the add( ) method, this method returns true if the object is removed from the collection. If the object doesn’t exist in this collection, false is returned. Read-only collections throw an UnsupportedOperationException if this method is called.

public boolean contains (Object o )

This method returns true if the collection contains the specified object.

public int size ( )

Use this method to find the number of elements in this collection.

public boolean isEmpty ( )

This method returns true if there are no elements in this collection.

public Iterator iterator ( )

Use this method to examine all the elements in this collection. This method returns an Iterator, which is an object that you can use to step through the collection’s elements. We’ll talk more about iterators in the next section.

As a special convenience, the elements of a collection can be placed into an array using the following methods:

public Object[] toArray ( )
public Object[] toArray (Object[] a )

These methods return an array that contains all the elements in this collection. The second version of this method returns an array of the same type as the array a.

Remember, these methods are common to every Collection implementation. Any class that implements Collection or one of its child interfaces will have these methods.

A Collection is a dynamic array; it can grow to accommodate new items. For example, a List is a kind of Collection that implements a dynamic array. You can insert and remove elements at arbitrary positions within a List. Collections work directly with the type Object, so we can use Collections with instances of any class.[34] We can even put different kinds of Objects in a Collection together; the Collection doesn’t know the difference.

As you might guess, this is where things get tricky. To do anything useful with an Object after we take it back out of a Collection, we have to cast it back (narrow it) to its original type. This can be done safely in Java because the cast is checked at runtime. Java throws a ClassCastException if we try to cast an object to the wrong type. However, this need for casting means that your code must remember types or methodically test them with instanceof. That is the price we pay for having a completely dynamic collection class that operates on all types.

You might wonder if you can implement Collection to produce a class that works on just one type of element in a type-safe way. Unfortunately, the answer is no. We could implement Collection’s methods to make a Collection that rejects the wrong type of element at runtime, but this does not provide any new compile time, static type safety. In C++, templates provide a safe mechanism for parameterizing types by restricting the types of objects used at compile time. For a glimpse at Java language work in this area, see http://www.math.luc.edu/pizza/gj.

Iterators

What does the java.util.Iterator interface do? An Iterator is an object that lets you step through another object’s data.[35]

public Object next ( )

This method returns the next element of the associated Collection.

public boolean hasNext ( )

This method returns true if you have not yet stepped through all of the Collection’s elements. In other words, it returns true if you can call next( ) to get the next element.

The following example shows how you could use an Iterator to print out every element of a collection:

public void printElements(Collection c, PrintStream out) {
    Iterator iterator = c.iterator( );

    while (iterator.hasNext( ))
        out.println(iterator.next( ));
}

Finally, Iterator offers the ability to remove an element from a collection:

public void remove ( )

This method removes the last object returned from next( ) from the associated Collection. Not all iterators implement remove( ). It doesn’t make sense to be able to remove an element from a read-only collection, for example. If element removal is not allowed, an UnsupportedOperationException is thrown from this method. If you call remove( ) before first calling next( ), or if you call remove( ) twice in a row, you’ll get an IllegalStateException.

Collection Flavors

The Collection interface has two child interfaces: Set represents a collection in which duplicate elements are not allowed, and List is a collection whose elements have a specific order.

Set has no methods besides the ones it inherits from Collection. It does, however, enforce the rule that duplicate elements are not allowed. If you try to add an element that already exists in a Set, the add( ) method will return false.

SortedSet adds only a few methods to Set. As you call add() and remove( ) , the set maintains its order. You can retrieve subsets (which are also sorted) using the subSet() , headSet(), and tailSet() methods. The first() , last(), and comparator( ) methods provide access to the first element, the last element, and the object used to compare elements (more on this later).

The last child interface of Collection is List . The List interface adds the ability to manipulate elements at specific positions in the list:

public void add (int index , Object element )

This method adds the given object at the supplied list position. If the position is less than zero or greater than the list length, an IndexOutOfBoundsException will be thrown. The element that was previously at the supplied position and all elements after it will be moved up by one index position.

public void remove (int index )

This method removes the element at the supplied position. All subsequent elements will move down by one index position.

public void get (int index )

This method returns the element at the given position.

public void set (int index , Object element )

This method changes the element at the given position to be the supplied object.

The Map Interface

The Collections Framework also includes the concept of a Map, which is a collection of key/value pairs. Another way of looking at a map is that it is a dictionary, similar to an associative array. Maps store and retrieve elements with key values; they are very useful for things like caches and minimalist databases. When you store a value in a map, you associate a key object with that value. When you need to look up the value, the map retrieves it using the key.

The java.util.Map interface specifies a map that, like Collection, operates on the type Object. A Map stores an element of type Object and associates it with a key, also of type Object. In this way, we can index arbitrary types of elements using arbitrary types as keys. As with Collection, casting is generally required to narrow objects back to their original type after pulling them out of a map.

The basic operations are straightforward:

public Object put (Object key , Object value )

This method adds the specified key/value pair to the map. If the map already contains a value for the specified key, the old value is replaced.

public Object get (Object key )

This method retrieves the value corresponding to key from the map.

public Object remove (Object key )

This method removes the value corresponding to key from the map.

public int size ( )

Use this method to find the number of key/value pairs in this map.

You can retrieve all the keys or values in the map:

public Set keySet ( )

This method returns a Set that contains all of the keys in this map.

public Collection values ( )

Use this method to retrieve all of the values in this map. The returned Collection can contain duplicate elements.

Map has one child interface, SortedMap . SortedMap maintains its key/value pairs in sorted order according to the key values. It provides subMap() , headMap( ), and tailMap( ) methods for retrieving sorted map subsets. Like SortedSet, it also provides a comparator( ) method that returns an object that determines how the map keys are sorted. We’ll talk more about this later.

Implementations

Up until this point, we’ve talked only about interfaces. But you can’t instantiate interfaces. The Collections Framework includes useful implementations of the collections interfaces. These implementations are listed in Table 9.7, according to the interface they implement.

Table 9-7. Collections Framework Implementation Classes

Interface

Implementation

Set

HashSet

SortedSet

TreeSet

List

ArrayList, LinkedList, Vector

Map

HashMap, Hashtable

SortedMap

TreeMap

ArrayList offers good performance if you add to the end of the list frequently, while LinkedList offers better performance for frequent insertions and deletions. Vector has been around since SDK 1.0; it’s now retrofitted to implement the List methods. Vector offers the advantage (and overhead) of synchronized methods, which is essential for multithreaded access. The old Hashtable has been updated so that it now implements the Map interface. It also has the advantage and overhead of synchronized operations. As you’ll see, there are other, more general ways to get synchronized collections.

Hashcodes and key values

If you’ve used a hashtable before, you’ve probably guessed that there’s more going on behind the scenes with maps than we’ve let on. An element in a hashtable is not associated with a key strictly by the key object’s identity, but rather by the key’s contents. This allows keys that are equivalent to access the same object. By “equivalent,” we mean those objects that compare true with equals( ). So, if you store an object in a Hashtable using one object as a key, you can use any other object that equals( ) tells you is equivalent to retrieve the stored object.

It’s easy to see why equivalence is important if you remember our discussion of strings. You may create two String objects that have the same text in them but that come from different sources in Java. In this case, the == operator will tell you that the objects are different, but the equals( ) method of the String class will tell you that they are equivalent. Because they are equivalent, if we store an object in a Hashtable using one of the String objects as a key, we can retrieve it using the other.

Since Hashtables have a notion of equivalent keys, what does the hashcode do? The hashcode is like a fingerprint of the object’s data content. The Hashtable uses the hashcode to store the objects so that they can be retrieved efficiently. The hashcode is nothing more than a number (an integer) that is a function of the data. The number always turns out the same for identical data, but the hashing function is intentionally designed to generate as random a number as possible for different combinations of data. That is, a very small change in the data should produce a big difference in the number. It is unlikely that two similar data sets will produce the same hashcode.

A Hashtable really just keeps a number of lists of objects, but it puts objects into the lists based on their hashcode. So when it wants to find the object again, it can look at the hashcode and know immediately how to get to the appropriate list. The Hashtable still might end up with a number of objects to examine, but the list should be short. For each object it finds, it does the following comparison to see if the key matches:

if ((keyHashcode == storedKeyHashcode) && key.equals(storedKey))
    return object;

There is no prescribed way to generate hashcodes. The only requirement is that they be somewhat randomly distributed and reproducible (based on the data). This means that two objects that are not the same could end up with the same hashcode by accident. This is unlikely (there are 2^32 possible integer values); moreover, it doesn’t cause a problem, because the Hashtable ultimately checks the actual keys, as well as the hashcodes, to see if they are equal. Therefore, even if two objects have the same hashcode, they can still co-exist in the hashtable.

Hashcodes are computed by an object’s hashCode( ) method, which is inherited from the Object class if it isn’t overridden. The default hashCode( ) method simply assigns each object instance a unique number to be used as a hashcode. If a class does not override this method, each instance of the class will have a unique hashcode. This goes along well with the default implementation of equals( ) in Object, which only compares objects for identity using ==.

You must override equals( ) in any classes for which equivalence of different objects is meaningful. Likewise, if you want equivalent objects to serve as equivalent keys, you need to override the hashCode( ) method, as well, to return identical hashcode values. To do this, you need to create some suitably complex and arbitrary function of the contents of your object. The only criterion for the function is that it should be almost certain to return different values for objects with different data, but the same value for objects with identical data.

Slam Dunking with Collections

The java.util.Collections class is full of handy static methods that operate on Sets and Maps. (It’s not the same as the java.util.Collection interface, which we’ve already talked about.) Since all the static methods in Collections operate on interfaces, they will work regardless of the actual implementation classes you’re using. This is pretty powerful stuff.

The Collections class includes these methods:

public static Collection synchronizedCollection (Collection c )
public static Set synchronizedSet (Set s )
public static List synchronizedList (List list )
public static Map synchronizedMap (Map m )
public static SortedSet synchronizedSortedSet (SortedSet s )
public static SortedMap synchronizedSortedMap (SortedMap m )

These methods create synchronized, thread-safe versions of the supplied collection. This is useful if you’re planning to access the collection from more than one thread. We’ll talk a more about this later in this chapter.

Furthermore, you can use the Collections class to create read-only versions of any collection:

public static Collection unmodifiableCollection (Collection c )
public static Set unmodifiableSet (Set s )
public static List unmodifiableList (List list )
public static Map unmodifiableMap (Map m )
public static SortedSet unmodifiableSortedSet (SortedSet s )
public static SortedMap unmodifiableSortedMap (SortedMap m )

Sorting for Free

Collections includes other methods for performing common operations like sorting. Sorting comes in two varieties:

public static void sort (List list )

This method sorts the given list. You can use this method only on lists whose elements implement the java.lang.Comparable interface. Luckily, many classes already implement this interface, including String, Date, BigInteger, and the wrapper classes for the primitive types (Integer, Double, etc.).

public static void sort (List list , Comparator c )

Use this method to sort a list whose elements don’t implement the Comparable interface. The supplied java.util.Comparator does the work of comparing elements. You might, for example, write an ImaginaryNumber class and want to sort a list of them. You would then create a Comparator implementation that knew how to compare two imaginary numbers.

Collections gives you some other interesting capabilities, too. If you’re interested in finding out more, check out the min( ), max( ), binarySearch( ), and reverse( ) methods.

A Thrilling Example

Collections is a bread-and-butter topic, which means it’s hard to make exciting examples about it. The example in this section reads a text file, parses all its words, counts the number of occurrences, sorts them, and writes the results to another file. It will give you a good feel for how to use collections in your own programs.

//file: WordSort.java
import java.io.*;
import java.util.*;

public class WordSort {
  public static void main(String[] args) throws IOException {
    // get the command-line arguments
    if (args.length < 2) {
      System.out.println("Usage: WordSort inputfile outputfile");
      return;
    }
    String inputfile = args[0];
    String outputfile = args[1];
    

/* Create the word map. Each key is a word and each value is an
 * Integer that represents the number of times the word occurs
 * in the input file.
 */

    Map map = new TreeMap( );
    
    // read every line of the input file
    BufferedReader in =
        new BufferedReader(new FileReader(inputfile));
    String line;

    while ((line = in.readLine( )) != null) {
      // examine each word on the line
      StringTokenizer st = new StringTokenizer(line);
      while (st.hasMoreTokens( )) {
        String word = st.nextToken( );
        Object o = map.get(word);
        // if there's no entry for this word, add one
        if (o == null) map.put(word, new Integer(1));
        // otherwise, increment the count for this word
        else {
          Integer count = (Integer)o;
          map.put(word, new Integer(count.intValue( ) + 1));
        }
      }
    }
    in.close( );
    
    // get the map's keys and sort them
    List keys = new ArrayList(map.keySet( ));
    Collections.sort(keys);

    // write the results to the output file
    PrintWriter out = new PrintWriter(new FileWriter(outputfile));
    Iterator iterator = keys.iterator( );
    while (iterator.hasNext( )) {
      Object key = iterator.next( );
      out.println(key + " : " + map.get(key));
    }
    out.close( );
  }
}

Suppose, for example, that you have an input file named Ian Moore.txt:

Well it was my love that kept you going
Kept you strong enough to fall
And it was my heart you were breaking
When he hurt your pride

So how does it feel
How does it feel
How does it feel
How does it feel

You could run the example on this file using the following command line:

java WordSort "Ian Moore.txt" count.txt

The output file, count.txt, looks like this:

And : 1
How : 3
Kept : 1
So : 1
Well : 1
When : 1
breaking : 1
does : 4
enough : 1
fall : 1
feel : 4
going : 1
he : 1
heart : 1
how : 1
hurt : 1
it : 6
kept : 1
love : 1
my : 2
pride : 1
strong : 1
that : 1
to : 1
was : 2
were : 1
you : 3
your : 1

The results are case-sensitive: “How” is recorded separately from “how”. You could modify this behavior by converting words to all lowercase after retrieving them from the StringTokenizer:

String word = st.nextToken().toLowerCase( );

Thread Safety and Iterators

If a collection will be accessed by more than one thread (see Chapter 8), things get a little tricky. Operations on the collection should be fast, but on the other hand, you need to make sure that different threads don’t step on each other’s toes with respect to the collection.

The Collections class provides methods that will create a thread-safe version of any Collection. There are methods for each subtype of Collection. The following example shows how to create a thread-safe List:

List list = new ArrayList( );
List syncList = Collections.synchronizedList(list);

Although synchronized collections are thread-safe, the Iterators returned from them are not. This is an important point. If you obtain an Iterator from a collection, you should do your own synchronization to ensure that the collection does not change as you’re iterating through its elements. You can do this with the synchronized keyword:

synchronized(syncList) {
    Iterator iterator = syncList.iterator( );
    // do stuff with the iterator here
}

WeakHashMap: An Interesting Variation

WeakHashMap is an especially interesting collection. In some respects, it looks and behaves like a HashMap . What’s interesting is that the key values in WeakHashMap are allowed to be harvested by the garbage collector.

WeakHashMap makes use of weak references. As you’ll recall, objects in Java are garbage-collected when there are no longer any references to them. A weak reference is a special kind of reference that doesn’t prevent the garbage collector from cleaning up the referenced object. Weak references and their siblings, soft references and phantom references, are implemented in the java.lang.ref package. We won’t go into detail here; just the concept of a weak reference is important.

Why is WeakHashMap useful? It means you don’t have to remove key/value pairs from a Map when you’re finished with them. Normally if you removed all references to a key object in the rest of your application, the Map would still contain a reference and keep the object “alive.” WeakHashMap changes this; once you remove references to a key object in the rest of the application, the WeakHashMap lets go of it too.



[34] In C++, where classes don’t derive from a single Object class that supplies a base type and common methods, the elements of a collection would usually be derived from some common collectable class. This forces the use of multiple inheritance and brings its associated problems.

[35] If you’re familiar with earlier versions of Java, it will help you to know that Iterator is the successor to Enumeration, kind of an Enumeration on steroids.

Get Learning Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.