Collections
are a
fundamental idea in programming. Applications frequently need to keep
track of many related things, like a group of employees or a set of
images. To support the concept of many at a
fundamental level, of course, Java includes the concept of
arrays. Since a one-dimensional array has
a fixed length, arrays are awkward for sets of things that grow and
shrink over the lifetime of an application. Ever since SDK 1.0, the
Java platform has had two handy classes for keeping track of sets.
The
java.util.Vector
class represents a dynamic list of
objects, and the
java.util.Hashtable
class is a set of key/value pairs. The
Java 2 platform introduces a more comprehensive approach to
collections called the
Collections Framework. The
Vector
and Hashtable
classes
still exist, but they are now a part of the framework.
If you work with dictionaries or associative arrays in other languages, you should understand how useful these classes are. If you are someone who has worked in C or another static language, you should find collections to be truly magical. They are part of what makes Java powerful and dynamic. Being able to work with lists of objects and make associations between them is an abstraction from the details of the types. It lets you think about the problems at a higher level and saves you from having to reproduce common structures every time you need them.
The Collections Framework is based around a handful of interfaces in
the java.util
package. These interfaces are
divided into two hierarchies. The first hierarchy descends from the
Collection
interface. This interface (and its
descendants) represents a box that holds other objects. The second
hierarchy is based on the Map
interface, which
represents a group of key/value pairs.
The mother of all collections is an interface appropriately named
Collection
. It serves as a box that holds other
objects, its elements. It doesn’t specify
whether duplicate objects are allowed or whether the objects will be
ordered in some way. These kinds of details are left to child
interfaces. Nevertheless, the Collection
interface
does define some basic
operations:
-
public boolean
add
(Object
o
)
This method adds the supplied object to this collection. If the operation succeeds, this method returns
true
. If the object already exists in this collection and the collection does not permit duplicates,false
is returned. Furthermore, some collections are read-only. These collections will throw anUnsupportedOperationException
if this method is called.-
public boolean
remove
(Object
o
)
This method removes the supplied object from this collection. Like the
add( )
method, this method returnstrue
if the object is removed from the collection. If the object doesn’t exist in this collection,false
is returned. Read-only collections throw anUnsupportedOperationException
if this method is called.-
public boolean
contains
(Object
o
)
This method returns
true
if the collection contains the specified object.-
public int
size
( )
Use this method to find the number of elements in this collection.
-
public boolean
isEmpty
( )
This method returns
true
if there are no elements in this collection.-
public Iterator
iterator
( )
Use this method to examine all the elements in this collection. This method returns an
Iterator
, which is an object that you can use to step through the collection’s elements. We’ll talk more about iterators in the next section.
As a special convenience, the elements of a collection can be placed into an array using the following methods:
public Object[]
toArray
( )
|
public Object[]
toArray
(Object[]
a
)
|
These methods return an array that contains all the elements in this
collection. The second version of this method returns an array of the
same type as the array a
.
Remember, these methods are common to every
Collection
implementation. Any class that
implements Collection
or one of its child
interfaces will have these methods.
A Collection
is a dynamic array; it can
grow to accommodate new items. For example,
a List
is a kind of Collection
that implements a dynamic array. You can insert and remove elements
at arbitrary positions within a List
.
Collection
s work directly with the type
Object
, so we can use
Collection
s with instances of any class.[34] We
can even put different kinds of Object
s in a
Collection
together; the
Collection
doesn’t know the difference.
As you might guess, this is where things get tricky. To do anything
useful with an Object
after we take it back out of
a Collection
,
we have to
cast
it back (narrow it) to its original type. This can be done safely in
Java because the cast is checked at runtime. Java throws a
ClassCastException
if we try to cast an object to the wrong type. However, this need for
casting means that your code must remember types or methodically test
them with
instanceof
. That is
the price we pay for having a completely dynamic collection class
that operates on all types.
You might wonder if you can implement Collection
to produce a class that works on just one type of element in a
type-safe way. Unfortunately, the answer is no. We could implement
Collection
’s methods to make a
Collection
that rejects the wrong type of element
at runtime, but this does not provide any new compile time, static
type safety. In C++, templates provide a safe mechanism for
parameterizing types by restricting the types of objects used at
compile time. For a glimpse at Java language work in this area, see
http://www.math.luc.edu/pizza/gj.
What
does the
java.util.Iterator
interface do? An
Iterator
is an object that lets you step through
another object’s data.[35]
-
public Object
next
( )
This method returns the next element of the associated Collection.
-
public boolean
hasNext
( )
This method returns
true
if you have not yet stepped through all of theCollection
’s elements. In other words, it returnstrue
if you can callnext( )
to get the next element.
The following example shows how you could use an
Iterator
to print out every element of a
collection:
public void printElements(Collection c, PrintStream out) { Iterator iterator = c.iterator( ); while (iterator.hasNext( )) out.println(iterator.next( )); }
Finally, Iterator
offers the ability to remove an
element from a collection:
-
public void
remove
( )
This method removes the last object returned from
next( )
from the associatedCollection
. Not all iterators implementremove( )
. It doesn’t make sense to be able to remove an element from a read-only collection, for example. If element removal is not allowed, anUnsupportedOperationException
is thrown from this method. If you callremove( )
before first callingnext( )
, or if you callremove( )
twice in a row, you’ll get anIllegalStateException
.
The Collection
interface has two child interfaces:
Set
represents a collection in which
duplicate elements are not allowed, and
List
is a collection whose elements have a
specific order.
Set
has no methods besides the ones it inherits
from Collection
. It does, however, enforce the
rule that duplicate elements are not allowed. If you try to add an
element that already exists in a Set
, the
add( )
method
will return false.
SortedSet
adds only a few methods to
Set
. As you call add()
and
remove( )
, the
set maintains its order. You can retrieve subsets (which are also
sorted) using the
subSet()
,
headSet()
, and tailSet()
methods. The
first()
,
last()
, and comparator( )
methods provide access to the first element, the last element, and
the object used to compare elements (more on this later).
The last child interface of Collection
is
List
. The List
interface adds the ability to manipulate elements at specific
positions in the list:
-
public void
add
(int
index
, Object
element
)
This method adds the given object at the supplied list position. If the position is less than zero or greater than the list length, an
IndexOutOfBoundsException
will be thrown. The element that was previously at the supplied position and all elements after it will be moved up by one index position.-
public void
remove
(int
index
)
This method removes the element at the supplied position. All subsequent elements will move down by one index position.
-
public void
get
(int
index
)
This method returns the element at the given position.
-
public void
set
(int
index
, Object
element
)
This method changes the element at the given position to be the supplied object.
The
Collections Framework also includes
the concept of a Map
, which is a
collection of key/value
pairs.
Another way of looking at a map is that it is a dictionary, similar
to an associative array. Maps store and retrieve elements with key
values; they are very useful for things like caches and minimalist
databases. When you store a value in a map, you associate a key
object with that value. When you need to look up the value, the map
retrieves it using the key.
The java.util.Map
interface specifies a map that, like Collection
,
operates on the type Object
. A
Map
stores an element of type
Object
and associates it with a key, also of type
Object
. In this way, we can index arbitrary types
of elements using arbitrary types as keys. As with
Collection
,
casting
is generally required to narrow objects back to their original type
after pulling them out of a map.
The basic operations are straightforward:
-
public Object
put
(Object
key
, Object
value
)
This method adds the specified key/value pair to the map. If the map already contains a value for the specified key, the old value is replaced.
-
public Object
get
(Object
key
)
This method retrieves the value corresponding to
key
from the map.-
public Object
remove
(Object
key
)
This method removes the value corresponding to
key
from the map.-
public int
size
( )
Use this method to find the number of key/value pairs in this map.
You can retrieve all the keys or values in the map:
-
public Set
keySet
( )
This method returns a
Set
that contains all of the keys in this map.-
public Collection
values
( )
Use this method to retrieve all of the values in this map. The returned
Collection
can contain duplicate elements.
Map
has one child interface,
SortedMap
. SortedMap
maintains its key/value pairs in sorted order according to the key
values. It provides
subMap()
, headMap( )
, and
tailMap( )
methods for retrieving sorted map
subsets. Like SortedSet
, it also provides a
comparator( )
method that returns an object that
determines how the map keys are sorted. We’ll talk more about
this later.
Up until this point, we’ve talked only about interfaces. But you can’t instantiate interfaces. The Collections Framework includes useful implementations of the collections interfaces. These implementations are listed in Table 9.7, according to the interface they implement.
Table 9-7. Collections Framework Implementation Classes
Interface |
Implementation |
---|---|
|
|
|
|
|
|
|
|
|
|
ArrayList
offers good performance if you add to
the end of the
list frequently, while
LinkedList
offers better performance for frequent
insertions and deletions.
Vector
has been around since SDK 1.0;
it’s now retrofitted to implement the List
methods. Vector
offers the advantage (and
overhead) of
synchronized
methods, which is
essential for multithreaded access. The old
Hashtable
has been updated so that it now
implements the Map
interface. It also has the
advantage and overhead of synchronized operations. As you’ll
see, there are other, more general ways to get synchronized
collections.
If
you’ve used a hashtable before, you’ve probably guessed
that there’s more going on behind the scenes with maps than
we’ve let on. An element in a hashtable is not associated with
a key strictly by the key object’s identity, but rather by the
key’s contents. This allows keys that
are equivalent to access the same object.
By “equivalent,” we mean those objects that compare
true
with equals( )
. So, if you
store an object in a Hashtable
using one object as
a key, you can use any other object that equals( )
tells you is equivalent to retrieve the stored object.
It’s easy to see why
equivalence
is important if you remember our discussion of strings. You may
create two String
objects that have the same text
in them but that come from different sources in Java. In this case,
the ==
operator will tell you that the
objects are different, but the equals( )
method of
the String
class will tell you that they are
equivalent. Because they are equivalent, if we store an object in a
Hashtable
using one of the
String
objects as a key, we can retrieve it using
the other.
Since Hashtable
s have a notion of equivalent keys,
what does the hashcode do? The hashcode is like a fingerprint of the
object’s data content. The Hashtable
uses
the hashcode to store the objects so that they can be retrieved
efficiently. The hashcode is nothing more than a number (an integer)
that is a function of the data. The number always turns out the same
for identical data, but the hashing function is intentionally
designed to generate as random a number as possible for different
combinations of data. That is, a very small change in the data should
produce a big difference in the number. It is unlikely that two
similar data sets will produce the same hashcode.
A Hashtable
really just keeps a number of lists of
objects, but it puts objects into the lists based on their hashcode.
So when it wants to find the object again, it can look at the
hashcode and know immediately how to get to the appropriate list. The
Hashtable
still might end up with a number of
objects to examine, but the list should be short. For each object it
finds, it does the following comparison to see if the key matches:
if ((keyHashcode == storedKeyHashcode) && key.equals(storedKey)) return object;
There is no prescribed way to generate hashcodes. The only
requirement is that they be somewhat randomly distributed and
reproducible (based on the data). This means that two objects that
are not the same could end up with the same hashcode by accident.
This is unlikely (there are 2^32 possible integer values); moreover,
it doesn’t cause a problem, because the
Hashtable
ultimately checks the actual keys, as
well as the hashcodes, to see if they are equal. Therefore, even if
two objects have the same hashcode, they can still co-exist in the
hashtable.
Hashcodes are computed by an object’s hashCode( )
method, which is inherited from the Object
class
if it isn’t overridden. The default hashCode( )
method simply assigns each object instance a unique
number to be used as a hashcode. If a class does not override this
method, each instance of the class will have a unique hashcode. This
goes along well with
the default implementation of
equals( )
in Object
, which only
compares objects for identity using ==
.
You must override equals( )
in any classes for
which equivalence of different objects is meaningful. Likewise, if
you want equivalent objects to serve as equivalent keys, you need to
override the hashCode( )
method, as well, to
return identical hashcode values. To do this, you need to create some
suitably complex and arbitrary function of the contents of your
object. The only criterion for the function is that it should be
almost certain to return different values for objects with different
data, but the same value for objects with identical
data.
The
java.util.Collections
class is full of handy static methods
that operate on Set
s and Map
s.
(It’s not the same as the
java.util.Collection
interface, which we’ve
already talked about.) Since all the static methods in
Collections
operate on interfaces, they will work
regardless of the actual implementation classes you’re using.
This is pretty powerful stuff.
The Collections
class includes these
methods:
public static Collection
synchronizedCollection
(Collection
c
)
|
public static Set
synchronizedSet
(Set
s
)
|
public static List
synchronizedList
(List
list
)
|
public static Map
synchronizedMap
(Map
m
)
|
public static SortedSet
synchronizedSortedSet
(SortedSet
s
)
|
public static SortedMap
synchronizedSortedMap
(SortedMap
m
)
|
These methods create synchronized, thread-safe versions of the supplied collection. This is useful if you’re planning to access the collection from more than one thread. We’ll talk a more about this later in this chapter.
Furthermore, you can use the Collections
class to
create read-only versions of any collection:
public static Collection
unmodifiableCollection
(Collection
c
)
|
public static Set
unmodifiableSet
(Set
s
)
|
public static List
unmodifiableList
(List
list
)
|
public static Map
unmodifiableMap
(Map
m
)
|
public static SortedSet
unmodifiableSortedSet
(SortedSet
s
)
|
public static SortedMap
unmodifiableSortedMap
(SortedMap
m
)
|
Collections
includes other
methods for performing common operations
like sorting. Sorting comes in two varieties:
-
public static void
sort
(List
list
)
This method sorts the given list. You can use this method only on lists whose elements implement the
java.lang.Comparable
interface. Luckily, many classes already implement this interface, includingString
,Date
,BigInteger
, and the wrapper classes for the primitive types (Integer
,Double
, etc.).-
public static void
sort
(List
list
, Comparator
c
)
Use this method to sort a list whose elements don’t implement the
Comparable
interface. The suppliedjava.util.Comparator
does the work of comparing elements. You might, for example, write anImaginaryNumber
class and want to sort a list of them. You would then create aComparator
implementation that knew how to compare two imaginary numbers.
Collections
gives you some other interesting
capabilities, too. If you’re interested in finding out more,
check out the min( )
, max( )
,
binarySearch( )
, and reverse( )
methods.
Collections is a bread-and-butter topic, which means it’s hard to make exciting examples about it. The example in this section reads a text file, parses all its words, counts the number of occurrences, sorts them, and writes the results to another file. It will give you a good feel for how to use collections in your own programs.
//file: WordSort.java import java.io.*; import java.util.*; public class WordSort { public static void main(String[] args) throws IOException { // get the command-line arguments if (args.length < 2) { System.out.println("Usage: WordSort inputfile outputfile"); return; } String inputfile = args[0]; String outputfile = args[1]; /* Create the word map. Each key is a word and each value is an * Integer that represents the number of times the word occurs * in the input file. */ Map map = new TreeMap( ); // read every line of the input file BufferedReader in = new BufferedReader(new FileReader(inputfile)); String line; while ((line = in.readLine( )) != null) { // examine each word on the line StringTokenizer st = new StringTokenizer(line); while (st.hasMoreTokens( )) { String word = st.nextToken( ); Object o = map.get(word); // if there's no entry for this word, add one if (o == null) map.put(word, new Integer(1)); // otherwise, increment the count for this word else { Integer count = (Integer)o; map.put(word, new Integer(count.intValue( ) + 1)); } } } in.close( ); // get the map's keys and sort them List keys = new ArrayList(map.keySet( )); Collections.sort(keys); // write the results to the output file PrintWriter out = new PrintWriter(new FileWriter(outputfile)); Iterator iterator = keys.iterator( ); while (iterator.hasNext( )) { Object key = iterator.next( ); out.println(key + " : " + map.get(key)); } out.close( ); } }
Suppose, for example, that you have an input file named Ian Moore.txt:
Well it was my love that kept you going Kept you strong enough to fall And it was my heart you were breaking When he hurt your pride So how does it feel How does it feel How does it feel How does it feel
You could run the example on this file using the following command line:
java WordSort "Ian Moore.txt" count.txt
The output file, count.txt, looks like this:
And : 1 How : 3 Kept : 1 So : 1 Well : 1 When : 1 breaking : 1 does : 4 enough : 1 fall : 1 feel : 4 going : 1 he : 1 heart : 1 how : 1 hurt : 1 it : 6 kept : 1 love : 1 my : 2 pride : 1 strong : 1 that : 1 to : 1 was : 2 were : 1 you : 3 your : 1
The results are case-sensitive: “How” is recorded
separately from “how”. You could modify this behavior by
converting words to all lowercase after retrieving them from
the
StringTokenizer
:
String word = st.nextToken().toLowerCase( );
If a collection will be accessed by more than one thread (see Chapter 8), things get a little tricky. Operations on the collection should be fast, but on the other hand, you need to make sure that different threads don’t step on each other’s toes with respect to the collection.
The Collections
class provides methods that will
create a thread-safe version of any Collection
.
There are methods for each subtype of Collection
.
The following example shows how to create a thread-safe
List
:
List list = new ArrayList( ); List syncList = Collections.synchronizedList(list);
Although synchronized collections are thread-safe, the
Iterator
s returned from them are not. This is an
important point. If you obtain an Iterator
from a
collection, you should do your own synchronization to ensure that the
collection does not change as you’re iterating through its
elements. You can do this with the synchronized
keyword:
synchronized(syncList) { Iterator iterator = syncList.iterator( ); // do stuff with the iterator here }
WeakHashMap
is an especially interesting
collection. In some respects, it looks and behaves like a
HashMap
. What’s interesting is that the
key values in WeakHashMap
are allowed to be
harvested by the garbage collector.
WeakHashMap
makes use of weak
references. As you’ll
recall, objects in Java are garbage-collected when there are no
longer any references to them. A weak reference
is a special kind of reference that doesn’t prevent the garbage
collector from cleaning up the referenced object. Weak references and
their siblings, soft references and
phantom references, are implemented in the
java.lang.ref
package. We won’t go into
detail here; just the concept of a weak reference is important.
Why is WeakHashMap
useful? It means you
don’t have to remove key/value pairs from a
Map
when you’re finished with them. Normally
if you removed all references to a key object in the rest of your
application, the Map
would still contain a
reference and keep the object “alive.”
WeakHashMap
changes this; once you remove
references to a key object in the rest of the application, the
WeakHashMap
lets go
of it too.
[34] In C++, where classes don’t derive from a single
Object
class that supplies a base type and common
methods, the elements of a collection would usually be derived from
some common collectable class. This forces the use of multiple
inheritance and brings its associated problems.
[35] If you’re familiar
with earlier versions of Java, it will help you to know that
Iterator
is the successor to
Enumeration
, kind of an
Enumeration
on steroids.
Get Learning Java now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.