Chapter 1. Interfaces
This book presents three topics:
- Data structures
Starting with the structures in the Java Collections Framework (JCF), you will learn how to use data structures like lists and maps, and you will see how they work.
- Analysis of algorithms
I present techniques for analyzing code and predicting how fast it will run and how much space (memory) it will require.
- Information retrieval
To motivate the first two topics, and to make the exercises more interesting, we will use data structures and algorithms to build a simple web search engine.
Here’s an outline of the order of topics:
We’ll start with the
List
interface and you will write classes that implement this interface two different ways. Then we’ll compare your implementations with the Java classesArrayList
andLinkedList
.Next I’ll introduce tree-shaped data structures and you will work on the first application: a program that reads pages from Wikipedia, parses the contents, and navigates the resulting tree to find links and other features. We’ll use these tools to test the “Getting to Philosophy” conjecture (you can get a preview by reading http://thinkdast.com/getphil).
We’ll learn about the
Map
interface and Java’sHashMap
implementation. Then you’ll write classes that implement this interface using a hash table and a binary search tree.Finally, you will use these classes (and a few others I’ll present along the way) to implement a web search engine, including a crawler that finds and reads pages, an indexer that stores the contents of web pages in a form that can be searched efficiently, and a retriever that takes queries from a user and returns relevant results.
Let’s get started.
Why Are There Two Kinds of List?
When people start working with the Java Collections Framework, they are sometimes confused about ArrayList
and LinkedList
. Why does Java provide two implementations of the List
interface? And how should you choose which one to use? I will answer these questions in the next few chapters.
I’ll start by reviewing interface
s and the classes that implement them, and I’ll present the idea of “programming to an interface”.
In the first few exercises, you’ll implement classes similar to ArrayList
and LinkedList
, so you’ll know how they work, and we’ll see that each of them has pros and cons. Some operations are faster or use less space with ArrayList
; others are faster or smaller with LinkedList
. Which one is better for a particular application depends on which operations it performs most often.
Interfaces in Java
A Java interface
specifies a set of methods; any class that implements this interface
has to provide these methods. For example, here is the source code for Comparable
, which is an interface
defined in the package java.lang
:
public interface Comparable<T> { public int compareTo(T o); }
This interface
definition uses a type parameter, T
, which makes Comparable
a generic type. In order to implement this interface
, a class has to
Specify the type
T
refers to, andProvide a method named
compareTo
that takes an object as a parameter and returns anint
.
For example, here’s the source code for java.lang.Integer
:
public final class Integer extends Number implements Comparable<Integer> { public int compareTo(Integer anotherInteger) { int thisVal = this.value; int anotherVal = anotherInteger.value; return (thisVal<anotherVal ? -1 : (thisVal==anotherVal ? 0 : 1)); } // other methods omitted }
This class extends Number
, so it inherits the methods and instance variables of Number
; and it implements Comparable<Integer>
, so it provides a method named compareTo
that takes an Integer
and returns an int
.
When a class declares that it implements an interface
, the compiler checks that it provides all methods defined by the interface
.
As an aside, this implementation of compareTo
uses the “ternary operator”, sometimes written ?:
. If you are not familiar with it, you can read about it at http://thinkdast.com/ternary.
List Interface
The Java Collections Framework (JCF) defines an interface
called List
and provides two implementations, ArrayList
and LinkedList
.
The interface
defines what it means to be a List
; any class that implements this interface
has to provide a particular set of methods, including add
, get
, remove
, and about 20 more.
ArrayList
and LinkedList
provide these methods, so they can be used interchangeably. A method written to work with a List
will work with an ArrayList
, LinkedList
, or any other object that implements List
.
Here’s a contrived example that demonstrates the point:
public class ListClientExample { private List list; public ListClientExample() { list = new LinkedList(); } private List getList() { return list; } public static void main(String[] args) { ListClientExample lce = new ListClientExample(); List list = lce.getList(); System.out.println(list); } }
ListClientExample
doesn’t do anything useful, but it has the essential elements of a class that encapsulates a List
; that is, it contains a List
as an instance variable. I’ll use this class to make a point, and then you’ll work with it in the first exercise.
The ListClientExample
constructor initializes list
by instantiating (that is, creating) a new LinkedList
; the getter method called getList
returns a reference to the internal List
object; and main
contains a few lines of code to test these methods.
The important thing about this example is that it uses List
whenever possible and avoids specifying LinkedList
or ArrayList
unless it is necessary. For example, the instance variable is declared to be a List
, and getList
returns a List
, but neither specifies which kind of list.
If you change your mind and decide to use an ArrayList
, you only have to change the constructor; you don’t have to make any other changes.
This style is called interface-based programming, or more casually, “programming to an interface” (see http://thinkdast.com/interbaseprog). Here we are talking about the general idea of an interface, not a Java interface
.
When you use a library, your code should only depend on the interface, like List
. It should not depend on a specific implementation, like ArrayList
. That way, if the implementation changes in the future, the code that uses it will still work.
On the other hand, if the interface changes, the code that depends on it has to change, too. That’s why library developers avoid changing interfaces unless absolutely necessary.
Exercise 1
Since this is the first exercise, we’ll keep it simple. You will take the code from the previous section and swap the implementation; that is, you will replace the LinkedList
with an ArrayList
. Because the code programs to an interface, you will be able to swap the implementation by changing a single line and adding an import
statement.
Start by setting up your development environment. For all of the exercises, you will need to be able to compile and run Java code. I developed the examples using Java SE Development Kit 7. If you are using a more recent version, everything should still work. If you are using an older version, you might find some incompatibilities.
I recommend using an interactive development environment (IDE) that provides syntax checking, auto-completion, and source code refactoring. These features help you avoid errors or find them quickly. However, if you are preparing for a technical interview, remember that you will not have these tools during the interview, so you might also want to practice writing code without them.
If you have not already downloaded the code for this book, see the instructions in “Working with the Code”.
In the directory named code
, you should find these files and directories:
build.xml
is an Ant file that makes it easier to compile and run the code.lib
contains the libraries you’ll need (for this exercise, just JUnit).src
contains the source code.
If you navigate into src/com/allendowney/thinkdast
, you’ll find the source code for this exercise:
ListClientExample.java
contains the code from the previous section.ListClientExampleTest.java
contains a JUnit test forListClientExample
.
Review ListClientExample
and make sure you understand what it does. Then compile and run it. If you use Ant, you can navigate to the code
directory and run ant ListClientExample
.
List is a raw type. References to generic type List<E> should be parameterized.
To keep the example simple, I didn’t bother to specify the type of the elements in the List
. If this warning bothers you, you can fix it by replacing each List
or LinkedList
with List<Integer>
or LinkedList<Integer>
.
Review ListClientExampleTest
. It runs one test, which creates a ListClientExample
, invokes getList
, and then checks whether the result is an ArrayList
. Initially, this test will fail because the result is a LinkedList
, not an ArrayList
. Run this test and confirm that it fails.
NOTE: This test makes sense for this exercise, but it is not a good example of a test. Good tests should check whether the class under test satisfies the requirements of the interface; they should not depend on the details of the implementation.
In the ListClientExample
, replace LinkedList
with ArrayList
. You might have to add an import
statement. Compile and run ListClientExample
. Then run the test again. With this change, the test should now pass.
To make this test pass, you only had to change LinkedList
in the constructor; you did not have to change any of the places where List
appears. What happens if you do? Go ahead and replace one or more appearances of List
with ArrayList
. The program should still work correctly, but now it is “overspecified”. If you change your mind in the future and want to swap the interface again, you would have to change more code.
In the ListClientExample
constructor, what happens if you replace ArrayList
with List
? Why can’t you instantiate a List
?
Get Think Data Structures now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.