Chapter 4. The Document Management System
The Challenge
After successfully implementing an advanced Bank Statements Analyzer for Mark Erbergzuck you decide to run some errands—including going to an appointment with your dentist. Dr. Avaj has run her practice successfully for many years. Her happy patients retain their white teeth well into old age. The downside of such a successful practice is that every year more and more patient documents get generated. Every time she needs to find a record of prior treatment, her assistants spend longer and longer searching their filing cabinets.
She realizes that it’s time to automate the process of managing these documents and keeping track of them. Luckily, she has a patient who can do that for her! You are going to help by writing software for her that manages these documents and enables her to find the information that will allow her practice to thrive and grow.
The Goal
In this chapter you’ll be learning about a variety of different software development principles. Key to the design of managing documents is an inheritance relationship, which means extending a class or implementing an interface. In order to do this the right way you’ll get to understand the Liskov Substitution Principle, named after famed computer scientist Barbara Liskov.
Your understanding of when to use inheritance will get fleshed out with a discussion of the “Composition over Inheritance” principle.
Finally, you’ll extend your knowledge of how to write automated test code by understanding what makes a good and maintainable test. Now that we’ve spoiled the plot of this chapter, let’s get back to understanding what requirements Dr. Avaj has for the Document Management System.
Note
If at any point you want to look at the source code for this
chapter, you can look at the package com.iteratrlearning.shu_book.chapter_04
in the book’s code repository.
Document Management System Requirements
A friendly cup of tea with Dr. Avaj has revealed that she has the documents that she wants to manage as files on her computer. The Document Management System needs to be able to import these files and record some information about each file that can be indexed and searched. There are three types of documents that she cares about:
- Reports
-
A body of text detailing some consultation of operation on a patient.
- Letters
-
A text document that gets sent to an address. (You’re probably familiar with these already, come to think of it.)
- Images
-
The dental practice often records x-rays or photos of teeth and gums. These have a size.
In addition, all documents need to record the path to the file that is being managed and what patient the document is about. Dr. Avaj needs to be able to search these documents, and query whether each of the attributes about a different type of document contains certain pieces of information; for example, to search for letters where the body contains “Joe Bloggs.”
During the conversation, you also established that Dr. Avaj might wish to add other types of documents in the future.
Fleshing Out the Design
When approaching this problem, there are lots of big design choices to make and modeling approaches that we could take. These choices are subjective, and you’re welcome to try to code up a solution to Dr. Avaj’s problem before or after reading this chapter. In “Alternative Approaches” you can see the reasons why we avoid different choices and the overarching principles behind them.
One good first step to approaching any program is to start with test-driven development (TDD), which is what we did when writing the book’s sample solution. We won’t be covering TDD until Chapter 5, so let’s begin with thinking about the behaviors that your software needs to perform and incrementally fleshing out the code that implements these behaviors.
The Document Management System should be able to import documents on request
and add them into its internal store of documents. In order to fulfill this
requirement, let’s create the DocumentManagementSystem
class and add two
methods:
void importFile(String path)
-
Takes a path to a file that our user wants to import to the Document Management System. As this is a public API method that might take input from users in a production system, we take our path as a
String
rather than relying on a more type-safe class likejava.nio.Path
orjava.io.File
. List<Document> contents()
-
Returns a list of all the documents that the Document Management System currently stores.
You’ll notice that contents()
returns a list of some Document
class.
We’ve not said what this class entails yet, but it’ll reappear in due course.
For now, you can pretend that it’s an empty class.
Importers
A key characteristic of this system is that we need to be able to import documents of different types. For the purposes of this system you can rely on the files’ extensions in order to decide how to import them, since Dr. Avaj has been saving files with very specific extensions. All her letters have the .letter extension, reports have .report, and .jpg is the only image format used.
The simplest thing to do would be to just throw all the code for the importing mechanism into a single method, as shown in Example 4-1.
Example 4-1. Switch of extension example
switch
(
extension
)
{
case
"letter"
:
// code for importing letters.
break
;
case
"report"
:
// code for importing reports.
break
;
case
"jpg"
:
// code for importing images.
break
;
default
:
throw
new
UnknownFileTypeException
(
"For file: "
+
path
);
}
This approach would have solved the problem in question but would be hard to
extend. Every time you want to add another type of file that gets
processed you would need to implement another entry in the switch
statement.
Over time this method would become intractably long and hard to read.
If you keep your main class nice and simple and split out different
implementation classes for importing different types of documents, then it’s
easy to locate and understand each importer in isolation.
In order to support different document types, an Importer
interface is
defined. Each Importer
will be a class that can import a different type of file.
Now that we know we need an interface to import the files, how should we represent
the file that is going to be imported? We have a couple of different options:
use a plain String
to represent the path of the file, or use a class that represents
a file, like java.io.File
.
You could make the case that we should apply the principle of strong typing here:
take a type that represents the file and reduce the scope for errors versus using a String
.
Let’s take that approach and use a java.io.File
object as the parameter
in our Importer
interface to represent the file being imported, as shown in Example 4-2.
Example 4-2. Importer
interface
Importer
{
Document
importFile
(
File
file
)
throws
IOException
;
}
You might be asking, Why don’t you use a File
for
the public API of DocumentManagementSystem
as well then? Well, in the case
of this application, our public API would probably be wrapped up in some kind of
user interface, and we aren’t sure what form that is taking files in. As a
result we kept things simple and just used a String
type.
The Document Class
Let’s also define the Document
class at this point in time. Each document will have
multiple attributes that we can search on. Different documents have different
types of attributes. We have several different options that we can consider the
pros and cons of when defining the Document
.
The first and simplest way to represent a document would be to use a
Map<String, String>
, which is a map from attribute names to values associated
with those attributes. So why not just pass a Map<String, String>
around
through the application? Well, introducing a domain class to model a single
document is not just drinking the OOP Koolaid, but also provides a series of
practical improvements in application maintability and readability.
For a start, the value of giving concrete names to components within an application cannot be overstated. Communication is King! Good teams of software developers use a Ubiquitous Language to describe their software. Matching the vocabulary that you use within the code of your application to the vocabulary that you use to talk to clients like Dr. Avaj makes things a lot easier to maintain. When you have a conversation with a colleague or client you will invariably need to agree upon some common language with which to describe different aspects of the software. By mapping this to the code itself, it makes it really easy to know what part of the code to change. This is called discoverability.
Note
The term Ubiquitous Language was coined by Eric Evans and originates in Domain Driven Design. It refers to the use of a common language that is clearly degined and shared between both developers and users.
Another principle that should encourage you to introduce a class to model
a document is strong typing. Many people use this term to refer to the nature of
a programming language, but here we’re talking about the more practical use
of strong typing in implementing your software. Types allow us to restrict the
way in which data is used. For example, our Document
class is immutable: once
it has been created you can’t change, or mutate, any of its attributes. Our
Importer
implementations create the documents; nothing else modifies them. If
you ever see a Document
with an error in one of its attributes, you can narrow
the source of the bug down to the specific Importer
that created the Document
.
You can also infer from the immutability that it’s possible to index or cache any
information associated with the Document
and you know that it will be correct
forever, since documents are immutable.
Another design choice that developers might consider when modeling their
Document
would be make the Document
extend HashMap<String, String>
.
At first that seems great because the HashMap
has all the functionality you
need to model a Document
. However, there are several reasons why this is a
bad choice.
Software design is often as much about restricting functionality
that is undesirable as it is about building things that you do want. We would
have instantly thrown away the aforementioned benefits from immutability
by allowing anything in the application to modify the Document
class if
it were just a subclass of HashMap
. Wrapping the collection also gives us
an opportunity to give more meaningful names to the methods, instead of, for example, looking
up an attribute by calling the get()
method, which doesn’t really mean anything!
Later on we’ll go into more detail about inheritance versus composition, because this is
really a specific example of that discussion.
In short, domain classes allow us to name a concept and restrict the possible
behaviors and values of this concept in order to improve discoverability and
reduce the scope for bugs. As a result, we’ve chosen to model the Document
as shown in Example 4-3. If you’re wondering why it isn’t public
like most interfaces, this is discussed later in “Scoping and Encapsulation Choices”.
Example 4-3. Document
public
class
Document
{
private
final
Map
<
String
,
String
>
attributes
;
Document
(
final
Map
<
String
,
String
>
attributes
)
{
this
.
attributes
=
attributes
;
}
public
String
getAttribute
(
final
String
attributeName
)
{
return
attributes
.
get
(
attributeName
);
}
}
One final thing to note about Document
is that it has a package-scoped
constructor. Often Java classes make their constructor public
, but this can
be a bad choice as it allows code anywhere in your project to create
objects of that type. Only code in the Document Management System should
be able to create Documents
, so we keep the constructor package scoped
and restrict access to only the package that the Document Management System
lives in.
Attributes and Hierarchical Documents
In our Document
class we used Strings
for attributes. Doesn’t this go against
the principle of strong typing? The answer here is yes and no. We are storing
attributes as text so that they can be searched through a text-based search.
Not only that, but we want to ensure that all attributes are created in a very
generic form that is independent of the Importer
that created them. Strings
aren’t a bad choice as such in this context. It should be noted that passing
Strings
around throughout an application in order to represent information is
often considered a bad idea. In contrast with something being strongly typed,
this is termed stringly typed!
In particular, if more complicated use was being made of the attribute values, then having different attribute types parsed out would be useful. For example, if we wanted to be able to find addresses within a certain distance or images with a height and width less than a certain size, then having strongly typed attributes would be a boon. It would be a lot easier to make comparisons with a width value that is an integer. In the case of this Document Management System, however, we simply don’t need that functionality.
You could design the Document Management System with a class hierarchy for
Documents
that models the Importer
hierarchy. For example, a ReportImporter
imports instances of the Report
class that extends the Document
class. This
passes our basic sanity check for subclassing. In other words, it allows you to
say a Report
is a Document
and it makes sense as a sentence. We chose not
to go down that direction, however, as the right way to model classes in an OOP
setting is to think in terms of behavior and data.
The documents are all modeled very generically in terms of named attributes, rather than specific fields that exist within different subclasses. Additionally, as far as this system is concerned, documents have very little behavior associated with them. There was simply no point in adding a class hierarchy here when it provided no benefit. You might think that this statement in and of itself is a little arbitrary, but it informs us of another principle: KISS.
You learned about the KISS principle in Chapter 2. KISS means that designs are better if they are kept simple. It’s often very hard to avoid unnecessary complexity, but it’s worth trying hard to do so. Whenever someone says, “we might need X” or “it would be cool if we also did Y,” just say No. Bloated and complex designs are paved with good intentions around extensibility and code that is a nice-to-have rather than must-have.
Implementing and Registering Importers
You can implement the Importer
interface to look up different types of files.
Example 4-4 shows the way that images are imported. One of the
great things about Java’s core library is that it provides a lot of built-in
functionality right out of the box. Here we read an image file using the
ImageIO.read
method and then extract the width and height of the image from
the resulting BufferedImage
object.
Example 4-4. ImageImporter
import
static
com
.
iteratrlearning
.
shu_book
.
chapter_04
.
Attributes
.*;
class
ImageImporter
implements
Importer
{
@Override
public
Document
importFile
(
final
File
file
)
throws
IOException
{
final
Map
<
String
,
String
>
attributes
=
new
HashMap
<>();
attributes
.
put
(
PATH
,
file
.
getPath
());
final
BufferedImage
image
=
ImageIO
.
read
(
file
);
attributes
.
put
(
WIDTH
,
String
.
valueOf
(
image
.
getWidth
()));
attributes
.
put
(
HEIGHT
,
String
.
valueOf
(
image
.
getHeight
()));
attributes
.
put
(
TYPE
,
"IMAGE"
);
return
new
Document
(
attributes
);
}
}
Attribute names are constants defined in the Attributes
class. This avoids bugs
where different importers end up using different strings for the same attribute
name; for example, "Path"
versus "path"
. Java itself doesn’t have a direct concept
of a constant as such, Example 4-5 shows the commonly used idiom. This
constant is public
because we want to be able to use it from different importers,
though you may well have a private
or package
scoped constant instead. The use
of the final
keyword ensures that it can’t be reassigned to and static
ensures
that there is only a single instance per class.
Example 4-5. How to define a constant in Java
public
static
final
String
PATH
=
"path"
;
There are importers for all three different types of files and you will see the
other two implemented in “Extending and Reusing Code”. Don’t worry, we’re not
hiding anything up our sleeves. In order to be able to use the Importer
classes
when we import files, we also need to register the importers to look them up.
We use the extension of the file that we want to import as the key of the Map
, as shown in Example 4-6.
Example 4-6. Registering the importers
private
final
Map
<
String
,
Importer
>
extensionToImporter
=
new
HashMap
<>();
public
DocumentManagementSystem
()
{
extensionToImporter
.
put
(
"letter"
,
new
LetterImporter
());
extensionToImporter
.
put
(
"report"
,
new
ReportImporter
());
extensionToImporter
.
put
(
"jpg"
,
new
ImageImporter
());
}
Now that you know how to import documents, we can implement search. We won’t be
focusing on the most efficient way to implement searching of documents here
since we’re not trying to implement Google, just get the information to Dr. Avaj
that she requires. A conversation with Dr. Avaj revealed that she wanted to be
able to look up information about different attributes of a Document
.
Her requirements could be met by just being able to find subsequences within
attribute values. For example, she might want to search for documents that have
a patient called Joe, and with Diet Coke in the body. We thus devised a very
simple query language that consisted of a series of attribute name and
substring pairs separated by commas. Our aforementioned query would be written
as "patient:Joe,body:Diet Coke"
.
Since the search implementation keeps things simple rather than trying to be
highly optimized, it just does a linear scan over all the documents recorded in
the system and tests each one against the query. The query String
that is
passed to the search
method is parsed into a Query
object that can then
be tested against each Document
.
The Liskov Substitution Principle (LSP)
We’ve talked about a few specific design decisions related to classes—for
example, modeling different Importer
implementations with classes, and why we
didn’t introduce a class hierarchy for the Document
class and why we didn’t
just make Document
extend HashMap
. But really there’s a broader principle
at stake here, one that allows us to generalize these examples into an approach
that you can use in any piece of software. This is called the Liskov
Substitution Principle (LSP) and it helps us understand how to subclass and
implement interfaces correctly. LSP forms the L of the SOLID principles that
we’ve been referring to throughout this book.
The Liskov Substitution Principle is often stated in these very formal terms, but is actually a very simple concept. Let’s demystify some of this terminology. If you hear type in this context, just think of a class or an interface. The term subtype means establish a parent-to-child relationship between types; in other words, extend a class or implement an interface. So informally you can think of this as meaning that child classes should maintain the behavior they inherit from their parents. We know, we know—it sounds like an obvious statement, but we can be more specific and split out LSP into four distinct parts:
- Preconditions cannot be strengthened in a subtype
-
A precondition establishes the conditions under which some code will work. You can’t just assume what you’ve written will work anyway, anyhow, anywhere. For example, all our
Importer
implementations have the precondition that the file being imported exists and is readable. As a result, theimportFile
method has validation code before anyImporter
is invoked, as can be seen in Example 4-7.Example 4-7. importFile definition
public
void
importFile
(
final
String
path
)
throws
IOException
{
final
File
file
=
new
File
(
path
);
if
(!
file
.
exists
())
{
throw
new
FileNotFoundException
(
path
);
}
final
int
separatorIndex
=
path
.
lastIndexOf
(
'.'
);
if
(
separatorIndex
!=
-
1
)
{
if
(
separatorIndex
==
path
.
length
())
{
throw
new
UnknownFileTypeException
(
"No extension found For file: "
+
path
);
}
final
String
extension
=
path
.
substring
(
separatorIndex
+
1
);
final
Importer
importer
=
extensionToImporter
.
get
(
extension
);
if
(
importer
==
null
)
{
throw
new
UnknownFileTypeException
(
"For file: "
+
path
);
}
final
Document
document
=
importer
.
importFile
(
file
);
documents
.
add
(
document
);
}
else
{
throw
new
UnknownFileTypeException
(
"No extension found For file: "
+
path
);
}
}
LSP means that you can’t require any more restrictive preconditions than your parent required. So, for example, you can’t require your document to be smaller than 100KB in size if your parent should be able to import any size of document.
- Postconditions cannot be weakened in a subtype
-
This might sound a bit confusing because it reads a lot like the first rule. Postconditions are things that have to be true after some code has run. For example, after
importFile()
has run, if the file in question is valid it must be in the list of documents returned bycontents()
. So if the parent has some kind of side effect or returns some value, then the child must do so as well. - Invariants of the supertype must be preserved in a subtype
-
An invariant is something that never changes, like the ebb and flow of the tides. In the context of inheritance, we want to make sure that any invariants that are expected to be maintained by the parent class should also be maintained by the children.
- The History Rule
-
This is the hardest aspect of LSP to understand. In essence, the child class shouldn’t allow state changes that your parent disallowed. So, in our example program we have an immutable
Document
class. In other words, once it has been instantiated you can’t remove, add, or alter any of the attributes. You shouldn’t subclass thisDocument
class and create a mutableDocument
class. This is because any user of the parent class would expect certain behavior in response to calling methods on theDocument
class. If the child were mutable, it could violate callers’ expectations about what calling those methods does.
Alternative Approaches
You could have taken a completely different approach when it comes to designing the Document Management System. We’ll take a look at some of these alternatives now as we think they are instructive. None of the choices could be considered wrong as such, but we do think the chosen approach is best.
Making Importer a Class
You could have chosen to make a class hierarchy for importers, and have a class
at the top for the Importer
rather than an interface. Interfaces and classes
provide a different set of capabilities. You can implement multiple interfaces,
while classes can contain instance fields and it’s more usual to have method
bodies in classes.
In this case the reason to have a hierarchy is to enable different importers to be used. You’ve already heard about our motivation for avoiding brittle class-based inheritance relationships, so it should be pretty clear that using interfaces is a better choice here.
That’s not to say that classes wouldn’t be a better choice elsewhere. If you want to model a strong is a relationship in your problem domain that involves state or a lot of behavior, then class-based inheritance is more appropriate. It’s just not the choice we think is most appropriate here.
Scoping and Encapsulation Choices
If you have taken the time to peruse the code you might notice that the
Importer
interface, its implementations, and our Query
class are all
package scoped. Package scope is the default scope, so if you see a class file
with class Query
at the top you know it’s package scoped, and if it says public
class Query
it’s public scoped. Package scoping means that other classes
within the same package can see or have access to the class, but no one
else can. It’s a cloaking device.
A strange thing about the Java ecosystem is that even though package scope is
the default scope, whenever we’ve been involved in software development projects
there are always more public
-scoped classes than package-scoped ones. Perhaps
the default should have been public
all along, but either way package scope
is a really useful tool. It helps you encapsulate these kinds of design
decisions. A lot of this section has commented on the different choices that
are available to you around designing the system, and you may want to refactor
to one of these alternative designs when maintaining the system. This would
be harder if we leaked details about this implementation outside of the package in
question. Through diligent use of package scoping you can stop classes outside
of the package making so many assumptions about that internal design.
We think it’s also worth reiterating that this is simply a justification and explanation of these design choices. There’s nothing inherently wrong with making other choices listed in this section—they may work out to be more appropriate depending on how the application evolves over time.
Extending and Reusing Code
When it comes to software, the only constant is change. Over time you may want to add features to your product, customer requirements may change, and regulations could force you alter your software. As we alluded to earlier, there may be more documents that Dr. Avaj would like to add to our Document Management System. In fact, when we first came to showcase the software that we’ve written for her she immediately realized that invoicing clients was something that she also wanted to keep track of in this system. An invoice is a document with a body and an amount and has an .invoice extension. Example 4-8 shows an example invoice.
Example 4-8. Invoice example
Dear Joe Bloggs Here is your invoice for the dental treatment that you received. Amount: $100 regards, Dr Avaj Awesome Dentist
Fortunately for us, all of Dr. Avaj’s
invoices are in the same format. As you can see, we need to extract an
amount of money from this, and the amount line starts with the Amount:
prefix.
The person’s name is at the beginning of the letter on a line with the prefix
Dear
. In fact, our system implements a general method of finding the suffix
of a line with a given prefix, shown in Example 4-9. In this example,
the field lines
has already been initialized with the lines of the file that
we’re importing. We pass this method a prefix
—for example, “Amount:”—and it
associates the rest of the line, the suffix, with a provided attribute name.
Example 4-9. addLineSuffix definition
void
addLineSuffix
(
final
String
prefix
,
final
String
attributeName
)
{
for
(
final
String
line:
lines
)
{
if
(
line
.
startsWith
(
prefix
))
{
attributes
.
put
(
attributeName
,
line
.
substring
(
prefix
.
length
()));
break
;
}
}
}
We in fact have a similar concept when we try to import a letter. Consider the
example letter presented in Example 4-10. Here you can extract the name of the
patient by looking for a line starting with Dear
. Letters also have addresses
and bodies of text that you want to extract from the contents of the text file.
Example 4-10. Letter example
Dear Joe Bloggs 123 Fake Street Westminster London United Kingdom We are writing to you to confirm the re-scheduling of your appointment with Dr. Avaj from 29th December 2016 to 5th January 2017. regards, Dr Avaj Awesome Dentist
We also have a similar problem when it comes to importing patient reports.
Dr. Avaj’s reports prefix the name of the patient with Patient:
and have a body
of text to include, just like letters. You can see an example of a report in
Example 4-11.
Example 4-11. Report example
Patient: Joe Bloggs On 5th January 2017 I examined Joe's teeth. We discussed his switch from drinking Coke to Diet Coke. No new problems were noted with his teeth.
So one option here would be to have all three text-based importers implement the same method to find the suffixes of text lines with a given prefix that was listed in Example 4-9. Now if we were charging Dr. Avaj based on the number of lines of code that we had written, this would be a great strategy. We could triple the amount of money that we would make for basically the same work!
Sadly (or maybe not so sadly, given the aforementioned incentives), customers rarely pay based on the number of lines of code produced. What matters are the requirements that the customer wants. So we really want to be able to reuse this code across the three importers. In order to reuse the code we need to actually have it live in some class. You have essentially three options to consider, each with pros and cons:
-
Use a utility class
-
Use inheritance
-
Use a domain class
The simplest option to start with is to create a utility class. You could
call this ImportUtil
. Then every time you wanted to have a method that needs
to be shared between different importers it could go in this utility class. Your
utility class would end up being a bag of static methods.
While a utility class is nice and simple, it’s not exactly the pinnacle of
object-oriented programming. The object-oriented style involves having concepts
in your application be modeled by classes. If you want to create a thing, then
you invoke new Thing()
for whatever your thing is. Attributes and behavior
associated with the thing should be methods on the Thing
class.
If you follow this principle of modeling real-world objects as classes, it does genuinely make it easier to understand your
application because it gives you a structure and maps a mental model of your
domain onto your code. You want to alter the way that letters are imported?
Well then edit the LetterImporter
class.
Utility classes violate this expectation and often end up turning into bundles of procedural code with no single responsibility or concept. Over time, this can often lead to the appearance of a God Class in our codebase; in other words, a single large class that ends up hogging a lot of responsibility.
So what should you do if you want to associate this behavior to a concept?
Well, the next most obvious approach might be to use inheritance. In this approach
you would have the different importers extend a TextImporter
class. You could
then place all the common functionality on that class and reuse it in subclasses.
Inheritance is a perfectly solid choice of design in many circumstances. You’ve already seen the Liskov Substitution Principle and how it puts constraints on the correctness of our inheritance relationship. In practice, inheritance is often a poor choice when the inheritance fails to model some real-world relationship.
In this case, a TextImporter
is an Importer
and we can
ensure that our classes follow the LSP rules, but it doesn’t really seem like a
strong concept to work with. The issue with inheritance relationships that don’t
correspond to real-world relationships is that they tend to be brittle. As your
application evolves over time you want abstractions that evolve with the
application rather than against it. As a rule of thumb, it’s a bad idea to introduce
an inheritance relationship purely to enable code reuse.
Our final choice is to model the text file using a domain class. To use this
approach we would model some
underlying concept and build out our different importers by invoking methods on
top of the underlying concept. So what’s the concept in question here? Well,
what we’re really trying to do is manipulate the contents of a text file, so
let’s call the class a TextFile
. It’s not original or creative, but that’s
the point. You know where the functionality for manipulating text files lies,
because the class is named in a really dead simple manner.
Example 4-12 shows the definition of the class and its fields.
Note that this isn’t a subclass of a Document
because a document shouldn’t
be coupled to just text files—we may import binary files like images as well.
This is just a class that models the underlying concept of a text file and has
associated methods for extracting data from text files.
Example 4-12. TextFile definition
class
TextFile
{
private
final
Map
<
String
,
String
>
attributes
;
private
final
List
<
String
>
lines
;
// class continues ...
This is the approach that we pick in the case of importers. We think this allows us to model our problem domain in a flexible way. It doesn’t tie us into a brittle inheritance hierarchy, but still allows us to reuse the code. Example 4-13 shows how to import invoices. The suffixes for the name and amount are added, along with setting the type of the invoice to be an amount.
Example 4-13. Importing invoices
@Override
public
Document
importFile
(
final
File
file
)
throws
IOException
{
final
TextFile
textFile
=
new
TextFile
(
file
);
textFile
.
addLineSuffix
(
NAME_PREFIX
,
PATIENT
);
textFile
.
addLineSuffix
(
AMOUNT_PREFIX
,
AMOUNT
);
final
Map
<
String
,
String
>
attributes
=
textFile
.
getAttributes
();
attributes
.
put
(
TYPE
,
"INVOICE"
);
return
new
Document
(
attributes
);
}
You can also see another example of an importer that uses the TextFile
class
in Example 4-14. No need to worry about how TextFile.addLines
is implemented;
you can see an explanation of that in Example 4-15.
Example 4-14. Importing letters
@Override
public
Document
importFile
(
final
File
file
)
throws
IOException
{
final
TextFile
textFile
=
new
TextFile
(
file
);
textFile
.
addLineSuffix
(
NAME_PREFIX
,
PATIENT
);
final
int
lineNumber
=
textFile
.
addLines
(
2
,
String:
:
isEmpty
,
ADDRESS
);
textFile
.
addLines
(
lineNumber
+
1
,
(
line
)
->
line
.
startsWith
(
"regards,"
),
BODY
);
final
Map
<
String
,
String
>
attributes
=
textFile
.
getAttributes
();
attributes
.
put
(
TYPE
,
"LETTER"
);
return
new
Document
(
attributes
);
}
These classes weren’t first written like this, though. They evolved into their
current state. When we started coding up the Document Management System, the
first text-based importer, the LetterImporter
, had all of its text
extraction logic written inline in the class. This is a good way to start.
Trying to seek out code to reuse often results in inappropriate abstractions.
Walk before you run.
As we started writing the ReportImporter
it become increasingly apparent that
a lot of the text extraction logic could be shared between the two importers, and
that really they should be written in terms of method invocations upon some
common domain concept that we have introduced here—the TextFile
. In fact, we
even copy and pasted the code that was to be shared between the two classes to
begin with.
That isn’t to say that copy and pasting code is good—far from it. But it’s
often better to duplicate a little bit of code when you start writing some
classes. Once you’ve implemented more of the application, the right
abstraction—e.g., a TextFile
class will become apparent. Only when you know a
little bit more about the right way to remove duplication should you go down
the route of removing the duplication.
In Example 4-15 you can see how the TextFile.addLines
method
was implemented. This is common code used by different Importer
implementations.
Its first argument is a start
index, which tells you which line number to start on.
Then there’s an isEnd
predicate that is applied to the line and returns true
if we’ve
reached the end of the line. Finally, we have the name of the attribute that we’re going to associate
with this value.
Example 4-15. addLines definition
int
addLines
(
final
int
start
,
final
Predicate
<
String
>
isEnd
,
final
String
attributeName
)
{
final
StringBuilder
accumulator
=
new
StringBuilder
();
int
lineNumber
;
for
(
lineNumber
=
start
;
lineNumber
<
lines
.
size
();
lineNumber
++)
{
final
String
line
=
lines
.
get
(
lineNumber
);
if
(
isEnd
.
test
(
line
))
{
break
;
}
accumulator
.
append
(
line
);
accumulator
.
append
(
"\n"
);
}
attributes
.
put
(
attributeName
,
accumulator
.
toString
().
trim
());
return
lineNumber
;
}
Test Hygiene
As you learned in Chapter 2, writing automated tests has a lot of benefits in terms of software maintainability. It enables us to reduce the scope for regressions and understand which commit caused them. It also enables us to refactor our code with confidence. Tests aren’t a magic panacea, though. They require that we write and maintain a lot of code in order to get these benefits. As you know, writing and maintaining code is a difficult proposition, and many developers find that when they first start writing automated tests that they can take a lot of developer time.
In order to solve the problem of test maintainability you need to get to grips with test hygiene. Test hygiene means to keep your test code clean and ensure that it is maintained and improved along with your codebase under test. If you don’t maintain and treat your tests, over time they will become a burden on your developer productivity. In this section you’ll learn about a few key points that can help to keep tests hygienic.
Test Naming
The first thing to think about when it comes to tests is their naming. Developers can get highly opinionated about naming—it’s an easy topic to talk about a lot because everyone can relate to it and think about the problem. We think the thing to remember is that there’s rarely a clear, really good name for something, but there are many, many, bad names.
The first test we wrote for the Document Management System was testing that
we import a file and create a Document
. This was written before we had
introduced the concept of an Importer
and weren’t testing Document
-specific
attributes. The code is in Example 4-16.
Example 4-16. Test for importing files
@Test
public
void
shouldImportFile
()
throws
Exception
{
system
.
importFile
(
LETTER
);
final
Document
document
=
onlyDocument
();
assertAttributeEquals
(
document
,
Attributes
.
PATH
,
LETTER
);
}
This test was named shouldImportFile
. The key driving principles when it
comes to test naming are readability, maintainability, and acting as executable
documentation. When you see a report of a test class being run, the names
should act as statements that document what functionality works and what does
not. This allows a developer to easily map from application behavior to a test
that asserts that this behavior is implemented. By reducing the impedence
mismatch between behavior and code, we make it easier for other developers to
understand what is happening in the future. This is a test that confirms that the
document management
system imports a file.
There are lots of naming anti-patterns, however. The worst anti-pattern is to
name a test something completely nondescript—for example, test1
. What on earth
is test1
testing? The reader’s patience? Treat people who are reading your code
like you would like them to treat you.
Another common test naming anti-pattern is just named after a concept or a noun—for example, file
or document
. Test names should describe
the behavior under test, not a concept. Another test naming anti-pattern is to
simply name the test after a method that is invoked during testing, rather than
the behavior. In this case the test might be named importFile
.
You might ask, by naming our test shouldImportFile
haven’t we committed this
sin here? There’s some merit to the accusation, but here we’re just describing
the behavior under test. In fact, the importFile
method is tested by various
tests; for example, shouldImportLetterAttributes
,
shouldImportReportAttributes
, and shouldImportImageAttributes
. None of those
tests are called importFile
—they are all describing more specific
behaviors.
OK, now you know what bad naming looks like, so what is good test naming? You should follow three rules of thumb and use them to drive test naming:
- Use domain terminology
-
Align the vocabulary used in your test names with that used when describing the problem domain or referred by the application itself.
- Use natural language
-
Every test name should be something that you can easily read as a sentence. It should always describe some behavior in a readable way.
- Be descriptive
-
Code will be read many times more often than it is written. Don’t skimp on spending more time thinking of a good name that’s descriptive up front and easier to understand later down the line. If you can’t think of a good name, why not ask a colleague? In golf, you win by putting in the fewest shots. Programming isn’t like that; shortest isn’t necessarily best.
You can follow the convention used in the DocumentManagementSystemTest
of
prefixing test names with the word “should,” or choose not to; that’s merely a matter of
personal preference.
Behavior Not Implementation
If you’re writing a test for a class, a component, or even a system, then you
should only be testing the public behavior of whatever is being tested. In
the case of the Document Management System, we only have tests for the behavior
of our public API in the form of DocumentManagementSystemTest
. In this test
we test the public API of the DocumentManagementSystem
class and thus the
whole system. The API can be seen in Example 4-17.
Example 4-17. Public API of the DocumentManagementSystem class
public
class
DocumentManagementSystem
{
public
void
importFile
(
final
String
path
)
{
...
}
public
List
<
Document
>
contents
()
{
...
}
public
List
<
Document
>
search
(
final
String
query
)
{
...
}
}
Our tests should only invoke these public API methods and not try to inspect the internal state of the objects or the design. This is one of the key mistakes made by developers that leads to hard-to-maintain tests. Relying on specific implementation details results in brittle tests because if you change the implementation detail in question, the test can start to fail even if the behavior is still working. Take a look at the test in Example 4-18.
Example 4-18. Test for importing letters
@Test
public
void
shouldImportLetterAttributes
()
throws
Exception
{
system
.
importFile
(
LETTER
);
final
Document
document
=
onlyDocument
();
assertAttributeEquals
(
document
,
PATIENT
,
JOE_BLOGGS
);
assertAttributeEquals
(
document
,
ADDRESS
,
"123 Fake Street\n"
+
"Westminster\n"
+
"London\n"
+
"United Kingdom"
);
assertAttributeEquals
(
document
,
BODY
,
"We are writing to you to confirm the re-scheduling of your appointment\n"
+
"with Dr. Avaj from 29th December 2016 to 5th January 2017."
);
assertTypeIs
(
"LETTER"
,
document
);
}
One way of testing this letter-importing functionality would have been to write the test as a
unit test on the LetterImporter
class. This would have looked fairly similar:
importing an example file and then making an assert about the result returned
from the importer. In our tests, though, the mere existence of the
LetterImporter
is an implementation detail. In “Extending and Reusing Code”,
you saw numerous other alternative choices for laying out our importer code. By
laying out our tests in this manner, we give ourselves the choice to refactor
our internals to a different design without breaking our tests.
So we’ve said that relying on the behavior of a class relies on using the public
API, but there’s also some parts of the behavior that aren’t usually restricted
just through making methods public or private. For example, we might not want
to rely on the order of documents being being returned from the contents()
method. That isn’t a property that’s restricted by the public API of the
DocumentManagementSystem
class, but simply something that you need to be careful
to avoid doing.
A common anti-pattern in this regard is exposing otherwise private state through a getter or setter in order to make testing easier. You should try to avoid doing this wherever possible as it makes your tests brittle. If you have exposed this state to make testing superficially easier, then you end up making maintaining your application harder in the long run. This is because any change to your codebase that involves changing the way this internal state is represented now also requires altering your tests. This is sometimes a good indication that you need to refactor out a new class that can be more easily and effectively tested.
Don’t Repeat Yourself
“Extending and Reusing Code” extensively discusses how we can remove duplicate
code from our application and where to place the resulting code. The exact
same reasoning around maintenance applies equally to test code. Sadly, developers
often simply don’t bother to remove duplication from tests in the same way as
they would for application code. If you take a look at Example 4-19 you’ll see a test that repeatedly makes asserts about the different
attributes that a resulting Document
has.
Example 4-19. Test for importing images
@Test
public
void
shouldImportImageAttributes
()
throws
Exception
{
system
.
importFile
(
XRAY
);
final
Document
document
=
onlyDocument
();
assertAttributeEquals
(
document
,
WIDTH
,
"320"
);
assertAttributeEquals
(
document
,
HEIGHT
,
"179"
);
assertTypeIs
(
"IMAGE"
,
document
);
}
Normally you would have to look up the attribute name for every attribute and
assert that it is equal to an expected value. In the case of the tests here, this is
a common enough operation that a common method, assertAttributeEquals
, was
extracted with this logic. Its implementation is shown in
Example 4-20.
Example 4-20. Implementing a new assertion
private
void
assertAttributeEquals
(
final
Document
document
,
final
String
attributeName
,
final
String
expectedValue
)
{
assertEquals
(
"Document has the wrong value for "
+
attributeName
,
expectedValue
,
document
.
getAttribute
(
attributeName
));
}
Good Diagnostics
Tests would be no good if they didn’t fail. In fact, if you’ve never seen a test fail how do you know if it’s working at all? When writing tests the best thing to do is to optimize for failure. When we say optimize, we don’t mean make the test run faster when it fails—we mean ensure that it is written in a way that makes understanding why and how it failed as easy as possible. The trick to this is good diagnostics.
By diagnostics we mean the message and information that gets printed out when a test fails. The clearer this message is about what has failed, the easier it is to debug the test failure. You might ask why even bother with this when a lot of the time Java tests are run from within modern IDEs that have debuggers built in? Well, sometimes tests may be run within continuous integration environments, and sometimes they may be from the command line. Even if you’re running them within an IDE it is still helpful to have good diagnostic information. Hopefully, we’ve convinced you of the need for good diagnostics, but what do they look like in code?
Example 4-21 shows a method that asserts that the system only contains a
single document. We will explain the hasSize()
method in a little bit.
Example 4-21. Test that the system contains a single document
private
Document
onlyDocument
()
{
final
List
<
Document
>
documents
=
system
.
contents
();
assertThat
(
documents
,
hasSize
(
1
));
return
documents
.
get
(
0
);
}
The simplest type of assert that JUnit offers us is assertTrue()
, which will
take a boolean value that it expects to be true. Example 4-22 shows how we
could have just used assertTrue
to implement the test. In this case the value
is being checked to equal 0
so that it will fail the shouldImportFile
test
and thus demonstrate the failure diagnostics. The problem with this is that we
don’t get very good diagnostics—just an AssertionError
with no information
in the message shown in Figure 4-1. You don’t know what failed, and you don’t know
what values were being compared. You know nothing, even if your name isn’t Jon
Snow.
Example 4-22. assertTrue example
assertTrue
(
documents
.
size
()
==
0
);
The most commonly used assertion is assertEquals
, which takes two values and
checks they are equal and is overloaded to support primitive values. So here
we can assert that the size of the documents
list is 0
, as shown in
Example 4-23. This produces a slightly better diagnostic as shown in
Figure 4-2, you know that
the expected value was 0
and the actual value was 1
, but it still doesn’t
give you any meaningful context.
Example 4-23. assertEquals example
assertEquals
(
0
,
documents
.
size
());
The best way of making an assert about the size itself is to use a matcher for asserting the collection size as this provides the most descriptive diagnostics. Example 4-24 has our example written in that style and demonstrates the output as well. As Figure 4-3 shows, this is much clearer as to what went wrong without you needing to write any more code.
Example 4-24. assertThat example
assertThat
(
documents
,
hasSize
(
0
));
What is going on here is that JUnit’s assertThat()
is being used. The method assertThat()
takes a value as its first parameter and a Matcher
as its second. The
Matcher
encapsulates the concept of whether a value matches some property and
also its associated diagnostics. The hasSize
matcher is statically imported
from a Matchers
utility class that contains a bundle of different matchers
and checks that the size of a collection is equal to its parameter. These
matchers come from the Hamcrest library, which is a very
commonly used Java library that enables cleaner testing.
Another example of how you can build better diagnostics was shown in
Example 4-20. Here an assertEquals
would have given us the
diagnostic for the attribute’s expected value and actual value. It wouldn’t
have told us what the name of the attribute was, so this was added into the
message string to help us understand failure.
Testing Error Cases
One of the absolute worst and most common mistakes to make when writing software is only to test the beautiful, golden, happy path of your application—the code path that is executed when the sun is shining on you and nothing goes wrong. In practice lots of things can go wrong! If you don’t test how your application behaves in these situations, you’re not going to end up with software that will work reliably in a production setting.
When it comes to importing documents into our Document Management System there are a couple of error cases that might happen. We might try to import a file that doesn’t exist or can’t be read, or we might try to import a file that we don’t know how to extract text from or read.
Our DocumentManagementSystemTest
has a couple of tests, shown in
Example 4-25, that test these two scenarios. In both cases we try to import a
path file that will expose the problem. In order to make an assert about the
desired behavior we use the expected =
attribute of JUnit’s @Test
annotation. This enables you to say Hey listen, JUnit, I’m expecting this test
to throw an exception, it’s of a certain type.
Example 4-25. Testing for error cases
@Test
(
expected
=
FileNotFoundException
.
class
)
public
void
shouldNotImportMissingFile
()
throws
Exception
{
system
.
importFile
(
"gobbledygook.txt"
);
}
@Test
(
expected
=
UnknownFileTypeException
.
class
)
public
void
shouldNotImportUnknownFile
()
throws
Exception
{
system
.
importFile
(
RESOURCES
+
"unknown.txt"
);
}
You may want an alternative behavior to simply throwing an exception in the case of an error, but it’s definitely helpful to know how to assert that an exception is thrown.
Constants
Constants are values that do not change. Let’s
face it—they are one of the few well-named concepts when it comes to computer
programming. The Java programming language doesn’t use an explicit const
keyword like C++ does, but conventionally developers create static
field
fields in order to represent constants. Since many tests consist of
examples of how a part of your computer program should be used, they often
consist of many constants.
It’s a good idea when it comes to constants that have some kind of nonobvious
meaning to give them a proper name that can be used within tests. We do that
extensively through the DocumentManagementSystemTest
, and in fact, have a block
at the top dedicated to declaring constants, shown in Example 4-26.
Example 4-26. Constants
public
class
DocumentManagementSystemTest
{
private
static
final
String
RESOURCES
=
"src"
+
File
.
separator
+
"test"
+
File
.
separator
+
"resources"
+
File
.
separator
;
private
static
final
String
LETTER
=
RESOURCES
+
"patient.letter"
;
private
static
final
String
REPORT
=
RESOURCES
+
"patient.report"
;
private
static
final
String
XRAY
=
RESOURCES
+
"xray.jpg"
;
private
static
final
String
INVOICE
=
RESOURCES
+
"patient.invoice"
;
private
static
final
String
JOE_BLOGGS
=
"Joe Bloggs"
;
Takeaways
-
You recognized the different trade-offs between different implementation approaches.
-
You understood several principles that drive the design of software.
-
You were introduced to the Liskov Substitution Principle as a way to think about inheritance.
-
You learned about situations where inheritance wasn’t appropriate.
Iterating on You
If you want to extend and solidify the knowledge from this section you could try one of these activities:
-
Take the existing sample code and add an implementation for importing prescription documents. A prescription should have a patient, a drug, a quantity, a date, and state the conditions for taking a drug. You should also write a test that checks that the prescription import works.
-
Try implementing the Game of Life Kata.
Completing the Challenge
Dr. Avaj is really pleased with your Document Management System and she now uses it extensively. Her needs are effectively met by the features because you drove your design from her requirements toward application behavior and into your implementation details. This is a theme that you will return to when TDD is introduced in the next chapter.
Get Real-World Software Development now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.