First, let’s be absolutely clear about what automated testing is not: it is not running tests that are automatically created or generated by software that scans and analyzes your source code. We would call this automatic testing. While automatic testing may have its place (we have not yet been convinced that it offers any value), it is not the kind of testing we are talking about here. In automated testing, we are interested in checking the code’s correct behavior, something that a code analysis tool cannot do.
Automated testing provides you with the ability to execute a suite of tests at the push of a button (or via a single command). These tests are crafted manually, usually by the same developers who created the code being tested. Often the boilerplate code that represents the skeleton of a test (or set of tests) will be automatically generated, but never the actual test code itself.
A test suite is a collection of related tests. The tests in a suite are executed, one at a time, by a piece of software known as a test harness. There are many freely available, open source test harnesses. The most widely used test harness is probably the xUnit series (JUnit, NUnit, cppUnit, httpUnit, etc.). A given test harness is usually specific to a particular programming language or environment.
As the test harness cycles through the tests it is running, it will typically do a number of other things as well. Before it runs each individual test, the test harness will call the test suite’s setup routine to initialize the environment in which the test will be run. Next, it runs the test, and the test harness stores information about the success or failure of the test (the test harness may also collect more information from the test describing the specifics of any failure). Finally, the test harness will call the suite’s teardown routine to clean up after the test.
The test harness will then report the results of all the tests back to the initiator of the test run. Sometimes this is simply a console log, but often the test harness is invoked by a GUI application that will display the results graphically.
In the GUI case, while the tests are running, you will usually see a green progress bar that grows toward 100% as each test is completed. This progress bar will stay green as long as all the tests are succeeding. However, as soon as a single test fails, the progress bar turns red and stays red as the remaining tests are run. Other parts of the display will tell the developer the details of the failures (how many, the names of the failed tests, etc.). When a GUI test runner is integrated with an IDE, failures are usually displayed as clickable links that will open the editor to the code line where the failure occurred.
Only very small codebases would have a single suite of tests. Typically, tests are organized into a collection of test suites where each suite contains a set of related tests. For example, for unit tests in an object-oriented language, you will usually create a separate test suite for every class (or group of closely related classes). Each test in a suite for a particular class would test a single method of that class.
Using behavior-driven development (BDD) as another example, each test suite would test a single feature of the application or system, with each individual test checking a single behavior of that feature.
You can normally run automated tests in two different ways. First, developers can manually run a subset of the tests while coding. Second, the scripted build can run the full complement of tests automatically.
As developers write code, they will periodically run the subset of tests that check the part of the application in which they are working. This helps them ensure that they haven’t unexpectedly broken something. If a test fails, it is usually very easy to find the cause, since only a small amount of code has been changed since the same test ran successfully. Also, just before the developers check their changes into the source control repository, they usually run the full set of tests to make sure they haven’t broken something in other parts of the system.
If test-first development (TFD) is being used, the test for new or changed code is written before the implementing code. This means that, initially, the test will fail (because the change hasn’t yet been implemented), and the goal is to refine the application’s code until the test succeeds. There is more on TFD later in this chapter.
In any case, the script that builds the application will normally run the full set of tests automatically. This helps ensure that the application or system as a whole is relatively bug-free at all times.
When continuous integration is used (Chapter 5), the application is built on a regular basis—usually at least once a day, and often many times a day. For very large applications or systems, it can take a very long time to run the full set of automated tests. In this case, it is common to select a subset of tests to run as part of the daily or intraday builds, and to only run the full set of tests nightly or weekly.
Get The Art of Lean Software Development now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.