Chapter 4. Evaluation of Training Programs 65
Systematic and Systemic Evaluation of Training Programs
vs. evaluate
mea·sure \
mā-\ vt
to choose or control with cautious restraint
to ascertain the measurements of
to estimate or appraise by a criterion
(Merriam-Webster’s Collegiate Dictionary, 2003, p. 769)
eval·u·ate \i-
wāt, -yü-
āt\ vt
to determine or fix the value of
to determine the significance, worth, or condition of usually by careful
appraisal and study
(Merriam-Webster’s Collegiate Dictionary, 2003, p. 432)
Evaluation Evaluation is considered a form of applied research, but there is a slight
difference between applied research and evaluation:
Applied research is aimed at producing generalizable knowledge relevant
to providing a solution to a general problem.
Evaluation focuses on collecting specific information relevant to a parti-
cular or specific evaluation object or “evaluand” (Guba and Lincon, 1981,
cited in Worthen, Sanders, and Fitzpatrick, 1997). Specifically,
Evaluation is disciplined inquiry to make a judgment about the worth
of the evaluation objects (such as instructional programs).
Evaluation produces information that is used to make decisions.
One example of evaluation: a study about the effectiveness of a new safet
training program on reducing work-related injuries in an organization. If the
results reveal that the work-related injury rate has been reduced by 12 percent
after the training program is implemented, the organization might recognize
the worth of the training program and decide to continue to provide it to the
employees annually. The organization would still need to consider that othe
factors might have influenced the results.
Evaluation is conducted through various measurement processes, using quan-
titative methods, qualitative methods, or a combination of both.
and systemic
This chapter introduces Donald Kirkpatrick’s four-level model of evaluation,
which will help practitioners understand the systematic and systemic
approaches to evaluating the effectiveness of training programs. This chapte
also provides an overview of the main concepts in measurement an
evaluation, and describes methods of constructing evaluation instruments.
66 Foundations of Instructional and Performance Technology
Donald Kirkpatrick’s Four-Level Model of Evaluation
Four levels
of evaluating
training programs
About a half century ago, Donald Kirkpatrick, now professor emeritus at
the University of Wisconsin, was working on his Ph.D. at the Universit
of Wisconsin. He decided that his doctoral dissertation would focus on
evaluating a supervisory training program. He came up with the idea o
“measuring participants’ reaction to the program, the amount of learning that
took place, the extent of their change in behavior after they returned to thei
jobs, and any final results that were achieved by participants after the
returned to work” (Kirkpatrick, 1996b, p. 55). The practice of using fou
levels in evaluating training programs came out of his work. The levels
are (1) reaction, (2) learning, (3) behavior, and (4) results. Although he
originally called them “steps” in the article published in the November 1959
issue of Training and Development, “levels” is more widely used to refer to
his evaluation model (Kirkpatrick, 1996b).
Kirkpatrick (1978, 1996a) explains that there are three reasons for evaluating
training programs and that his evaluation model is useful for any of the three
1. To know how to improve future programs
2. To determine whether to continue or discontinue the program
3. To justify the existence of the training program or department
Each level of Kirkpatrick’s evaluation model requires one to elicit different
information on the effectiveness of a training program:
1. Reaction: Did the participants like the training program?
2. Learning outcomes: Did they learn what they were supposed to learn? Ho
much did they learn?
3. Behavioral change: Did they change their on-the-job behavior?
4. Results on the organization: Did their knowledge and changed behavio
positively affect the organization in terms of resulting in increased pro-
duction, improved quality, decreased costs, etc.? (Kirkpatrick an
Kirkpatrick, 2005a, 2005b)
Kirkpatrick’s four-level model of evaluation helps practitioners measure the
effectiveness of a training program from level 1 to level 4 in a systematic way,
but also encourages them to project its systemic impact on long-term outcomes
as well as short-term outcomes.
Kirkpatrick (1996a) explains that “trainers must begin with desired results
and then determine what behavior is needed to accomplish them. . . . The fou
levels of evaluation are considered in reverse. First, we evaluate reaction.
Then we evaluate learning, behavior, and results” (p. 26). In other words, the

Get Foundations of Instructional and Performance Technology now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.