289
11
Dynamic Programming
11.1 Introduction
Dynamic programming is an optimization technique in which a complex opti-
mization problem is divided into a number of stages (or subproblems) in which
a policy decision has to be taken at each stage. The stages are solved sequen-
tially, one by one. The stages generally represent a time-varying phenomenon
such as the amount of inventory in a store. Dynamic programming thus refers
to planning of a time-varying system. The series of interrelated decisions taken
at each stage is done using the state information associated with that stage and
has to be suitably linked with the next stage. The dimensionality of the prob-
lem increases with an increase in the number of states. The series of best policy
decisions taken at each stage is referred to as the optimal policy of the optimi-
zation problem. The principle of optimality in dynamic programming states that
the optimal decision at a given stage is independent of the optimal decisions
taken in the previous stages. Typically in dynamic programming, the optimal
decision pertaining to the last stage is taken rst and then moved backward to
the next stage and the process is continued until the rst stage is reached. The
technique of dynamic programming was developed by Richard Bellman in the
1950s. The method is used to solve a number of problems in different areas
(Edwin and Gruber 1971; George 1963; Leondes and Smith 1970). The method,
though easy to implement, has a serious drawback: the complexity of the prob-
lem increases with an increase in the number of variables. This is frequently
referred to as the curse of dimensionality in dynamic programming. This chapter
discusses aspects of deterministic and probabilistic dynamic programming.
11.2 Deterministic Dynamic Programming
In dynamic programming, when the current policy decision and the state
completely determine the state of the next stage, it is called deterministic
dynamic programming. Let the state at stage n be denoted by s
n
. The policy
290 Optimization: Algorithms and Applications
decision x
n
transforms this state to s
n+1
at the next stage n + 1. The function
f s
n n+ +1 1
*
( )
is the optimal value of the objective function to which the contri-
bution made by x
n
decision is to be added (Figure 11.1). This provides the
contribution of n stages and is given by f
n
(s
n
, x
n
). This function is optimized
with respect to x
n
to give
f s f s x
n n n n n
*
,
*
.
( ) ( )=
The procedure is repeated by
moving back one stage.
Let us take an example to explain the concept of dynamic programming.
A person in a remote place A has to reach city I to withdraw money from an
ATM. Though he has the option to select different paths to reach his goal, he
is interested in nding the path that has a minimum distance to be covered.
The intermediate villages where he can change his path are given by B, C,
D, E, F, G, and H. The distance between the villages is given in Figure 11.2.
Before using dynamic programming, let us select the path that results in
the minimum distance from one city to another. From village A, the mini-
mum distance is 3 to village C. From village C, the minimum distance is 6 to
village F. In this way the total distance traveled is 16 and the path is
A C F G I
Stage n Stage n + 1
S
n
x
n
S
n+1
f
*
n+1
(S
n+1
)
f
n
(S
n
, x
n
)
FIGURE 11.1
Structure of deterministic dynamic programming.
4
5
3
2
5
3
4
3
6
7
4
5
3
4
5
A
D
C
B
F
E G
H
I
FIGURE 11.2
Distance (not to scale) between the villages.

Get Optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.