80 Nonlinear PDEs: A Bit of Theory
Letting → 0, we obtain that the reverse inequality of (4.11) holds as well,
from which we get Bellman’s principle:
u(t, x) = sup
α∈A
t,t+h
E
Q
[u(t + h, X
α
t+h
)|X
α
t
= x] (4.12)
Note that this result can be generalized to any stopping time τ in T
tT
≡ {τ ∈
[t, T ] Q−a.s.}:
u(t, x) = sup
α∈A
t,T
E
Q
[u(τ, X
α
τ
)|X
α
t
= x] (4.13)
and it reads in a discrete time setting as
u(t, x) = sup
α
t
∈A
E
Q
[u(t + 1, X
α
t+1
)|X
α
t
= x] (4.14)
This relation is known as the dynamic programming equation. From Equation
(4.12), we can now sketch the derivation of the HJB equation.
4.4.4 Formal derivation of the HJB PDE
Let us take an arbitrary constant control α
s
= a with a ∈ A during the
interval [t, t + h]. From (4.12), we get
u(t, x) ≥ E
Q
[u(t + h, X
a
t+h
)|X
a
t
= ...