p e
k
∣ s ∝ p e
k
, s
which we can actually split into two separate pieces using the probability chain rule:
p s
k +1
, s
k +2
, ⋯, s
n
∣ e
k
, s
1
, s
2
, ⋯, s
k
p e
k
, s
1
, s
2
, ⋯, s
k
This looks fruitless, but we can actually forget about x
1
, ⋯ , x
k
in the first probability
because the probabilities are D-Separated. I won’t discuss D-Separation too much, but
because we’re asserting the Markov assumption in our model we can effectively forget
about these variables, because they precede what we care about in our probability
model: