
58 STATISTICAL LEARNING AND DATA SCIENCE
• If both (y
1
, . . . , y
i−1
, 1 ) and (y
1
, . . . , y
i−1
, 0 ) are stro ngly permitted, then
p(y
i
= 1|y
1
. . . y
i−1
) = 0.5 and
p(y
i
= 0|y
1
. . . y
i−1
) = 0.5 .
Again we can calcula te the Shannon entropy
¯
H(Y
l
) of the binary pr oce ss. But now,
¯
H(Y
l
) ≤ log
2
∆
S
(X
l
) .
as fa r a s the number o f different strongly permitted sequences is not greater than ∆
S
(X
l
)
and they are not equally pro bable in general.
Then
¯
H
¯
H(y
i
|y
1
. . . y
i−1
) =
0 if x
i
/∈ EB(S(S
0
, X
i−1
, Y
i−1
)) = B
i−1
1 if x
i
∈ EB(S(S
0
, X
i−1
, Y
i−1
)) = B
i−1
Averaging these relations over all X
l
(as independent sa mple sequences ) and using
P
(B
i
) ≥
c(S), we have
H(S, l) ≥
E
[
¯