July 2017
Intermediate to advanced
360 pages
8h 26m
English
Imagine that you need to design a spam-filtering algorithm starting from this initial (over-simplistic) classification based on two parameters:
| Parameter | Spam emails (X1) | Regular emails (X2) |
| p1 - Contains > 5 blacklisted words | 80 | 20 |
| p2 - Message length < 20 characters | 75 | 25 |
We have collected 200 email messages (X) (for simplicity, we consider p1 and p2 mutually exclusive) and we need to find a couple of probabilistic hypotheses (expressed in terms of p1 and p2), to determine:
We also assume the conditional independence of both terms (it means that hp1 and hp2 contribute conjunctly to spam in the ...
Read now
Unlock full access