In this recipe, we will load a dataset containing how many people walk on a street, the weather (rainy or sunny), and a hidden variable state (whether the station is closed or open). The objective will be to determine what's the most likely status for the station, given that we can't observe it:
- We first load the dataset. It contains the following columns: ID, people, state, and weather. The state variable (flagging whether the station is open or closed) would typically be hidden/unavailable, but we have it here so we can compare our model versus the true values:
data = read.csv("./subway_data.csv")
- We create an extra column called weather_sunny, which will flag whether the weather was sunny or not (this is a standard dummy ...