How to do it...

In this recipe, we will load a dataset containing how many people walk on a street, the weather (rainy or sunny), and a hidden variable state (whether the station is closed or open). The objective will be to determine what's the most likely status for the station, given that we can't observe it:

  1. We first load the dataset. It contains the following columns: ID, people, state, and weather. The state variable (flagging whether the station is open or closed) would typically be hidden/unavailable, but we have it here so we can compare our model versus the true values:
data = read.csv("./subway_data.csv")
  1. We create an extra column called weather_sunny, which will flag whether the weather was sunny or not (this is a standard dummy ...

Get R Statistics Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.