November 2019
Intermediate to advanced
346 pages
9h 36m
English
Since the dataset is large, even importing all of it is computationally intensive. For this reason, we begin Step 1 by specifying a subset of features from our dataset, the ones we consider most promising, as well as recording their data type so that we don't have to convert them later. We then proceed to read the data into a data frame in Step 2. In Steps 3 and 4, we sort the data by date, since the problem requires being able to predict events in the future, and then drop the date column since we will not be employing it further. In the next two steps, we perform a train-test split, keeping in mind temporal progression. We then instantiate, fit, and test a random forest classifier in Steps 8 and 9. Depending on the application, ...