Exercises
Consider trying to learn the next stock price from several past stock prices. Suppose you have 20 years of daily stock data. Discuss the effects of various ways of turning your data into training and testing data sets. What are the advantages and disadvantages of the following approaches?
Take the even-numbered points as your training set and the odd-numbered points as your test set.
Randomly select points into training and test sets.
Divide the data in two, where the first half is for training and the second half for testing.
Divide the data into many small windows of several past points and one prediction point.
Figure 13-17 depicts a distribution of "false" and "true" classes. The figure also shows several potential places (a, b, c, d, e, f, g) where a threshold could be set.
Figure 13-17. A Gaussian distribution of two classes, "false" and "true"
Draw the points a–g on an ROC curve.
If the "true" class is poisonous mushrooms, at which letter would you set the threshold?
How would a decision tree split this data?
Refer to Figure 13-1.
Draw how a decision tree would approximate the true curve (the dashed line) with three splits (here we seek a regression, not a classification model).
Tip
The "best" split for a regression takes the average value of the data values contained in the leaves that result from the split. The output values of a regression-tree fit thus look like a staircase. ...
Get Learning OpenCV now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.