In previous chapters we used a variety of measures to compare models or judge how well a model performed its task. In this chapter we examine best practices for judging forecast accuracy, with emphasis placed on concerns specific to time series data.
For those new to time series forecasting, it’s most important to understand that standard cross-validation usually isn’t advisable. You cannot select training, validation, and testing data sets by selecting randomly selected samples of the data for each of these categories in a time-agnostic way.
But it’s even trickier than that. You need to think about how different data samples relate to one another in time even if they appear independent. For example, suppose you are working on a time series classification task, so that you have many samples of separate time series, where each is its own data point. It could be tempting to think that in this case you can randomly pick time series for each of training, validation, and testing, but you would be wrong to follow through on such an idea. The problem with this approach is that it won’t mirror how you would use your model, namely that your model will be trained on earlier data and tested on later data.
You don’t want future information to leak into your model because that’s not how it will play out in a real-world modeling situation. This in turn means that the forecast error you measure in your model will be lower during testing than in production because ...