In the literature on machine learning and statistical analyses, the overwhelming focus tends to be on performance of models in terms of accuracy. While accuracy should usually be the primary concern when evaluating a model, sometimes computational performance considerations matter tremendously in the face of large data sets or widely deployed models to serve large populations of client applications.
Time series data sets get so large that analyses can’t be done at all—or can’t be done properly—because they are too intensive in their demands on available computing resources. In such cases, many organizations treat their options as follows:
Upsize on computing resources (expensive and often wasteful both economically and environmentally).
Do the project badly (not enough hyperparameter tuning, not enough data, etc.).
Don’t do the project.1
None of these options are satisfying, particularly when you are just starting out with a new data set or a new analytical technique. It can be frustrating not to know whether your failures are the result of poor data, an overly difficult problem, or a lack of resources. Hopefully, we will find some workarounds to expand your options in the case of very demanding analyses or very large data sets.
This chapter is designed to guide you through some considerations of how to lessen the computing resources you need to train or infer using a particular model. For the most ...