Let us define the “Training Set”, “Development Set” and “Test Set”, before discussing the partitioning of the data into these.
- Training set: The set of data/examples used to train machine learning algorithm. In machine learning, this data is used to find the ‘optimal’ weights for the model/classifier. Typically, the majority of data used goes to the training set.
- Development (dev)/validation set: The portion of data which is used to evaluate the model/classifier at intermediate stages of training. This set is used to fine tune hyperparameters and evaluate model architecture with various configurations. It is used during the development of the model, not in the final model evaluation.
- Test set: Once the ...