Classifying unseen test data

The classic supervised machine-learning classification task is to train a classifier on labeled training instances and to apply the classifier on unseen test instances. The key thing to remember here is that the number of attributes in the training set, their types, their names, and their range of values (if they are regular nominal attributes or nominal class attributes) in the training dataset must be exactly the same as those in the test dataset.

Getting ready

It is possible to have a key difference between a training dataset and a testing dataset in Weka. The @DATA section of an ARFF file in the testing section can look similar to the @DATA section of an ARFF file. It can have attribute values and class labels as ...

Get Java Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.