The classification pattern

This design pattern explores the implementation of classification using Pig and Mahout.

Background

Classification is one of the core concepts of predictive analytics; it is a technique in which data is labeled into categories or groups according to its characteristics. Classification is a simplified way to make decisions based on the data and its attributes. For example, in a survey questionnaire, we choose an appropriate answer or select a particular check box for a given question. Here, we are making a decision from a finite group of choices (check boxes or answers) provided to us. Sometimes, the number of choices can be as small as two (yes/no). In these cases, classification uses specific information on the input data ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.