3Feature Selection and Machine Learning Models for High-Dimensional Data: State-of-the-Art

G. Manikandan1* and S. Abirami2

1Department of Computer Science & Engineering, College of Engineering Guindy, Anna University, Chennai, Tamil Nadu, India

2Department of Information Science and Technology, College of Engineering Guindy, Anna University, Chennai, India

Abstract

The technology developments in various domains generate the large amount data with millions of samples/instances and features. Some of the data are from many areas such as bio-informatics, text mining, and microarray data, which are commonly represented in high-dimensional feature vector, and prediction process is difficult task in this kind of data in field of pattern recognition, bioinformatics, statistical analysis, and machine learning. High dimensionality data increases the computational time as well as the space complexity while processing data. In general, most of the pattern recognition and machine learning techniques are available for processing the low-dimensional data; this will not solve the issues of high-dimensional data. To solve this issue, feature selection (FS) plays a vital role which is modeled to select the feature set from the greater number of features from the high-dimensional data; thereby, it builds the simpler model and gives the higher classification accuracy. Also, the FS process focuses on reducing and eliminating the dimensionality nature of the data by removing the irrelevant and ...

Get Computational Intelligence and Healthcare Informatics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.