9Impact of Data Pre-Processing in Information Retrieval for Data Analytics

Huma Naz1*, Sachin Ahuja2, Rahul Nijhawan1 and Neelu Jyothi Ahuja1

1 School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India

2 Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India

Abstract

In recent years, data-driven decision making has emerged as the main focus of research due to the extensive use and availability of data-driven approaches. The accuracy of such research studies is completely dependent on the quality of data available for the research. To enhance the performance of the model, diverse “data pre-processing” techniques are adopted by the researchers. This chapter attempts to provide an insight into the application of data pre-processing techniques and their effects on information retrieval. That further takes into consideration a few chosen problems involving huge amounts of data. This chapter covers the major issues that need to be dealt with before the beginning of any data analysis process. The chapter consists of two sections that highlight the need for data pre-processing. To establish the need for data pre-processing and study its effects on the achieved results, three machine learning algorithms named decision tree, Naive Bayes, and artificial neural network were applied to four diverse datasets. The result shows that high accuracy, as well as better data quality, is attained after the application of data ...

Get Machine Intelligence, Big Data Analytics, and IoT in Image Processing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.