11An Effective Machine Learning Approach to Model Healthcare Data

Shaila H. Koppad1*, S. Anupama Kumar2 and Mohan Kumar3

1 Manager, Capgemini Technology Services India Limited, Bengaluru, Karnataka, India

2 Department of MCA, RV College of Engineering, Bengaluru, Karnataka, India

3 Department of Pulmonary, Sapthagiri Institute of Medical Sciences and Research Centre, Bengaluru, Karnataka, India

Abstract

India ranks second in the world population, and a growing population sends an alarming signal to healthcare professionals. As population grows people tend to acquire complications related to health, and technological developments have to provide solutions to overcome challenges. The health updates published by the Government of India in 2017 focus on a rise in non-communicable diseases. This research work focuses on non-communicable Chronic Obstructive Pulmonary Disease (COPD) which is the major cause of death worldwide. The major challenge faced by the healthcare industry is gathering and integrating the clinical data in various formats and analysing them for the betterment of patients. The COPD data was collected from heterogeneous sources in diversified formats. The various data storing formats were evaluated depending on the different manipulations and operations; ORC file format was found efficient. The preprocessed data is stored in ORC format in Hive Component and it is found to be very efficient in terms of scalability and high availability. The different machine learning ...

Get Data Engineering and Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.