5Data Leakage and Data Wrangling in Machine Learning for Medical Treatment

P.T. Jamuna Devi1* and B.R. Kavitha2

1J.K.K. Nataraja College of Arts and Science, Komarapalayam, Tamilnadu, India

2Vivekanandha College of Arts and Science, Elayampalayam, Tamilnadu, India


Currently, healthcare and life sciences overall have produced huge amounts of real-time data by ERP (enterprise resource planning). This huge amount of data is a tough task to manage, and intimidation of data leakage by inside workers increases, the companies are wiping far-out for security like digital rights management (DRM) and data loss prevention (DLP) to avert data leakage. Consequently, data leakage system also becomes diverse and challenging to prevent data leakage. Machine learning methods are utilized for processing important data by developing algorithms and a set of rules to offer the prerequisite outcomes to the employees. Deep learning has an automated feature extraction that holds the vital features required for problem solving. It decreases the problem of the employees to choose items explicitly to resolve the problems for unsupervised, semisupervised, and supervised healthcare data. Finding data leakage in advance and rectifying for it is an essential part of enhancing the definition of a machine learning problem. Various methods of leakage are sophisticated and are best identified by attempting to extract features and train modern algorithms on the problem. Data wrangling and data leakage ...

Get Data Wrangling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.