15Pehchaan: Analysis of the ‘Aadhar Dataset’ to Facilitate a Smooth and Efficient Conduct of the Upcoming NPR

Soumyadev Mukherjee, Harshit Anand, Nishan Acharya, Subham Char, Pritam Ghosh and Minakhi Rout*

School of Computer Engineering, Kalinga Institute of Industrial Technology (Deemed to be) University, Bhubaneswar, Odisha, India

Abstract

The Government of India has sanctioned Rs. 3,941.35 crore for maintaining the National Population Register (NPR). The “usual residents” of a nation are reflected in the NPR. Any individual who has stayed in an area for the past six months or plans to stay in an area for the next six months is referred to as a “usual resident”. “Aadhar” is an authentic identity number comprising of 12 digits that can be issued at will by people who reside in the nation or individuals who hold passports of India, subjective to their demographic and biometric information. Analyzing the “Aadhar Dataset” and drawing meaningful insights out of the same will surely ensure a fruitful result and facilitate a smoother conduct of the upcoming NPR. The sole objective of “Hadoop” in this research is storing and processing huge amount of semi structured data. Hence, our proposed work uses “Hadoop” for processing the data gathered. The input data is processed using MapReduce and finally the result is loaded into the Hadoop Distributed File System (HDFS).

Keywords: National population register, aadhar, identity crisis, UIDAI, big data, hadoop, mapreduce, HDFS, tableau ...

Get Machine Learning Approach for Cloud Data Analytics in IoT now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.