NoSQL Web Crawler Application
Ganesh Chandra Deka Directorate General of Training, Ministry of Skill Development & Entrepreneurship, New Delhi, India
Abstract
With the advent of Web technology, the Web is full of unstructured data called Big Data. However, these data are not easy to collect, access, and process at large scale. Web Crawling is an optimization problem. Site-specific crawling of various social media platforms, e-Commerce websites, Blogs, News websites, and Forums is a requirement for various business organizations to answer a search quarry from webpages. Indexing of huge number of webpage requires a cluster with several petabytes of usable disk. Since the NoSQL databases are highly scalable, use of NoSQL database ...