Chapter Three

NoSQL Web Crawler Application

Ganesh Chandra Deka    Directorate General of Training, Ministry of Skill Development & Entrepreneurship, New Delhi, India

Abstract

With the advent of Web technology, the Web is full of unstructured data called Big Data. However, these data are not easy to collect, access, and process at large scale. Web Crawling is an optimization problem. Site-specific crawling of various social media platforms, e-Commerce websites, Blogs, News websites, and Forums is a requirement for various business organizations to answer a search quarry from webpages. Indexing of huge number of webpage requires a cluster with several petabytes of usable disk. Since the NoSQL databases are highly scalable, use of NoSQL database ...

Get A Deep Dive into NoSQL Databases: The Use Cases and Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.