Focused Web Crawling
Contents
The Opportunities and Challenges of Mining the Web
Topic Hierarchies for Focused Crawling
Preamble
Focused web crawling stands one step above the other techniques discussed in this book. It is an integrated technology that combines two base technologies: classification and web analytics. This combination of basic capabilities enables more complex decision making and provides information that is more specific to its intended use. Figure 15.1 shows a complete analysis system that “feeds” automatically on Internet data and outputs information that can be used to make ...
Get Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.