Skip to Main Content
Building an Intelligent Web: Theory and Practice
book

Building an Intelligent Web: Theory and Practice

by Pawan Lingras, Rajendra Akerkar
March 2010
Intermediate to advanced content levelIntermediate to advanced
326 pages
12h 25m
English
Jones & Bartlett Learning
Content preview from Building an Intelligent Web: Theory and Practice
“4137X˙CH07˙Akerkar” — 2007/9/8 — 11:21 — page 278 — #16
278 CHAPTER 7 Web Content Mining
site. For example, if a robot visits a site called http://www.myweb.ca/, it should first
check for http://www.myweb.ca/robots.txt. If this document exists, the robot should
parse it looking for records such as
User-agent: *
Disallow: /
These records indicate if robots are allowed to retrieve all documents from the website.
A site can have only a single “/robots.txt” file. Moreover, the file cannot be in any of
the user directories. A robot will never look for robots.txt appearing anywhere except
at the root of a web document hierarchy, such as http://www.m
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Reinventing the Organization for GenAI and LLMs

Reinventing the Organization for GenAI and LLMs

Ethan Mollick

Publisher Resources

ISBN: 9780763741372