While pulling the required data from a semistructured document, we perform various tasks. The following are the basic tasks that we adopt for scraping:
- Searching a semistructured document: Accessing a particular element or a specific type of element in a document can be accomplished using its
tag name and
tag attributes, such as
class, and so on.
- Navigating within a semistructured document: We can navigate through a web document to pull different types of data in four ways, which are navigating down, navigating sideways, navigating up, and navigating back and forth. We can get to know more about these in detail later in this chapter.
- Modifying a semistructured document: By modifying the
tag name or the