November 2016
Beginner to intermediate
687 pages
15h 31m
English
Let's talk about some more item postprocessing. Scrapy provides a way to define a pipeline for items as well, where you can define the kind of post processing an item has to go through. This is a very methodical and good program design.
We need to build our own item pipeline if we want to post process scraped items, such as removing noise and case conversion, and in other cases, where we want to derive some values from the object, for example, to calculate the age from DOB or to calculate the discount price from the original price. In the end, we might want to dump the item separately into a file.
The way to achieve this will be as follows:
setting.py:ITEM_PIPELINES = { 'myproject.pipeline.CleanPipeline': ...
Read now
Unlock full access