Scrapy setup and the application code
Scrapy is a Python library is used to extract content from web pages or to crawl pages linked to a given web page (see the Web crawlers (or spiders) section of Chapter 4, Web Mining Techniques, for more details). To install the library, type the following in the terminal:
sudo pip install Scrapy
Install the executable in the bin
folder:
sudo easy_install scrapy
From the movie_reviews_analyzer_app
folder, we initialize our Scrapy project as follows:
scrapy startproject scrapy_spider
This command will create the following tree inside the scrapy_spider
folder:
├── __init__.py ├── items.py ├── pipelines.py ├── settings.py ├── spiders ├── spiders │ ├── __init__.py
The pipelines.py
and items.py
files manage how ...
Get Machine Learning for the Web now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.