Scrapy setup and the application code

Scrapy is a Python library is used to extract content from web pages or to crawl pages linked to a given web page (see the Web crawlers (or spiders) section of Chapter 4, Web Mining Techniques, for more details). To install the library, type the following in the terminal:

sudo pip install Scrapy 

Install the executable in the bin folder:

sudo easy_install scrapy

From the movie_reviews_analyzer_app folder, we initialize our Scrapy project as follows:

scrapy startproject scrapy_spider

This command will create the following tree inside the scrapy_spider folder:

├── __init__.py
├── items.py
├── pipelines.py
├── settings.py
├── spiders
├── spiders
│   ├── __init__.py

The pipelines.py and items.py files manage how ...

Get Machine Learning for the Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.