May 2017
Beginner to intermediate
220 pages
5h 2m
English
The type of technology used to build a websitewill affect how we crawl it. A useful tool to check the kind of technologies a website is built with is the module detectem, which requires Python 3.5+ and Docker. If you don't already have Docker installed, follow the instructions for your operating system at https://www.docker.com/products/overview. Once Docker is installed, you can run the following commands.
docker pull scrapinghub/splashpip install detectem
This will pull the latest Docker image from ScrapingHub and install the package via pip. It is recommended to use a Python virtual environment (https://docs.python.org/3/library/venv.html) or a Conda environment (https://conda.io/docs/using/envs.html) ...
Read now
Unlock full access