book

Learn Python by Building Data Science Applications

by Philipp Kats, David Katz

August 2019

Beginner

482 pages

12h 56m

English

Packt Publishing

Read now

Unlock full access

Content preview from Learn Python by Building Data Science Applications

Beyond Beautiful Soup

In this example, we used the BS4 library to parse static HTML for us. Beautiful Soup is an invaluable library for dealing with occasionally messy HTML, but when it comes to large scales and dynamic pages, it simply won't suffice. For production scraping in large quantities, perhaps on a regular basis, it is a good idea to utilize the Scrapy (https://scrapy.org/) package. Scrapy is an entire framework for downloading HTML, parsing data, pulling data, and then storing it. One of its killer features is that it can run asynchronously – for example, while it is waiting for one page to load, it can switch to processing another, automatically. Because of that, Scrapy's scrapers are significantly faster on large lists of websites. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781789535365Supplemental Content

Learn Python by Building Data Science Applications

by Philipp Kats, David Katz

Beyond Beautiful Soup

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Python for Data Science

Introduction to Machine Learning with Python

Python Data Science Handbook

Python for Geospatial Data Analysis

Publisher Resources