Skip to Content
Learn Python by Building Data Science Applications
book

Learn Python by Building Data Science Applications

by Philipp Kats, David Katz
August 2019
Beginner
482 pages
12h 56m
English
Packt Publishing
Content preview from Learn Python by Building Data Science Applications

Quality control

As we mentioned already, there are plenty of issues with this data, as web pages are very different in terms of their structure and offer different sets of information, formatted differently. There are a lot of issues in the code – cleaning all of it will take another chapter (and indeed, that's what we'll do in Chapter 11, Data Cleaning and Manipulation). It is good practice, however, to perform a modicum of basic quality control, verifying that all the pages have some minimal, requisite properties, and that they are not null. We could also add some other checks, ensuring, for example, that the additional fields are not empty, at least for a significant number of the pages.

The approach we'll be using is two-fold. First, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python for Data Science

Python for Data Science

Yuli Vasiliev
Introduction to Machine Learning with Python

Introduction to Machine Learning with Python

Andreas C. Müller, Sarah Guido

Publisher Resources

ISBN: 9781789535365Supplemental Content