November 2019
Intermediate to advanced
346 pages
9h 36m
English
We begin our recipe for blocking unwanted ads by importing the dataset. The data we have used in this recipe has been feature-engineered for us. In Step 2, we import the data into a data frame. Looking at the data, we see that it consists of 1,558 numerical features and an ad or non-ad label:

The features encode the geometry of the image, sentences in the URL, the URL of the image, alt text, anchor text, and words near the anchor text. Our goal is to predict whether an image is an advertisement (ad) or not (non-ad). We proceed to clean our data by dropping rows with missing values in Steps3 and 4. Generally, it may make ...