March 2020
Beginner to intermediate
352 pages
8h 40m
English
Having preprocessed the dataset, let's do some sanity checking using descriptive statistics techniques.
We can implement this as shown here:
dfs.info()
The output of the preceding code is as follows:
<class 'pandas.core.frame.DataFrame'>Int64Index: 37554 entries, 1 to 78442Data columns (total 6 columns):subject 37367 non-null objectfrom 37554 non-null objectdate 37554 non-null datetime64[ns, UTC]to 36882 non-null objectlabel 36962 non-null objectthread 37554 non-null objectdtypes: datetime64[ns, UTC](1), object(5)memory usage: 2.0+ MB
We will learn more about descriptive statistics in Chapter 5, Descriptive Statistics. Note that there are 37,554 emails, with each email containing six columns—subject, from, ...
Read now
Unlock full access