December 2018
Beginner to intermediate
682 pages
18h 1m
English
Most DataFrames will not have columns of booleans like our movie dataset. The most straightforward method to produce a boolean Series is to apply a condition to one of the columns using one of the comparison operators. In step 2, we use the greater than operator to test whether or not the duration of each movie was more than two hours (120 minutes). Steps 3 and 4 calculate two important quantities from a boolean Series, its sum and mean. These methods are possible as Python evaluates False/True as 0/1.
You can prove to yourself that the mean of a boolean Series represents the percentage of True values. To do this, use the value_counts method to count with the normalize parameter set to True to get its distribution:
>>> movie_2_hours.value_counts(normalize=True) ...