- Read in the diamonds dataset, and output the first five rows:
>>> diamonds = pd.read_csv('data/diamonds.csv')>>> diamonds.head()
- Before we begin analysis, let's change the cut, color, and clarity columns into ordered categorical variables:
>>> cut_cats = ['Fair', 'Good', 'Very Good', 'Premium', 'Ideal']>>> color_cats = ['J', 'I', 'H', 'G', 'F', 'E', 'D']>>> clarity_cats = ['I1', 'SI2', 'SI1', 'VS2', 'VS1', 'VVS2', 'VVS1', 'IF']>>> diamonds['cut'] = pd.Categorical(diamonds['cut'], categories=cut_cats, ordered=True)>>> diamonds['color'] = pd.Categorical(diamonds['color'], categories=color_cats, ordered=True)>>> diamonds['clarity'] ...