July 2018
Intermediate to advanced
474 pages
13h 37m
English
This section will walk through the steps to analyze the movie ratings in the MovieLens database:
mainDF.describe('rating_1').show
import matplotlib.pyplot as plt%matplotlib inlinemainDF.select('rating_1').toPandas().hist(figsize=(16, 6), grid=True)plt.title('Histogram of Ratings')plt.show()
mainDF.groupBy(['rating_1']).agg({'rating_1':'count'})\ .withColumnRenamed('count(rating_1)', 'Row Count').orderBy(["Row Count"],ascending=False)\ .show()
Read now
Unlock full access