Count by occupation

We count a number of the various occupations of our users.

The following steps were implemented to get the occupation DataFrame and populate the list, which was displayed using Matplotlib.

  1. Get user_data.
  2. Extract occupation count using groupby("occupation") and calling count() on it.
  3. Extract list of tuple("occupation","count") from the list of rows.
  4. Create a numpy array of values in x_axis and y_axis.
  5. Create a plot of type bar.
  6. Display the chart.

The complete code listing can be found following:

user_data = get_user_data() user_occ = user_data.groupby("occupation").count().collect() user_occ_len = len(user_occ) user_occ_list = [] for i in range(0, (user_occ_len - 1)): element = user_occ[i] count = element. __getattr__('count') ...

Get Machine Learning with Spark - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.