O'Reilly logo

Machine Learning with Spark - Second Edition by Nick Pentreath, Manpreet Singh Ghotra, Rajdeep Dua

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Count by occupation

We count a number of the various occupations of our users.

The following steps were implemented to get the occupation DataFrame and populate the list, which was displayed using Matplotlib.

  1. Get user_data.
  2. Extract occupation count using groupby("occupation") and calling count() on it.
  3. Extract list of tuple("occupation","count") from the list of rows.
  4. Create a numpy array of values in x_axis and y_axis.
  5. Create a plot of type bar.
  6. Display the chart.

The complete code listing can be found following:

user_data = get_user_data() user_occ = user_data.groupby("occupation").count().collect() user_occ_len = len(user_occ) user_occ_list = [] for i in range(0, (user_occ_len - 1)): element = user_occ[i] count = element. __getattr__('count') ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required