Appendix A

About

This section is included to assist the students to perform the activities present in the course. It includes detailed steps that are to be performed by the students to complete and achieve the objectives of the course.

Lesson 1: Introduction to Spark Distributed Processing

Activity 1: Statistical Operations on Books

Solution:

  1. Open the file you've used for the exercise (book_analysis_act_b1.py in this case).
  2. Define a function by the name statistics, and import operator and statistics:

    def statistics(book):

       import operator

       import statistics

  3. Next, get the average word length. Use the function mentioned in the Prerequisites section of this activity. Print the word length.

       # average

       print(book.take(2))

       avg = ...

Get Big Data Processing with Apache Spark now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.