June 2020
Intermediate to advanced
382 pages
11h 39m
English
Apache Spark is an open source framework that is used to solve complex distributed problems. It implements a divide-and-conquer strategy to solve problems. To process a problem, it divides the problem into various subproblems and processes them independently of each other. We will demonstrate this by using a simple example of counting words from a list.
Let's assume that we have the following list of words:
wordsList = [python, java, ottawa, news, java, ottawa]
We want to calculate the frequency of each word in this list. For that, we will apply the divide-and-conquer strategy to solve this problem in an efficient way.
The implementation of divide-and-conquer is shown in the following ...