We will use data from two sources:
- Reddit worldnews
- Dow Jones Industrial Average (DJIA)
The Getting started section that follows has two clear goals:
- Moving our development environment into a virtual appliance from a previous local Spark shell-centered development environment. This naturally implies setting up prerequisite resources.
- Attaining the preceding goal also implies being able to spin up a brand new Spark cluster, running inside the virtual appliance.