October 2018
Beginner to intermediate
348 pages
10h
English
Spark gives a SQL interface for a NoSQL Cassandra database that is running ad hoc tasks, such as generating business reports on the fly, data analysis, debugging, and finding data patterns. This chapter provided a brief overview of the Spark architecture, which stands on top among other sets of available tools; it offers ease of installation and a huge community, as well as backing up on Hadoop for data warehousing. It also discusses different ways of installation, along with a custom all-in-one Docker image, which has Apache Cassandra, a monitoring stack, and Spark including PySpark, SparkR, and Jupyter with their dependencies. The Docker image has several flags that can be enabled based on the use case or toolset to test locally ...