January 2020
Intermediate to advanced
312 pages
10h 22m
English
Chapter 7. Processing truly big datasets with Hadoop and Spark
Table 7.1. A comparison of compression formats available for use out of the box with Hadoop
Chapter 9. PageRank with map and reduce in PySpark
Table 9.1. Differences between the RDD’s .reduce, .fold, and .aggregate methods
Chapter 10. Faster decision-making with machine learning and PySpark
Table 10.1. A subset of mushroom data for a small decision tree classifier
Table 10.2. Five randomly seeded decision trees for an example random forest
Chapter 11. Large datasets in the cloud with Amazon Web Services and S3