January 2020
Intermediate to advanced
312 pages
10h 22m
English
Chapter 2. Accelerating large dataset work: Map and parallel computing
Chapter 3. Function pipelines for mapping complex transformations
Chapter 4. Processing large datasets with lazy workflows
Chapter 5. Accumulation operations with reduce
Chapter 6. Speeding up map and reduce with advanced parallelization
Chapter 7. Processing truly big datasets with Hadoop and Spark
Chapter 8. Best practices for large data with Apache Streaming and mrjob
Chapter 9. PageRank with map and reduce in PySpark
Chapter 10. Faster decision-making with machine learning ...