June 2023
Intermediate to advanced
276 pages
6h 45m
English
In the previous chapter, we learned about machine learning with Amazon SageMaker, a powerful service for creating, testing, and tuning machine learning algorithms. In this chapter, we will learn about Elastic Map Reduce (EMR), which is essentially a managed Hadoop cluster. Hadoop is a powerful framework for the massively parallel processing of data. This ability is unique to the Hadoop architecture and is the only way to efficiently query petabytes of data using commodity hardware. Hadoop is an interesting community project that is really made up of hundreds of plug-and-play widgets. There is also a service on Hadoop to do machine learning called Mahout, as well as Spark ML. In this chapter, we will walk ...
Read now
Unlock full access