Summary
We explored the evolution of the Hadoop and MapReduce frameworks and discussed YARN, HDFS concepts, HDFS Reads and Writes, and key features as well as challenges. Then, we discussed the evolution of Apache Spark, why Apache Spark was created in the first place, and the value it can bring to the challenges of big data analytics and processing.
Finally, we also took a peek at the various components in Apache Spark, namely, Spark core, Spark SQL, Spark streaming, Spark GraphX, and Spark ML as well as PySpark and SparkR as a means of integrating Python and R language code with Apache Spark.
Now that we have seen big data analytics, the space and the evolution of the Hadoop Distributed computing platform, and the eventual development of ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access