July 2017
Beginner to intermediate
378 pages
10h 26m
English
Good question, glad you asked. Spark was built for distributed cluster computing, so everything scales nicely without any code changes. The word general in the general engine description is very appropriate for Spark. It refers to the many and varied ways you can use it.
You can use it for ETL data processing, machine learning modeling, graph processing, stream data processing, and SQL and structure data processing. It is a boon for analytics in a distributed computing world.
It has APIs for multiple programming languages such as Java, Scala, Python, and R. It operates mostly in memory, which is where the speed improvement over Hadoop MapReduce mainly comes from. For analytics, Python and R are the popular programming ...