Apache Spark is very popular in the big data community these days. Here are some of the most prominent reasons for using Apache Spark in big data modeling and computation:
- Speed: Speed is important in processing large datasets. Spark offers the ability to run computations up to one hundred times faster than Hadoop2 MapReduce in memory, or ten times faster on disk.
- Accessibility: Spark was developed to be highly accessible, offering simple APIs in Python, Java, Scala, and SQL, and rich built-in libraries. In addition to this, it also integrates with other big data tools, including Hadoop clusters and sources such as Cassandra3.
- Platform support: Apache spark was built to run on Hadoop and Mesos, standalone, ...