5IoT’s Data Processing Using Spark
Ankita Bansal* and Aditya Atri
1Netaji Subhas University of Technology, Delhi, India
2Netaji Subhas Institute of Technology, Delhi, India
Abstract
Large volume of structured and unstructured data known as Big Data requires efficient frameworks and software techniques for processing because they cannot be processed using traditional database methods. One well-known system for Big Data processing is Spark. MapReduce technology of the Hadoop was used for batch processing embedded in cluster computing. In order to help Hadoop work faster, the Spark was introduced. Spark has its own processing engine which uses distributed file storage of Hadoop and cloud storage of data. Spark’s API conforms to the type of data and its associated processing required. Spark also provides functionalities and tools for processing of queries, graphs, and machine learning algorithms. Spark SQL is very important and used for processing of queries included in the framework of Spark and hence maintaining the storage of large datasets on the cloud. Spark also performs operations on the input data taken from various different data sources. In order to maintain and create data frames, in-built functions are used by Spark.
Keywords: RDD, DataFrames, datasets, spark SQL, SQLContext, hive tables, JSON, parquet files, data sources, hadoop, MapReduce, cloud, big data, spark, cluster computing, spark API
5.1 Introduction
In this chapter, we will start with the basics of three ...
Get The Smart Cyber Ecosystem for Sustainable Development now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.