May 2018
Beginner to intermediate
384 pages
10h 19m
English
HDFS and MR are storage and compute engines at the core of Hadoop. The raw implementation of parallel processing applications is complex and error prone. Apache Pig provides a wrapper around the parallel processing jobs on Hadoop. Pig makes it easy to process large datasets by providing a simple programming interface and API. The tasks and actions written with Pig are inherently parallelized on the underlying Hadoop cluster. In the context of cyber security, Pig can be used for the implementation of complex parallel data aggregation and anomaly detection tasks along with preparation of the training data for supervised learning in case the CI protection application is leveraging machine learning algorithms.
Read now
Unlock full access