May 2017
Beginner to intermediate
596 pages
15h 2m
English
Apache Pig is a platform developed by Yahoo for data access and processing, which works on top of HDFS dealing with large datasets. Pig has two components, namely these:
A Pig job abstracts the MapReduce complexity, fires the MapReduce job in the background and executes it in a sequential manner. Apart from MapReduce, Pig’s Hadoop job can be executed with Apache Tez and Apache Spark. Pig gives Hadoop Ecosystem a data flow capability abstracting ETL-like functionality away from the user. It allows us to extract a large dataset from HDFS, then allows it to do necessary functions (such ...