Hadoop—An IntroductionUnique Features of HadoopBig Data and HadoopA Typical Scenario for Using HadoopTraditional Database SystemsData LakeBig Data, Data Science and HadoopCluster Computing and Hadoop ClustersCluster ComputingHadoop ClustersHadoop Components and the Hadoop EcosphereWhat Do Hadoop Administrators Do?Hadoop Administration—A New ParadigmWhat You Need to Know to Administer HadoopThe Hadoop Administrator’s ToolsetKey Differences between Hadoop 1 and Hadoop 2Architectural DifferencesHigh-Availability FeaturesMultiple Processing EnginesSeparation of Processing and SchedulingResource Allocation in Hadoop 1 and Hadoop 2Distributed Data Processing: MapReduce and Spark, Hive and PigMapReduceApache SparkApache HiveApache PigData Integration: Apache Sqoop, Apache Flume and Apache KafkaKey Areas of Hadoop AdministrationManaging the Cluster StorageAllocating the Cluster ResourcesScheduling JobsSecuring Hadoop DataSummary