O'Reilly logo

Big Data Forensics – Learning Hadoop Investigations by Joe Sremack

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Collecting Hadoop Distributed File System Data

The Hadoop Distributed File System (HDFS) is the primary source of evidence in a Hadoop forensic investigation. Whether Hadoop data is used in Hive, HBase, or a custom Java application, the data is stored in HDFS. This means the forensic evidence can be collected from HDFS. Investigators can take two collection approaches: collect HDFS data from the host operating system or directly from Hadoop.

The advantage of collecting from HDFS is investigators can collect much more data than they can from a data analysis layer or application layer. Some potentially relevant data can only be collected through HDFS. This includes metadata, configuration files, user files that were not imported into an ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required