O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Big Data Forensics – Learning Hadoop Investigations

Book Description

Perform forensic investigations on Hadoop clusters with cutting-edge tools and techniques

About This Book

  • Identify, collect, and analyze Hadoop evidence forensically
  • Learn about Hadoop's internals and Big Data file storage concepts
  • A step-by-step guide to help you perform forensic analysis using freely available tools

Who This Book Is For

This book is meant for statisticians and forensic analysts with basic knowledge of digital forensics. They do not need to know Big Data Forensics. If you are an IT professional, law enforcement professional, legal professional, or a student interested in Big Data and forensics, this book is the perfect hands-on guide for learning how to conduct Hadoop forensic investigations. Each topic and step in the forensic process is described in accessible language.

What You Will Learn

  • Understand Hadoop internals and file storage
  • Collect and analyze Hadoop forensic evidence
  • Perform complex forensic analysis for fraud and other investigations
  • Use state-of-the-art forensic tools
  • Conduct interviews to identify Hadoop evidence
  • Create compelling presentations of your forensic findings
  • Understand how Big Data clusters operate
  • Apply advanced forensic techniques in an investigation, including file carving, statistical analysis, and more

In Detail

Big Data forensics is an important type of digital investigation that involves the identification, collection, and analysis of large-scale Big Data systems. Hadoop is one of the most popular Big Data solutions, and forensically investigating a Hadoop cluster requires specialized tools and techniques. With the explosion of Big Data, forensic investigators need to be prepared to analyze the petabytes of data stored in Hadoop clusters. Understanding Hadoop's operational structure and performing forensic analysis with court-accepted tools and best practices will help you conduct a successful investigation.

Discover how to perform a complete forensic investigation of large-scale Hadoop clusters using the same tools and techniques employed by forensic experts. This book begins by taking you through the process of forensic investigation and the pitfalls to avoid. It will walk you through Hadoop's internals and architecture, and you will discover what types of information Hadoop stores and how to access that data. You will learn to identify Big Data evidence using techniques to survey a live system and interview witnesses. After setting up your own Hadoop system, you will collect evidence using techniques such as forensic imaging and application-based extractions. You will analyze Hadoop evidence using advanced tools and techniques to uncover events and statistical information. Finally, data visualization and evidence presentation techniques are covered to help you properly communicate your findings to any audience.

Style and approach

This book is a complete guide that follows every step of the forensic analysis process in detail. You will be guided through each key topic and step necessary to perform an investigation. Hands-on exercises are presented throughout the book, and technical reference guides and sample documents are included for real-world use.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Big Data Forensics – Learning Hadoop Investigations
    1. Table of Contents
    2. Big Data Forensics – Learning Hadoop Investigations
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the color images of this book
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Starting Out with Forensic Investigations and Big Data
      1. An overview of computer forensics
        1. The forensic process
          1. Identification
          2. Collection
          3. Analysis
          4. Presentation
        2. Other investigation considerations
          1. Equipment
          2. Evidence management
          3. Investigator training and certification
          4. The post-investigation process
      2. What is Big Data?
        1. The four Vs of Big Data
        2. Big Data architecture and concepts
      3. Big Data forensics
        1. Metadata preservation
        2. Collection methods
        3. Collection verification
      4. Summary
    9. 2. Understanding Hadoop Internals and Architecture
      1. The Hadoop architecture
        1. The components of Hadoop
        2. The Hadoop Distributed File System
        3. The Hadoop configuration files
        4. Hadoop daemons
      2. Hadoop data analysis tools
        1. Hive
        2. HBase
        3. Pig
      3. Managing files in Hadoop
        1. File permissions
        2. Trash
        3. Log files
        4. File compression and splitting
        5. Hadoop SequenceFile
        6. The Hadoop archive files
        7. Data serialization
        8. Packaged jobs and JAR files
      4. The Hadoop forensic evidence ecosystem
      5. Running Hadoop
        1. LightHadoop
        2. Amazon Web Services
        3. Loading Hadoop data
          1. Importing sample data for testing
      6. Summary
    10. 3. Identifying Big Data Evidence
      1. Identifying evidence
      2. Locating sources of data
        1. Compiling data requirements
        2. Reviewing the system architecture
        3. Interviewing staff and reviewing the documentation
        4. Assessing data viability
        5. Identifying data sources in noncooperative situations
        6. Data collection requirements
        7. Data source identification
        8. Structured and unstructured data
        9. Data collection types
          1. In-house or third-party collection
            1. The types of data to request
            2. The data collection request
          2. An investigator-led collection
      3. The chain of custody documentation
      4. Summary
    11. 4. Collecting Hadoop Distributed File System Data
      1. Forensically collecting a cluster system
      2. Physical versus remote collections
      3. HDFS collections through the host operating system
        1. Imaging the host operating system
        2. Imaging a mounted HDFS partition
        3. Targeted collection from a Hadoop client
      4. The Hadoop shell command collection
        1. Collecting HDFS files
        2. HDFS targeted data collection
        3. Hadoop Offline Image and Edits Viewers
      5. Collection via Sqoop
      6. Other HDFS collection approaches
      7. Summary
    12. 5. Collecting Hadoop Application Data
      1. Application collection approaches
        1. Backups
        2. Query extractions
        3. Script extractions
        4. Software extractions
      2. Validating application collections
      3. Collecting Hive evidence
        1. Loading Hive data
        2. Identifying Hive evidence
        3. Hive backup collection
        4. Hive query collection
          1. Hive query control totals
        5. Hive metadata and log collection
        6. The Hive script collection
      4. Collecting HBase evidence
        1. Loading HBase data
        2. Identifying HBase evidence
        3. The HBase backup collection
        4. The HBase query collection
        5. HBase collection via scripts
        6. HBase control totals
        7. HBase metadata and log collection
      5. Collecting other Hadoop application data and non-Hadoop data
      6. Summary
    13. 6. Performing Hadoop Distributed File System Analysis
      1. The forensic analysis process
        1. Forensic analysis goals
        2. Forensic analysis concepts
        3. The challenges of forensic analysis
          1. Anti-forensic techniques
          2. Data encryption
      2. Analysis preparation
      3. Analysis
        1. Keyword searching and file and data carving
          1. Bulk Extractor
          2. Autopsy
        2. Metadata analysis
          1. File activity timeline analysis
          2. Other metadata analysis
        3. The analysis of deleted files
        4. HDFS data extraction
          1. Hex editors
        5. Cluster reconstruction
        6. Configuration file analysis
          1. Linux configuration files
          2. Hadoop configuration files
          3. Hadoop application configuration files
        7. Log file analysis
      4. Summary
    14. 7. Analyzing Hadoop Application Data
      1. Preparing the analysis environment
      2. Pre-analysis steps
        1. Loading data
          1. Preload data transformations
        2. Data surveying
        3. Transforming data
          1. Transforming nonrelational data
      3. Analyzing data
        1. The analysis approach
          1. Types of investigation
        2. Analysis techniques
          1. Isolating known facts and events
          2. Grouping and clustering
          3. Histograms
          4. The time series analysis
            1. Measuring change over time
          5. Anomaly detection
            1. Rule-based analysis
            2. Duplication analysis
            3. Benford's law
            4. Aggregation analysis
            5. Plotting outliers on a timeline
          6. Analyzing disparate data sets
          7. Keyword searching
        3. Validating the findings
        4. Documenting the findings
      4. Summary
    15. 8. Presenting Forensic Findings
      1. Types of reports
        1. Sample reports
          1. Internal investigation report
          2. Affidavit and declaration
          3. Expert report
      2. Developing the report
        1. Explaining the process
        2. Showing the findings
        3. Using exhibits or appendices
      3. Testimony and other presentations
      4. Summary
    16. Index