Practical Hadoop Security

Book description

Practical Hadoop Security is an excellent resource for administrators planning a production Hadoop deployment who want to secure their Hadoop clusters. A detailed guide to the security options and configuration within Hadoop itself, author Bhushan Lakhe takes you through a comprehensive study of how to implement defined security within a Hadoop cluster in a hands-on way.

You will start with a detailed overview of all the security options available for Hadoop, including popular extensions like Kerberos and OpenSSH, and then delve into a hands-on implementation of user security (with illustrated code samples) with both in-the-box features and with security extensions implemented by leading vendors.

No security system is complete without a monitoring and tracing facility, so Practical Hadoop Security next steps you through audit logging and monitoring technologies for Hadoop, as well as ready to use implementation and configuration examples--again with illustrated code samples.

The book concludes with the most important aspect of Hadoop security – encryption. Both types of encryptions, for data in transit and data at rest, are discussed at length with leading open source projects that integrate directly with Hadoop at no licensing cost.

Practical Hadoop Security:

  • Explains importance of security, auditing and encryption within a Hadoop installation
  • Describes how the leading players have incorporated these features within their Hadoop distributions and provided extensions
  • Demonstrates how to set up and use these features to your benefit and make your Hadoop installation secure without impacting performance or ease of use
  • Table of contents

    1. Cover
    2. Title
    3. Copyright
    4. Dedication
    5. Contents at a Glance
    6. Contents
    7. About the Author
    8. About the Technical Reviewer
    9. Acknowledgments
    10. Introduction
    11. Part I: Introducing Hadoop and Its Security
      1. Chapter 1: Understanding Security Concepts
        1. Introducing Security Engineering
          1. Security Engineering Framework
          2. Psychological Aspects of Security Engineering
          3. Introduction to Security Protocols
        2. Securing a Program
          1. Non-Malicious Flaws
          2. Malicious Flaws
        3. Securing a Distributed System
          1. Authentication
          2. Authorization
          3. Encryption
        4. Summary
      2. Chapter 2: Introducing Hadoop
        1. Hadoop Architecture
          1. HDFS
          2. Inherent Security Issues with HDFS Architecture
          3. Hadoop’s Job Framework using MapReduce
          4. Inherent Security Issues with Hadoop’s Job Framework
          5. Hadoop’s Operational Security Woes
        2. The Hadoop Stack
          1. Main Hadoop Components
        3. Summary
      3. Chapter 3: Introducing Hadoop Security
        1. Starting with Hadoop Security
          1. Introducing Authentication and Authorization for HDFS
          2. Authorization
          3. Real-World Example for Designing Hadoop Authorization
          4. Fine-Grained Authorization for Hadoop
        2. Securely Administering HDFS
          1. Using Hadoop Logging for Security
          2. Monitoring for Security
          3. Tools of the Trade
        3. Encryption: Relevance and Implementation for Hadoop
          1. Encryption for Data in Transit
          2. Encryption for Data at Rest
        4. Summary
    12. Part II: Authenticating and Authorizing Within Your Hadoop Cluster
      1. Chapter 4: Open Source Authentication in Hadoop
        1. Pieces of the Security Puzzle
        2. Establishing Secure Client Access
          1. Countering Spoofing with PuTTY’s Host Keys
          2. Key-Based Authentication Using PuTTY
          3. Using Passphrases
        3. Building Secure User Authentication
          1. Kerberos Overview
          2. Installing and Configuring Kerberos
          3. Preparing for Kerberos Implementation
          4. Implementing Kerberos for Hadoop
        4. Securing Client-Server Communications
          1. Safe Inter-process Communication
          2. Encrypting HTTP Communication
          3. Securing Data Communication
        5. Summary
      2. Chapter 5: Implementing Granular Authorization
        1. Designing User Authorization
          1. Call the Cops: A Real-World Security Example
          2. Determine Access Groups and their Access Levels
          3. Implement the Security Model
          4. Access Control Lists for HDFS
        2. Role-Based Authorization with Apache Sentry
          1. Hive Architecture and Authorization Issues
          2. Sentry Architecture
          3. Implementing Roles
        3. Summary
    13. Part III: Audit Logging and Security Monitoring
      1. Chapter 6: Hadoop Logs: Relating and Interpretation
        1. Using Log4j API
          1. Loggers
          2. Appenders
          3. Layout
          4. Filters
        2. Reviewing Hadoop Audit Logs and Daemon Logs
          1. Audit Logs
          2. Hadoop Daemon Logs
        3. Correlating and Interpreting Log Files
          1. What to Correlate?
          2. How to Correlate Using Job Name?
        4. Important Considerations for Logging
          1. Time Synchronization
          2. Hadoop Analytics
          3. Splunk
        5. Summary
      2. Chapter 7: Monitoring in Hadoop
        1. Overview of a Monitoring System
          1. Simple Monitoring System
          2. Monitoring System for Hadoop
        2. Hadoop Metrics
          1. The jvm Context
          2. The dfs Context
          3. The rpc Context
          4. The mapred Context
          5. Metrics and Security
          6. Metrics Filtering
          7. Capturing Metrics Output to File
        3. Security Monitoring with Ganglia and Nagios
          1. Ganglia
          2. Monitoring HBase Using Ganglia
          3. Nagios
          4. Nagios Integration with Ganglia
          5. The Nagios Community
        4. Summary
    14. Part IV: Encryption for Hadoop
      1. Chapter 8: Encryption in Hadoop
        1. Introduction to Data Encryption
          1. Popular Encryption Algorithms
          2. Applications of Encryption
        2. Hadoop Encryption Options Overview
        3. Encryption Using Intel’s Hadoop Distro
          1. Step-by-Step Implementation
          2. Special Classes Used by Intel Distro
        4. Using Amazon Web Services to Encrypt Your Data
          1. Deciding on a Model for Data Encryption and Storage
          2. Encrypting a Data File Using Selected Model
        5. Summary
    15. Part V: Appendices
      1. Appendix A: Pageant Use and Implementation
        1. Using Pageant
        2. Security Considerations
      2. Appendix B: PuTTY and SSH Implementation for Linux-Based Clients
        1. Using SSH for Remote Access
      3. Appendix C: Setting Up a KeyStore and TrustStore for HTTP Encryption
        1. Create HTTPS Certificates and KeyStore/TrustStore Files
        2. Adjust Permissions for KeyStore/TrustStore Files
      4. Appendix D: Hadoop Metrics and Their Relevance to Security
    16. Index

    Product information

    • Title: Practical Hadoop Security
    • Author(s):
    • Release date: December 2014
    • Publisher(s): Apress
    • ISBN: 9781430265450