O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Apache Accumulo for Developers

Book Description

Discover how to build Accumulo, Hadoop, and ZooKeeper clusters from scratch on both Windows and Linux. With this book’s examples-based approach, you’ll learn the painless way through clear instructions and real-world exercises.

  • Shows you how to build Accumulo, Hadoop, and ZooKeeper clusters from scratch on both Windows and Linux
  • Allows you to get hands-on knowledge about how to run Accumulo on Amazon EC2, Google Cloud Platform, Rackspace, and Windows Azure Cloud platforms
  • Packed with practical examples to enable you to manipulate Accumulo with ease

In Detail

Accumulo is a sorted and distributed key/value store designed to handle large amounts of data. Being highly robust and scalable, its performance makes it ideal for real-time data storage. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift.

Apache Accumulo for Developers is your guide to building an Accumulo cluster both as a single-node and multi-node, on-site and in the cloud. Accumulo has been proven to be able to handle petabytes of data, with cell-level security, and real-time analyses so this is your step by step guide in taking full advantage of this power.

Apache Accumulo for Developers looks at the process of setting up three systems - Hadoop, ZooKeeper, and Accumulo – and configuring, monitoring, and securing them.

You will learn to connect Accumulo to both Hadoop and ZooKeeper. You will also learn how to monitor the cluster (single-node or multi-node) to find any performance bottlenecks, and then integrate to Amazon EC2, Google Cloud Platform, Rackspace, and Windows Azure. When integrating with these cloud platforms, we will focus on scripting as well.

You will also learn to troubleshoot clusters with monitoring tools, and use Accumulo cell-level security to secure your data.

Table of Contents

  1. Apache Accumulo for Developers
    1. Table of Contents
    2. Apache Accumulo for Developers
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Building an Accumulo Cluster from Scratch
      1. Necessary requirements
      2. Setting up Cygwin
      3. Setting up Hadoop
        1. SSH configuration
          1. Creating a Hadoop user
          2. Generating an SSH key for the Hadoop user
        2. Installing Hadoop
        3. Configuring Hadoop
          1. core-site.xml
          2. mapred-site.xml
          3. hdfs-site.xml
          4. hadoop-env.sh
        4. Preparing the Hadoop filesystem
        5. Starting the Hadoop cluster
        6. Multi-node configurations
          1. The NameNode website
          2. The JobTracker website
          3. The TaskTracker website
      4. Setting up ZooKeeper
        1. Installing ZooKeeper
        2. Configuring ZooKeeper
        3. Starting ZooKeeper
      5. Setting up and configuring Accumulo
        1. Installing Accumulo
        2. Configuring Accumulo
      6. Starting the Accumulo cluster
        1. The Accumulo website
      7. Connecting to the Accumulo cluster using Java
      8. Summary
    9. 2. Monitoring and Managing Accumulo
      1. Monitoring
        1. Setting up Ganglia
          1. Configuring Ganglia
        2. Setting up the Graylog2 server
          1. Logging using Graylog2
        3. Setting up Nagios
        4. Hadoop
          1. NameNode web interface
          2. Finding the logfiles
          3. How does Accumulo store files in Hadoop?
          4. Live, dead, and decommissioning nodes
        5. Accumulo
        6. Monitoring a system's overview
      2. Elasticity
      3. Failover
      4. Resource management
      5. Summary
    10. 3. Integrating Accumulo into Various Cloud Platforms
      1. Amazon EC2
        1. Prerequisites for Amazon EC2
        2. Creating Amazon EC2 Hadoop and ZooKeeper cluster
        3. Setting up Accumulo
      2. Google Cloud Platform
        1. Prerequisites for Google Cloud Platform
        2. Creating the project
        3. Installing the Google gcutil tool
          1. Configuring credentials
          2. Configuring the project
        4. Creating the firewall rules
        5. Creating the cluster
          1. Hadoop
          2. ZooKeeper
          3. Accumulo
        6. Deleting the cluster
      3. Rackspace
        1. Configuration
        2. Network
      4. Windows Azure
        1. Prerequisites
        2. Creating the cluster
          1. Hadoop
          2. ZooKeeper
          3. Accumulo
        3. Deleting the cluster
      5. Summary
    11. 4. Optimizing Accumulo Performance
      1. Prerequisites
      2. Hadoop performance
        1. Baseline
        2. Tuning
          1. Tuning parameters for mapred-default.xml
        3. HDFS
          1. Tuning parameters for mapred-site.xml
          2. Tuning parameters for hdfs-site.xml
      3. ZooKeeper performance
        1. ZooKeeper overview
      4. Accumulo performance
        1. Tuning parameters for accumulo-site.xml
        2. Accumulo overview
        3. Accumulo's performance summary
          1. Tables
          2. Comparing bulk ingest versus batch write
          3. Accumulo examples
      5. Summary
    12. 5. Security
      1. Visibility
        1. Creating an Accumulo user
        2. Creating tables in Accumulo
        3. How does visibility work?
      2. Security expression
        1. Writing a Java client
      3. Authorization
      4. User authorizations
      5. Handling secure authorization
      6. Query Services Layer
      7. Summary
    13. A. Accumulo Command References
    14. B. Hadoop Command References
    15. C. ZooKeeper Command References
    16. Index