O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Hunk

Book Description

Visualize and analyze your Hadoop data using Hunk

About This Book

  • Explore your data in Hadoop and NoSQL data stores
  • Create and optimize your reporting experience with advanced data visualizations and data analytics
  • A comprehensive developer's guide that helps you create outstanding analytical solutions efficiently

Who This Book Is For

If you are Hadoop developers who want to build efficient real-time Operation Intelligence Solutions based on Hadoop deployments or various NoSQL data stores using Hunk, this book is for you. Some familiarity with Splunk is assumed.

What You Will Learn

  • Deploy and configure Hunk on top of Cloudera Hadoop
  • Create and configure Virtual Indexes for datasets
  • Make your data presentable using the wide variety of data visualization components and knowledge objects
  • Design a data model using Hunk best practices
  • Add more flexibility to your analytics solution via extended SDK and custom visualizations
  • Discover data using MongoDB as a data source
  • Integrate Hunk with AWS Elastic MapReduce to improve scalability

In Detail

Hunk is the big data analytics platform that lets you rapidly explore, analyse, and visualize data in Hadoop and NoSQL data stores. It provides a single, fluid user experience, designed to show you insights from your big data without the need for specialized skills, fixed schemas, or months of development. Hunk goes beyond typical data analysis methods and gives you the power to rapidly detect patterns and find anomalies across petabytes of raw data.

This book focuses on exploring, analysing, and visualizing big data in Hadoop and NoSQL data stores with this powerful full-featured big data analytics platform.

You will begin by learning the Hunk architecture and Hunk Virtual Index before moving on to how to easily analyze and visualize data using Splunk Search Language (SPL). Next you will meet Hunk Apps which can easy integrate with NoSQL data stores such as MongoDB or Sqqrl. You will also discover Hunk knowledge objects, build a semantic layer on top of Hadoop, and explore data using the friendly user-interface of Hunk Pivot. You will connect MongoDB and explore data in the data store. Finally, you will go through report acceleration techniques and analyze data in the AWS Cloud.

Style and approach

A step-by-step guide starting right from the basics and deep diving into the more advanced and technical aspects of Hunk.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Learning Hunk
    1. Table of Contents
    2. Learning Hunk
    3. Credits
    4. About the Authors
    5. About the Reviewer
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    8. 1. Meet Hunk
      1. Big data analytics
      2. The big problem
      3. The elegant solution
        1. Supporting SPL
        2. Intermediate results
      4. Getting to know Hunk
        1. Splunk versus Hunk
      5. Hunk architecture
        1. Connecting to Hadoop
        2. Advance Hunk deployment
        3. Native versus virtual indexes
          1. Native indexes
          2. Virtual index
        4. External result provider
        5. Computation models
          1. Data streaming
          2. Data reporting
          3. Mixed mode
        6. Hunk security
          1. One Hunk user to one Hadoop user
          2. Many Hunk users to one Hadoop user
          3. Hunk user(s) to the same Hadoop user with different queues
      6. Setting up Hadoop
        1. Starting and using a virtual machine with CDH5
          1. SSH user
          2. MySQL
      7. Starting the VM and cluster in VirtualBox
      8. Big data use case
        1. Importing data from RDBMS to Hadoop using Sqoop
        2. Telecommunications – SMS, Call, and Internet dataset from dandelion.eu
        3. Milano grid map
        4. CDR aggregated data import process
        5. Periodical data import from MySQL using Sqoop and Oozie
        6. Problems to solve
      9. Summary
    9. 2. Explore Hadoop Data with Hunk
      1. Setting up Hunk
        1. Extracting Hunk to a VM
          1. Setting up Hunk variables and configuration files
          2. Running Hunk for the first time
        2. Setting up a data provider and virtual index for CDR data
          1. Setting up a connection to Hadoop
          2. Setting up a virtual index for data stored in Hadoop
          3. Accessing data through a virtual index
      2. Exploring data
        1. Creating reports
          1. The top five browsers report
          2. Top referrers
          3. Site errors report
        2. Creating alerts
        3. Creating a dashboard
      3. Controlling security with Hunk
        1. The default Hadoop security
        2. One Hunk user to one Hadoop user
      4. Summary
    10. 3. Meeting Hunk Features
      1. Knowledge objects
        1. Field aliases
        2. Calculated fields
        3. Field extractions
        4. Tags
        5. Event type
        6. Workflow actions
        7. Macros
        8. Data model
          1. Add auto-extracting fields
          2. Adding GeoIP attributes
          3. Other ways to add attributes
      2. Introducing Pivot
      3. Summary
    11. 4. Adding Speed to Reports
      1. Big data performance issues
      2. Hunk report acceleration
        1. Creating a virtual index
        2. Streaming mode
        3. Creating an acceleration search
          1. What's going on in Hadoop?
        4. Report acceleration summaries
          1. Reviewing summary details
          2. Managing report accelerations
      3. Hunk accelerations limits
      4. Summary
    12. 5. Customizing Hunk
      1. What we are going to do with the Splunk SDK
        1. Supported languages
        2. Solving problems
        3. REST API
        4. The implementation plan
        5. The conclusion
      2. Dashboard customization using Splunk Web Framework
        1. Functionality
      3. A description of time-series aggregated CDR data
        1. Source data
        2. Creating a virtual index for Milano CDR
        3. Creating a virtual index for the Milano grid
        4. Creating a virtual index using sample data
      4. Implementation
        1. Querying the visualization
        2. Downloading the application
        3. Custom Google Maps
          1. Page layout
          2. Linear gradients and bins for the activity value
      5. Custom map components
        1. Other components
      6. The final result
      7. Summary
    13. 6. Discovering Hunk Integration Apps
      1. What is Mongo?
        1. Installation
        2. Installing the Mongo app
        3. Mongo provider
        4. Creating a virtual index
          1. Inputting data from the recommendation engine backend
          2. Data schemas
          3. Data mechanics
      2. Counting by shop in a single collection
      3. Counting events in all collections
        1. Counting events in shops for observed days
      4. Summary
    14. 7. Exploring Data in the Cloud
      1. An introduction to Amazon EMR and S3
        1. Amazon EMR
          1. Setting up an Amazon EMR cluster
        2. Amazon S3
          1. S3 as a data provider for Hunk
          2. The advantages of EMR and S3
      2. Integrating Hunk with EMR and S3
        1. Method 1: BYOL
          1. Setting up the Hunk AMI
          2. Adding a license
          3. Configuring the data provider
          4. Configuring a virtual index
          5. Setting up a provider and virtual index in the configuration file
          6. Exploring data
        2. Method 2: Hunk–hourly pricing
          1. Provisioning a Hunk instance using the Cloud formation template
          2. Provisioning a Hunk instance using the EC2 Console
      3. Converting Hunk from an hourly rate to a license
      4. Summary
    15. Index