O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Pentaho for Big Data Analytics

Book Description

With your knowledge of Java and this guide, you can take the analysis of your big data to new levels using Pentaho. Covers all the essentials tools, techniques, tips, and tricks in one handy volume.

  • A guide to using Pentaho Business Analytics for big data analysis
  • Learn Pentaho’s visualization and reporting tools with practical examples and tips
  • Precise insights into churning big data into meaningful knowledge with Pentaho

In Detail

Pentaho accelerates the realization of value from big data with the most complete solution for big data analytics and data integration. The real power of big data analytics is the abstraction between data and analytics. Data can be distributed across the cluster in various formats, and the analytics platform should have the capability to talk to different heterogeneous data stores and fetch the filtered data to enrich its value.

Pentaho Big Data Analytics is a practical, hands-on guide that provides you with clear, step-by-step exercises for using Pentaho to take advantage of big data systems, where data beats algorithm, and gives you a good grounding in using Pentaho Business Analytics’ capabilities.

This book looks at the key ingredients of the Pentaho Business Analytics platform. We will see how to prepare the Pentaho BI environment, and get to grips with the big data ecosystem through Hadoop and Pentaho MapReduce. The book provides a clear guide to the essential tools of Pentaho Business Analytics, providing familiarity with both the various design tools for setting up reports, and the visualization tools necessary for complete data analysis.

Table of Contents

  1. Pentaho for Big Data Analytics
    1. Table of Contents
    2. Pentaho for Big Data Analytics
    3. Credits
    4. About the Authors
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. The Rise of Pentaho Analytics along with Big Data
      1. Pentaho BI Suite – components
        1. Data
        2. Server applications
        3. Thin Client Tools
        4. Design tools
      2. Edge over competitors
      3. Summary
    9. 2. Setting Up the Ground
      1. Pentaho BI Server and the development platform
      2. Prerequisites/system requirements
      3. Obtaining Pentaho BI Server (Community Edition)
      4. The JAVA_HOME and JRE_HOME environment variables
      5. Running Pentaho BI Server
      6. Pentaho User Console (PUC)
      7. Pentaho Action Sequence and solution
      8. The JPivot component example
      9. The message template component example
      10. The embedded HSQLDB database server
      11. Pentaho Marketplace
      12. Saiku installation
      13. Pentaho Administration Console (PAC)
      14. Creating data connections
      15. Summary
    10. 3. Churning Big Data with Pentaho
      1. An overview of Big Data and Hadoop
        1. Big Data
        2. Hadoop
      2. The Hadoop architecture
        1. The Hadoop ecosystem
        2. Hortonworks Sandbox
      3. Pentaho Data Integration (PDI)
        1. The Pentaho Big Data plugin configuration
      4. Importing data to Hive
      5. Putting a data file into HDFS
      6. Loading data from HDFS into Hive (job orchestration)
      7. Summary
    11. 4. Pentaho Business Analytics Tools
      1. The business analytics life cycle
      2. Preparing data
        1. Preparing BI Server to work with Hive
        2. Executing and monitoring a Hive MapReduce job
      3. Pentaho Reporting
      4. Data visualization and dashboard building
        1. Creating a layout using a predefined template
        2. Creating a data source
        3. Creating a component
      5. Summary
    12. 5. Visualization of Big Data
      1. Data visualization
      2. Data source preparation
        1. Repopulating the nyse_stocks Hive table
        2. Pentaho's data source integration
        3. Consuming PDI as a CDA data source
      3. Visualizing data using CTools
        1. Visualizing trends using a line chart
        2. Interactivity using a parameter
        3. Multiple pie charts
        4. Waterfall charts
      4. CSS styling
      5. Summary
    13. A. Big Data Sets
      1. Freebase
      2. U.S. airline on-time performance
      3. Amazon public data sets
    14. B. Hadoop Setup
      1. Hortonworks Sandbox
        1. Setting up the Hortonworks Sandbox
        2. Hortonworks Sandbox web administration
      2. Transferring a file using secure FTP
      3. Preparing Hive data
      4. The nyse_stocks sample data
    15. Index