Book description
Your all-in-one resource for using Pentaho with MySQL for Business Intelligence and Data Warehousing
Open-source Pentaho provides business intelligence (BI) and data warehousing solutions at a fraction of the cost of proprietary solutions. Now you can take advantage of Pentaho for your business needs with this practical guide written by two major participants in the Pentaho community.
The book covers all components of the Pentaho BI Suite. You'll learn to install, use, and maintain Pentaho-and find plenty of background discussion that will bring you thoroughly up to speed on BI and Pentaho concepts.
Of all available open source BI products, Pentaho offers the most comprehensive toolset and is the fastest growing open source product suite
Explains how to build and load a data warehouse with Pentaho Kettle for data integration/ETL, manually create JFree (pentaho reporting services) reports using direct SQL queries, and create Mondrian (Pentaho analysis services) cubes and attach them to a JPivot cube browser
Review deploying reports, cubes and metadata to the Pentaho platform in order to distribute BI solutions to end-users
Shows how to set up scheduling, subscription and automatic distribution
The companion Web site provides complete source code examples, sample data, and links to related resources.
Table of contents
- Copyright
- About the Author
- Credits
- Acknowledgments
- Introduction
-
I. Getting Started with Pentaho
- 1. Quick Start: Pentaho Examples
- 2. Prerequisites
- 3. Server Installation and Configuration
-
4. The Pentaho BI Stack
- 4.1. Pentaho BI Stack Perspectives
- 4.2. The Pentaho Business Intelligence Server
- 4.3. Desktop Programs
- 4.4. Pentaho Enterprise Edition and Community Edition
- 4.5. Creating Action Sequences with Pentaho Design Studio
- 4.6. Summary
-
II. Dimensional Modeling and Data Warehouse Design
- 5. Example Business Case: World Class Movies
-
6. Data Warehouse Primer
- 6.1. Why Do You Need a Data Warehouse?
- 6.2. The Big Debate: Inmon Versus Kimball
- 6.3. Data Warehouse Architecture
- 6.4. Data Warehouse Challenges
- 6.5. Data Warehouse Trends
- 6.6. Summary
-
7. Modeling the Business Using Star Schemas
- 7.1. What Is a Star Schema?
- 7.2. Querying Star Schemas
- 7.3. The Bus Architecture
- 7.4. Design Principles
- 7.5. Handling Dimension Changes
- 7.6. Advanced Dimensional Model Concepts
- 7.7. Summary
-
8. The Data Mart Design Process
- 8.1. Requirements Analysis
-
8.2. Data Analysis
- 8.2.1. Data Profiling
-
8.2.2. Using eobjects.org DataCleaner
- 8.2.2.1. Adding Profile Tasks
- 8.2.2.2. Adding Database Connections
- 8.2.2.3. Doing an Initial Profile
- 8.2.2.4. Working with Regular Expressions
- 8.2.2.5. Profiling and Exploring Results
- 8.2.2.6. Validating and Comparing Data
- 8.2.2.7. Using a Dictionary for Column Dependency Checks
- 8.2.2.8. Alternative Solutions
- 8.3. Developing the Model
- 8.4. Data Modeling with Power*Architect
- 8.5. Building the WCM Data Marts
- 8.6. Summary
-
III. ETL and Data Integration
-
9. Pentaho Data Integration Primer
- 9.1. Data Integration Overview
-
9.2. Getting Started with Spoon
- 9.2.1. Launching the Spoon Application
- 9.2.2. A Simple "Hello, World!" Example
- 9.2.3. Checking Consistency and Dependencies
-
9.2.4. Working with Database Connections
- 9.2.4.1. JDBC and ODBC Connectivity
- 9.2.4.2. Creating a Database Connection
- 9.2.4.3. Testing Database Connections
- 9.2.4.4. How Database Connections Are Used
- 9.2.4.5. A Database-Enabled "Hello, World!" Example
- 9.2.4.6. Database Connection Configuration Management
- 9.2.4.7. Generic Database Connections
- 9.3. Summary
-
10. Designing Pentaho Data Integration Solutions
-
10.1. Generating Dimension Table Data
- 10.1.1. Using Stored Procedures
-
10.1.2. Loading a Simple Date Dimension
- 10.1.2.1. CREATE TABLE dim_date: Using the Execute SQL Script Step
- 10.1.2.2. Missing Date and Generate Rows with Initial Date: The Generate Rows Step
- 10.1.2.3. Days Sequence: The Add Sequence Step
- 10.1.2.4. Calculate and Format Dates: The Calculator Step
- 10.1.2.5. The Value Mapper Step
- 10.1.2.6. Load dim_date: The Table Output Step
- 10.1.3. More Advanced Date Dimension Features
- 10.1.4. Loading a Simple Time Dimension
- 10.1.5. Loading the Demography Dimension
-
10.2. Loading Data from Source Systems
-
10.2.1. Staging Lookup Values
- 10.2.1.1. The stage_lookup_data Job
- 10.2.1.2. The START Job Entry
- 10.2.1.3. Transformation Job Entries
- 10.2.1.4. Mail Success and Mail Failure
- 10.2.1.5. The extract_lookup_type and extract_lookup_value Transformations
- 10.2.1.6. The stage_lookup_data Transformation
- 10.2.1.7. Check If Staging Table Exists: The Table Exists Step
- 10.2.1.8. The Filter rows Step
- 10.2.1.9. Create Staging Table: Executing Dynamic SQL
- 10.2.1.10. The Dummy Step
- 10.2.1.11. The Stream Lookup Step
- 10.2.1.12. Sort on Lookup Type: The Sort Rows Step
- 10.2.1.13. Store to Staging Table: Using a Table Output Step to Load Multiple Tables
-
10.2.2. The Promotion Dimension
- 10.2.2.1. Promotion Mappings
- 10.2.2.2. Promotion Data Changes
- 10.2.2.3. Synchronization Frequency
- 10.2.2.4. The load_dim_promotion Job
- 10.2.2.5. The extract_promotion Transformation
- 10.2.2.6. Determining Promotion Data Changes
- 10.2.2.7. Saving the Extract and Passing on the File Name
- 10.2.2.8. Picking Up the File and Loading the Extract
-
10.2.1. Staging Lookup Values
- 10.3. Summary
-
10.1. Generating Dimension Table Data
-
11. Deploying Pentaho Data Integration Solutions
-
11.1. Configuration Management
- 11.1.1. Using Variables
- 11.1.2. Using JNDI Connections
-
11.1.3. Working with the PDI Repository
- 11.1.3.1. Creating a PDI Repository
- 11.1.3.2. Connecting to the Repository
- 11.1.3.3. Automatically Connecting to a Default Repository
- 11.1.3.4. The Repository Explorer
- 11.1.3.5. Managing Repository User Accounts
- 11.1.3.6. How PDI Keeps Track of Repositories
- 11.1.3.7. Upgrading an Existing Repository
- 11.2. Running in the Deployment Environment
- 11.3. Summary
-
11.1. Configuration Management
-
9. Pentaho Data Integration Primer
-
IV. Business Intelligence Applications
-
12. The Metadata Layer
- 12.1. Metadata Overview
- 12.2. Pentaho Metadata Features
- 12.3. Creation and Maintenance of Metadata
- 12.4. Summary
-
13. Using the Pentaho Reporting Tools
- 13.1. Reporting Architecture
- 13.2. Web-Based Reporting
- 13.3. Practical Uses of WAQR
-
13.4. Pentaho Report Designer
- 13.4.1. The PRD Screen
- 13.4.2. Report Structure
- 13.4.3. Report Elements
- 13.4.4. Creating Data Sets
- 13.4.5. Adding and Using Parameters
- 13.4.6. Layout and Formatting
- 13.4.7. Alternate Row Colors: Row Banding
- 13.4.8. Grouping and Summarizing Data
- 13.4.9. Adding Charts and Graphs
- 13.4.10. Working with Subreports
- 13.4.11. Publishing and Exporting Reports
- 13.5. Summary
-
14. Scheduling, Subscription, and Bursting
- 14.1. Scheduling
- 14.2. Background Execution and Subscription
- 14.3. Bursting
- 14.4. Summary
-
15. OLAP Solutions Using Pentaho Analysis Services
- 15.1. Overview of Pentaho Analysis Services
-
15.2. MDX Primer
- 15.2.1. Cubes, Dimensions, and Measures
- 15.2.2. Hierarchies, Levels, and Members
- 15.2.3. Cube Family Relationships
-
15.2.4. MDX Query Syntax
- 15.2.4.1. Basic MDX Query
- 15.2.4.2. Axes: ON ROWS and ON COLUMNS
- 15.2.4.3. Looking at a Part of the Data
- 15.2.4.4. Dimension on Only One Axis
- 15.2.4.5. More MDX Examples: a Simple Cube
- 15.2.4.6. The FILTER Function
- 15.2.4.7. The ORDER Function
- 15.2.4.8. Using TOPCOUNT and BOTTOMCOUNT
- 15.2.4.9. Combining Dimensions: The CROSSJOIN Function
- 15.2.4.10. Using NON EMPTY
- 15.2.4.11. Working with Sets and the WITH Clause
- 15.2.4.12. Using Calculated Members
-
15.3. Creating Mondrian Schemas
- 15.3.1. Getting Started with Pentaho Schema Workbench
- 15.3.2. Using the Schema Editor
-
15.3.3. Creating and Editing a Basic Schema
- 15.3.3.1. Basic Schema Editing Tasks
- 15.3.3.2. Creating a Cube
- 15.3.3.3. Choosing a Fact Table
- 15.3.3.4. Adding Measures
- 15.3.3.5. Adding Dimensions
- 15.3.3.6. Adding and Editing Hierarchies and Choosing Dimension Tables
- 15.3.3.7. Adding Hierarchy Levels
- 15.3.3.8. Associating Cubes with Shared Dimensions
- 15.3.3.9. Adding the DVD and Customer Dimensions
- 15.3.3.10. XML Listing
- 15.3.4. Testing and Deployment
- 15.3.5. Schema Design Topics We Didn't Cover
- 15.4. Visualizing Mondrian Cubes with JPivot
- 15.5. Enhancing Performance Using the Pentaho Aggregate Designer
- 15.6. Summary
- 16. Data Mining with Weka
-
17. Building Dashboards
- 17.1. The Community Dashboard Framework
- 17.2. CDF Concepts and Architecture
- 17.3. Example: Customers and Websites Dashboard
- 17.4. Summary
-
12. The Metadata Layer
Product information
- Title: Pentaho® Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL®
- Author(s):
- Release date: August 2009
- Publisher(s): Wiley
- ISBN: 9780470484326
You might also like
video
Fighting Churn Customer Behavior: Load the event data into PostgreSQL and run analytics
In this video, Carl Gold goes through the milestone number 1: Customer Behavior: Load the event …
book
Pentaho Data Integration Beginner's Guide
Extract, Transform, and Load (ETL) is the essence of data integration and this book shows you …
book
Pentaho® Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration
A complete guide to Pentaho Kettle, the Pentaho Data lntegration toolset for ETL This practical book …
book
Pentaho Data Integration Cookbook Second Edition
The premier open source ETL tool is at your command with this recipe-packed cookbook. Learn to …