O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Implementing an IBM InfoSphere BigInsights Cluster using Linux on Power

Book Description

This IBM® Redbooks® publication demonstrates and documents how to implement and manage an IBM PowerLinux™ cluster for big data focusing on hardware management, operating systems provisioning, application provisioning, cluster readiness check, hardware, operating system, IBM InfoSphere® BigInsights™, IBM Platform Symphony®, IBM Spectrum™ Scale (formerly IBM GPFS™), applications monitoring, and performance tuning. This publication shows that IBM PowerLinux clustering solutions (hardware and software) deliver significant value to clients that need cost-effective, highly scalable, and robust solutions for big data and analytics workloads.

This book documents and addresses topics on how to use IBM Platform Cluster Manager to manage PowerLinux BigData data clusters through IBM InfoSphere BigInsights, Spectrum Scale, and Platform Symphony. This book documents how to set up and manage a big data cluster on PowerLinux servers to customize application and programming solutions, and to tune applications to use IBM hardware architectures. This document uses the architectural technologies and the software solutions that are available from IBM to help solve challenging technical and business problems.

This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering cost-effective Linux on IBM Power Systems™ solutions that help uncover insights among client's data so they can act to optimize business results, product development, and scientific discoveries.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. IBM Redbooks promotions
  4. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  5. Chapter 1. Introduction to the solution
    1. 1.1 InfoSphere BigInsights
      1. 1.1.1 The advantages of InfoSphere BigInsights
      2. 1.1.2 What practical problems can be solved with InfoSphere BigInsights
    2. 1.2 Linux on IBM Power Systems
      1. 1.2.1 IBM Power Systems
      2. 1.2.2 Linux for big data
    3. 1.3 IBM Platform Symphony
    4. 1.4 IBM Spectrum Scale-FPO (formerly GPFS-FPO)
    5. 1.5 IBM Platform Cluster Manager
  6. Chapter 2. Reference architecture
    1. 2.1 The big data environment reference architecture
    2. 2.2 Hardware architecture for InfoSphere BigInsights clusters on POWER
      1. 2.2.1 General architecture
      2. 2.2.2 The POSHv2 architecture
    3. 2.3 Software architecture for InfoSphere BigInsights clusters
      1. 2.3.1 Cluster management software
  7. Chapter 3. Installation
    1. 3.1 Linux on Power Systems
      1. 3.1.1 Operating system installation
      2. 3.1.2 Operating system prerequisites setup for InfoSphere BigInsights
    2. 3.2 InfoSphere BigInsights
      1. 3.2.1 Installation
    3. 3.3 Platform Cluster Manager - Advanced Edition
      1. 3.3.1 Planning a system configuration
      2. 3.3.2 Performing the installation
      3. 3.3.3 Managing and configuring the IBM Power Systems nodes
      4. 3.3.4 Provisioning templates
      5. 3.3.5 Adding nodes to Platform Cluster Manager - Advanced Edition
  8. Chapter 4. Design considerations
    1. 4.1 Important factors for sizing an InfoSphere BigInsights cluster
      1. 4.1.1 Scalability
      2. 4.1.2 Availability
    2. 4.2 Customizing the predefined configurations
    3. 4.3 IBM Spectrum Scale (formerly GPFS) considerations
    4. 4.4 High availability considerations
      1. 4.4.1 Designing for high availability
    5. 4.5 Throughput and bandwidth considerations
      1. 4.5.1 The data network
      2. 4.5.2 Administrative/management network
    6. 4.6 Data volumes considerations
    7. 4.7 Security, user authentication, and edge nodes
      1. 4.7.1 Security preferred practices in non-relational data stores
      2. 4.7.2 Securing IBM Spectrum Scale
      3. 4.7.3 Securing data storage and transaction logs
    8. 4.8 Impact of use cases in design
      1. 4.8.1 Workload characteristics definition
  9. Chapter 5. Solution customization
    1. 5.1 IBM Elastic Storage Server
    2. 5.2 IBM Platform Symphony MapReduce
    3. 5.3 File system: Spectrum Scale / HDFS (architectural changes when not using Spectrum Scale)
    4. 5.4 InfoSphere BigInsights high availability
      1. 5.4.1 IBM Spectrum Scale
      2. 5.4.2 HDFS
    5. 5.5 Security: InfoSphere BigInsights user authentication
      1. 5.5.1 Using flat file security
      2. 5.5.2 Using LDAP security
  10. Chapter 6. Cluster management
    1. 6.1 Managing nodes in a Platform Cluster Manager - Advanced Edition environment
      1. 6.1.1 Adding nodes to a Platform Cluster Manager - Advanced Edition environment
      2. 6.1.2 Removing nodes from a Platform Cluster Manager - Advanced Edition environment (including the monitored node entry)
      3. 6.1.3 Monitoring a Platform Cluster Manager - Advanced Edition managed node
    2. 6.2 Managing InfoSphere BigInsights cluster nodes within Platform Cluster Manager - Advanced Edition
      1. 6.2.1 Creating a cluster template
      2. 6.2.2 Creating a cluster from a cluster template
  11. Chapter 7. Tuning
    1. 7.1 Tuning IBM InfoSphere BigInsights
      1. 7.1.1 Tuning at the operating system level
      2. 7.1.2 Tuning Hadoop
      3. 7.1.3 Tuning BigSQL
    2. 7.2 Tuning IBM Spectrum Scale (formerly GPFS)
      1. 7.2.1 Tuning at the operating system level
      2. 7.2.2 Tuning the Spectrum Scale daemon (formerly the GPFS daemon)
      3. 7.2.3 Configuring the Spectrum Scale file system
    3. 7.3 Tuning the Platform Symphony MapReduce framework
  12. Appendix A. Integration and configuration for IBM Spectrum Scale, Hadoop, and IBM Platform Symphony
    1. Test cluster description
    2. Installation and configuration of IBM Spectrum Scale (formerly GPFS)
    3. Configuration of the IBM Spectrum Scale File Placement Optimizer Hadoop Connector
    4. Installing and configuring IBM Java and Apache Hadoop
    5. Running a Hadoop MapReduce job
    6. Installing and configuring IBM Platform Symphony V7.1
    7. Running Hadoop MapReduce jobs on Platform Symphony
  13. Appendix B. Scripts
    1. Postscripts that are defined in the InfoSphere BigInsights cluster template
    2. Postscripts that are defined in the image profile
  14. Appendix C. BigData Enablement and Administration Toolkit introduction
    1. BigData Enablement and Administration Toolkit overview
    2. Big data solution deployment with BigData Enablement and Administration Toolkit
  15. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  16. Back cover