AI and Big Data on IBM Power Systems Servers

Book Description

Abstract

As big data becomes more ubiquitous, businesses are wondering how they can best leverage it to gain insight into their most important business questions. Using machine learning (ML) and deep learning (DL) in big data environments can identify historical patterns and build artificial intelligence (AI) models that can help businesses to improve customer experience, add services and offerings, identify new revenue streams or lines of business (LOBs), and optimize business or manufacturing operations. The power of AI for predictive analytics is being harnessed across all industries, so it is important that businesses familiarize themselves with all of the tools and techniques that are available for integration with their data lake environments.

In this IBM® Redbooks® publication, we cover the best practices for deploying and integrating some of the best AI solutions on the market, including:


  • IBM Watson Machine Learning Accelerator (see note for product naming)

  • IBM Watson Studio Local

  • IBM Power Systems™

  • IBM Spectrum™ Scale

  • IBM Data Science Experience (IBM DSX)

  • IBM Elastic Storage™ Server

  • Hortonworks Data Platform (HDP)

  • Hortonworks DataFlow (HDF)

  • H2O Driverless AI


  • We map out all the integrations that are possible with our different AI solutions and how they can integrate with your existing or new data lake. We also walk you through some of our client use cases and show you how some of the industry leaders are using Hortonworks, IBM PowerAI, and IBM Watson Studio Local to drive decision making. We also advise you on your deployment options, when to use a GPU, and why you should use the IBM Elastic Storage Server (IBM ESS) to improve storage management. Lastly, we describe how to integrate IBM Watson Machine Learning Accelerator and Hortonworks with or without IBM Watson Studio Local, how to access real-time data, and security.

    Note: IBM Watson Machine Learning Accelerator is the new product name for IBM PowerAI Enterprise.

    Note: Hortonworks merged with Cloudera in January 2019. The new company is called Cloudera. References to Hortonworks as a business entity in this publication are now referring to the merged company. Product names beginning with Hortonworks continue to be marketed and sold under their original names.

    Table of Contents

    1. Front cover
    2. Figures
    3. Tables
    4. Examples
    5. Notices
      1. Trademarks
    6. Preface
      1. Authors
      2. Now you can become a published author, too!
      3. Comments welcome
      4. Stay connected to IBM Redbooks
    7. Chapter 1. Solution overview
      1. 1.1 Introduction
        1. 1.1.1 Types of AI
        2. 1.1.2 What is a data lake
      2. 1.2 Artificial intelligence solutions
        1. 1.2.1 IBM PowerAI
        2. 1.2.2 IBM Watson Machine Learning Accelerator
        3. 1.2.3 IBM Watson Studio Local
        4. 1.2.4 H2O Driverless AI
        5. 1.2.5 IBM PowerAI Vision
        6. 1.2.6 Key differences among the popular AI solutions
      3. 1.3 Data platforms
        1. 1.3.1 Apache Hadoop and Hortonworks
        2. 1.3.2 IBM Spectrum Scale
        3. 1.3.3 IBM Elastic Storage Server
        4. 1.3.4 IBM Spectrum Scale HDFS Transparency Connector 
      4. 1.4 When to use a GPU
      5. 1.5 Hortonworks Data Platform GPU support
        1. 1.5.1 Native GPU support
        2. 1.5.2 GPU discovery
        3. 1.5.3 GPU isolation and monitoring
        4. 1.5.4 GPU scheduling
        5. 1.5.5 Hortonworks Data Platform 3 and YARN container with GPU support
      6. 1.6 Linux on Power
      7. 1.7 Client use cases
        1. 1.7.1 Large European investment bank
        2. 1.7.2 Asian job-hunting services company
        3. 1.7.3 Large bank
        4. 1.7.4 South American IT services provider
        5. 1.7.5 Governmental agency
        6. 1.7.6 European IT services company
    8. Chapter 2. Integration overview
      1. 2.1 Architecture overview
        1. 2.1.1 Infrastructure stack
      2. 2.2 System configurations
        1. 2.2.1 IBM Watson Machine Learning Accelerator configuration
        2. 2.2.2 IBM Watson Studio Local configurations
        3. 2.2.3 Configuring an HDP system
        4. 2.2.4 Configuring a proof of concept
        5. 2.2.5 Conclusion
      3. 2.3 Deployment options
        1. 2.3.1 Deploying IBM Watson Studio Local in stand-alone mode or with IBM Watson Machine Learning Accelerator
        2. 2.3.2 Using the Hadoop Integration service versus using an Apache Livy connector
        3. 2.3.3 Deploying H2O Driverless AI in stand-alone mode or within IBM Watson Machine Learning Accelerator
        4. 2.3.4 Running Spark jobs
      4. 2.4 IBM Watson Machine Learning Accelerator and Hortonworks Data Platform
      5. 2.5 IBM Watson Studio Local with Hortonworks Data Platform
      6. 2.6 IBM Watson Studio Local with IBM Watson Machine Learning Accelerator
      7. 2.7 IBM Spectrum Scale and Hadoop Integration
        1. 2.7.1 Information Lifecycle Management
      8. 2.8 Security
        1. 2.8.1 Datalake security
        2. 2.8.2 IBM Watson Machine Learning Accelerator security with Hadoop
        3. 2.8.3 IBM Watson Studio Local security
        4. 2.8.4 IBM Spectrum Scale security
    9. Chapter 3. Integrating new data
      1. 3.1 Data ingestion overview
      2. 3.2 Types of data ingestion
      3. 3.3 Options for data ingestion
      4. 3.4 Using data connectors to work with external data sources
      5. 3.5 How integration improves the artificial intelligence models
    10. Chapter 4. Integration details
      1. 4.1 Integrating IBM Watson Machine Learning Accelerator with Hortonworks
        1. 4.1.1 Running remote Spark jobs with Livy
        2. 4.1.2 Accessing the Hadoop data from IBM Watson Machine Learning Accelerator
      2. 4.2 Integrating IBM Watson Studio Local, IBM Watson Machine Learning Accelerator, and Hadoop clusters
      3. 4.3 Integrating IBM Watson Studio Local with Hortonworks Data Platform
    11. Chapter 5. Accessing real-time data
      1. 5.1 Hortonworks DataFlow
        1. 5.1.1 Planning a Hortonworks DataFlow installation
      2. 5.2 Apache NiFi
        1. 5.2.1 Adding the Apache NiFi service to an HDF cluster
        2. 5.2.2 Working with Apache NiFi
        3. 5.2.3 Integrating Apache NiFi with a data science tool environment
      3. 5.3 Apache Storm
        1. 5.3.1 Working with Apache Storm
        2. 5.3.2 Integrating Apache Storm with a data science tool environment
      4. 5.4 Apache Spark Streaming
        1. 5.4.1 Working with Apache Spark Streaming
      5. 5.5 Apache Kafka Streams
      6. 5.6 Integrating streaming tools with data science tools
        1. 5.6.1 Application overview
        2. 5.6.2 Configuring a Kafka topic
        3. 5.6.3 Starting a streamer by using the nmap-ncat utility
        4. 5.6.4 Running the Spark Streaming engine
        5. 5.6.5 Merging data by using NiFi and saving it on HDFS
        6. 5.6.6 Conclusion of the data stream integration proof of concept
    12. Appendix A. Additional information
      1. System topology
      2. Software levels
    13. Appendix B. Installing an IBM Watson Machine Learning Accelerator notebook
      1. Customizing a notebook package
      2. Adding a notebook package
      3. Creating a Spark Instance Group with a notebook
      4. Creating notebooks for users
      5. Testing notebooks
      6. Conclusion and additional information
    14. Related publications
      1. IBM Redbooks
      2. Online resources
      3. Help from IBM
    15. Back cover

    Product Information

    • Title: AI and Big Data on IBM Power Systems Servers
    • Author(s): Rafael Freitas de Lima Ivaylo B. Bozhinov Scott Vetter Anto A John Ahmed. Mashhour, James Van Oosten, Fernando Vermelho, Allison White
    • Release date: March 2019
    • Publisher(s): IBM Redbooks
    • ISBN: 9780738457512