Data Accelerator for AI and Analytics

Book description

This IBM® Redpaper publication focuses on data orchestration in enterprise data pipelines. It provides details about data orchestration and how to address typical challenges that customers face when dealing with large and ever-growing amounts of data for data analytics. While the amount of data increases steadily, artificial intelligence (AI) workloads must speed up to deliver insights and business value in a timely manner.

This paper provides a solution that addresses these needs: Data Accelerator for AI and Analytics (DAAA). A proof of concept (PoC) is described in detail.

This paper focuses on the functions that are provided by the Data Accelerator for AI and Analytics solution, which simplifies the daily work of data scientists and system administrators. This solution helps increase the efficiency of storage systems and data processing to obtain results faster while eliminating unnecessary data copies and associated data management.

Table of contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. Data orchestration in enterprise data pipelines
    1. 1.1 Introduction
    2. 1.2 Overview
    3. 1.3 Sample use case: Building the correct training and validation data set
  5. Chapter 2. Data Accelerator for AI and Analytics supporting data orchestration
    1. 2.1 Generic components
      1. 2.1.1 Data layer
      2. 2.1.2 High-performance storage with a smart data cache layer
      3. 2.1.3 Compute cluster layer
      4. 2.1.4 Data catalog layer
      5. 2.1.5 Interfaces between the layers
    2. 2.2 Proof of concept environment
      1. 2.2.1 Red Hat OpenShift V4.5.9 cluster
      2. 2.2.2 IBM Spectrum Scale V5.1.0 storage cluster
      3. 2.2.3 IBM ESS storage cluster
      4. 2.2.4 Capacity tier storage
      5. 2.2.5 IBM Spectrum Discover V2.0.2+ metadata catalog
      6. 2.2.6 IBM Spectrum LSF Workload Manager
      7. 2.2.7 Description of the Audi Autonomous Driving Dataset
  6. Chapter 3. Data Accelerator for AI and Analytics use cases
    1. 3.1 Generic workflow
      1. 3.1.1 Provisioning phase
      2. 3.1.2 Analytic usage phase
    2. 3.2 Trigging an analytic job by using an integrated development environment
    3. 3.3 Workload manager starts an analytics job
    4. 3.4 New data ingest triggers an analytics job
    5. 3.5 The layer on top of workload triggers
  7. Chapter 4. Planning for Data Accelerator for AI and Analytics
    1. 4.1 Security and data access rights considerations
    2. 4.2 Data layer
      1. 4.2.1 Network-attached storage (NSF) Filer
      2. 4.2.2 Cloud object storage
      3. 4.2.3 IBM Spectrum Archive Enterprise Edition Tape
    3. 4.3 High-performance storage with smart data cache layer
      1. 4.3.1 IBM ESS 3000 and IBM Spectrum Scale
    4. 4.4 Compute cluster layer
      1. 4.4.1 IBM Spectrum LSF
      2. 4.4.2 Compute cluster
    5. 4.5 Data catalog layer
      1. 4.5.1 IBM Spectrum Discover
  8. Chapter 5. Deployment considerations for Data Accelerator for AI and Analytics
    1. 5.1 Data layer
      1. 5.1.1 Network-attached storage (NAS) Filer
      2. 5.1.2 IBM Cloud Object Storage
      3. 5.1.3 IBM Spectrum Archive Enterprise Edition Tape
    2. 5.2 High-performance storage with smart data cache layer
      1. 5.2.1 IBM ESS 3000 and IBM Spectrum Scale
    3. 5.3 Compute cluster layer
      1. 5.3.1 IBM Spectrum LSF
      2. 5.3.2 IBM Spectrum Scale storage cluster
      3. 5.3.3 Compute cluster
    4. 5.4 Data catalog layer
      1. 5.4.1 IBM Spectrum Discover
    5. 5.5 The Data Accelerator for AI and Analytics interface glue code
  9. Appendix A. Code samples
  10. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  11. Back cover

Product information

  • Title: Data Accelerator for AI and Analytics
  • Author(s): Simon Lorenz, Gero Schmidt, TJ Harris, Mike Knieriemen, Nils Haustein, Abhishek Dave, Venkateswara Puvvada, Christof Westhues
  • Release date: January 2021
  • Publisher(s): IBM Redbooks
  • ISBN: 9780738459325