Making Data Smarter with IBM Spectrum Discover: Practical AI Solutions

Book description

More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data, such as the following examples:


  • Pinpointing and activating relevant data for large-scale analytics
  • Lacking the fine-grained visibility that is needed to map data to business priorities
  • Removing redundant, obsolete, and trivial (ROT) data
  • Identifying and classifying sensitive data

IBM® Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on-premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage.

IBM Spectrum® Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.

This IBM Redbooks® publication presents several use cases that are focused on artificial intelligence (AI) solutions with IBM Spectrum Discover. This book helps storage administrators and technical specialists plan and implement AI solutions by using IBM Spectrum Discover and several other IBM Storage products.

Table of contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. IBM Spectrum Discover overview
    1. 1.1 Introduction
    2. 1.2 Extensible platform for data oversight
    3. 1.3 Benefits
    4. 1.4 IBM Spectrum Discover use cases
      1. 1.4.1 Large-scale analytics/artificial intelligence/machine learning
      2. 1.4.2 Data and storage optimization use case
      3. 1.4.3 Data governance
      4. 1.4.4 Data management
    5. 1.5 Architecture
      1. 1.5.1 Role-based access control
      2. 1.5.2 Data source connections
      3. 1.5.3 GUI
      4. 1.5.4 Reports
    6. 1.6 A deeper look at metadata
      1. 1.6.1 Cataloging metadata
      2. 1.6.2 Enriching metadata
      3. 1.6.3 Policies and user-defined metadata
      4. 1.6.4 IBM Spectrum Discover Application Catalog and Software Development Kit
      5. 1.6.5 Data movement with IBM Spectrum Discover
    7. 1.7 Deployment patterns
    8. 1.8 Overview of the use cases in the book
  5. Chapter 2. Generic imagery use cases
    1. 2.1 Categorizing medical imaging data with content-search capability
      1. 2.1.1 Metadata provided in DICOM Files
      2. 2.1.2 Using CONTENTSEARCH policy to extract DICOM metadata
      3. 2.1.3 Exploring DICOM files by using IBM Spectrum Discover
    2. 2.2 Extracting metadata from LIDAR imagery by using custom applications
      1. 2.2.1 Using the Point Data Abstraction Library with LIDAR imagery
      2. 2.2.2 User-defined tagging for the metadata from PDAL
      3. 2.2.3 Creating a policy to collect tag values from PDAL
      4. 2.2.4 Using the metadata to identify locations of interest
    3. 2.3 Organizing training data sets for artificial intelligence
      1. 2.3.1 Extracting training set metadata
      2. 2.3.2 Visual exploration of data in the transmission_inspect project
    4. 2.4 Summary
  6. Chapter 3. AI pipeline that uses IBM Spectrum Discover
    1. 3.1 Introduction to AI pipeline
      1. 3.1.1 Ingest
      2. 3.1.2 Curate
      3. 3.1.3 Inspect
      4. 3.1.4 Generate
    2. 3.2 AI pipeline by using IBM Spectrum Discover
      1. 3.2.1 Ingest
      2. 3.2.2 Curation
      3. 3.2.3 Analysis
      4. 3.2.4 Helper application
      5. 3.2.5 Query and inference
      6. 3.2.6 Generate
      7. 3.2.7 Reports
      8. 3.2.8 Pipeline orchestration
    3. 3.3 Summary and value proposition
  7. Chapter 4. Using artificial intelligence in medical imaging: JFR Challenge
    1. 4.1 Introduction and overview
      1. 4.1.1 Context of AI in medical imaging
      2. 4.1.2 The JFR Challenge
      3. 4.1.3 Managing complex medical data
      4. 4.1.4 Use case description
    2. 4.2 Use case products
    3. 4.3 Benefits
      1. 4.3.1 Unified data sources
      2. 4.3.2 CONTENTSEARCH policies
      3. 4.3.3 Capabilities extension
      4. 4.3.4 Data copy
      5. 4.3.5 API interfacing
    4. 4.4 Use case architecture
    5. 4.5 Implementation steps
      1. 4.5.1 AI inference service
      2. 4.5.2 Unified data sources
      3. 4.5.3 Training
      4. 4.5.4 Inference
      5. 4.5.5 New model release
    6. 4.6 Online resources
    7. 4.7 Summary
  8. Chapter 5. IBM Spectrum Discover integration with IBM Spectrum Archive Enterprise Edition
    1. 5.1 Use cases introduction and overview
    2. 5.2 Benefits
    3. 5.3 Products involved
    4. 5.4 IBM Spectrum Discover integration with IBM Spectrum Scale and IBM Spectrum Archive EE architecture
      1. 5.4.1 Data view of migration status with IBM Spectrum Discover
      2. 5.4.2 Data movement with IBM Spectrum Discover
    5. 5.5 Implementation key points
    6. 5.6 Sample use cases
      1. 5.6.1 Data Governance use case: Data staging for high-performance processing
      2. 5.6.2 Data Optimization use case: Data migration to tape for cost-efficient archiving
    7. 5.7 Online resources
    8. 5.8 Summary
  9. Appendix A. IBM Spectrum Scale, IBM Spectrum Archive, and IBM Tape libraries product details
    1. IBM Spectrum Scale overview
    2. IBM Spectrum Archive overview
    3. IBM tape technologies overview
  10. Appendix B. Additional material
    1. Locating the GitHub material
    2. Cloning the GitHub material
  11. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  12. Back cover

Product information

  • Title: Making Data Smarter with IBM Spectrum Discover: Practical AI Solutions
  • Author(s): Ivaylo B. Bozhinov, Isom Crawford Jr., Joseph Dain, Mathias Defiebre, Maxime Deloche, Kiran Ghag, Vasfi Gucer, Xin Liu, Abeer Selim, Gauthier Siri, Christopher Vollmar
  • Release date: October 2020
  • Publisher(s): IBM Redbooks
  • ISBN: 9780738459134