IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage

Book description

This IBM® Redpaper publication provides a comprehensive overview of the IBM Spectrum® Discover metadata management software platform. We give a detailed explanation of how the product creates, collects, and analyzes metadata.

Several in-depth use cases are used that show examples of analytics, governance, and optimization. We also provide step-by-step information to install and set up the IBM Spectrum Discover trial environment.

More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data such as:


  • Pinpointing and activating relevant data for large-scale analytics
  • Lacking the fine-grained visibility that is needed to map data to business priorities
  • Removing redundant, obsolete, and trivial (ROT) data
  • Identifying and classifying sensitive data

      IBM Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage.

      IBM Spectrum Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.

Table of contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. IBM Spectrum Discover overview
    1. 1.1 Introduction
    2. 1.2 At a high level
    3. 1.3 Architecture
      1. 1.3.1 Role-based access control
      2. 1.3.2 Data source connections
      3. 1.3.3 Cataloging metadata
      4. 1.3.4 Enriching metadata
      5. 1.3.5 Graphical user interface
      6. 1.3.6 Reports
    4. 1.4 Environment requirements
      1. 1.4.1 The foundation
      2. 1.4.2 Deployment models
      3. 1.4.3 Recommended single node trial requirements
  5. Chapter 2. Metadata essentials
    1. 2.1 Metadata collection
      1. 2.1.1 System metadata and scans
      2. 2.1.2 User-defined metadata
    2. 2.2 Metadata management and exploration
      1. 2.2.1 Searching and reporting
      2. 2.2.2 Temperature tag
      3. 2.2.3 SizeRange and TimeSinceAccess tags
  6. Chapter 3. Sample use cases
    1. 3.1 Storage optimization
      1. 3.1.1 Gaining insight into unstructured data
      2. 3.1.2 Mapping data to business priorities
      3. 3.1.3 Reducing storage operation expenditures
    2. 3.2 Data governance
      1. 3.2.1 Use case scenario
      2. 3.2.2 Data stewardship with IBM Spectrum Discover
      3. 3.2.3 Documenting the various PII components
      4. 3.2.4 Identifying regular expressions for the PII components
      5. 3.2.5 Creating tags to identify files or objects that include PII
      6. 3.2.6 Creating policies to identify files or objects that include PII
      7. 3.2.7 Defining and scheduling regular reports for governance
      8. 3.2.8 Summary
    3. 3.3 Healthcare and life sciences use cases
      1. 3.3.1 Variant Call Format use case
      2. 3.3.2 Digital Imaging and Communications in Medicine use case
    4. 3.4 Summary
  7. Chapter 4. Deep inspection and the AI pipeline
    1. 4.1 Overview
    2. 4.2 Collecting metadata by using deep inspection
      1. 4.2.1 Defining the tag
      2. 4.2.2 Implementing and starting the deep inspection agent
      3. 4.2.3 Defining and running the deep inspection policy
    3. 4.3 Data wrangling with IBM Spectrum Discover
  8. Appendix A. Installing and setting up IBM Spectrum Discover
    1. A.1 Free 90-day trial download
    2. A.2 Creating Data Source Connections
    3. A.3 LDAP/Active directory
    4. A.4 Backing up IBM Spectrum Discover
  9. Related publications
    1. Online resources
    2. Help from IBM
  10. Back cover

Product information

  • Title: IBM Spectrum Discover: Metadata Management for Deep Insight of Unstructured Storage
  • Author(s): Joseph Dain, Norman Bogard, Isom Crawford Jr., Mathias Defiebre, Larry Coyne
  • Release date: October 2019
  • Publisher(s): IBM Redbooks
  • ISBN: 9780738457864