Skip to Content
Cataloging Unstructured Data in IBM Watson Knowledge Catalog with IBM Spectrum Discover
book

Cataloging Unstructured Data in IBM Watson Knowledge Catalog with IBM Spectrum Discover

by Joseph Dain, Abeer Selim, Anil Patil, Christopher Vollmar, Flavio de Rezende, Frank Greco, Frank N. Lee, Isom Crawford Jr., Ivaylo B. Bozhinov, Joanna Wong, Joshua Blumert, Larry Coyne
August 2020
Intermediate to advanced
108 pages
2h 21m
English
IBM Redbooks

Overview

This IBM® Redpaper publication explains how IBM Spectrum® Discover integrates with the IBM Watson® Knowledge Catalog (WKC) component of IBM Cloud® Pak for Data (IBM CP4D) to make the enriched catalog content in IBM Spectrum Discover along with the associated data available in WKC and IBM CP4D. From an end-to-end IBM solution point of view, IBM CP4D and WKC provide state-of-the-art data governance, collaboration, and artificial intelligence (AI) and analytics tools, and IBM Spectrum Discover complements these features by adding support for unstructured data on large-scale file and object storage systems on premises and in the cloud.

Many organizations face challenges to manage unstructured data. Some challenges that companies face include:


  • Pinpointing and activating relevant data for large-scale analytics, machine learning (ML) and deep learning (DL) workloads.
  • Lacking the fine-grained visibility that is needed to map data to business priorities.
  • Removing redundant, obsolete, and trivial (ROT) data and identifying data that can be moved to a lower-cost storage tier.
  • Identifying and classifying sensitive data as it relates to various compliance mandates, such as the General Data Privacy Regulation (GDPR), Payment Card Industry Data Security Standards (PCI-DSS), and the Health Information Portability and Accountability Act (HIPAA).

This paper describes how IBM Spectrum Discover provides seamless integration of data in IBM Storage with IBM Watson Knowledge Catalog (WKC). Features include:

  • Event-based cataloging and tagging of unstructured data across the enterprise.
  • Automatically inspecting and classifying over 1000 unstructured data types, including genomics and imaging specific file formats.
  • Automatically registering assets with WKC based on IBM Spectrum Discover search and filter criteria, and by using assets in IBM CP4D.
  • Enforcing data governance policies in WKC in IBM CP4D based on insights from IBM Spectrum Discover, and using assets in IBM CP4D.

Several in-depth use cases are used that show examples of healthcare, life sciences, and financial services.

IBM Spectrum Discover integration with WKC enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of data. The integration improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Cognitive Computing with IBM Watson

Cognitive Computing with IBM Watson

Robert High, Tanmay Bakshi
IBM Cloud Pak for Data with IBM Spectrum Scale Container Native

IBM Cloud Pak for Data with IBM Spectrum Scale Container Native

Gero Schmidt, Tara Astigarraga, Paulina Acevedo, JJ Miller, Dessa Simpson, Austen Stewart, Todd Tosseth, Jayson Tsingine, Israel Andres Vizcarra Godinez
IBM Cloud Pak for Data

IBM Cloud Pak for Data

Hemanth Manda, Sriram Srinivasan, Deepak Rangarao
Red Hat OpenShift V4.3 on IBM Power Systems Reference Guide

Red Hat OpenShift V4.3 on IBM Power Systems Reference Guide

Dino Quintero, Daniel Casali, Alain Fisher, Federico Fros, Miguel Gomez Gonzalez, Felix Gonzalez, Paulo Sergio Lemes Queiroz, Sudipto Pal, Bogdan Savu, Richard Wale

Publisher Resources

ISBN: 9780738459028Other