CHAPTER 5Data Lifecycle and Lineage

INTRODUCTION

Data in business has a definite life span and follows a defined lifecycle. The data life cycle (DLC) is the entire period of time that data exists in the organization. This lifecycle encompasses all the stages that the data goes through. The DLC encompasses the end-to-end effective and efficient management of business data with appropriate policies and procedures throughout its lifecycle. A thorough understanding and analysis of the DLC is needed for the following reasons:

  1. Data is a reflection of reality in business. Just like a business changes as it evolves, data also changes. The data status changes at different stages of the DLC. A good understanding of the data is essential, and DLC provides that understanding.
  2. Know the key stakeholders and what stages in the data lifecycle they add value to data.
  3. Understand the risks involved in data management from origination to archival and purging.

In this regard, the data lifecycle can be seen from two main views: business-enabled DLC and IT-enabled DLC.

BUSINESS-ENABLED DLC STAGES

The DLC is the entire period of time that data exists in the IT landscape of the organization. Most data from a business perspective goes through eight key stages: origination, capture, validation, processing, distribution, aggregation, interpretation, and consumption. The following section discusses the eight key business-enabled DLC stages.

Origination

In most cases, business data originates in an ...

Get Data Quality now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.