Chapter 4. Information
We’ve discussed the metadata repositories for data. In this chapter, we’ll dive into the metadata repositories for information. The difference between the two is that metadata repositories for information are richer in human interpretation, making them serve purposes that demand a higher level of intellectual—not technical—abstraction.
We’ll mainly cover metadata repositories for regulatory purposes, for retention of records and information throughout their lifecycles, and for information security and data protection. Lastly, we’ll examine metadata repositories for business processes.
But first, let’s look into what metadata for information is all about.
Why metadata repositories for information?
There are two reasons why metadata repositories for Information exist, namely data and knowledge:
-
Data interpreted by humans becomes information
-
Information also exceeds data and contains condensed knowledge
And consequently, this information is managed through metadata repositories designed for information.
Data, when interpreted by humans, transforms into information. The primary reason metadata repositories for information exist is because data needs to be examined within a broader context than what is sometimes provided by metadata repositories for data alone. These reasons include, for example, information security, data protection, and industry-specific regulations.
Information is also condensed knowledge. The second reason metadata repositories for information exist is to manage knowledge. Knowledge is ultimately stored in Records and Information Management Systems, where it is archived for regulatory purposes. Additionally, the elements within the repositories that handle information security and data protection represent knowledge independent of any direct relationship with data in the IT landscape, as you will see.
Records and Information Management Systems
With Records and Information Management Systems (RIMS) your company obtains the capability to manage the lifecycle of the records and information it produces. A record is a document or set of data that serves as proof of something. This means that not every single document or data produced is considered a record, e.g. draft documents or the temperature logs in meeting rooms are typically not considered records. Study reports from an R&D department, contracts with a supplier, and logs that control factors such as temperature in production facilities all constitute records.The management of records is the assurance that these records will be properly handled throughout their lifecycle, meaning the period in which these records are to be stored within the organization (Figure 4-1).
Imagine you’re an employee in a pharmaceutical company. For instance, your company must be able to defend, in a court of law, that its medicine was not the cause of a patient’s death, despite the patient having used the medicine produced by your company.. In order for your company lawyers to defend the case properly, they need evidence. In this case, the proof is the clinical studies that show no reporting of side effects similar to what caused the patient to die, which led the patient’s family to sue your company.
The FDA mandates that the pharmaceutical sector retains specific types of data for the life of a product plus 35 years, aligning with the typical lifespan of a patient even after the product is no longer on the market.Consider the length of time involved—you have to store records and information for that long! That is what the RIMS does and it requires thorough processes to uphold.
Note
In general, heavily regulated industries like Pharma, petrochemicals, and finance tend to have more finely structured and maintained RIMSs than more loosely regulated businesses such as tourism and hospitality. This distinction arises from the stringent regulations imposed on these sectors, where authorities require strict compliance for what must be provable at a later point in time.
Records and Information Management is a field that is structured by international standards. Unlike the management of many other metadata repositories, there is clear guidance on how to manage RIMS. Within ISO, ISO Technical Committee 46 Subcommittee 11 defines the standards of records and information management, with ISO 15489 as a central standard. Another standard is ISO 9001, as well as FDA Chapter 21 part 11.1
The Organizational Aspect of RIMS
Records and information management departments work to ensure records in various ways, depending on their position within the lifecycle. As depicted in Figure 4-2 all records must be mapped throughout the company, by the Records and Information Management department. These records must be under control, meaning they should be identified, assigned ownership, and undergo proper retention procedures. Furthermore, records may be managed directly by the Records and Information Management department at the end of the lifecycle (Figure 4-2).
You should remember that RIMS do not necessarily directly reflect the IT landscape at the metadata level. This is because RIMS is used to manage the lifecycle of data throughout that entire lifecycle. Consider this: would you keep an application running for the entire lifecycle of the data it contains, spanning, for instance, 65 years? In many cases, this corresponds to the product’s lifespan plus an additional 35 years. Of course, that will not happen, so, at a certain point, that data will be transferred from the application from where it sits, to a storage solution. Figure 4-2 depicts the mapping of records throughout the entire life cycle, ultimately managed by the Records and Information Management department.
If your company faces a lawsuit, it’s the Records and Information Management department that issues a legal hold. A legal hold means that you identify all records relevant to the lawsuit and ”freeze” them, implying that normal actions in the course of the information lifecycle are put on hold, such as sharing records between departments or deleting them according to their retention period. The records are proof of actions, and when needed they must be kept safe.
A RIMS is likely to group records in terms of their confidentiality. Confidentiality denotes the degree of secrecy associated with a particular record, determining the number of employees authorized to access it.
To provide a more in-depth understanding of the content within the RIMS, let’s continue with the pharma example, illustrating the types of records it can contain:
Physical or digital documents:
-
Employee Contracts
-
Legal Agreements
-
Research Studies
-
Strategy documentation
-
Memos
-
And more
Categorized data, grouped in bigger chunks:
-
Clinical data
-
Financial data
-
Lab results
-
Building monitoring
-
And more
RIMS as a Metadata Repository
RIMS represents records, physical and digital documents as well as data grouped in larger chunks (Figure 4-3).
Keep in mind that RIMS doesn’t only depict the IT landscape, but also an analog reality of records stored in archives. Moreover, it not only portrays the company’s IT landscape but also encompasses dedicated long-term data storage solutions designed specifically for records and information management. This is because, in the course of the information lifecycle, data grouped as records may be moved from the running IT landscape into long term storage. Here, access is expensive but storage is cheap—and given the fact that records are kept for regulatory compliance, the frequency of their usage is low. Records are only consulted in relation to lawsuits or during inspections. You may also have scenarios where data is printed into physical documents because their long term storage is cheapest in physical form (Figure 4-4).
Tip
Data can be categorized into ‘hot’ and ‘cold’ based on its frequency of use. Consider Records and Information Management as the overseeing process until the data reaches a point of becoming ‘ice cold. To find out more about this topic, check out Figure 6-9 in Fundamentals of Data Engineering by Joe Reis and Matt Housley.
Main driver for RIMS and vendors
RIMS is implemented for regulatory purposes that are both industry generic and specific. The overarching purpose is retention and legal hold, with industry-specific regulations determining the duration for which data must be retained (Figure 4-5).
Some of the vendors in this category are Castlepoint Systems, Granicus, OpenKM
Information Security Management System
Today, the discipline of Information Security Management System (ISMS) is closely associated with the ISO/IEC 27001:2022 standard, which pertains to information security, cybersecurity, and privacy protection.
The ISO 27001 standard—and the connected standards, altogether known as the 27000-series—deliver a framework for information security. It’s essential to recognize that information security extends beyond the confines of the IT landscape; it encompasses more than just cyber security. Information security focuses on three types of assets, with assets meaning anything of value to the company. These are:
- Intangible assets
-
These assets encompass intellectual property, insider knowledge, and even rumors.
- IT Landscape assets
-
Assets in the IT landscape encompass everything from hardware to software, including on-premise server rooms, cables, laptops, applications, and more.
- Tangible assets
-
Tangible assets include individuals who possess extraordinary expert knowledge or hold a special status within the company, either as high-ranking employees or public figures.
At the core of the ISO 27000-series is the creation of an ISMS, or Information Security Management System. This system conducts ongoing risk assessments of identified assets based on the following criteria:
- Threats
-
These are simply the potential threats that a company faces towards their assets.
- Vulnerabilities
-
How likely these assets are to be hit by the threats.
- Impact
-
Order of magnitude refers to the potential size of the damage if a threat were to materialize towards an asset.
- Mitigation
-
The measures that are put in place to prevent the threat.
The organizational aspect of ISMS
ISMS is overseen by a Chief Information Security Officer (CISO), who may have a team performing daily operations. Managing the Information Security department is an ongoing process, defined in the ISO 27000-series as:
-
Plan
-
Do
-
Check
-
Act
These steps initiate the ISMS by establishing it (Plan), implementing it (Do), monitoring it (Check), and maintaining it (Act). This cycle is then repeated annually to progressively reduce risk throughout the organization.
ISMS as a metadata repository
ISMS is a repository that lists assets and the risks associated with them, as well as the mitigating actions to avoid those risks. As such, the ISMS is a repository that works on the basis of classifying data in terms of its confidentiality. Essentially, the ISMS is a tool that helps companies evaluate how serious it is for a company to lose control of its various information.
In the context of this book, the focal point of the ISMS isn’t solely the mitigation efforts led by the CISO. While mitigation is the end result, the key lies in how the CISO acquires the ability to mitigate risks in the first place. This critical aspect of the ISMS is known as the asset inventory, and that is what I’d like you to focus on. The asset inventory is simply the list of assets, collected by the CISO and their team.
The asset inventory will list IT assets in the IT Landscape as well as tangible and intangible assets. All the risk assessments and mitigations performed by the CISO are performed up against the asset inventory. Figure 4-6 illustrates a high-level diagram of an ISMS.
One specific element inherent to the asset inventory is the concept of a Risk Owner. The Risk Owner is a person in the organization that has been given ownership of a certain risk.The concept of a risk owner is often challenging to manage and is, therefore, translated into a more tangible form through the identification of specific assets. Accordingly, risk owners become asset owners.
At this point, let’s delve into the contents of the ISMS asset inventory, illustrated in Figure 4-6.. Intangible assets include intellectual assets, rumors, and more. IT landscape assets comprise infrastructure, applications, machinery, laptops, and phones. Tangible assets encompass persons, objects, and buildings.
What happens if the CISO creates the asset inventory, without taking into account other metadata repositories for the IT landscape? At the very best, a lot of time is wasted reproducing what should already exist. At its worst—and this is the most likely—an alternative depiction of the IT landscape has been created, resulting in uncertainty of what metadata repositories are correct.
Tip
How to align metadata repositories is discussed in Chapter 11.
You should populate the asset inventory in the ISMS, especially when dealing with assets related to the IT landscape, by referencing the metadata repositories for data.
You should reflect on the entries in those assets and interpret the data in the context of information security. Then you get the information that should be grouped as assets in the asset inventory, at a slightly higher level of abstraction than the assets for data. This an empirical approach to building your ISMS, and we will discuss this approach more in depth in Chapter 8.
Warning
In most companies, the asset inventory is an information security threat in itself. This is because it relies on an imprecise, non-empirical asset inventory that does not reflect reality. It’s the result of the unwillingness to take ownership from the business, resulting in too little dialogue with the CISO. But this is not the fault of the CISO of the business. And you can change this for the better. I’ll discuss this further in Chapter 6, and give you concrete advice on how to circumvent this situation.
Main driver for the ISMS and vendors
The ISMS is of regulatory nature, in the sense that the ISMS is mandatory to implement and manage, in order to comply with information security standards, in particular the ISO 27000 series (Figure 4-7). However, companies can choose not to follow strict information security practices, which will lead to fewer business opportunities, but not fines, as in the case of the RIMS example discussed above.
Some of the vendors in this category are OneTrust, ComplyCloud,
General Data Protection Repository
The General Data Protection Regulation (GDPR) came into effect in 2018 and pioneered modern privacy regulations. GDPR created a European standard that serves as inspiration for similar regulations around the world—the most known is the California Consumer Privacy Act. Just like in the case of Information Security, Data Protection exceeds the IT landscape, in the sense that it also deals with information stored on physical media.
Note
For the sake of simplicity, I will use the GDPR-terminology in the following discussions. You can easily translate it to similar standards.
The heart of data protection is to protect everyone’s private life—that means all of us. As technology becomes more and more ambient in our lives, it contains more precise data about us. And to prevent your data from being used against your will, your data needs to be protected. Hence the need for GDPR and all other similar regulations.
You protect data first by analyzing how it is processed, and then by evaluating the impact of the risks of that process. And you do that in a Data Protection Impact Assessment (DPIA). The DPIA must contain a description of data that contains (a.o.):
-
a systematic description of the envisaged processing operations and the purposes of the processing
-
an assessment of the necessity and proportionality of the processing operations in relation to the purposes
I don’t want you to focus on the actual activity of data protection, it is not the purpose of this book. Instead, take a closer look at the phrasing in the first bullet, it says a description of the envisaged processing operations. And there you have it: To protect data, you need to map how data is processed. And that map is built into every single DPIA. That makes the collection of DPIAs a metadata repository, containing a process map of your company. I call this repository a Data Protection Repository.
And this is where the fundamentals of metadata come into play because what authority does the process map in the DPR actually have? How is it aligned with other metadata repositories—if at all? And if these repositories are not aligned, where is the truth?
Note
You should know that the Data Protection Registry is not an established concept, but my concept. A Data Protection Repository (DPR) stores Data Protection Impact Assessments (DPIAs).
The DPIAs are used to answer a Data Subject Access Request (DSAR). The DSAR is your right to know what data a given company or organization has about you, as well as to correct the data, understand how it is processed and delete the data.
Organizational aspect of DPR
Data Protection in a company is performed by a Data Protection Officer (DPO), and the DPO is typically a lawyer. The DPO builds the DPR by examining the data processing through the company and listing those in DPIAs. Typically, this is done early when new IT projects are launched. However, when GDPR was first launched, the assessment had to be done towards the existing landscape.
When a citizen or a customer makes a DSAR towards a company, it lands with the DPO, who has 28 days to answer the DSAR. The DPO has to:
-
register the DSAR
-
verify identity of the Subject performing the DSAR
-
understand DSAR
-
match the DSAR with data processing (by consulting DPIA’s in the DPR)
-
collect the data
-
Inform/erase/correct/provide the data
I don’t want you to think deeper about these actions. Instead I want you to focus on the fact that the foundation for performing them may be wrong. I have personally witnessed DSARs being performed and then seeing the same kind of unwanted, harmful data processing being repeated after that data subjects had been informed that it wouldn’t be repeated. This is very common in a big enterprise context. And it’s because the DPR is not correct: It does not really reflect the IT landscape. Not because the DPO has performed a sloppy job, but because the task is too immense and the changes to the IT landscape are so many, that a manually maintained PDR cannot keep up. The typical reaction to this kind of problem is that the DPO wants to work harder, make more confrontational data governance, and perform intrusive hard-talks with business departments to improve the data quality in the DPR. And I can guarantee you that it will not work.
Instead, the DPR needs to coordinate the other metadata repositories, contribute to them and benefit from them. I will discuss this in chapter 11, and you can browse FIGURE to use specifically how processing of data is depicted at various levels of depth in the EAM, DMDB, DC, ISMS and how the DPR plays into this powerful constellation.
The DPR as a Metadata Repository
A Data Protection Repository is an inventory that lists data and the way data is processed in a company. It focuses on the sensitivity of data, meaning the degree to which data is personal. The DPR first performs DPIAs of the IT landscape and on physical media. It is then subsequently used to perform DSARs. You can see a high level view of a Data Protection Repository in Figure 4-8.
Note
An alternative to DPRs is the Privacy Information Management System (PIMS) where the user is capable of determining themselves how their data should be used. This can be a sort of DSAR-as-a-service. An example of such software is MineOS
In the context of this book, it is neither the outcome of the DPIA or the DSAR that is important. Instead, It’s the fact that the DPIA constructs a process landscape with data in it. That process landscape may match, or more likely not match, other process maps in the organization. In other metadata repositories you will find other process landscapes—subsequently, if they don’t match the one defined in the DPR, your company has multiple, contrasting process landscapes at play. This will cause great confusion until you address it. I advise on this in chapter 6.
Main driver for the DPR and a comment about vendors
The purpose for implementing DPRs is regulation focused on data protection around the world, such as GDPR and CCPA (Figure 4-9).
You will not find dedicated vendors for this section. Most ISMS will contain a data protection component that helps you manage regulations such as GDPR. However, in many organizations this is simply done with spreadsheets and documents.
Business Process Management Software
You use Business Process Management Software (BPMS) to formalize business processes. Business processes describe how your company carries out its tasks. BPMS allow for business processes to be mined, managed, and automated. Mined in the sense that they are discovered and made understandable, managed in the sense that you on that premise can get them under control and automated simply means that the BPMS allows you to automate manual tasks strategically, altogether contributing to a more smooth and fast execution of the business processes.
The BPMS must visualize business processes so that it is easier to discover and understand them. Furthermore, the BPMS has no direct link to the IT Landscape per se, it is only when a certain process is carried out with the help of IT that the connection exists—which is almost always the case. However, the BPMS is intended to depict processes, not the IT landscape (Figure 4-10).
Business process management is not an organizationally formalized activity like records management, data protection and information security.
However, you can set up structured team activities to perform business process management by following general guidance that can be found in ISO 9000. The BPMS can also be part of a larger, more complex system. The Quality Management System (QMS) is typically run by a QA department. I’ll discuss BPMS as part of QMS in Chapter ?:.
Business processes can be modeled using a standard notation, such as Business Process Model and Notation (BPMN) published by the Object Management Group. You can see an example of a business process model in Figure 4-11.
You should consider the level of granularity in business processes. Business processes depict very concretely the various actions performed. And these processes are automatically, or semi automatically generated as visualizations by the BPMS. From that point on—when you have the processes mapped in your BPMS—you can use the BPMS to:
-
perform major IT transformations, such as ERP migration
-
Identify weak processes and make them stronger
-
Identify slow processes and make them faster
-
automate processes using AI, RPA etc.
-
and more!
You should not think of business processes as the value stream or value chain of your company.Value streams or value chains provide a high-level overview of a company’s activities, without delving into the detailed layers of actions carried out by employees, machines, or robots – a level of detail that is precisely captured by business processes. Accordingly, business processes are both carried out inside the IT landscape, and as purely human procesess, as depicted in Figure 4-12.
Main driver for the BPMS
The BPMS is mainly a tool used in the context of IT operations (Figure 4-6). It can streamline the application landscape, facilitate data migration from systems that are being phased out to newly introduced systems, thereby enticing with the numerous benefits discussed above (Figure 4-13).
Some of the vendors in this category are SAP Signavio,
The Ideal Setup of Information Metadata Repositories
Similar to our exploration in Chapter 3, in Chapter 4, we have again covered a set of metadata repositories, specifically focusing on metadata repositories for information. Now, let’s consolidate them. Figure 4-14 illustrates all the metadata repositories for information in a diagram, with the BPMS at the top and the RIMS at the bottom.
Let’s quickly run through the repositories in the diagram.
The BPMS depicts business processes. Business processes are deep, detailed depictions of the processes carried out by humans, machines and robots in your company. The BPMS is capable of performing a rationalization of the IT landscape, transforming it and making it stronger and faster. Business processes are visualized in the BPMS.
The DPR describes how personal data is processed. It’s the repository put in place to handle regulations such as GDPR and CCPA that obliges companies to transparently declare how they process personal data and how they manage requests to change this processing. The DPR is managed by the DPO.
The ISMS manages the information security of the company, both in regards to cybersecurity, but also the security of analog, tangible assets and intangible assets. All information security risks must be listed, evaluated in terms of severity and then mitigated. The ISMS is managed by the CISO.
The RIMS allows you to manage the lifecycle of your company records. It depicts all records, physical and digital,that a company produces, with a capability of issuing a legal hold on the records that are subject to be used in a lawsuit. The RIMS is managed by a Records- and Information Management department.
Note
All metadata repositories for information interact less directly with the IT landscape, by adding a layer of interpretation of it that metadata repositories for data do not need. Also, metadata repositories for information look outside the IT landscape, such asdescribing conversations (BPMS), rumors (ISMS) and physical paper documents (DPR + RIMS)
In Figure 4-15. you can see an example of concrete technologies that deliver a BPMS, DPR, ISMS and RIMS.
Note
Note that Figure 4-14 does not portray a likely reality. For a more accurate depiction, delve into Chapter 6 and study the figures presented within it.
Overlapping Peripheral Capabilities for Information Metadata Repositories
Metadata repositories for Information overlap more often than not.
Technologies performing information security management will, for example, offer a data protection component and vice versa, since they share methodology to protect data and information. Likewise, a technologically refined RIMS can push its peripheral capabilities towards information security and data protection as it handles confidentiality and sensitivity as a natural continuation of its overall purpose of assessing and managing retention of records and information.
The BPMS is different in nature than the other repositories discussed in this chapter, as it does not serve a regulatory purpose, and should not primarily be used for such purposes. However, the BPMS depicts processes, and as such, it overlaps with the DPR.
Parts of the purposes of the RIMS can be performed through the DPR, as the latter handles retention specifically for sensitive data. Another purpose of the RIMS can be handled by the ISMS, which is assessing confidentiality. The overlapping peripheral capabilities can be seen in Figure 4-16.
Summary
In this chapter we have looked at metadata repositories for information. Here are the takeaways:
-
Metadata repositories for information interact less directly with your IT landscape by adding a layer of interpretation to it.
-
Metadata repositories for information looks outside of the IT landscape towards physical media and abstract ideas
-
The majority of metadata repositories for information are motivated by regulations.
-
A BPMS provides you with an overview of your business processes
-
The DPR is the repository describing how personal data is processed
-
In the ISMS is an asset inventory listing all assets that are confidential
-
The RIMS manages the life cycle of records and information.
-
Legal holds can be issued by the RIMS
-
A DPR and ISMS are often found in the same technology
-
The DPR, ISMS and RIMS overlap
-
Like the DPR, the RIMS describe sensitivity and handles retention
-
As with the ISMS, the RIMS assigns levels of confidentiality
-
In the current chapter, we explored instances of metadata repositories designed for managing information. There are numerous other repositories that exist.
In the next chapter, we will take a closer look at metadata repositories for knowledge.
Get Fundamentals of Metadata Management now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.