529
Chapter 31
Integrity Management,
Data Provenance, and
Cloud Services
31.1 Overview
In this chapter, we will discuss integrity management for cloud services. Integrity
includes several aspects. In the database world, integrity includes concurrency con-
trol and recovery as well as enforcing integrity constraints. For example, when mul-
tiple transactions are executed at the same time, the consistency of the data has to
be ensured. When a transaction aborts, it has to be ensured that the database is
recovered from the failure into a consistent state. Integrity constraints are rules that
have to be satisfied by the data. e rules include “salary value has to be positive”
and “age of an employee cannot decrease over time.” More recently, integrity has
included data quality, data provenance, data currency, real-time processing, and
fault tolerance.
In this chapter, we discuss the aspects of integrity for cloud services as well as
implementing integrity management as cloud services. For example, how do we
ensure the integrity of the data and the processes? How do we ensure that data
quality is maintained? e organization of this chapter is as follows. In Section
31.2, we discuss the aspects of integrity, data quality, and provenance. In particular,
integrity aspects will be discussed in Section 31.2.1. Data quality and provenance
will be discussed in Section 31.2.2. Detecting security threats and misuse with
data provenance will be discussed in Section 31.2.3. Cloud services and integrity
530 ◾  Developing and Securing the Cloud
management are discussed in Section 31.3. In particular, data integrity and prov-
enance as cloud services will be discussed in Section 31.3.1. Data integrity for cloud
services will be discussed in Section 31.3.2. is chapter concludes with Section
31.4. e aspects of integrity are illustrated in Figure 31.1.
31.2 Integrity, Data Quality, and Provenance
31.2.1 Aspects of Integrity
ere are many aspects of integrity. For example, concurrency control, recovery,
data accuracy, meeting real-time constraints, data accuracy, data quality, data
provenance, fault tolerance, and integrity constraint enforcement are all aspects
of integrity management. is is illustrated in Figure 31.1. In this section, we will
examine each aspect of integrity.
Concurrency control: In data management, concurrency control is about trans-
actions that are executed at the same time and ensuring the consistency of the data.
erefore, transactions have to obtain locks or utilize time stamps to ensure that
the data are left in a consistent state when multiple transactions attempt to access
the data at the same time. Extensive research has been carried out on concurrency-
control techniques for transaction management both in centralized and in distrib-
uted environments [BERN87].
Data recovery: When transactions abort before they complete execution, the
database should be recovered to a consistent state such as its state before the trans-
action started execution. Several recovery techniques have been proposed to ensure
the consistency of the data.
Data authenticity: When the data are delivered to the user, their authenticity has
to be ensured. at is, the user should get the accurate data and the data should not
be tampered with. We have conducted research on ensuring authenticity of XML
data during third-party publishing [BERT04].
Data quality
and
provenance
Concurrency
control
and
recovery
Integrity
of the
agents
Integrity
of the
websites
Aspects
of
integrity
Figure 31.1 Aspects of integrity.

Get Developing and Securing the Cloud now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.