O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Principles of Data Integration

Book Description

Principles of Data Integration is the first comprehensive textbook of data integration, covering theoretical principles and implementation issues as well as current challenges raised by the semantic web and cloud computing. The book offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand. Readers will also learn how to build their own algorithms and implement their own data integration application.

Written by three of the most respected experts in the field, this book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts.

This text is an ideal resource for database practitioners in industry, including data warehouse engineers, database system designers, data architects/enterprise architects, database researchers, statisticians, and data analysts; students in data analytics and knowledge discovery; and other data professionals working at the R&D and implementation levels.

  • Offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand
  • Enables you to build your own algorithms and implement your own data integration applications

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Preface
  7. 1. Introduction
    1. 1.1 What Is Data Integration?
    2. 1.2 Why Is It Hard?
    3. 1.3 Data Integration Architectures
    4. 1.4 Outline of the Book
    5. Bibliographic Notes
  8. Part I: Foundational Data Integration Techniques
    1. 2. Manipulating Query Expressions
      1. 2.1 Review of Database Concepts
      2. 2.2 Query Unfolding
      3. 2.3 Query Containment and Equivalence
      4. 2.4 Answering Queries Using Views
      5. Bibliographic Notes
    2. 3. Describing Data Sources
      1. 3.1 Overview and Desiderata
      2. 3.2 Schema Mapping Languages
      3. 3.3 Access-Pattern Limitations
      4. 3.4 Integrity Constraints on the Mediated Schema
      5. 3.5 Answer Completeness
      6. 3.6 Data-Level Heterogeneity
      7. Bibliographic Notes
    3. 4. String Matching
      1. 4.1 Problem Description
      2. 4.2 Similarity Measures
      3. 4.3 Scaling Up String Matching
      4. Bibliographic Notes
    4. 5. Schema Matching and Mapping
      1. 5.1 Problem Definition
      2. 5.2 Challenges of Schema Matching and Mapping
      3. 5.3 Overview of Matching and Mapping Systems
      4. 5.4 Matchers
      5. 5.5 Combining Match Predictions
      6. 5.6 Enforcing Domain Integrity Constraints
      7. 5.7 Match Selector
      8. 5.8 Reusing Previous Matches
      9. 5.9 Many-to-Many Matches
      10. 5.10 From Matches to Mappings
      11. Bibliographic Notes
    5. 6. General Schema Manipulation Operators
      1. 6.1 Model Management Operators
      2. 6.2 Merge
      3. 6.3 ModelGen
      4. 6.4 Invert
      5. 6.5 Toward Model Management Systems
      6. 6.5 Bibliographic Notes
    6. 7. Data Matching
      1. 7.1 Problem Definition
      2. 7.2 Rule-Based Matching
      3. 7.3 Learning-Based Matching
      4. 7.4 Matching by Clustering
      5. 7.5 Probabilistic Approaches to Data Matching
      6. 7.6 Collective Matching
      7. 7.7 Scaling Up Data Matching
      8. Bibliographic Notes
    7. 8. Query Processing
      1. 8.1 Background: DBMS Query Processing
      2. 8.2 Background: Distributed Query Processing
      3. 8.3 Query Processing for Data Integration
      4. 8.4 Generating Initial Query Plans
      5. 8.5 Query Execution for Internet Data
      6. 8.6 Overview of Adaptive Query Processing
      7. 8.7 Event-Driven Adaptivity
      8. 8.8 Performance-Driven Adaptivity
      9. Bibliographic Notes
    8. 9. Wrappers
      1. 9.1 Introduction
      2. 9.2 Manual Wrapper Construction
      3. 9.3 Learning-Based Wrapper Construction
      4. 9.4 Wrapper Learning without Schema
      5. 9.5 Interactive Wrapper Construction
      6. Bibliographic Notes
    9. 10. Data Warehousing and Caching
      1. 10.1 Data Warehousing
      2. 10.2 Data Exchange: Declarative Warehousing
      3. 10.3 Caching and Partial Materialization
      4. 10.4 Direct Analysis of Local, External Data
      5. Bibliographic Notes
  9. Part II: Integration with Extended Data Representations
    1. 11. XML
      1. 11.1 Data Model
      2. 11.2 XML Structural and Schema Definitions
      3. 11.3 Query Language
      4. 11.4 Query Processing for XML
      5. 11.5 Schema Mapping for XML
      6. Bibliographic Notes
    2. 12. Ontologies and Knowledge Representation
      1. 12.1 Example: Using KR in Data Integration
      2. 12.2 Description Logics
      3. 12.3 The Semantic Web
      4. Bibliographic Notes
    3. 13. Incorporating Uncertainty into Data Integration
      1. 13.1 Representing Uncertainty
      2. 13.2 Modeling Uncertain Schema Mappings
      3. 13.3 Uncertainty and Data Provenance
      4. Bibliographic Notes
    4. 14. Data Provenance
      1. 14.1 The Two Views of Provenance
      2. 14.2 Applications of Data Provenance
      3. 14.3 Provenance Semirings
      4. 14.4 Storing Provenance
      5. Bibliographic Notes
  10. Part III: Novel Integration Architectures
    1. 15. Data Integration on the Web
      1. 15.1 What Can We Do with Web Data?
      2. 15.2 The Deep Web
      3. 15.3 Topical Portals
      4. 15.4 Lightweight Combination of Web Data
      5. 15.5 Pay-as-You-Go Data Management
      6. Bibliographic Notes
    2. 16. Keyword Search
      1. 16.1 Keyword Search over Structured Data
      2. 16.2 Computing Ranked Results
      3. 16.3 Keyword Search for Data Integration
      4. Bibliographic Notes
    3. 17. Peer-to-Peer Integration
      1. 17.1 Peers and Mappings
      2. 17.2 Semantics of Mappings
      3. 17.3 Complexity of Query Answering in PDMS
      4. 17.4 Query Reformulation Algorithm
      5. 17.5 Composing Mappings
      6. 17.6 Peer Data Management with Looser Mappings
      7. Bibliographic Notes
    4. 18. Integration in Support of Collaboration
      1. 18.1 What Makes Collaboration Different
      2. 18.2 Processing Corrections and Feedback
      3. 18.3 Collaborative Annotation and Presentation
      4. 18.4 Dynamic Data: Collaborative Data Sharing
      5. Bibliographic Notes
    5. 19. The Future of Data Integration
      1. 19.1 Uncertainty, Provenance, and Cleaning
      2. 19.2 Crowdsourcing and “Human Computing"
      3. 19.3 Building Large-Scale Structured Web Databases
      4. 19.4 Lightweight Integration
      5. 19.5 Visualizing Integrated Data
      6. 19.6 Integrating Social Media
      7. 19.7 Cluster- and Cloud-Based Parallel Processing and Caching
  11. Bibliography
  12. Index