O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Provenance

Book Description

The World Wide Web is now deeply intertwined with our lives, and has become a catalyst for a data deluge, making vast amounts of data available online, at a click of a button. With Web 2.0, users are no longer passive consumers, but active publishers and curators of data. Hence, from science to food manufacturing, from data journalism to personal well-being, from social media to art, there is a strong interest in provenance, a description of what influenced an artifact, a data set, a document, a blog, or any resource on the Web and beyond. Provenance is a crucial piece of information that can help a consumer make a judgment as to whether something can be trusted. Provenance is no longer seen as a curiosity in art circles, but it is regarded as pragmatically, ethically, and methodologically crucial for our day-to-day data manipulation and curation activities on the Web. Following the recent publication of the PROV standard for provenance on the Web, which the two authors actively help shape in the Provenance Working Group at the World Wide Web Consortium, this Synthesis lecture is a hands-on introduction to PROV aimed at Web and linked data professionals. By means of recipes, illustrations, a website at www.provbook.org, and tools, it guides practitioners through a variety of issues related to provenance: how to generate provenance, publish it on the Web, make it discoverable, and how to utilize it. Equipped with this knowledge, practictioners will be in a position to develop novel applications that can bring open-ness, trust, and accountability. Table of Contents: Preface / Acknowledgments / Introduction / A Data Journalism Scenario / The PROV Ontology / Provenance Recipes / Validation, Compliance, Quality, Replay / Provenance Management / Conclusion / Bibliography / Authors' Biographies / Index

Table of Contents

  1. Cover
  2. Half title
  3. Copyright
  4. Title
  5. Abstract
  6. Dedication
  7. Contents
  8. Preface
  9. Acknowledgments
  10. 1 Introduction
    1. 1.1 The Case for Provenance
    2. 1.2 A Definition of Provenance
    3. 1.3 Provenance and the Web Architecture
    4. 1.4 The W3C PROV Standard
    5. 1.5 Online Extensions
  11. 2 A Data Journalism Scenario
    1. 2.1 Scenario: The employment report
      1. 2.1.1 Characters
      2. 2.1.2 Story Creation and Publication
      3. 2.1.3 Crunching Data
      4. 2.1.4 Reusing the Story
    2. 2.2 Provenance Use Cases
      1. 2.2.1 Quality Assessment
      2. 2.2.2 Compliance
      3. 2.2.3 Cataloging
      4. 2.2.4 Replay
    3. 2.3 A Brief Introduction to Expressing Provenance
    4. 2.4 Summary
  12. 3 The PROV Ontology
    1. 3.1 Overview
    2. 3.2 Qualified Relation Patterns
    3. 3.3 Data Flow View
      1. 3.3.1 Entity
      2. 3.3.2 Derivation
      3. 3.3.3 Revision
      4. 3.3.4 Quotation
      5. 3.3.5 Primary Source
    4. 3.4 Process Flow View
      1. 3.4.1 Activity
      2. 3.4.2 Generation
      3. 3.4.3 Usage
      4. 3.4.4 Invalidation
      5. 3.4.5 Start
      6. 3.4.6 End
      7. 3.4.7 Communication
    5. 3.5 Responsibility View
      1. 3.5.1 Agent
      2. 3.5.2 Attribution
      3. 3.5.3 Association
      4. 3.5.4 Delegation
    6. 3.6 Alternates View
      1. 3.6.1 Specialization
      2. 3.6.2 Alternate
    7. 3.7 Bundles
    8. 3.8 Miscellaneous
      1. 3.8.1 Collection and Membership
      2. 3.8.2 Refined Derivation
      3. 3.8.3 Further Properties
    9. 3.9 Ontology Structure
    10. 3.10 Summary
  13. 4 Provenance Recipes
    1. 4.1 Modeling
      1. 4.1.1 Iterative Modeling
      2. 4.1.2 Identify, Identify, Identify!
      3. 4.1.3 From Data Flow to Activities
      4. 4.1.4 Plan for Revisions
      5. 4.1.5 Modeling Update and Other Destructive Activities
      6. 4.1.6 Modeling Message Passing
      7. 4.1.7 Modeling Parameters
      8. 4.1.8 Introduce the Environment
      9. 4.1.9 Modeling Sub-activities
    2. 4.2 Organizing
      1. 4.2.1 Stitch Provenance Together
      2. 4.2.2 Use Content-Negotiaton when Exposing Provenance
      3. 4.2.3 Bundle Up and Provide Attribution to Provenance
      4. 4.2.4 Embedding Provenance in HTML
      5. 4.2.5 Embedding Provenance in Other Media
      6. 4.2.6 When all Else Fails, add Provenance to HTTP Headers
      7. 4.2.7 Embedding Provenance in Bundles: Self-Referential Bundles
      8. 4.2.8 When Displaying Provenance, Adopt Conventional Layout
    3. 4.3 Collecting
      1. 4.3.1 Use Structured Logs to Collect Provenance
      2. 4.3.2 Collect in a Local Form, Expose as PROV
    4. 4.4 Anti-patterns
      1. 4.4.1 Activity but No Derivation
      2. 4.4.2 Association but No Attribution
      3. 4.4.3 Specify Responsibility First, What a Prov:Agent is Will Follow
    5. 4.5 Summary
  14. 5 Validation, Compliance, Quality, Replay
    1. 5.1 Validation Use Cases
    2. 5.2 Principles of Validation
      1. 5.2.1 Events and Their Ordering
      2. 5.2.2 Simultaneous Events
      3. 5.2.3 Nested Intervals and Specialization
      4. 5.2.4 Use Cases Revisited
    3. 5.3 Utilizing Provenance
      1. 5.3.1 Provenance-Based Compliance
      2. 5.3.2 Provenance-Based Quality Assessment
      3. 5.3.3 Provenance-Based Cataloging
      4. 5.3.4 Provenance-Based Replaying
    4. 5.4 Implementation Techniques for Provenance Analysis
      1. 5.4.1 Finding Ancestors
      2. 5.4.2 Deep Traversal
      3. 5.4.3 Pattern Detection for Policy Compliance
      4. 5.4.4 Time Comparison
      5. 5.4.5 Trust-Based Filtering
      6. 5.4.6 Finding External Ancestor Resources
      7. 5.4.7 Replay Technique
    5. 5.5 Summary
  15. 6 Provenance Management
    1. 6.1 Exposing Provenance
      1. 6.1.1 Embedding Provenance in HTML with RDFA
      2. 6.1.2 Provenance Services
    2. 6.2 Provenance Management Tools
      1. 6.2.1 ProvToolbox
      2. 6.2.2 ProvPy
      3. 6.2.3 provconvert and ProvTranslator
      4. 6.2.4 ProvStore
      5. 6.2.5 ProvValidator
      6. 6.2.6 Browser PROV Extractor
      7. 6.2.7 ProvVis: Interactive Visualizations for PROV
    3. 6.3 Provenance Management on www.provbook.org
      1. 6.3.1 Directories
      2. 6.3.2 URI Schemes for Entities, Agents, and Activities
      3. 6.3.3 The PROV Book Ontology
      4. 6.3.4 Data Journalism Provenance
      5. 6.3.5 Exposing Provenance
    4. 6.4 Summary
  16. 7 Conclusion
    1. 7.1 Toward Provenance Self Certification: A Checklist
    2. 7.2 Applying Provenance in the Wild
    3. 7.3 Open Issues
      1. 7.3.1 Provenance Enabling Systems
      2. 7.3.2 Fundamentals of Provenance
      3. 7.3.3 Provenance Analytics
      4. 7.3.4 Securing Provenance
    4. 7.4 Final Words
  17. Bibliography
  18. Authors’ Biographies
  19. Index