Data Stewardship for Open Science

Book description

Data Stewardship for Open Science: Implementing FAIR Principles has been written with the intention of making scientists, funders, and innovators in all disciplines and stages of their professional activities broadly aware of the need, complexity, and challenges associated with open science, modern science communication, and data stewardship. The FAIR principles are used as a guide throughout the text, and this book should leave experimentalists consciously incompetent about data stewardship and motivated to respect data stewards as representatives of a new profession, while possibly motivating others to consider a career in the field.

The ebook, avalable for no additional cost when you buy the paperback, will be updated every 6 months on average (providing that significant updates are needed or avaialble). Readers will have the opportunity to contribute material towards these updates, and to develop their own data management plans, via the free Data Stewardship Wizard.

Table of contents

  1. Cover
  2. Halftitle
  3. Title Page
  4. Copyright Page
  5. Table of Contents
    1. List of Figures
    2. Preface
    3. Author
    4. 1 Introduction
      1. 1.1 Data Stewardship for Open Science
      2. 1.2 Introduction by the Author
      3. 1.3 Definitions and Context
      4. 1.4 The Lines of Thinking
      5. 1.5 The Basics of Good Data Stewardship
    5. 2 Data Cycle Step 1: Design of Experiment
      1. 2.1 Is There Pre-Existing Data?
      2. 2.2 Will you use Pre-existing data (Including Opedas)?
      3. 2.3 Will You use Reference Data?
      4. 2.4 Where is it Available?
      5. 2.5 What Format?
      6. 2.6 Is the Data Resource Versioned?
      7. 2.7 Will You Be Using Any Existing (Non-Reference) Datasets?
      8. 2.8 Will Owners of that Data Work With You on this Study?
      9. 2.9 Is Reconsent Needed?
      10. 2.10 Do You Need to Harmonize Different Sources of OPEDAS?
      11. 2.11 What/How/Who Will Integrate Existing Data?
        1. 2.11.1 Will you need to add data from the literature?
        2. 2.11.2 Will you need text-mining?
        3. 2.11.3 Do you need to integrate or link to a different type of data?
      12. 2.12 Will Reference Data Be Created?
        1. 2.12.1 What will the IP be like?
        2. 2.12.2 How will you maintain it?
      13. 2.13 Will You Be Storing Physical Samples?
        1. 2.13.1 Where will information about samples be stored?
        2. 2.13.2 Will your data and samples be added to an existing collection? What’s up?
      14. 2.14 Will You Be Collecting Experimental Data?
      15. 2.15 Are there Data Formatting Considerations?
        1. 2.15.1 What is the volume of the anticipated dataset?
        2. 2.15.2 What data formats do the instruments yield?
        3. 2.15.3 What preprocessing is needed?
          1. 2.15.3.1 Are there ready-to-use workflows?
          2. 2.15.3.2 What compute is needed?
        4. 2.15.4 Will you create images?
      16. 2.16 Are there Potential Issues Regarding Data Ownership and Access Control?
        1. 2.16.1 Who needs access?
        2. 2.16.2 What level of data protection is needed?
          1. 2.16.2.1 Is the collected data privacy sensitive?
          2. 2.16.2.2 Is your institutes’ security sufficient for storage?
    6. 3 Data Cycle Step 2: Data Design and Planning
      1. 3.1 Are You Using Data Types Used by Others, Too?
        1. 3.1.1 What format(s) will you use for the data?
      2. 3.2 Will you be Using New Types of Data?
        1. 3.2.1 Are there suitable terminology systems?
        2. 3.2.2 Do you need to develop new terminology systems?
        3. 3.2.3 How will you describe your data format?
      3. 3.3 How will you be Storing Metadata?
        1. 3.3.1 Did you consider how to monitor data integrity?
        2. 3.3.2 Will you store licenses with the data?
      4. 3.4 Method Stewardship
        1. 3.4.1 Is all software for steps in your workflow properly maintained?
      5. 3.5 Storage (How Will You Store Your Data?)
        1. 3.5.1 Storage capacity planning
          1. 3.5.1.1 Will you be archiving data for long-term preservation?
          2. 3.5.1.2 Can the original data be regenerated?
          3. 3.5.1.3 If your data changes over time, how frequently do you do backups?
        2. 3.5.2 When is the data archived?
        3. 3.5.3 Re-use considerations: Will the archive need to be online?
        4. 3.5.4 Will workflows need to be run locally on the stored data?
        5. 3.5.4.1 Is there budget to enable supported reuse by others (collaboration/coauthorship)?
        6. 3.5.5 How long does the data need to be kept?
        7. 3.5.6 Will the data be understandable after a long time?
        8. 3.5.7 How frequently will you archive data?
      6. 3.6 Is There (Critical) Software in the Workspace?
      7. 3.7 Do You Need The Storage Close To Computer Capacity?
      8. 3.8 Compute Capacity Planning
        1. 3.8.1 Determine needs in memory/CPU/IO ratios
    7. 4 Data Cycle Step 3: Data Capture (Equipment Phase)
      1. 4.1 Where does the Data Come From? Who will Need the Data?
      2. 4.2 Capacity and Harmonisation Planning
        1. 4.2.1 Will you use non-equipment data capture (i.e., questionnaires, free text)?
          1. 4.2.1.1 Case report forms?
    8. 5 Data cycle step 4: Data processing and curation
      1. 5.1 Workflow development
        1. 5.1.1 Will you be running a bulk/routine workflow?
      2. 5.2 Choose the workflow engine
        1. 5.2.1 Who are the customers that use your workflows?
        2. 5.2.2 Can workflows be run remotely?
        3. 5.2.3 Can workflow decay be managed?
        4. 5.2.4 Verify workflows repeatedly on the same data
      3. 5.3 Workflow running
      4. 5.4 Tools and data directory (for the experiment)
    9. 6 Data cycle step 5: Data linking and ‘integration’
      1. 6.1 What approach will you use for data integration?
      2. 6.2 Will you make your output semantically interoperable?
      3. 6.3 Will you use a workflow or tools for database access or conversion?
    10. 7 Data cycle step 6: Data analysis, interpretation
      1. 7.1 Will you use static or dynamic (systems) models?
      2. 7.2 Machine-learning?
      3. 7.3 Will you be building kinetic models?
      4. 7.4 How will you make sure the analysis is best suited to answer your biological question?
      5. 7.5 How will you ensure reproducibility?
        1. 7.5.1 Will this step need significant storage and compute capacity?
        2. 7.5.2 How are you going to interpret your data?
        3. 7.5.3 How will you document your interpretation steps?
      6. 7.6 Will you be doing (automated) knowledge-discovery?
    11. 8 Data cycle step 7: Information and insight publishing
      1. 8.1 How much will be open data/access?
      2. 8.2 Who will pay for open access data publishing?
      3. 8.3 Legal issues
        1. 8.3.1 Where to publish?
      4. 8.4 What technical issues are associated with HPR?
        1. 8.4.1 What service will be offered around your data?
        2. 8.4.2 Submit to an existing database?
        3. 8.4.3 Will you run your own access web service for data?
        4. 8.4.4 How and where will you be archiving/cataloguing?
      5. 8.5 Will you still publish if the results are negative?
        1. 8.5.1 Data as a publishable unit?
        2. 8.5.2 Will you publish a narrative?
    12. Bibliography
    13. Index

Product information

  • Title: Data Stewardship for Open Science
  • Author(s): Barend Mons
  • Release date: March 2018
  • Publisher(s): Chapman and Hall/CRC
  • ISBN: 9781315351148