Blockchain-enabled open science framework
A new approach to solving the reproducibility crisis.
Open dissemination of scientific research and data is a prerequisite for solving the reproducibility crisis. The problem of low reproducibility is plaguing all disciplines, but its impact is much worse in preclinical research and development. Further clinical R&D into drug candidates and targets is turning up wasteful because published results can’t be replicated (Freedman 2015, Begley 2012). There are similar problems with clinical research: Drug trials are published with mostly positive data supporting the claims, and negligible negative data (Ioannidis 2015). In this article, the discussion will mostly remain limited to preclinical research outcomes.
The lack of replicability can be attributed to two broad reasons: a lack of consensus on protocols being used in research labs and the level of access to tools and equipment required for performing experiments. There are numerous methods that can be used to solve a problem, and domain-specific researchers often have their own versions of protocols which work optimally in their lab. This creates obvious problems for independent verification of results and claims. Moreover, not every lab has access to high-end equipment. Many researchers have to substitute lower-end techniques which may affect the overall accuracy. To solve these problems and standardize research equipment, we need the scientific community to reach a consensus on specific protocols and also provide generalized methods with sufficient flexibility for minor substitutions.
To get higher returns, investors specializing in the life-science research domain are interested in investing at the preclinical stages. But they often require some level of screening before making an investment. Traditionally, startups looking for investment have to partner with established institutions and go through thorough due diligence. Once those partnerships are established, the investors are more willing to consider making an investment. Currently, there is no independent vetting mechanism that would let biopharma companies demonstrate their ability to translate early research into promising drug candidates.
In this article, I propose making the process of commercializing preclinical research more reproducible and transparent by basing it on a blockchain. This effort will rely on the blockchain for communication to carry out peer reviews and publicly report the results. The program will be discussed thoroughly in a later section. Let us begin by reviewing three major initiatives currently in place to enhance reproducibility.
Reproducibility Project: Cancer Biology (RP:CB)
RP:CB is a collaboration between the Center for Open Science and Science Exchange (under the CEO Elizabeth Iorns) which reexamines the most impactful publications in cancer biology from 2010-2012 and replicate them (Errington 2014). The replication is being conducted by expert members from the Science Exchange network, in accordance with standards from the Center of Open Science. The results of these experiments and RP:CB will be made publicly available through the open-access publisher eLife. The high impact studies under consideration will go through two stages of review, and the results from each stage will be published in a new, innovative format. The first phase will culminate in the creation of a replication plan for a study, in the form of a registered report. The protocols proposed in this report will have been reviewed by the scientific community prior to starting the experiments. The second phase is the replication study itself which will be conducted using the protocols specified in the Registered Report and completed by the Science Exchange network of scientific labs. In the end, both reports will be peer-reviewed by the eLife reviewers and made available online, along with all the methods and data (Mourouzis, 2015).
Minimum Publishing Standards
Print-based publications used to limit the amount of space allocated to each section of a paper. The investigators would prioritize reporting results and making their claims, instead of detailing the protocols, especially in the methods section, where attention to detail is essential. This section generally provides the instructions that other researchers need in order to replicate an experiment. Most journals have now moved online, and space is not an issue. However, even with supplemental materials, the quality of methods and procedures reported has not increased significantly.
In some cases, the investigators choose to not disclose the details of their protocols and only vaguely cover the techniques used in their experiments, keeping the exact procedure a trade secret. While there might be some value in doing so, it certainly doesn’t help increase reproducibility. BioMed Central has sought to standardize the publication of procedures by providing a checklist of minimum standards that must be met before a paper can be published. The checklist outlines specific criteria that can be used as a guideline to write the entire methods section. Therefore, if all the requirements are followed, it would embed a high degree of replication in the methods.
The idea of publish or perish is still largely a reality of academia, and recognizing this, BioMed Central has launched new types of articles that will enhance reproducibility. This is the idea of launching support articles that are full publications to ensure authors receive credit for all their work—not just the publication but also protocols as registered reports, case studies, and data notes. The online platform for the journal also links related articles and resources such as the data notes or protocols which have been published in addition to the main research paper.
Data Discovery Index
The National Institutes of Health (NIH) recognizes the crucial importance of reproducibility in life sciences, and the realistic complexities that arise during experiments which contribute to the lack of reproducibility. A complex array of other factors are involved, but we will focus on two elements that the NIH identified: lack of incentives for publishing negative data or other relevant experimental data and poor training in experimental design that would lend itself to high replicability. Let’s discuss potential solutions to both issues.
Researchers only report a portion of data generated from their experiments, because their goal is to support the claims made in a publication. This often leads to scientists not publishing any negative data or other relevant data points that support the experiments. Recently, cloud-based data sharing services like figshare have become more popular for researchers to upload additional data corresponding to their publications, but the adoption rate remains low. A larger data set would allow for more consistent conclusions and statistical analysis. In addition, negative data can be very useful in determining potential avenues or pitfalls to avoid.
Data often goes missing in action because once a publication is live, the investigators have no motivation to dedicate additional time and resources to uploading supplementary data. NIH created a solution to this problem by creating the Data Discovery Index, which allows researchers to upload unpublished databases (Collins 2014). If other researchers use this data in their own work, the original data set can be cited. This becomes a new measure of scientific contribution that will allow researchers to gain additional citations for making experimental data available.
The second problem pointed out by the NIH is insufficient training in experimental design that translates easily to replicable studies. To remedy this, the NIH is releasing training modules to help researchers understand common pitfalls to avoid while designing experiments. In this paper, I propose a new vetting and disclosure mechanism for research labs interested in commercializing their research and attracting private investment by demonstrating rigorous preclinical reproducibility. My hope is that such an effort will promote reasonable disclosure of published research by private entities (such as biotech startups) and prevent colossal failures like Theranos from happening. The program will be based on an open and publicly available ledger called the blockchain, and will enable blockchain-based research tracking.
My Proposition: Use the Blockchain
How can the blockchain enable research data tracking and communication? The answer is to use the blockchain as a backend “file-drawer” to store links. These links are simply references that in turn point to real data sources, and there are several advantages to storing only links on the blockchain. The first advantage is that the data uploaded to the blockchain is tamper-evident, meaning that any attempts made to modify it will be recorded. The blockchain relies on cryptographic signatures to maintain the integrity of the network, and any information pegged to it can be verified using the signatures. The second advantage is that any data stored in this manner becomes very easy to share publicly. To understand why, let’s follow an analogy. Imagine that we have a room full of packages, where each package has a sticker-label. Then, we can create a catalog of all the stickers. Once we have that, it is very easy to show someone the contents of a particular package. In the same sense, once we have links stored in the blockchain, we can easily and publicly share the metadata on the information stored by that link. Now if that catalog was electronic, we could search through it digitally. Similarly, we would have to develop frontend applications to search through the blockchain for links.
So what’s the practical advantage of a blockchain? The blockchain is a secure, tamper-evident, and verifiable mechanism for metadata storage. Public or private labs, early-stage startups, and private entities looking to commercialize their research can use this mechanism to report deep due diligence around the basic sciences. We can rely on the blockchain as a data structure to maintain the integrity of the reported information. Novel modalities or new scaffold identification and optimization often take longer than planned. Biotech startups interested in next steps on the road to commercialization can use the blockchain to provide reliable and consistent history of updates. These updates and the platform can become a sophisticated system for providing public disclosures to potential investors or funding bodies. Moreover, startups are often required to already be vetted through partnerships prior to raising venture funding, and the blockchain can be used as a mechanism to partially automate the due diligence and review process. The reported methods, supplementary data, figures and experimental design can be uploaded to the blockchain and shared with scientists from the partnering firm to be analyzed for evidence of reproducibility and being on par with their standards of research. Blockchain-based histories and disclosure reviews can become the gold standard of vetting research labs or private entities before raising venture funding.
A comprehensive, blockchain-based solution is beyond the scope of this paper. However, a major source of innovation would be the applications built on top of the blockchain that can rely on a provable mechanism of data storage. One such example is a decentralized reputation system that can be built on top of the blockchain for rewarding groups with high standards of publications. The published documents can also become a primary source for journalists, who otherwise rely on press releases. I speculate that once more labs and biotech startups join such an effort, different tiers of review and reporting will emerge that can be advantageous to journalists and the general public. This mechanism can assist well-funded startups or companies to publicly disclose their progress, and stealth companies can appoint a reviewer to supplement an executive summary of the review process. These guidelines based on the blockchain can reduce the likelihood of another disaster like Theranos from occurring by requiring reasonable peer review and scrutiny.
The validation of basic sciences in an independent, public, and steadily improving manner demonstrates a strong commitment to improving the quality of preclinical research being carried out by a research lab. It may become necessary for labs to obtain private funding to accelerate the development of drug candidates after obtaining promising animal-model data. Releasing updates publicly not only improves the research being done, but also boosts the confidence of private investors toward funding independently verified research. This commitment to reproducibility can become a cornerstone for public-private partnerships between pharmaceutical companies and research labs in academia.
Carlota Perez describes the nature of technological revolutions (in her wonderful book) as being characterized by two distinct phases: the installation phase, when a technology is introduced to the market and the required infrastructure is built; and the deployment phase, when the technology has been widely adopted and next-generation applications are built on top. An entire file-system built on top of the blockchain containing metadata descriptors that are easy to share is an example of a scientific disclosure application deployed using the blockchain as a file-linking service. The blockchain is going through the installation phase presently, and this article presents a use case for the future deployment era.
- Begley, C.G. and Ellis, L.M., 2012. Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), pp.531-533.
- Collins, F.S. and Tabak, L.A., 2014. NIH plans to enhance reproducibility. Nature, 505(7485), p.612.
- Errington, T.M., Iorns, E., Gunn, W., Tan, F.E., Lomax, J. and Nosek, B.A., 2014. An open investigation of the reproducibility of cancer biology research. Elife, 3, p.e04333.
- Freedman, L.P., Cockburn, I.M. and Simcoe, T.S., 2015. The economics of reproducibility in preclinical research. PLoS Biol, 13(6), p.e1002165.
- Ioannidis, J.P., 2005. Why most published research findings are false. PLoS Med, 2(8), p.e124.
- Mourouzis, I. and Pantos, C., 2015. The Biomedical Data Journal in the New Era of Open Access to Scientific Information. Biomed Data J, 1(1).