Clinical trial data sharing has the potential to accelerate scientific progress and, ultimately, to improve public health. Effective data sharing serves to reduce duplication of efforts, may reduce patient exposure to unnecessary drugs and interventions, and serves to shape the design of trials in the development phase. Most importantly, sharing of clinical trial data is an ethical imperative since study participants put themselves at risk to contribute to science. If their data is not shared, their contributions are not adequately leveraged for future science.Reference Bauchner1 In 2015, the Institute of Medicine released a consensus study report that laid a solid foundation for the ecosystem of data sharing policies and platforms.Reference Mello2 Yet, researchers are really just beginning to leverage the benefits of secondary analyses and the aggregation of clinical trial data to drive forward new insights.
Trial data shared through mechanisms like Vivli, a global data sharing platform launched in 2018,3 are a rich resource to potentially help the research community accelerate the understanding, investigation, and validation of biomarkers. In what follows, we outline how we see data sharing platforms playing this role. Ultimately, it is our vision that as adoption and utilization of clinical trial data sharing platforms becomes more commonplace, broader sharing of trial data will drive more rigorous validation of biomarkers and potentially greater biomarker adoption.
What Are Data Sharing Platforms? Why Are They Needed?
In the evolution of science, researchers have struggled at times to make their data freely available. The reasons for this include fear of competition from other researchers, lack of a user-friendly mechanism for sharing data, and resource constraints. Into this gap, a number of platforms to encourage sharing of clinical trial data were launched to try and provide better value from data sharing. The existing platforms vary in their approach and construction — some are website portals that list available trials; others are data repositories that contain the underlying infrastructure to allow data to be collected, tracked, deposited and archived for long term storage. Some platforms even include sophisticated analytics or provide a secure virtual research environment within which all research must be conducted.
Data sharing platforms have also diverged in their implementation approaches. For example, some platforms are considered “open access” — meaning that the data is available to be downloadable simply upon signing of a legal agreement. Other platforms are considered “managed access” — meaning that there may be additional requirements that secondary research teams must meet (e.g., submitting a specific scientific hypothesis or showing that the research team has the necessary expertise to properly analyze the data) in order to gain access.
Vivli is one of these new clinical trial data sharing platforms. Built on Microsoft's Azure Cloud, Vivli's mission is to facilitate scientific sharing and reuse of clinical research data, allowing biomedical researchers to safely and securely combine individual participant-level data (IPD) from completed clinical trials. Vivli's platform includes a data repository, a robust search engine, and a secure research environment.
As with other platforms that have emerged to enable safe and effective data sharing, Vivli seeks to balance data security with maximizing the utility gleaned from completed trials, in order to best honor the contributions of trial participants. As the types of analysis enabled by Vivli and others continue to evolve, researchers will be able to better analyze and capture nuances of data perhaps not initially evident, including the use and understanding of biomarkers.
Benefits From Sharing and Accessing Biomarker Data Across Studies
Data sharing platforms such as Vivli, Reference Bierer4 Clinicalstudydatarequest.com (CSDR),Reference Rockhold, Nisen and Freeman5 and the Yale University Open Data Access (YODA) ProjectReference Ross6 differ in several important ways from government-run trial reporting registries, such as Clinicaltrials.gov and the EudraCT database. These large government registries promote registration and disclosure and contain a wealth of data about clinical trials. However, they only contain “aggregate” or summary-level data about clinical trials.Reference Zarin and Tse7 Such summary-level data can be useful for meta-analyses that seek to pool population-level effects across a group of trials, but there are many important clinical and public health questions that require IPD to answer.
For example, IPD allows an analyst to standardize or redefine outcomes across studies, check analysis assumptions (e.g., normality), explore covariates and treatment-covariate interactions, conduct new subgroup analyses, model prognostic or diagnostic data, or model heterogeneity within and between studies.Reference Debray, Moons and van Valkenhoef8 IPD is particularly valuable for aggregating studies with time-to-event outcomes such as survival. Nevitt et al. report an average of 105 IPD meta-analyses published annually from 2009-2015, an increasing trend from prior yearsReference Nevitt9 reflecting recognition of the greater value of IPD over summary-level data for these important types of secondary data analysis.
Data sharing of IPD is of particular benefit for biomarker discovery and validation. As shown in Figure 1, if trials begin to incorporate a new biomarker and the IPD for these studies are shared, then pooled analyses across these trials can help validate the biomarker's applicability across broader clinical circumstances. The biomarker can also be further refined to apply only to certain subgroups, or being modulated by covariates, or more robust models of predication can be built. If the biomarker is more widely validated and shown to be robust and useful — whether for regulatory or clinical endpoints — then it can even become a Common Data Element (CDE) (a common term or concept that is used in clinical research to refer to standardized study variables) that disease-specific experts in the field agree should be a part of any data collection efforts for their domain.

Figure 1. Vision of Virtuous Cycle of Sharing and Advancement of Biomarker Science Using the Vivli Platform
At present, there are many CDEs that are not necessarily well-validated biomarkers. The NIH CDE resource portal, for example, contains over 26,000 CDEs.10 Unfortunately, very few of these CDEs are routinely used, in part because there is little systematic evidence for (or against) them. However, if a biomarker CDE has been fully and transparently validated and refined using shared IPD from multiple studies, then uptake may be higher, especially if the community norm in that disease area supports that biomarker CDE.
Once recognized and used as a CDE, a particular biomarker would then be more likely to be routinely incorporated as an outcome within clinical trial protocols in the future. As these trials come to completion, the IPD may be shared externally through data sharing platforms. At which point, researchers accessing this data can reap the rewards of those studies having used biomarker CDEs. Datasets that incorporate CDEs are much more easily harmonized and integrated since the explicit aim of CDEs is to facilitate and enable the combination of data across studies and to improve data integrity. If we consider, for example, HbA1c levels or CD4 counts — two surrogate biomarkers that are widely used in different populations (diabetes and AIDS, respectively) — to direct patient care towards improving outcomes, we see that these established biomarkers can be precisely measured and are currently in wide adoption. Trials that include these measurements as outcomes are easier to integrate and compare than for example a diabetes trial that does not use HbA1c levels as an outcome.
The ability to combine trial data from different platforms or repositories is in a nascent stage. Bridging different platforms in order to combine data is also critically important as not all data required for an analysis typically resides in one repository or platform. With clinical trial data there are often governance rules and/or security concerns that prevent data from leaving the confines of a particular data platform and therefore bridging or connecting between platforms to combine data is a potential solution to enable integration of disparate datasets.
The ability to combine trial data from different platforms or repositories is in a nascent stage. Bridging different platforms in order to combine data is also critically important as not all data required for an analysis typically resides in one repository or platform. With clinical trial data there are often governance rules and/or security concerns that prevent data from leaving the confines of a particular data platform and therefore bridging or connecting between platforms to combine data is a potential solution to enable integration of disparate datasets.
Vivli was created with these challenges for harmonization in mind. Thus, the Vivli platform includes (1) a robust centralized search engine driven by detailed metadata to locate data across various platforms and repositories available to find the relevant datasets; (2) harmonized data sharing and use agreements that can be used for a broad set of stakeholders; and (3) a platform that is able to host and archive data in long term storage in a secure manner for stakeholders who may lack the resources to host and store the data on their own. Although biomarker data is just one of the data types that Vivli hosts and makes available for secondary research, we believe that these features of the platform make it well-suited to facilitating biomarker research. Moreover, future platform development is envisioned to allow Vivli users to share and request genomics and imaging data collected in trials, which could further enhance researchers' ability to powerfully integrate certain classes of biomarkers.
Accelerating Biomarker Research
How do data sharing platforms add value to accelerate or advance the study of biomarkers? Let's imagine a case: An oncologist wishes to evaluate the significance of tumor response as a biomarker for breast cancer patients. Through Vivli, the oncologist can search for breast cancer trials that have this as an outcome, submit a proposal for the data, and then once approved, they have access to the data through Vivli's secure research environment. They can then proceed to analyze IPD from these trials, and can conduct a rigorous analysis with both trial-level data and IPD to evaluate whether tumor response is valid prognostic marker, trial-level surrogate, or both.
Before data sharing platforms like Vivli were available, it would require considerable time and effort for secondary researchers to access IPD from trials to be able to answer these kinds of questions about biomarkers. The researchers either had to have privileged access to IPD from a set of trials or they had to contact the trial investigators or sponsors individually. Data sharing platforms can thus offer novel (and far more efficient) opportunities for researchers to explore biomarker data from trials from multiple trials or multiple sponsors. This increased opportunity for re-use of individual participant data correspondingly increases the value of the participant's contributions to science, and thereby promotes a more ethical research enterprise.
But the power of IPD data platforms for biomarker research is not merely hypothetical. The large-scale, multi-stakeholder Accelerating Medicines Partnership Parkinson's disease (AMP PD) (a coalition formed by the NIH and comprised of pharmaceutical companies, FDA and the Michael J. Fox Foundation) launched in January 2018 by the Parkinson's Disease community to focus on a knowledge portal of shared individual patient data. This rich resource will enable Parkinson's Disease researchers to drive biomarker discovery and validate therapeutic targets.Reference Bakkar, Boehringer and Bowser11 The oncology community has also capitalized on the potential for data sharing platforms through mechanisms such as Cancerlinq, a platform containing real-world data from oncology patients implemented by ASCOReference Sledge, Miller and Hauser12 and Project Data SphereReference Green13 a data sharing platform focused on oncology data.
Digital Biomarkers
An exciting new area for clinical trial innovation is the development of digital biomarkers, defined colloquially as physiological and behavioral data collected via digital sensors and other tools that explain, influence, or predict health-related outcomes. This broader definition of biomarker is distinct from the definition used in the regulatory arena for pharmaceuticals, where biomarker is defined as objective, quantifiable characteristics of biological processes and explicitly not how a person “feels, functions, or survives” (which are clinical endpoints).
Although the science of digital biomarkers is still in its infancy, there are already examples of data sharing showing value in the area of digital biomarkers. For example, the Parkinson's Disease Digital Biomarker DREAM challenge is benchmarking sensor data from accelerometers and gyroscopes to develop digital biomarker signatures of Parkinsonian gait, tremor, or other movement for diagnosis or disease monitoring. We believe that data sharing can accelerate the discovery, validation, and adoption of these biomarkers, just as we described above.
Another example is novel diabetes biomarkers made possible by the increased availability of continuous glucose monitors (CGM). Rather than being constrained only to averages of blood sugar as reflected in the HgbA1c, CGM data allows measurement of “time in target (blood sugar) range” which is being considered as a new biomarker. The Type 1 diabetes community is very active and initiatives such as the Tidepool Big Data Donation Project aims to collect de-identified CGM data for this and other biomarker development. As clinical trials begin to collect more physiologic and behavioral data using digital tools, the opportunities and value of data sharing for digital biomarker development are large.
Conclusion
We envision that we are in the infancy of utilization of data sharing platforms for biomarker research. Once leveraged, these types of data sharing platforms that enable the integration of multiple sources of data currently siloed have the potential to greatly enrich the biomarker knowledge base — aggregating and making accessible rich data streams that only a few years ago would have been unwieldy if not impossible to access. Thus, in our view, the future of biomarker discovery and rigorous validation looks highly promising.
We envision that we are in the infancy of utilization of data sharing platforms for biomarker research. Once leveraged, these types of data sharing platforms that enable the integration of multiple sources of data currently siloed have the potential to greatly enrich the biomarker knowledge base — aggregating and making accessible rich data streams that only a few years ago would have been unwieldy if not impossible to access. Thus, in our view, the future of biomarker discovery and rigorous validation looks highly promising.