U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Committee on Strategies for Responsible Sharing of Clinical Trial Data; Board on Health Sciences Policy; Institute of Medicine. Discussion Framework for Clinical Trial Data Sharing: Guiding Principles, Elements, and Activities. Washington (DC): National Academies Press (US); 2014 Jan 22.

Cover of Discussion Framework for Clinical Trial Data Sharing

Discussion Framework for Clinical Trial Data Sharing: Guiding Principles, Elements, and Activities.

Show details

DATA SHARING ELEMENTS AND ACTIVITIES

The following sections define key elements of data and data sharing activities, such as the type(s) of data to be shared; provider(s) and recipient(s) of shared data; and whether and when data are disclosed publicly, with or without restrictions, or exchanged privately among parties. This section then describes a selected set of data sharing activities. The purpose of outlining potential data sharing activities is to provide an heuristic approach to organizing the work of the committee throughout the course of the study, including information gathering and discussions in public sessions. In its final report, the committee will, with respect to each of these data sharing activities, present findings relating to benefits, risks, and burdens associated with these data sharing activities and suggest strategies and practical approaches to facilitate responsible data sharing. The data sharing activities noted in this discussion should not be interpreted as a conceptual framework that would necessarily lead to particular findings, conclusions, or recommendations.

What Types of Data Could Be Shared

Current Practices in Data Disclosure

During the course of a clinical trial, different types of data are collected, transformed into analyzable datasets to address specific research questions, and used to generate various publications and reports for different audiences (Drazen, 2002) (see Figure 1).

FIGURE 1. Data flow from participant to analyzed data and reporting.

FIGURE 1

Data flow from participant to analyzed data and reporting. NOTE: See Appendix C for more detail about each element of the figure.

Publication in peer-reviewed scientific journals is currently the primary method for sharing clinical trial data with the scientific and medical communities, as well as the public (often through media coverage of published findings). These publications, however, contain only a small subset of the data collected, produced, and analyzed in the course of a trial (Doshi et al., 2013; Zarin, 2013). Scientific journal articles generally contain a brief summary of the trial background, research question(s), methodology, results, figures and tables, and discussion.

Clinical trial sponsors seeking regulatory approval from authorities such as the FDA and the EMA must submit detailed CSRs (discussed below) and IPD as required, which forms the basis of the marketing application for a product. In trials that are not conducted for regulatory approval of a product, detailed CSRs may or may not be prepared (Doshi et al., 2012; Teden, 2013).

Beyond the selected clinical trial data that are disclosed in journal publications, IPD and more detailed clinical datasets have not been routinely shared with the broader scientific community or the public. Some sponsors in both industry and academia have shared IPD and summary data reports upon request and on a case-by-case basis (Rathi et al., 2012). There are a number of initiatives at the NIH to share clinical trial data from the time of publication (Immune Tolerance Network, 2013; NHLBI, 2007). Recently, proposals have been put forth for more proactive sharing of both CSRs and IPD, and several plans have been announced or implemented (EMA, 2013; Krumholz and Ross, 2011; Kuntz, 2013; Loder, 2013; Nisen and Rockhold, 2013; PhRMA and EFPIA, 2013; YODA, 2013).

Many discussions of clinical trial data sharing thus far have not been specific regarding which of many possible clinical trial data elements or datasets might be shared. This framework articulates more specific definitions and descriptions of data that might be shared to help facilitate more focused discussions among the public and various stakeholders.

Data

The committee has identified various types of clinical trial data (containing differing levels of detail) that might be included in a data sharing activity (see Figure 1).

Raw data

Sometimes called source data, raw data are observations about individual participants. These data might be collected specifically for the study protocol, or as part of routine care and used by the investigators. At the source, data might be in the form of measurements of participant characteristics such as weight, blood pressure, or heart rate, and can be associated with the baseline (or initial) visit or subsequent followup visits. Raw data might also include a baseline description of the participant's medical history, physical exam information, clinical laboratory results (e.g., serum lipid values, hemoglobin levels), whole exome or genome sequences, or imaging (e.g., X-ray, magnetic resonance imaging). Depending on the trial, demographics, clinical data, and other appropriate raw source information are entered into case report forms. Some data must be abstracted and/or interpreted for the purposes of the protocol, for example, reading the X-ray for tumor size or evaluating the electrocardiogram for evidence of a heart attack. Data might also include assessments by clinical study staff or adjudication committees to determine whether specific clinical end points or adverse events in the participant's profile (e.g., heart attack, death) meet protocol-specified criteria. In addition to “traditional” clinical trial data, other types of health data are increasingly being collected, including self-reported measures (e.g., quality of life), quantified sensor data (e.g., readings from remote monitoring devices, including smartphone apps and geolocation data), consumer genomics data (e.g., from companies like 23andMe), and community-level self-reported data (e.g., from sites like PatientsLikeMe).

Data entry into the database

After collection (and assessment, abstraction, or adjudication as appropriate), source data typically must be entered into an organized data management system (i.e., database) for further evaluation and processing. Data typically undergo a process of cleaning, quality assurance, and quality control to detect inconsistent, incomplete, or inaccurate entries, and to confirm that the data were collected and evaluated according to the protocol and that they match the source data. This process continues throughout the course of the trial as data are collected.

Analyzable dataset

Typically, after data are entered in computerized form, new variables are mathematically generated to serve as the basis for later analyses. These variables are sometimes called “derived” variables. For example, patient age might not be entered directly, but calculated by subtracting the birthdate from the date of a given clinic visit. “Treatment response” could be entered as a mathematical comparison of lesion sizes recorded from two images. After the trial is declared complete, the edited and cleaned data are moved into an analyzable data file and locked (i.e., no further changes may be made). If the study is blinded (or masked), then after the database is locked, the treatment code file is merged with the analyzable data file, and the data are unblinded to the investigators. Some or all of this now-unblinded analyzable dataset will then be used for data analyses.

A statistical analysis plan (SAP) is finalized before the trial is completed and unblinded; the SAP drives the initial analyses of the analyzable dataset. The trial protocol should contain both a basic SAP and a more detailed SAP that includes the analyses and any interim analyses to be conducted, as well as the statistical methods that will be used, as determined by the protocol. The full SAP includes, for example, plans for analysis of baseline descriptive data, adherence to the intervention, primary and secondary outcomes, definition of adverse events and serious adverse events, as well as the comparison of these measures across interventions for pre-specified subgroups. The analysis may be very extensive and can result in several dozen tables and figures.

For many clinical trials, the SAP-defined analysis might not use all of the data available in the analyzable dataset. Moreover, peer-reviewed journal publications of clinical trials generally draw on only part of the analyzable dataset. Supplemental data are often collected to permit exploration of ancillary questions not directly related to the primary purpose of the protocol, and researchers might conduct exploratory and post hoc analysis not defined in the SAP to answer additional questions.

Reports Generated from Data

Box 3 lists some of the types of reports that are commonly generated from the analyzable dataset. In addition to preparing peer-reviewed journal publications describing the primary and major secondary outcomes specified in the protocol, trial sponsors prepare various additional reports, including results summaries for registries, lay summaries, and clinical study reports.

Box Icon

BOX 3

Types of Summary Reports. Lay-language summary Registry results summary (e.g., for ClinicalTrials.gov)

Publications

Several scientific journal publications are commonly derived from the analyses driven by the SAP and from post hoc analyses. Typically, a primary publication will address the primary and possibly the leading secondary outcome measures specified in the protocol. The primary publication would also include the baseline measures to demonstrate participant comparability between intervention arms and comparisons of any adverse events of major interest or frequency. Subsequent journal publications might address in more detail a specific aspect of the primary analysis that was not included in the primary publication or analyze outcomes in particular prespecified subgroups of participants. Ideally, each journal publication should have a specific dataset corresponding exclusively to the data used to generate the tables and figures in the publication (which would be a subset of the full analyzable dataset). Each specific dataset is typically stored in separate sets of data files to document the data used for each journal publication.

Registry results summary and lay-language summary

Many clinical trials are subject to a requirement that their results be reported to one or more registries, in formats specified by the particular registry (for example, results of trials of FDA-regulated products must be reported to ClinicalTrials.gov in the United States). These summaries are publicly available on the registry website and are generally limited to major outcomes and adverse events.

A lay-language summary is a brief, nontechnical overview written for the general public and trial participants. Lay summaries of the clinical trial protocol are often required by institutional review boards (IRBs) to assist nonscientist members of the IRB in the protocol review and approval process; however, the preparation of lay-language summaries of clinical trial results is uncommon. A recent study suggests that trial participants would value such summaries and that provision of lay-language results summaries to participants is feasible (Getz et al., 2012).

Clinical Study Report (CSR)

When a clinical trial is submitted to regulatory agencies as part of an application for marketing approval of an intervention or approval of a new indication, trial sponsors usually submit a detailed CSR. Specifications for CSRs were defined by the ICH and adopted by the FDA, the EMA, and the Japanese Ministry of Health, Labor, and Welfare in an effort to simplify the application process for new interventions globally (ICH, 1995). According to the FDA guidance, the CSR is an

integrated full report of an individual study of any therapeutic, prophylactic or diagnostic agent … conducted in patients. The clinical and statistical description, presentation, and analyses are integrated into a single report incorporating tables and figures into the main text of the report or at the end of the text, with appendices containing such information as the protocol, sample case report forms, investigator-related information, information related to the test drugs/investigational products including active control/comparators, technical statistical documentation, related publications, patient data listings, and technical statistical details such as derivations, computations, analyses, and computer output. (FDA, 1996, p. 1)

Although a CSR contains mainly summary data and summary tables and graphs, it also usually contains considerable additional information (often thousands of pages), including, as described in the definition above, numerous large appendixes. Supplemental information can include detailed narratives describing individual participants. In some instances, the CSR and/or its appendixes might include identifiable participant or commercially confidential information or other protected health information or intellectual property. A CSR synopsis (i.e., executive summary) is sometimes drafted to accompany a full CSR. Some of the supporting clinical trials included in a regulatory submission do not directly contribute to the evaluation of effectiveness of the intervention; for these studies, sponsors may be permitted to submit an abbreviated CSR (FDA, 1999). Some CSRs may also be redacted before they are shared to remove any commercially confidential information and personally identifiable information.

Metadata and Additional Documentation

In order for researchers to make use of clinical trial data that is shared with them (e.g., to perform confirmatory analyses or carry out exploratory analyses), they need further information or metadata (i.e., “data about the data”) in addition to the data elements or datasets described above. Box 4 summarizes some of the metadata that might be needed to facilitate full use of shared data. Supporting documentation critical to interpretation of shared clinical trial data include the full protocol, manual of operations, consent form, case report forms, and the SAP.

Box Icon

BOX 4

Metadata and Additional Documentation to Support the Use of Shared Clinical Trial Data. Clinical trial registration number and dataset (available through ClinicalTrials.gov and other registries) Full protocol (e.g., all outcomes, study structure), including (more...)

The trial protocol describes the trial rationale; the eligibility and exclusion criteria for participants; the primary and secondary hypotheses and the corresponding primary and secondary outcome measures; the methods used to gather and adjudicate adverse events; other measures intended to evaluate the intervention; and a full description of the intervention and how it is administered. While a trial protocol provides the overall experimental design, a detailed manual of operations describes how the trial was conducted. A copy of the template for the informed consent form describes what participants agreed to, what hypotheses were included, and the additional purposes for which their data might be used. Case report forms capture precisely what measures were made, and at what time points during the trial, as defined in the protocol. The SAP sets out how each data element was analyzed, what specific statistical method was used for each analysis, and how adjustments were made for testing multiple variables. If some analysis methods required critical assumptions, data users would need to understand how those assumptions were verified. For key analyses, full use of the shared data would be aided by providing the computer software and version used, as well as the statistical programming code for the statistical software used for each analysis.

Summaries and reports (e.g., publications from the trial, lay-language summary of trial results, trial registry results summary, CSR synopsis) would also help shared-data recipients understand and make the most efficient use of shared data.

There are variations in how these data elements are defined and in the terminology used to describe them. In its future deliberations, the committee will seek clarity and consistency in use of the terms. The final report will include discussion of how the information in the analyzable dataset, CSR, and IPD differ and which types of analyses, either confirmatory or exploratory, require which level of data sharing.

Who Are the Providers of Shared Data

Data are generated at almost every step in the clinical research process, from the initial collection of baseline participant data to the analysis of the analyzable dataset. Different individuals or organizations hold or control the data at different times during the course of the trial (e.g., laboratory technicians, investigators, database administrators, statisticians, DSMBs, sponsors, and regulatory agencies). A data holder may or may not have legal authority to share the data with others. In contrast, an individual or organization with the authority to share the data might not have physical possession of the data at a particular time. Thus, responsibility for providing data for sharing might need to be coordinated among several entities. Potential entities that are likely to be data providers include (but are not limited to) the following:

  • Individual participants in a clinical trial (the initial “providers” of data to researchers). Some participants might hold data to the extent that they self-generate and transmit the data (from self-quantifying devices), retain copies of the data, or receive information from investigators. Participants could, in turn, share their information with organizations that aggregate data from many participants (e.g., disease advocacy groups, research platforms such as PatientsLikeMe, Reg4ALL, or Sage Bionetwork's Bridge).
  • Clinical trial funders (e.g., government, industry, foundations, or advocacy organizations).
  • Contract research organizations that collect source data from participants on behalf of sponsors.
  • Principal investigators or their institutions.
  • Site principal investigators of a multisite trial or their institutions.
  • The data or biostatistics coordinating center or the institution hosting the center.
  • Regulatory agencies to which data are submitted.
  • Systematic reviewers and guideline developers.

Potential users of shared data will need to know whom to contact to obtain the data they seek, who owns the data, who controls the data, and where they can get answers to questions that will inevitably arise about the dataset. As such, data providers might have an ongoing resource role, beyond simply sending data and metadata to recipients. The final report will include analysis of the advantages and disadvantages of various actors having responsibility for providing data to be shared.

Who Are the Recipients of Shared Data

Many individuals and entities could be recipients of shared data from clinical trials. These include (but are not limited to) the following:

  • Individuals participating in the trial, as a part of the agreement for participation, for a variety of reasons described above (e.g., trust, transparency, respect, engagement).
  • Researchers seeking to reanalyze a study or explore new scientific questions.
  • The institutions supporting the researchers.
  • Funding agencies (e.g., government, private sector).
  • IRBs or scientific peer review committees reviewing a new study of the same or a similar intervention in order to have a more comprehensive safety profile of the intervention.
  • The DSMB or DMC for another clinical trial, whose decision to recommend continuing or stopping that trial can be informed by the results of a completed trial that has not yet been or will never be published.
  • Educators requesting to use a dataset for teaching purposes (e.g., in a biostatistics class).
  • A disease advocacy group seeking to advance research.
  • Prospective plaintiffs or attorneys seeking information that could be used in current or future litigation.
  • Competitors of the industry sponsor of the intervention studied in the trial.
  • Members of the media.
  • Interested members of the public.

Potential data recipients are interested in different types of data for potentially very different purposes.

When Might Clinical Trial Data Be Shared

Data might be shared at various points in the timeline of a clinical trial, for instance,

  • After publication of the primary results of the clinical trial in a peer-reviewed journal.
  • After discontinuation of development of an intervention by a sponsor.
  • After completion or early termination of a clinical trial.
  • After regulatory approval of a new intervention or a new indication for the intervention.
  • Following the occurrence of serious adverse events.
  • Earlier, at the discretion of the data provider or generator.

The final report will include consideration of the advantages and disadvantages associated with disclosure of clinical trial data at various time points. The timing of data sharing could have very important implications that need serious consideration. For example, timing of release of the data will have consequences for the interests of the clinical trial team (e.g., the impact of data sharing on the timing of further publications from the data). Timing is also of great interest to the trial sponsor (e.g., relative to the timing of securing intellectual property rights or regulatory approval).

How Might Data Be Shared

A variety of models for clinical trial data sharing have been proposed, planned, or implemented (select examples are summarized in Box 5). The types of data that are shared differ across the models. Proposed models of data sharing have generally imposed some sort of restriction on the sharing of data that could directly or potentially identify trial participants, as well as data that reveal CCI or trade secrets or might result in inaccurate analysis. Access to clinical trial data in current models ranges from essentially full access to de-identified data to fully restricted or no access.

Box Icon

BOX 5

Set of Clinical Trial Data Sharing Activities. Open Access A data sharing program or system in which data are made broadly available to the public through an open-access website. Data might be aggregated from multiple sources (i.e., more than one institution, (more...)

Open Access

In an open or public access model, data are made available, at a defined time, to any party who seeks them, for any purpose. For example, the EMA has announced that it will release, to any data requester that is a known entity to the agency, both summary and participant-level data (excluding, for example, personally identifiable data and information the EMA deems to be CCI) immediately after a regulatory decision about a new drug (Eichler et al., 2013; EMA 2013; Immport, 2013; Immune Tolerance Network, 2013; NHLBI, 2007).

Controlled Access

In some models of data sharing, access is restricted to specific classes of user or for specific purposes. Requestors might need to demonstrate that they meet specified eligibility criteria. Some models require only the name and contact information of the requestor, while others require information about the proposed use of the requested data or how the data will be analyzed. Some models might also impose conditions relating to whether the data generators would receive credit in publications.

In some cases, the actual data are not provided to the requestor. Instead, data holders might run specific data analyses for approved requestors and deliver to the requestors only the results of the requested analyses. In another model, recipients receive credentials to access and run queries on the data, but are not able to download or obtain copies of the data.

Data sharing can also take place indirectly, through a “trusted intermediary” or “honest broker,” who either negotiates the conditions for data sharing (with the data provider retaining control over the data and its release) or takes full control of the data and brokers both the conditions for data release and the delivery to recipients (Mello et al., 2013). Trusted intermediaries might also accept and facilitate data analysis queries from secondary investigators, as mentioned above. The use of a trusted intermediary raises a number of issues, including selection, administration, funding, and compensation.

A controlled-access data sharing model could implicitly or explicitly address issues such as specification of data that will be shared; any categories of data that are to be specifically restricted; and how to address risks of breaches of confidentiality, attempts to re-identify trial participants, data retention periods, scientifically inappropriate analyses, and the need for timely responses to data requests.

To obtain shared data under a controlled-access model, a recipient typically must execute a data use agreement (DUA). Conditions in the DUA might include

  • prohibitions on re-identification or contact of individual trial participants;
  • requirements to acknowledge the providers of the data in any publications resulting from the shared data;
  • requirements to send copies of submitted manuscripts and publications to the trial investigators or study sponsor;
  • restrictions on further sharing of the data with additional parties or using the data for purposes other than originally proposed;
  • assignment of intellectual property rights for discoveries from the shared data;
  • requirements to publish or post findings from the data; and
  • requirements to notify industry sponsors of the trial of any findings that raise safety concerns.

There might also be limits on the length of time that a recipient may use or access shared data (ADNI, 2013; Harvard University MRCT, 2013; Nisen and Rockhold, 2013; PhRMA and EFPIA, 2013; YODA, 2013).

Selected Set of Clinical Trial Data Sharing Activities

The possible models and approaches to clinical trial data sharing, and the purposes motivating that sharing, are extensive. The rationale, benefits, risks, and burdens associated with one particular data sharing activity could differ from those of another data sharing activity, depending on the data elements and parties involved.

To stimulate public comments and to provide heuristic organizational structure to its work, the committee has described a selected set of data sharing activities as examples of the types of arrangements or approaches under which clinical trial data might be shared (see Box 5). To derive this selected set, the committee reviewed a range of existing and proposed data sharing activities and distilled them into four conceptual categories that represent broad “families” of activities that have key features in common. No single proposed or enacted data sharing activity has all of the characteristics in the familial description, and different data sharing activities in the same category can have important differences. The categories therefore are derived from, but do not describe any specific, data sharing activities currently underway or proposed. To the extent there are redundancies in characteristics within the models, the committee will address them or take them into account in its forthcoming analysis.

Each activity is described according to the type(s) of data that could be shared; provider(s) of data; recipients of data; timing of data sharing; conditions or qualifications for access; and conditions of data use. The descriptive characteristics are an illustrative but not exhaustive list. Detailed descriptions and, particularly, conclusions and recommendations regarding those descriptive characteristics or strategies and approaches for sharing will be included in the final report.

Copyright 2014 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK253383

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.1M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...