NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
National Academy of Medicine; The Learning Health System Series; Grossmann C, Chua PS, Ahmed M, et al., editors. Sharing Health Data: The Why, the Will, and the Way Forward. Washington (DC): National Academies Press (US); 2022.
Sharing Health Data: The Why, the Will, and the Way Forward.
Show detailsInterviewees: Melissa Haendel, PhD, Co-director; Chris Chute, DrPH, MD, MPH, Co-director; and Andrea Volz, Communications Manager
ABSTRACT
The National COVID Cohort Collaborative (N3C) was rapidly established in spring 2020 as an open science partnership between the 60 Clinical and Translational Science Awards (CTSA) Program hub sites, the National Center for Data to Health (CD2H), multiple distributed clinical data research networks, and other partner organizations (Haendel et al., 2021; NCATS, 2021; CNDH, 2021). Born from the urgency to understand COVID-19, the N3C endeavors to improve the accessibility and efficiency of a large COVID-19 clinical data set while demonstrating a novel approach to sharing patient-level data and enabling individual researchers to use the data for approved projects. When COVID-19 emerged, CD2H leaders had been working on the harmonization of different common data models (see Box 2) already in use by the research community (Weeks and Pardee, 2019). The pandemic galvanized the CTSA community, and the N3C leadership accelerated progress toward the launch of a cloud-based consortium for aggregating institution-level data in a research enclave. Leaders’ deep familiarity with collaboration challenges—including expedient human subjects review, aligned incentives for sharing, governance, and data privacy/ security requirements—enabled them to quickly gain cooperation from research collaborators, the cloud computing host, funding agency, and journal editors. The result is an active partnership and dynamic data enclave containing data for more than 3.2 million COVID-positive cases, and approximately 9.3 million patients as of November 2021. The N3C publications and data insights also continue to grow rapidly.
BACKGROUND
The N3C is anchored by its data enclave, a secure, cloud-based platform housing individual-level clinical data from its contributing partners (all based in the U.S.). Partner organizations, such as the National Patient-Centered Clinical Research Network (PCORnet®) and TriNetX, provide access to structured data from electronic health records (EHRs) across the country that can be queried to answer questions related to COVID-19. The N3C developed a comprehensive list of demographic and clinical data elements to create a research registry of patients who have been tested for or diagnosed with COVID-19, augmented with data on treatment and outcomes. Data are mapped to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (Haendel et al., 2021). Researchers identify questions of mutual interest via meetings and during weekly presentation forums and can form dedicated workstreams based on collective expertise. The N3C website provides detailed information about various domain teams that have organized to address particular topics, including how to participate via Slack channels and other forums (N3C, 2021a). Potential collaborators have access to onboarding documents, descriptions of domain teams, and other resources. This team science approach leverages complementary capabilities and domain expertise in disciplines such as informatics, epidemiology, biostatistics, data science, and a range of clinical specialties (e.g., cardiology, pulmonology, nephrology, neurology), which is particularly important given the progress in understanding COVID-19 as an illness with varying short- and long-term effects on different organ systems. The enclave is notable for its dynamic nature; data partners contribute new patient records an average of twice per week. Updates on the refreshed data are made available on N3C’s website and the National Institutes of Health (NIH) website. Nonetheless, the enclave is dependent on the availability of “local” data and documentation in EHRs and/or claims data, which may be incomplete or inconsistent from one health system to the next.
DESCRIPTION
Access to the N3C data enclave leverages best practices in collaborative data stewardship, privacy, and security, including institutional- and user-level permissions, two-factor authentication, compliance with all federal provisions for protecting data (e.g., Federal Information Security Management Act, HIPAA), and review of the nature and appropriateness of a given data request (Haendel et al., 2021). This is balanced by measures that enhance efficiency, namely the creation of a single institutional review board (IRB) to review all requests to query the N3C research data. Details regarding the protocol approved by the Johns Hopkins University IRB are publicly available, as are the extensive data specifications and software tools for data visualization and constructing efficient queries. In addition to creating a limited dataset and de-identified dataset, a subgroup of N3C researchers have developed a unique synthetic dataset, comprised of data that are computationally derived from the limited dataset and resemble patient information statistically, but are not actual patient data.
The data use request process for the N3C data enclave (as of April 2021) is briefly outlined here (N3C, 2021b). The requirements support data security, consistency, and continual reinforcement of the trust fabric across N3C, and are consistent across the limited, de-identified, and synthetic datasets, except as noted below.
- An institution-level data use agreement (DUA) must be executed as a prerequisite.
- Users must have completed required training in NIH information security and protection of human participants.
- Users register for an account to access the enclave, verifying that they have a tool in place to complete two-factor authentication in order to access data.
- Users then complete a data use request form, specifying the nature and scope of the research question and justifying the level of detail needed in the data.
- Users supply documentation of IRB approval as part of the data use request process for limited and de-identified data (not applicable for the synthetic dataset).
- A Data Access Committee reviews and approves or declines each request. Approved requests are valid for one year, and training/support is provided as needed.
Tools to support cohort exploration, including data views and analyses of de-identified data, are offered to CTSA researchers as well as citizen scientists, upholding the N3C’s intent to maximize transparency and inclusivity, while preserving privacy and security.
Overall stewardship is provided by NIH’s National Center for Advancing Translational Sciences (NCATS), which funds the N3C (Bennett et al., 2021). A Steering Committee approves activities and assures alignment of working groups, committees, and cores with the overall N3C goals. Much of the governance is focused on the data enclave, as described above. Primary governance documents include the Attribution and Publication Guidelines, the Community Guiding Principles, and the User Code of Conduct. This approach balances scientific autonomy with the creation of an open, respectful environment that encourages collaboration on this extraordinary health challenge. For instance, the Community Guiding Principles are partnership, inclusivity, transparency, reciprocity, accountability, and security.
While the N3C was an offshoot of the CTSA, a longstanding initiative that emphasizes collaboration, building the will for the N3C’s creation was not automatic. Recognizing the need to garner rapid cooperation in a team science environment, leaders of N3C sought to identify champions from the CTSA hubs, the funding agency, and other community affiliates who could serve as ambassadors for the importance of this work. In addition, the N3C leaders did not stipulate that data users had to also be data contributors. This was an important aspect of gaining buy-in, as was the early and explicit plan to recognize all contributors as manuscript authors. In many academic institutions, participation on a publication is a key type of “currency” that supports promotion and tenure and may be particularly significant for early career investigators. Thus, N3C balances the goals of a very large team science consortium and the needs and values of individual investigators in academia. The leadership recognized the recurring tension in academia regarding research productivity and aims to produce high impact papers that recognizes all of the contributors. To this end, a consortium authorship model was developed to address the objective of recognizing the vast number of contributors to any given N3C paper.
The CTSA program is a prominent and prestigious feature for many academic institutions. The ability to foster both intra- and inter-institutional collaboration has been a hallmark of the CTSAs for more than two decades and has helped launch careers in clinical and translational science for hundreds of scientists. Nonetheless, in the interviews that informed this case study, N3C leaders describe this as a “social engineering experiment,” in that it engenders a new level of openness and data sharing. The complexity of COVID-19 has helped collaborators recognize the importance of a diverse team with specialized expertise that could range from acute kidney injury to the Python programming language to pharmacokinetics. This team science approach is also intended to foster higher caliber research outputs, in that strong multidisciplinary teams and high-quality data can yield higher impact papers in leading journals. Participants in the N3C are encouraged to get involved via a prominent link on the N3C home page, either by joining existing collaborative groups listed on the N3C website, or self-organizing around topics of interest to create new “domain teams,” which could range from a specific clinical topic (COVID-19 outcomes among people with diabetes) to increasingly broad and cross-cutting issues (impact of the pandemic in rural communities, or genomics and COVID-19). This rapid growth presented early challenges, especially in resource management and communication.
A unique challenge early in the N3C’s formation was developing the DUA between the NIH and the contributing institutions. The research data reside on a platform funded by the NIH, and the N3C itself is not a formal legal entity—simply a funded project. As such, NCATS is the fiduciary agent, holding the data and operating in accord with pertinent federal rules. Consequently, the Data Access Committee is composed exclusively of federal officials; N3C community members cannot participate. Progress and successful execution of the DUA was facilitated by the urgent need for this research platform, and a strong partnership between the NIH and NCATS leadership and the principal investigators of the CD2H initiative (which incubated the N3C). A related challenge was establishing a single IRB for the N3C. Johns Hopkins assumed that responsibility, and the logistics of linking other IRBs and applications was greatly eased by the SmartIRB infrastructure (smartirb. org). This obviated the need for each data-contributing organization to write, submit, and review its own IRB application, instead ceding this regulatory requirement to a central IRB.
In the progenitor publication, Health Data Sharing to Support Better Health Outcomes: Building a Foundation of Stakeholder Trust, barriers cited by researchers and research oversight leaders centered on pace, process, and price of accessing data; data latency; and variability of IRB requirements. The urgency of the pandemic spurred partners to organize quickly and address issues related to rapid availability of high-quality, curated data as well as research oversight needs. In the time since the N3C leaders were interviewed, N3C has grown to 31 domain teams and contains data from more than 9.3 million patients, including more than 3.2 million COVID-19 cases. Nearly 200 institutions have signed a Data Transfer Agreement, signaling their willingness to contribute data to the enclave once it is harmonized to the Common Data Model. While the data enclave is the centerpiece, the N3C architects also describe it as “a collaborative research community committed to the rapid generation and dissemination of knowledge for the public good, and to the advancement of COVID-19 science.”
FUTURE DIRECTIONS
Team science can offer an uneven value proposition, insofar as large, complex consortia can become unwieldy or bureaucratic and can present political or communication-related challenges. The N3C leaders noted that one of their goals was to show that building something of this magnitude can be done rapidly and without significant friction. Though it took a pandemic to attenuate many of the traditional “pain points” in research (variable interpretations of the same protocol by multiple IRBs, lag time to attain research-ready data), it has also shown that science can move much faster and have a more immediate impact on health care. It will be critical to hold the gains in this regard, preserving both efficiency and data quality in collaborative research without reverting to pre-pandemic “business as usual” practices that could slow overall progress. Results from N3C studies will inform how COVID-19 is treated in both the short- and long-term (Bennett et al., 2021). The progress of the N3C to date demonstrates that both the philosophical and technical milestones of this initiative can serve as a blueprint for accelerating research, as well as implementation of findings in clinical practice.
- CASE STUDY: THE NATIONAL COVID COHORT COLLABORATIVE (N3C) - Sharing Health DataCASE STUDY: THE NATIONAL COVID COHORT COLLABORATIVE (N3C) - Sharing Health Data
- SHARING HEALTH DATA: THE WHY, THE WILL, AND THE WAY FORWARD - Sharing Health Dat...SHARING HEALTH DATA: THE WHY, THE WILL, AND THE WAY FORWARD - Sharing Health Data
Your browsing activity is empty.
Activity recording is turned off.
See more...