U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of Developing Methods to Link Patient Records across Data Sets That Preserve Patient Privacy

Developing Methods to Link Patient Records across Data Sets That Preserve Patient Privacy

, PharmD, MSCE, , PhD, , MPH, , SB, SM, , BS, , MD, , MPH, CPH, , , PhD, , MD, MPH, MS, , MSc, MBA, , PhD, , MD, MPH, , MD, and , MA.

Author Information and Affiliations

Structured Abstract

Background:

Patient-centered outcomes research (PCOR) relies on access to researchable health care data for a broad spectrum of patients. Payer-stakeholders, such as Anthem, can use longitudinal patient-level claims data from their general membership to identify patients for engagement in PCOR opportunities. We hypothesized that the contributions of payer-stakeholder member engagement to PCOR initiatives could expand patient-centered registry participation as evaluated through linkage with health plan data.

To design and test a model for improving PCOR engagement and data integration between patient-centered registry self-reported data and health plan administrative claims data, we developed privacy-preserving record linkage (PPRL) methods to determine overlapping membership in the National Patient-Centered Clinical Research Network (PCORnet), patient-powered research networks (PPRNs), and a health plan research network (HPRN). We queried the resulting overlapping data to identify and validate claims-based computable phenotypes and tested different outreach approaches to engage health plan members in patient-centered registry participation.

Our research advanced the methodology standards for data linkage within a distributed research network (DRN) as well as the PCORI methodology domains Describe data linkage plans and Requirements for the design and features of data networks. We developed methods that improved the capacity of DRNs by ensuring the appropriate privacy and confidentiality of network participants. In addition, we engaged a payer-stakeholder data repository, the HealthCore Integrated Research Environment, to conduct population-based patient identification of potential research participants.

Objective:

This research aimed to develop and test linkage and validation methods to identify potential participants for PCOR opportunities. We assessed the feasibility of linking and using PPRN membership and health plan data to confirm self-reported diagnosis (aim 1) and to contact health plan members to participate in patient-centered registries (aim 2).

Objectives in aim 1 were to test PPRL processes between PPRN members' and health plan enrollees' data and to measure disease-specific confirmation rates, a validation measure of health plan administrative data on patient-reported disease status, as indicated based on participation in a disease-specific patient-centered registry, for particular health conditions.

Aim 2 sought to quantify health plan members' registration rates in any of 4 disease-specific PPRNs following 2 common payer-initiated outreach methods for inviting member participation: mail and email.

We also conducted structured interviews with PPRN representatives to gauge their interest in health plan member outreach and to assess patient understanding of the need for data linkage with health plans to close gaps in evidence for the PPRN disease conditions of interest.

Methods:

Aim 1 developed PPRL methods for the anonymous linkage of overlapping members using secure HIPAA-compliant, 1-way, cryptographic hash functions. A cryptographic hash function is a mathematical algorithm that converts a string of text into an irreversible hash text string. For the linked data from the 4 PPRNs—the American BRCA Outcomes and Utilization of Testing PPRN, Arthritis Patient Partnership With Comparative Effectiveness Researchers, Multiple Sclerosis (MS) PPRN, and Vasculitis PPRN—we compared self-reported diagnoses by PPRN members with claims-based computable phenotypes to validate agreement between the 2 data sources on the knowledge of disease status.

Aim 2 identified enrolled members who met the diagnostic criteria for computable phenotypes but were not registered in a PPRN. We then randomly assigned members to 2 groups (outreach by mail or email) and quantified new PPRN registrants by employing PPRL methods.

In a separate study, we conducted structured interviews with representatives of PPRNs to understand their interest in and willingness to participate in relevant research sponsored by health plans.

Results

Aim 1:

Data identifiers for 21 616 PPRN members were converted through PPRL into hashed identifiers, and 4487 (21%) were linked to health plan data. Of the linked cohort, 3548 (16%) PPRN members were commercially insured, with health plan membership eligible for inclusion in the aim 1 analysis. Irrespective of enrollment duration, confirmation rates (the agreement between patient self-reported disease status as indicated through PPRN membership and health plan administrative record of disease status) were determined. The confirmation rates for breast or ovarian cancer, rheumatoid or psoriatic arthritis or psoriasis, MS, and vasculitis PPRNs were 72%, 50%, 75%, and 67%, respectively, which increased to 91%, 67%, 93%, and 80%, respectively, when limiting the cohort to those with continuous health plan enrollment ≥5 years.

Aim 2:

A total of 14 571 members were randomly assigned to each outreach method (email or regular mail). Invitations were sent to 13 834 (94.9%) mail group members and 10 205 (70.0%) email group members. A significantly larger proportion of the mail group (n = 78; 0.54%; 95% CI, 0.42%-0.67%) registered in PPRNs relative to the email group (n = 24; 0.16%; 95% CI, 0.11%-0.25%; P < 0.001). Registrants had more comorbidities and greater medical system use, especially emergency department visits (52.0% vs 42.5%; P = 0.053), compared with nonregistrants.

PPRN structured interviews:

We conducted 9 structured interviews with PPRN members to discuss health plan outreach and data linkage. This qualitative approach explored perceptions about HPRN collaboration with PPRNs, HPRN outreach, data linkage, and patient privacy and sought to identify opportunities for HPRNs to engage patients.

Conclusions:

We demonstrated the feasibility of linking and using PPRN membership and health plan data to confirm self-reported diagnosis. Our outreach initiative to invite health plan members to register in PPRNs was modestly effective. Responses to regular mail slightly out-performed email outreach. Our structured interviews identified opportunities for HPRNs to engage in PCOR.

Limitations:

Our data were limited to a commercially insured population. We were constrained in overlap identification by the accuracy of recorded PPRN member-level identifiers. In addition, patient views reflected the perceptions of a population already heavily engaged in PPRN research.

This report contains information that is proprietary to PCORI and HealthCore, Inc. HealthCore, Inc. is not liable or in any way responsible for any written and/or verbal changes to the report content as originally designed and presented hereinafter, without regard to origin.

Portions of this report are a pre-copyedited, author-produced version of an article accepted for publication in the Journal of the American Medical Informatics Association following peer review. The version of record [Agiro A, Chen X, Eshete B, et al. Data linkages between patient-powered research networks and health plans: a foundation for collaborative research. J Am Med Inform Assoc. 2019;26(7):594-602. doi: 10.1093/jamia/ocz012] is available online at https://doi.org/10.1093/jamia/ocz012.

This reprint is available in free-to-view only with the permission of the American Medical Informatics Association and Oxford University Press.

© The Author(s) 2019. All rights reserved; no part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of Oxford University Press and/or Oxford Publishing Limited (“OPL”) in respect of the underlying rights, or as expressly permitted by law.

For permissions, please email moc.puo@snoissimrep.slanruoj

Background

The National Patient-Centered Clinical Research Network (PCORnet) has 3 complementary components: (1) health system data in the form of electronic health records from clinical data research networks (CDRNs); (2) communities of patients who provide patient-contributed data from patient-powered research networks (PPRNs); and (3) health plan data in the form of administrative claims data from health plan research networks (HPRNs).1 Data from all 3 sources are necessary for some types of patient-centered comparative effectiveness research. Gaps between these disparate sources impede the effective collaboration in targeted data-sharing that is so important for patient-centered outcomes research (PCOR).

Our research focused on discovering ways to identify and close the gaps that impede PCOR through the use of data linkage between PCORnet PPRNs and HPRNs. The linkage methodology employed here specifically addressed the PCORI methodology standard Requirements for the design and features of data networks. To ensure the appropriate privacy and confidentiality of research participants, we built privacy-preserving record linkage (PPRL) between PPRNs and an HPRN within a distributed research network.

This research requires the use of computable phenotypes, which the National Institutes of Health Collaboratory Living Textbook of Pragmatic Clinical Trials defines as a clinical condition, characteristic, or set of clinical features that can be determined solely from the secondary data sources.1 In aim 1, we conducted data linkage to validate the computable phenotypes of PPRNs. This linkage provided a mechanism by which patient self-reported disease status could confirm the computable phenotype outcome in the health plan administrative data by connecting the patient-centered registry to the health plan data. The linkage methodology was applied in aim 2 to assess the outcomes from a plan to contact and engage health plan membership in PPRN research. The specific outcome of interest was the rate at which patients with a disease joined a disease-specific PPRN following direct outreach by the health plan, as assessed through data linkage (ie, matching record in a PPRN data set). In another important segment of this research, we conducted a small qualitative study consisting of semistructured telephone interviews with PPRN-identified patient leaders. The primary goal was to gain insights and assess the impact of health plan outreach on PPRN participation.

Aim 1

Develop PPRL methods for the anonymous linkage and computable phenotype confirmation.

Linking digital patient data from diverse sources to health plans' administrative claims data helps pinpoint opportunities to reach out to patients about participating in PCOR, such as drug safety and clinical effectiveness research.2-4 PPRNs are a valuable source of patient-generated information. Diagnosis code–based algorithms, called computable phenotypes, developed by PPRNs can be used to query and engage health plans' claims data to identify members with specific diagnoses who hold simultaneous membership in both PPRNs and health plans for research opportunities. This study is an example of such engagement between PPRNs and 14 health plans and has the potential to explore improvements in methodologies for integrating longitudinal payer claims data into the PCORnet environment. The objectives were (1) to develop and test a PPRL method for linking data from 4 disease-specific PPRNs with enrollee membership information from 14 health plans; and (2) to measure patient overlap—specifically, the overlap in membership among 4 PPRNs and a nationwide health plan—and confirmation rates showing that the overlapped population for the disease-specific PPRN patient registry was in agreement with the administrative diagnostic claims data from health plan records. We also assessed self-selection bias when patients join and participate in PPRN research. In addition, we analyzed the duration of uninterrupted health plan coverage needed to improve the confirmation rate, which is the agreement between claims-based diagnoses and PPRN self-reported diagnoses. The longer a member is enrolled in a health plan, the greater the opportunity for the health plan to have a record in administrative claims for the condition of interest. These approaches could contribute novel additions to the literature on PCOR from health plans.

Aim 2

Engage health plan members.

Employing linkage methodologies developed in aim 1, we conducted our aim 2 study to evaluate 2 routine payer-initiated outreach methods for inviting prospective candidates to participate in PPRNs: regular US Postal Service (USPS) mail (mail group) and internet-based email (email group). The main objective was to evaluate the primary outcome of this outreach effort: the number of health plan members who registered to participate in any of the 4 PPRNs of interest evaluated by data linkage.

PPRN Patient Structured Interview

Health plans, or payers, are uniquely positioned to perform population-based patient identification and address key methodologic and operational needs for PCOR by providing longitudinal data on patients.3 However, the potential role of payer stakeholders such as Anthem in supporting PCOR initiatives is poorly understood.

Our research sought to understand the sharing of longitudinal patient-level data across research networks. We focused on patients' PPRN involvement and their interactions with health plans to identify evidence that both health plan outreach and the linkage of health plan data with patient-reported data from PPRN data repositories would be valuable to patient-centered evidence development. Both of these data resources possess rich information about patients' conditions and disease experience. The PPRN data contain patient-reported outcomes and provide a way to engage PPRN members directly in research opportunities. The HPRN data contain longitudinal cross–health system diagnostic, procedure, and medication use vital to pragmatic patient follow-up. In addition, health plans have the ability to inform their HPRN membership about PCOR opportunities. Such linkages could be used to close gaps in disease evidence, improve longitudinal patient follow-up, and add to available methodologies to improve data integrity.

This qualitative research used 60-minute semistructured telephone interviews to explore PPRN-identified patients' roles and experiences in PPRNs and their potential involvement with the HPRNs. Our research also solicited their opinions on the need for longitudinal data follow-up to address research of mutual interest to PPRNs and HPRNs. Collaboration of HPRNs with PPRNs is necessary for establishing reliable models for future PPRNs and engaging with payer stakeholders for computable phenotype validation. It could also lead to the formation of research cohorts for longitudinal analysis beyond the current capabilities of the PCORnet environment.4 Finally, we explored best practices for identifying and expanding patient involvement in PPRNs. Health plan patients may otherwise be missed when relying on traditional outreach methods.

Multi-stakeholder engagement, the development of a data linkage methodology, and structured interviews with key patients from the PPRNs have led to successfully addressing methodology gaps in distributed research. Our work highlights the need for patient engagement in methods development for PPRL and health plan outreach for PCOR.

Participation of Patients and Other Stakeholders

We engaged patients as stakeholders in the development of our methodology research. From the outset of our proposal, we directly engaged patient-representative leadership from the initial set of 7 proposed PPRNs. We sought input on perceptions of data linkage, health plan outreach, and health plan engagement in patient-powered research.

To better understand the patient perspective, during the proposal stage we conferred with a close personal friend of the principal investigator, an individual who has muscular dystrophy. We discussed feelings and attitudes about health plan engagement in research and identified solutions to key barriers, such as avoiding interference with the patient-provider relationship and reinforcing respect for privacy and confidentiality in all engagements. We also directly engaged PPRN patient leadership in the proposal development stage. In our development process, we discussed forming the methodology to conduct data linkage with PPRN leadership and patient representatives. We assured them that the linkage of PPRN data with HPRN data would not affect patients' health plan membership. We held multiple one-on-one meetings with each PPRN, including their patient leadership. In addition, early in the award period we hosted a dinner discussion and project overview at the PCORI annual meeting with PPRN patients and leadership. This engagement proved invaluable in building relationships between PPRNs and an HPRN for PPRL activities and health plan member outreach.

In aim 2, we tested 2 patient engagement strategies: (1) invitation for PPRN participation with direct USPS mailers by the payer and (2) invitation for PPRN participation via email. This work contributed to the methodology of patient engagement by a payer stakeholder, a health plan that has patient-oriented relationships with its membership and routinely engages its membership in clinical care.

Finally, we conducted PPRN patient engagement workshops and structured interviews at 3 notable points: (1) before IRB approval to discuss comments from the PCORI and PCORnet communities on the proposed research; (2) following the validation of the computable phenotypes to address any concerns about conducting patient outreach; and (3) following the pragmatic study to assist in the evaluation of the engagement strategies to better understand the results across the heterogeneous PPRNs. Midway through the study, we conducted a consumer panel survey through our survey vendor, ORC International. The rationale was to try to understand the attitudes and likely responses of health care consumers who receive informational outreach from their health insurance provider. This panel consisted of prescreened respondents who were willing to participate in surveys. They were recruited according to standards published by the European Society for Opinion and Marketing Research and the Market Research Society. The survey participants were asked 6 questions about their sex and age, whether they have a medical condition (chosen from a predetermined list), their receptivity to their health insurance plan engaging them about their medical condition, and their willingness to participate in clinical research. In this portion of the study, we engaged with a representative from each PPRN to better understand how patients who participate in PPRN research view HPRN outreach for research purposes. We also addressed issues related to the need for interactions among HPRNs, CDRNs, and PPRNs in the performance of patient-oriented research. The specific objectives were to explore the following:

  • Patients' perceptions of and experiences with health plan research
  • The need for data linkage to close gaps in evidence
  • Patients' data linkage concerns
  • Opportunities for further patient participation in health plan research

Although this was essentially methodology research, we sought to assess patient engagement throughout the process given the concerns about PPRL and health plan outreach. Patient engagement directly enhanced our research and informed the development of PPRL.

Methods

We developed formal study protocols for each aim before conducting the study and specified plans for quantitative data analysis that correspond to major aims. At each stage, we engaged one-on-one with individuals who represented the population of interest, PPRN leadership, and our payer stakeholder to support the research. A manuscript for aim 1 has been published, a manuscript for aim 2 has been drafted, and we plan to disseminate the structured interview findings. Appendix A outlines our ongoing dissemination plan. In addition, as part of our ongoing work with PCORnet, we have broadly disseminated the findings of data linkage and PPRL methodologies across PCORnet. Reports were guided by formal reporting guidelines, the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network for reporting of qualitative research studies,5 and a recent publication on guidance for reporting data linkage.6

Aim 1

Develop PPRL methods for the anonymous linkage and computable phenotype confirmation.

Research Design

This descriptive study used the HealthCore Integrated Research Environment (HIRE) to identify overlapping members among 4 disease-specific PPRNs and 14 geographically dispersed commercial health plans. Overlapping membership means simultaneous membership in both a PPRN of interest and an Anthem health plan. This overlapped population provides a data resource for evaluating the agreement between patient-reported disease status, as indicated by PPRN participation, and disease status as derived from health plan administrative data. The HIRE is a repository of longitudinal patient-level administrative claims data for approximately 60 million health plan enrollees and is broadly representative of the US commercially insured population.7 The study design did not use any primary data collection. We specifically addressed the data linkage in distributed research through a secondary data analysis of membership in a PPRN with membership in an HPRN. We analyzed the overlap in membership by developing PPRL methods to address the confirmation rates of computable phenotypes in an administrative claims data environment. The research directly assessed the data quality, completeness, and plausibility across a distributed research network, PCORnet. The design specifically addressed building a linkage methodology to assess the adequacy of the health plan as a data source.

Research Conduct

This study received IRB approval; this aim was nonexperimental and received a determination of nonhuman participants research. Researchers accessed a limited, deidentified data set, and all data were handled in strict compliance with applicable privacy rules under HIPAA. We had initially proposed linkage with 7 PPRNs to address 7 computable phenotypes. Instead, we conducted linkages with 4 PPRNs but still evaluated 7 computable phenotypes. One PPRN wanted us to share our membership privacy-preserved identifiers, which would have prevented evaluation of the computable phenotype confirmation rate. Another PPRN did not have sufficient consent from the PPRN members to conduct PPRL.

Data Sources and Data Sets

HealthCore Integrated Research Environment

As a repository of longitudinal patient-level administrative claims data for approximately 60 million enrollees, the HIRE enables us to conduct a range of real-world research, including retrospective database studies, medical record review studies, and prospective site-based studies, as well as cross-sectional and longitudinal patient and provider surveys. This environment offers an innovative framework for linking patient survey data with administrative claims data. The HIRE has been used to validate numerous clinical coding algorithms.8-12 The breadth of data in the HIRE served as a reliable source of member information when developing the PPRL methodology and in validating PPRN computable phenotypes.

Patient-powered research networks

The study population consisted of members from the 4 disease-specific PPRNs, which are managed by patient-governance groups and are a part of PCORnet:

  • The American BRCA Outcomes and Utilization of Testing (ABOUT) Network (https://www.aboutnetwork.org/) includes men and women aged 18 years and older. Members may have a known genetic variant (within their family) or a personal or family history of breast, ovarian, or related cancers.13
  • ArthritisPower (https://arthritispower.creakyjoints.org) concentrates on musculoskeletal and inflammatory skin conditions (focused on arthritis or psoriasis) and operates a nationwide research registry network of patients diagnosed with rheumatoid arthritis (RA), psoriatic arthritis (PsA), and spondyloarthritis (eg, PsA and ankylosing spondylitis) as well as a variety of other rheumatic conditions.14
  • The iConquerMS PPRN (https://www.iconquerms.org) specializes in multiple sclerosis (MS) and is working to establish a community of 20 000 participants. People with MS and other stakeholders enroll on the network's portal, which facilitates patient collection of demographics, MS history, and PRO data in addition to ongoing interactions and communications with the network's members.15
  • The Vasculitis PPRN (V-PPRN; https://www.vpprn.org) has more than 2500 members enrolled in clinical studies investigating multiple types of vasculitis.16

We initially targeted 7 PPRNs to create a representative sample and ensure a diversity of PPRN types that covered rare and common disease as well as networks of patients and clinical researchers. Our goal was to develop a methodology broadly relevant across PPRNs, health plans, and potentially CDRNs. We were unable to work with NephCure Kidney Network (NKN) and ImproveCareNow because of insufficient consent language to allow for deidentified linkage. In addition, the Health eHeart Alliance wanted us to send them our eligible population hashed, which would have prevented us from determining the confirmation rate in our data environment (Anthem HIRE).

Analytic and Evaluative Approach

Linkage methodology

Our anonymous linkage methods were built on a secure, HIPAA-compliant hash algorithm (double-salted SHA-256 hash function) so that we could conduct PPRL between the HIRE and PPRN databases.17-20 A complete presentation of the mathematical algorithms behind a hash function is beyond the scope of this report. A cryptographic hash function is suitable for mapping data of arbitrary size to a bit string of a fixed size, which creates a hash (Figure 1). Note in the figure that even a small change, such as Haynes to Haines, results in a vastly different hash value. Encryption and hashing are 2 common ways to exchange data securely for anonymous linkages. Encryption is the process of encoding information so it can be accessed by an authorized entity using a password or security key (Figure 2). Hashing is a 1-way function that maps information to an anonymous identifier. Because the process is not reversible, hashing is highly secure. Furthermore, hashing was appropriate because the PPRNs and the health plan elected not to exchange encrypted fully identifiable patient information that could be reversed and inadvertently expose patient-identifying information during data transfer. In simple encryption, someone who obtains the password could obtain the direct text of the identifiable patient information. This cryptographic hash function and encryption of the hash value as data were transferred to prevent disclosure of personally identifiable information, because there was no way to reverse the hash and expose identifiers between the HPRN and PPRN.

Figure 1. Example of cryptographic hash function.

Figure 1

Example of cryptographic hash function.

Figure 2. Example of encryption.

Figure 2

Example of encryption.

The anonymous linkage of the HIRE and PPRN data networks is similar to an approach formulated by Weber et al that complied with the HIPAA minimum privacy policies and precluded the full exchange of identifiers such as Social Security number—both of which are required in other sophisticated linkage algorithms.21 Therefore, only minimal essential information was included: patients' whole first and whole last names, date of birth, and sex. Exact matches on whole first and whole last names, sex, and date of birth were used in a deterministic fashion to establish linkage. The study used 2 software implementations of the anonymous linkage algorithm: Microsoft Structured Query Language (SQL) server and the Java programming language, both easily implementable in the PPRN data environments.

The population of linked members could contain insured members with Medicaid coverage or members with state, federal, and local government employee coverage whose data could not be used for research. Therefore, our analysis for aim 1 is limited to commercially insured members for whom permission to use data for research is available. We cannot characterize such excluded members or report on their sample sizes.

Self-reported diagnoses by PPRN members (the denominator) were compared with claims-based computable phenotypes (the numerator) to generate confirmation rates (percentages) across varying durations of health plan coverage (any, or ≥5 years). The confirmation rate is an assessment of the agreement between patient self-report and the administrative claims diagnostic record. Only self-reported diagnosis was used because not all PPRNs collected other patient information (eg, care from the relevant specialists) or other types of data such as immunosuppressant medication use (eg, biologic therapy). Only a flag indicating a disease of interest was transmitted from the PPRN along with the anonymous hashed identifier. The PPRNs employ multiple data management strategies to organize their registry databases. The anonymous linkage between the HIRE and each PPRN enabled identification of PPRN-specific overlap in membership and validation of computable phenotypes against self-reported diagnoses by PPRN members. All patients' data in each of the 4 PPRNs were hashed, irrespective of geography or whether patients reported that they were commercially insured or any other features.

Computable Phenotypes

We obtained the computable phenotypes or coding algorithms directly from PPRN leadership. These algorithms were developed by the PPRNs, with patient and clinician engagement in the formation of the PPRNs. Both broad and strict confirmation rates for computable phenotypes were calculated for PPRN members who were successfully linked with administrative claims data and specific codes. The rationale for using both broad and strict definitions is to showcase that stringent logic is needed to accurately identify the patient population of interest from claims.

Broad definition computable phenotypes required at least 1 diagnosis regardless of the diagnosis in the primary diagnosis claim position, which was specific to the condition of interest for that PPRN. Diagnoses were based on medical claims from any treatment setting, including inpatient hospitalization, emergency department (ED) services, outpatient services, or office visits. Strict definition computable phenotypes relied on more stringent requirements to reduce misclassification and varied across the cohorts. In the ABOUT PPRN, strict computable phenotypes required at least 2 diagnoses in any position, as found in medical claims that were 30 days apart in the office visit setting.22 Strict computable phenotypes for members of the iConquerMS PPRN required at least 3 claims for MS diagnosis–related hospitalizations or MS diagnosis–related outpatient/ED visits in any diagnosis position or MS-related prescription fills in any combination that were no more than 365 days apart.23 In the ArthritisPower PPRN, strict computable phenotypes required at least 2 diagnoses in medical outpatient claims from a specialist, such as a dermatologist for psoriasis or a rheumatologist for other relevant conditions, and age at diagnosis.24 Strict computable phenotypes in the V-PPRN were constructed from a combination of diagnosis codes; physician specialty (ie, rheumatology, immunology, nephrology, otorhinolaryngology or pulmonology, cardiology or vascular surgery); or the use of immunosuppressant medications.25 Detailed computable phenotypes can be found in the supplemental material of the published manuscript. As an example, the computable phenotype for MS was simply the presence of an International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis code of 340 or International Statistical Classification of Diseases, Tenth Revision (ICD-10) code of G35.

Statistical analysis

Descriptive statistics were used to establish patient counts and evaluate demographic and clinical characteristics, including age, sex, region, comorbidities (measured by the Deyo/Charlson Comorbidity Index [DCCI]26) medical and pharmacy use, and costs during the health insurance coverage periods January 2006 through July 2017. Differences in demographic and clinical characteristics were compared using t test for continuous variables and χ2 test for categorical variables. The confirmation rate is simply a proportion. Therefore, we determined the exact 95% CIs for confirmation rates using binomial random variables. To determine the minimum duration of uninterrupted health plan coverage needed to attain a satisfactory confirmation rate of 70%, we conducted a Poisson regression with offsets.

Aim 2

Engage health plan members.

Research Design

In this descriptive study, health plan enrollee data were queried from the HIRE to identify members who met strict definition computable phenotypes between December 1, 2017, and February 28, 2018, but who were not simultaneously registered in any of the 4 disease-specific PPRNs of interest, identified in aim 1.

We tested 2 primary health plan patient outreach strategies: (1) invitation for PPRN participation by direct mail by the payer and (2) invitations by email. Health plans have 2 primary modes of member communication—USPS mail and email—so these options represent appropriate comparators relevant to the health plan decision-maker in determining how to engage membership in PPRN research opportunities. An example of the letter invitation is provided in Appendix B. The primary outcome of interest was registration in a disease-specific PPRN, defined as receipt of an electronically signed consent form. If we did not receive consent from a targeted patient after 3 months of the initial outreach, then lack of consent was considered to be the outcome of not enrolling in the PPRN. The study setting included the HIRE for member identification and the PPRN's web portals for obtaining electronic consent. The population eligible for engagement consisted of active Anthem health plan members with data curated in the HIRE who were identified with the validated computable phenotype of interest and who were not already members of the PPRN. We randomly assigned members with both USPS addresses and email addresses to 1 of the outreach strategies and used HealthCore's standard operating procedures governing outreach to members on behalf of Anthem health plans. A vendor with longstanding relationships with HealthCore was engaged to operationalize the outreach and associated logistics.

Research conduct

This prospective randomized study received IRB approval. Researchers accessed a limited set of member data of direct relevance to the study's objectives. All research materials and processes complied with the applicable privacy policies specified in HIPAA. We modified our approach for our initially proposed pragmatic evaluation of outreach to providers and patients because of an inability to directly engage providers from Anthem-specific health plans on specific conditions of interest. In addition, based on our membership outreach efforts with the PCORI-funded ADAPTABLE (Aspirin Dosing: A Patient-centric Trial Assessing Benefits and Long-term Effectiveness) trial, we ensured that we maximized engagement by increasing the outreach sample size to include all PPRN-eligible members who had both email and USPS mail addresses. We then randomly assigned them to assess the effectiveness of the modality of health plan engagement.

Data Sources and Data Sets

PPRNs of interest

The 4 disease-specific PPRNs in this study, listed in aim 1, like all others, are managed by patient-governance groups and are a part of PCORnet within PCORI.

HealthCore Integrated Research Environment

We used the HIRE, described in aim 1. Notably, the HIRE contains more than 12 million currently active health plan members. This data source provides an ideal environment from which to characterize PPRN participation in the engagement strategies. The data set contained patient information that we used to compare the demographic and clinical characteristics of patients who participated as well as those who did not respond to the patient recruitment material. These administrative data were expected to be complete nondifferentially between those randomly assigned to the different patient recruitment modalities. The data were then used to characterize those who participated vs those who did not participate.

Analytical and Evaluative Approach

Engagement plan

The study setting had 2 main components: the HIRE data repository for patient identification and the PPRNs' web portals for recording electronic consent. The population eligible for randomization and outreach consisted of currently active members in the 14 HPRNs, curated in the HIRE database, who were identified based on the validated strict definition computable phenotype but who were not current members of 1 of the 4 PPRNs. All eligible members with USPS and email addresses were randomly (1:1) assigned to the mail or email group stratified by computable phenotype of interest for each PPRN. An experienced vendor with an established business relationship with HealthCore was contracted to plan the logistics and then to schedule and conduct the outreach activities to the identified health plan members using the assigned modality. The first outreach to members occurred on April 23, 2018, and a follow-up reminder was sent to members who did not respond 1 month later, on May 23, 2018. Representative content consisted of both the initial and reminder communications. New PPRN registrants were quantified for the period April 23, 2018, to August 31, 2018, using PPRL of overlapping health plan and PPRN members with secure, HIPAA-compliant, 1-way, cryptographic hash functions described in a previous publication.27

Outcome

The primary outcome of interest was registration in a PPRN, defined as the receipt of an electronically signed consent form for patient participation in 1 of the 4 participating PPRNs. If a patient's registration was not received and confirmed by August 31, 2018, then the patient was recorded as not having the primary outcome: registering in a PPRN.

Statistical analysis

We used descriptive statistics to establish member counts and to evaluate demographic and clinical characteristics, including age, sex, region, baseline comorbidities (measured by the DCCI),26 and baseline medical and pharmacy use during all available health insurance coverage periods (January 20016 through February 2018). Registration rates were calculated as simple proportions, and the exact 95% CIs for all registration rates were determined using binomial random variables. We used the intention-to-treat approach and analyzed our data as randomized; hence, we have included members whose email communication could not be delivered (emails bounced back). As a sensitivity analysis, we also used the as-treated approach to test the robustness of our result among those to whom the email/mail communication was delivered successfully. A conventional α = 0.05 was used to interpret statistical significance. Statistical analyses were performed using the SAS Enterprise Guide 7.1 (SAS Institute).

PPRN Patient Structured Interview

Research Design

In the structured interviews portion of this study, we engaged a representative from each PPRN. This qualitative research study consisted of semistructured telephone interviews conducted by a trained facilitator and using a discussion guide developed by HealthCore. The interviews were audio-recorded and transcribed for analysis purposes.

Research Conduct

This prospective, qualitative, semistructured interview study obtained IRB approval. Verbal informed consent was obtained during the semistructured telephone interviews. All research materials and processes complied with the applicable privacy policies specified in HIPAA. Our semistructured interviews adhered to PCORI standards for qualitative methods and documented the data to be collected, strategies for data collection (semistructured interviews), and how the data were collected in compliance with privacy and confidentiality practices.

Data Sources and Data Sets

Patient population

PPRN patient representatives who were willing to participate in research opportunities were selected for interviews. They were identified from 7 PPRNs (listed below), contacted about their study participation by their PPRN leadership, and asked to contact HealthCore's survey vendor to schedule an interview. This group was selected as a purposive sample to ensure a depth of experiences reflective across the PPRNs of interest. The PPRN representatives were experienced research participants who served in patient leadership positions within PPRN research activities.

Participating PPRNs including the following:

  • ABOUT Network
  • ArthritisPower
  • DuchenneConnect Patient-Report Registry
  • Health eHeart Alliance
  • iConquerMS PPRN
  • NKN
  • V-PPRN
Patient outreach

The selected representatives were notified of the study and their participation through a prenotification mailed letter/email sent by their PPRN leadership. The prenotification letter/email informed the patient about the study, how and why the patient had been identified, and what participation entailed. The outreach included a phone number for the patient to contact HealthCore's survey vendor to schedule the 60-minute telephone interview and a hyperlink for contacting HealthCore to obtain more information about the study. Participants did not receive direct compensation for completing an interview because they were already compensated for their general participation in research opportunities as part of their PPRN membership.

The study specifically targeted patients from PPRNs who were willing to participate in a 60-minute telephone interview. A maximum of up to 10 telephone interview sessions (60 minutes each) was planned.

Analytical and Evaluative Approach

Semistructured interview process

HealthCore's preferred survey vendor managed the data-collection activities. The vendor was provided with the minimum amount of patient health information or other identifying information necessary to execute the study. The survey vendor's acumen, in addition to our internal expertise in qualitative research, embodied the trustworthiness and credibility for qualitative research. To ensure accuracy and completeness, the interviews were audio-recorded and transcribed. Judith Stephenson, a principal scientist at HealthCore with more than 50 years of qualitative and quantitative research experience, was engaged in all aspects of the design, data analysis, and interpretation in this study.

The vendor scheduled all interviews following calls that patient representatives made in response to the prenotification letter or email they received from their PPRN leadership. Patients who opted not to participate were thanked for their time and were not contacted about the study again.

At the beginning of each interview, participants were administered a brief questionnaire that covered demographic characteristics. The facilitator then led the 60-minute discussion by asking open-ended questions from the interview discussion guide (Appendix C). The guide served as a road map and memory aid, with prompts for the facilitator. The same interview guide was used for all interviews. The interview discussions were audio-recorded; all participants were informed during the introduction of the audio recording. The vendor provided HealthCore with audio recordings and transcripts of the structured interview sessions as well as a summary report of interview findings.

Analysis and evaluation

The audio recordings were transcribed, and a descriptive thematic content analysis using standard qualitative methods was conducted. The qualitative analysis pooled responses for each question or topic from all discussions to categorize results and identify common themes. This work resulted in a descriptive content analysis of participants' responses that we used to develop the major themes and supportive quotes and to create a text summary analysis.28,29

Results

Our main methodology development successfully implemented PPRL in both aims. This record linkage provided the opportunity to evaluate the confirmation rates of computable phenotypes across 4 PPRNs (aim 1) and evaluate a pragmatic trial of health plan member engagement in 4 PPRNs (aim 2). Finally, we evaluated patient perceptions of data linkage and health plan engagement in patient-powered research.

Aim 1

Develop PPRL methods for the anonymous linkage and computable phenotype confirmation.

PPRN and HIRE Matching

Because informed consent to allow data linkages is mandatory for membership in ArthritisPower and V-PPRN, all member data in these 2 PPRNs were available for linkage. In contrast, 60% and 87% of iConquerMS and ABOUT Network member data, respectively, were available for linkage because membership in these 2 PPRNs is not tied to mandatory informed consent. At the time of this analysis, data for 21 616 PPRN members were available to be hashed, including 5665 members from ABOUT Network, 11 343 members from ArthritisPower, 2509 members from iConquerMS, and 2099 members from V-PPRN. Of these, 4487 (21%) of the members were linked to the 14 health plans, including 1435 (25%) ABOUT members, 2166 (19%) ArthritisPower members, 543 (22%) iConquerMS members, and 343 (16%) V-PPRN members. A total of 3546 (16%) PPRN members were commercially insured and had at least 1 day of medical coverage (Table 1). A total of 684 (3%) of all PPRN members had at least 5 years of uninterrupted medical insurance enrollment.

Table 1. Patient Counts for PPRN-HIRE Matching.

Table 1

Patient Counts for PPRN-HIRE Matching.

Patient Characteristics

Compared with the reference group—that is, health plan members who were not linked with PPRNs but who met the broad definition computable phenotype (Table 2)—PPRN members linked to health plans were younger (mean [SD] age, 56 years [16.5] vs 48 years [11.6]); more likely to be female (76% vs 92%); and less likely to reside in the northeastern United States (23% vs 18%, respectively; P < 0.001). In general, smaller proportions of PPRN members linked to health plans had more comorbid conditions and smaller proportions had more medical and pharmacy use than the aforementioned reference group (ie, health plan members who were not linked with PPRNs but who met the broad definition. These apparent differences suggest some bias in patient selection when joining PPRNs.

Table 2. Summary of PPRN Member Characteristics.

Table 2

Summary of PPRN Member Characteristics.

Relative to the reference group, PPRN members who met the claims-based broad definition computable phenotype after linkage to health plans were younger (mean [SD] age, 50 [11.2] years vs 56 [16.5] years); more likely to be female (76% vs 90%); and less likely to reside in the northeastern United States (23% vs 20%, respectively; P < 0.001). These PPRN members were less likely to have more comorbid conditions and had higher pharmacy use vs the reference group. However, they had similar levels of medical use when compared with the reference group. These apparent differences suggest that research participants are systematically different from nonparticipants because only PPRN members who meet the broad definition computable phenotype are eligible for collaborative research with health plans.

The average duration of uninterrupted health plan coverage was 3 years (SD, 2.9 years) for PPRN members linked with health plans. Our Poisson regression analysis showed that the duration of uninterrupted health plan coverage needed to achieve a claims-based broad definition confirmation rate of 70% depended on how common or rare the condition was among commercially insured members, who were much younger compared with Medicare-insured members. In general, a duration of 5 years of continuous health plan enrollment would be needed, although that figure could be as low as 3 years for MS.

Broad Definition Confirmation Rates

Irrespective of the duration of coverage, the confirmation rate for breast or ovarian cancer (ABOUT Network) was 72% (95% CI, 68%-76%). Confirmation rates increased with ≥5 years of longitudinal health plan coverage: For breast or ovarian cancer, the confirmation rate was 91% (95% CI, 82%-96%). The confirmation rate for breast cancer was 66% (95% CI, 61%-70%) and for ovarian cancer 68% (95% CI, 55%-79%) for any duration. For breast cancer only, the confirmation rate at 5 years increased to 90% (95% CI, 81%-96%). For ovarian cancer only, the confirmation rate increased to 100% (95% CI, 72%-100%). In the ArthritisPower PPRN, the confirmation rate for RA, PsA, or psoriasis for patients with any duration of health plan enrollment (ie, no minimum duration of coverage) was 50% (95% CI, 49%-53%). For RA, the confirmation rate was 52% (95% CI, 48%-56%). The confirmation rate was 52% (95% CI, 43%-60%) for PsA and 47% (95% CI, 38%-55%) for psoriasis, as shown in Table 3.

Table 3. Confirmation Rates of Claims-based Diagnosis as a Percentage of Self-reported Diagnosis From PPRN Members Regardless of Duration of Insurance Coverage.

Table 3

Confirmation Rates of Claims-based Diagnosis as a Percentage of Self-reported Diagnosis From PPRN Members Regardless of Duration of Insurance Coverage.

For members with ≥5 years of health plan enrollment, self-reported diagnoses as indicated through participation in disease-specific patient-centered registries for RA, PsA, or psoriasis were confirmed in claims at 67% (95% CI, 60%-73%). The confirmation rates for RA or PsA increased to 67% (95% CI, 59%-74%) and 67% (95% CI, 47%-83%), respectively. The confirmation rates for psoriasis increased to 79% (95% CI, 54%-94%), as shown in Table 4. For MS (iConquerMS), the confirmation rate was 75% (95% CI, 71%-79%), for any duration, and at ≥5 years of health plan enrollment the rate for MS increased to 93% (95% CI, 87%-97%). The confirmation rate for vasculitis (V-PPRN) was 67% (95% CI, 59%-74%), and at ≥5 years of health plan enrollment, the rate increased to 80% (95% CI, 67%-90%).

Table 4. Confirmation Rates of Claims-based Diagnosis as a Percentage of Self-reported Diagnosis for PPRN Members with ≥5 Years of Uninterrupted Insurance Coverage.

Table 4

Confirmation Rates of Claims-based Diagnosis as a Percentage of Self-reported Diagnosis for PPRN Members with ≥5 Years of Uninterrupted Insurance Coverage.

Strict Definition Confirmation Rates

For any duration of health plan enrollment, 60% (95% CI, 55%-64%) of ABOUT Network members were strictly confirmed using the strict definition in claims. The strict definition confirmation rate for breast cancer in claims was 58% (95% CI, 53%-63%), and for ovarian cancer alone the strict definition confirmation rate was 63% (95% CI, 50%-75%). Confirmation rates increased with ≥5 years of longitudinal health plan coverage: For breast or ovarian cancer, the strict definition confirmation rate in claims was 90% (95% CI, 80%-96%). Using the strict definition, the confirmation rate for breast cancer alone was 89% (95% CI, 79%-95%); for ovarian cancer alone, the strict definition confirmation rate in claims was 91% (95% CI, 59%-100%). In the ArthritisPower PPRN, the strict definition confirmation rate for RA, PsA, or psoriasis was 35% (95% CI, 32%-38%) for members with any duration of health plan coverage. For RA, the strict definition confirmation rate was 37% (95% CI, 33%-41%); the strict definition confirmation rate for psoriasis was lower, at 16% (95% CI, 10%-23%), as shown in Table 3. The strict definition confirmation rate for RA, PsA, or psoriasis for members with ≥5 years of enrollment was 58% (95% CI, 51%-65%). The strict definition confirmation rates of RA or PsA increased to 59% (95% CI, 51%-67%) and 47% (95% CI, 28%-66%), respectively. The strict definition confirmation rate for psoriasis was 47% (95% CI, 25%-71%), as shown in Table 4. For MS (iConquerMS), the strict definition confirmation rate was 73% (95% CI, 68%-77%) for any duration, and at 5 years the rate for MS increased to 92% (95% CI, 86%-96%). The strict definition confirmation rate for vasculitis (V-PPRN) was 42% (95% CI, 35%-49%) for any duration; at 5 years, it increased to 51% (95% CI, 37%-65%).

Aim 2

Engage health plan members.

Study Population

A total of 14 571 health plan members were randomly assigned to the mail group and 14 574 to the email group. As the randomization was stratified by computable phenotype of interest for each PPRN, 6777 members for ABOUT Network, 6489 for ArthritisPower, 1180 for iConquerMS, and 125 for V-PPRN were randomly assigned to the mail group. Correspondingly, 6778 members for ABOUT Network, 6490 for ArthritisPower, 1180 for iConquerMS, and 126 for V-PPRN were randomly assigned to the email group. Overall, 728 (5%) of the randomly assigned members targeted to receive USPS mail and 823 (5.7%) targeted to receive email had no contact requests, as shown in Table 5. Undeliverable addresses were considerably fewer for the USPS mail group (9 [0.1%]) than for the email group (3546 [24.3%]). As a result, invitations were delivered to 13 834 (94.9%) members in the mail group and 10 205 (70.0%) members in the email group.

Table 5. Study Sample.

Table 5

Study Sample.

Health Plan Member Demographics and Clinical Characteristics

The mean (SD) ages were similar for the mail group (50.8 [10.5] years) and email group (50.7 [10.5] years; P = 0.494). In excess of two-thirds of the members were in the 45 to 64 years age group, with 10 073 (69.1%) members in the mail group and 10 010 (68.7%) in the email group. Women constituted 78.4% and 78.5% of the mail and email groups, respectively. Residential location was predominantly urban (80.8% in the mail group and 81.3% in the email group), with smaller proportions (18.3% and 17.9%, respectively) in rural areas (P = 0.27). Slightly larger proportions of members were located in the Midwest (37.6% and 38.4%) than in the West (30.4% and 30.0%) and the South (24.0% and 24.0%) in the mail and email groups, respectively (P = 0.37). The 2 groups had similar health status, as reflected by DCCI total and severity scores, and comparable medical use history, as shown in Table 6.

Table 6. Characteristics of Health Plan Members Invited to Join a PPRN.

Table 6

Characteristics of Health Plan Members Invited to Join a PPRN.

Health Plan Member Registration by Randomization Group

As shown in Table 7, a significantly larger proportion of members from the mail group (n = 78 [0.54%]; 95% CI, 0.42%-0.67%) registered in 1 of the PPRNs relative to the email group (n = 24 [0.16%]; 95% CI, 0.11%-0.25%; P < 0.0001). After excluding members with undelivered email or mail and excluding those added to the do-not-contact list, we observed that the mail group members (n = 78 [0.56%]; 95% CI, 0.45%-0.70%) remained more likely to register in 1 of the PPRNs than the email group members (n = 23 [0.23%]; 95% CI, 0.14%-0.34%; P < 0.0001).

Table 7. Rate of Health Plan Member Registration by Randomization Group.

Table 7

Rate of Health Plan Member Registration by Randomization Group.

Characteristics of PPRN Registrants vs Nonregistrants

The proportion of female registrants was greater than that of female nonregistrants (87.3% vs 78.4%, respectively; P = 0.03). Health plan members with more comorbidities were more likely to register in PPRNs, as indicated by greater mean (SD) DCCI scores for members who registered compared with those who did not (2.5 [2.79] vs 2.2 [2.61]; P = 0.076). PPRN registrants had marginally greater medical use compared with nonregistrants, including ED visits (52.0% vs 42.5%; P = 0.05), as shown in Table 8. Mail and email registrants had similar baseline characteristics (see the table included in Appendix D).

Table 8. Characteristics of Health Plan Members Who Registered in a PPRN vs Those Who Did Not.

Table 8

Characteristics of Health Plan Members Who Registered in a PPRN vs Those Who Did Not.

The heterogeneity across PPRNs is presented in Table 9. The mail group had statistically significantly higher registration rates than the email group in ABOUT Network and ArthritisPower. The numbers were too small to make conclusions for V-PPRN and iConquerMS PPRN.

Table 9. Engagement Heterogeneity Across PPRNs.

Table 9

Engagement Heterogeneity Across PPRNs.

PPRN Patient Structured Interview

A total of 9 PPRN patient representatives were contacted, and all 9 participated. The interviews each lasted 60 minutes and took place between September 14 and November 2, 2019. Of the 9 participants, 6 were female; all 9 were white, non-Hispanic; 8 had at least a bachelor's degree; 8 had been involved with their PPRN for at least 3 to 5 years; and 5 had a family history of the PPRN-related disease (see Table 10).

Table 10. PPRN Participant Demographic Characteristics.

Table 10

PPRN Participant Demographic Characteristics.

Four themes were identified from the interviews: (1) the value of the HPRNs and collaboration, (2) HPRN outreach, (3) data linkage and patient privacy, and (4) opportunities for HPRNs. Subthemes were also identified and illustrated with patient quotes.

Theme 1: Value of HPRNs and Collaboration

The extent of current HPRN collaboration and value varied by PPRN. Some PPRN participants were aware of PPRN collaboration, although others said that their PPRN had not participated in any collaboration. For some participants, perceptions of the overall value of collaboration varied and depended on their current role, personal participation, and exposure within the PPRN. Newer PPRN members and those who were no longer participating as board members saw less personal opportunity and were less sure of the importance of collaboration: “After you cycle off, you almost feel like you're not the hot toy anymore. You're kind of just shoved in the closet and that the kids forgot about you.”

The value of collaboration with health plans was seen as providing a broader view of the patient. Participants felt that health plans were able to access data not otherwise available, and putting all the information together allowed a broader view of the varying aspects of patient care: “You can really get a 360 view of what's happening to the patient that is so much richer than what you can get if you only get 1 of the 3 types of data.”

Health plans were also seen as providing value by helping with patient recruitment and acting as a resource for clinical trials. This work was viewed as especially important when the study population was hard to reach or where specific subsets of patients with rare diseases were needed: “The health plan has access to the data, as opposed to relying on patients' memory or recollections. It brings higher-quality data for study.” Another patient stated:

It's honestly a win-win for everybody. I think if the health plans are able to work with us to identify issues that are of interest to the community … as far as research and they're able to conduct that, not only does it inform the health plans, but it also serves to benefit the patient community, the people who are part of the PPRN… I think there is a tremendous amount of value there.

An important barrier to collaboration is lack of trust with HPRNs:

There is this huge concern and ongoing distrust of health plans in general, a dislike of the process that's put in place by health plans with regards to clinical care and coverage of services that then makes people less likely to trust them when their name is on research.

For some participants, this distrust extended to a concern that research data could be used to limit access and to establish protocols that allow health plans to further deny needed treatment and services. For other participants, knowing which health plan to work with and knowing how to be sure that the health plan's motives were in the patient's best interest was another barrier. Still others mentioned logistics as being a barrier, including lack of platform connectivity and the difficulties involved in collaborating with multiple payers when dealing with the complex coverage of specific disease states.

One participant said that plans provided through employers faced additional trust and privacy concerns because a breach of confidentiality could affect employment. Another participant added that funding was limited and that as contracts expired, additional collaborations were on hold until further funding was assured. In what is described as an uncertain political climate, participants were concerned with the future of protections for preexisting conditions and the Genetic Information Nondiscrimination Act of 2008 (GINA). Limited knowledge about how these protections work contributes to fears about how patients' lives can be affected.

Participants believe that barriers can be overcome by increasing trust with the health plan provider and by addressing issues mentioned earlier, such as assuring privacy and how data are used. Trust in the health plan also increases when health plans align with known and trusted entities, such as physicians or support groups: “We see time and time again when we survey people that it really is their health care provider who they trust and turn to with regard to participation in research.”

Participants often used the word transparency, including current lack of transparency with how data are used, how benefits are determined, and how treatment protocols are established. Better transparency also includes sharing more information about how data access is determined.

Participants felt that educating PPRNs about the benefits of collaborations was also important; such benefits included better access to patients' own histories or advancing research for future generations. Health plans are often the scapegoat when things are not working well within a system. It is important for health plans to demonstrate that research will be used to better the overall community of patients and “benefit society as a whole” rather than “research being used to deny them access to care.”

Theme 2: HPRN Outreach

Participants felt that most patients may be skeptical about outreach from their health plans. Personally, participants were agreeable to outreach, primarily because they work closely with organizations that collaborate regularly and understand the benefits of outreach. They suspected, though, that the average patient would have concerns and questions if approached by their health plan to participate in research and would ask, “Why does my health plan want me to do this?” Another patient stated the following:

I am more informed than many, many families or many individuals that are dealing with this type of a condition, and not to be arrogant about it, I guess I would look at it quite different than an individual who may be less informed.

Lack of a meaningful relationship with health plans is a key contributor to the skepticism. Often, relationships with health plans are short term, changing annually, and developing trust with a plan in a short amount of time is not perceived as a wise time investment because there is no perpetuity. Health plans are also known to put their own interests first by restricting access, increasing premiums, and denying coverage. These actions do not convey the idea of working together and benefitting study participants. As one participant stated, “I am self-employed, so I change health plans every year, so I have no emotional relationship to my insurance company.”

Participants agreed that there was no clear preference for the form of communication from a health plan, although email was mentioned slightly more often. For 1 participant, USPS mail was thought to be more secure and less likely to be a scam. However, other participants said that mail was mostly junk and would most likely end up in the trash before being looked at. Participants thought interacting with a health plan through social media was rare and limited interaction to populations that were comfortable with social media: “Social media may be the initial contact…. Raising awareness about what is happening.” Another said, “People are used to getting mail at home … but [it] needs to be generic so as not to violate privacy.”

Multiple forms of communication may be needed to get patients' attention. After getting their attention, educating them enough to encourage participation is another challenge. Initial awareness of an outreach effort could be through email, mail, or even social media but would serve only as an introduction to the topic. More importantly, the follow-up must convey credibility and importance and may even need to be an in-person meeting or phone call.

Participants noted an uptick in interest in their PPRN when outreach efforts came through a trusted source, such as an existing patient group, foundation, or speaker. When tracking how patients learn about potential PPRN opportunities or patient registries, some participants stated that mailings from insurers received more attention when the mailing referenced a specific disease:

I know we were very happy with the mailing that went out for the PPRN. We did see some increase in numbers. We saw some people that actually picked out that box, ‘How did you hear about us’ and they put in that they'd gotten a mailing from their insurer.

Theme 3: Data Linkage and Patient Privacy

Data linkage leads to concerns about privacy. According to participants, the value of sharing or linking data is viewed similar to the value of collaboration but with the added concern of a data breach. Some participants said that the level of concern varies depending on the information shared and that being able to control what is shared and understanding the specifics about how it is shared, how is it used, and how it is being protected is key:

There's a lot of concerns that the laws are going to change and they're going to deny me coverage or they're going to put me in a high-risk pool. Until we can stabilize health insurance coverage and guarantee that people will have coverage regardless, affordable coverage, I think we're still going to run up against some of these issues.

Data that contain specific treatment information; prescribing information; treatment algorithms; diagnoses; and test results, including genetic testing, are considered to be the most useful but also the most concerning from a privacy standpoint. Less specific, more subjective information, such as quality-of-life data, was considered less concerning and potentially less useful.

Some participants talked about the logistics of having different platforms communicating with each other because of the many issues involved with making sure that the systems used are able to merge data:

Getting different systems to talk to each other is trying to get Apple and Windows and Linux to all talk to each other. It's worse than a family reunion. Good luck. They were all built independently, they're all built with different protections, and you've got to break that protection to share the data, which makes you vulnerable to attack.

The prevalence of data breaches and hacking was worrisome. Lack of privacy in situations where privacy has been promised (eg, Facebook) caused participants to wonder whether they could eventually be identified. Their fears about the potential consequences of identification ranged from discrimination and being denied life insurance to being classified as high risk and being denied coverage.

However, because of the value in linking information, participants were willing to explore ideas to assure privacy and reduce concerns. One important idea includes making study consent forms easy to understand. A shorter, easier-to-read format would make critical points easier for study patients to understand as opposed to current consent forms described as “10-30 pages long” with language only for the “most sophisticated of lawyers.”

Granting study participants control over what is and is not shared was also important to some participants. Consent needs to be specific, and study participants must know exactly what and how information will be shared because sharing specific information may affect their decision to participate. The most important privacy issue for participants was assuring their privacy and protection through deidentifiers that are clearly explained to reassure participants that their privacy will be protected even in the event of a data breach: “I need to know for sure it's anonymous. I need to know about unprotected data breaches and that if the data is subpoenaed, it would not be shared.”

Theme 4: Opportunities for HPRNs

HPRNs can better serve patients by continuing to be involved. For study participants, being involved meant looking at things from a patient perspective and helping PPRNs identify important research issues: “We have questions that aren't being answered and that can probably be more easily answered if you involved us. We have questions you haven't thought of.”

Once these issues are identified, HPRNs can assist with the “the harmonization of care practices” by sharing their knowledge and awareness of programs, trials, and other resources that are available. HPRNs have resources for funding that are not accessible to PPRNs, giving them a great deal more data. Giving PPRNs access to these data can result in the PPRN's ability to share much-needed data with patients.

Only after it is shown that HPRN involvement facilitates PCOR and results in sharing information with patients that helps them make informed decisions and improve outcomes will the goal of increasing trust take a step forward:

We're really in this era where patient-focused or patient-centered research is sort of the catch phrase right now. That's why we have PPRNs. One of the big demands from that is that patients want to know what were the results, what happened, whether it be good or bad…. Depending on what type of research it was, if you can align with a PPRN or a patient advocacy group that represents that community, you've got to build an audience there to disseminate the results from that research. Ultimately, it could make a difference in health care decisions or outcomes.

Study participants were positive about having the opportunity to share their thoughts about their PPRN roles and hoped for continuing and additional research that could possibly reach out to more members of the PPRNs. However, some participants were concerned about the future of their PPRN because of lack of funding. Three of the participants reported that their PPRN currently faced an uncertain future because of lack of funding, which made them uncertain about what would happen to the member registries and projects in progress:

I wish that they could figure out a way structurally, financially, whatever, to ensure the longevity of the initial premise of this network, collaborative network, with all the different players, whether it be a CDRN or a PPRN. I think that we're all crucial to the formula, but especially the patients…. I just hear it during our weekly reports. I'm like, ‘Wait a second. What happened here? We really worked hard, and we got this registry going, and now they're going to hang us out to dry?’ I know it's changing landscapes and leadership and that type of thing, but that's just it—we're starting to feel like we're being abandoned, and that's upsetting. It could change again. Everything evolves.

Discussion

This is the first PCORnet study to develop and employ PPRL methods to identify overlapping membership across a distributed research network. This methodological advance was a critical demonstration for PCORnet in providing a foundation for future data linkage development. For this study, the data linkage was critical to assessing the agreement between patient self-reported disease status and administrative health plan claims and to assessing the impact of health plan member engagement in patient-centered registry research opportunities. The methods developed in this research help close evidence gaps and enhance PCORnet's ability to conduct data linkage. For these linkages, we validated secondary data sources—specifically, administrative claims data, which we used to identify eligible PPRN members for engagement in PPRN research. We directly engaged representatives of the populations of interest and other relevant stakeholders throughout the process. Such engagements included one-on-one discussions with PPRNs, an innovation meeting with PPRN leadership at a PCORI annual meeting, and our structured interviews with PPRN patient leadership.

Since the inception of this project, PCORnet has evolved and adopted PPRL technology. Takeaways from this project drove discussions with PCORnet on the performance of linkages across networks. These data linkage takeaways have informed additional data linkage activities across PCORnet, including data linkages between PCORnet's CDRNs and HPRNs. We have continued to collaborate with PPRNs in health plan member engagement, which led to funded work with the Inflammatory Bowel Disease Partners PPRN. The takeaways from our aim 2 work helped shape that outreach from a health plan.

Aim 1

Develop PPRL methods for the anonymous linkage and computable phenotype confirmation.

This study examined overlapping membership among 4 disease-specific PPRNs and 14 health plans. From more than 20 000 individual patient records obtained and hashed with PPRL, 16% to 25% were successfully linked to health plans across the PPRNs. We demonstrated that it was possible to securely link and confirm patient-generated data from a PPRN by using clinician-generated data (eg, diagnosis) from health plans' administrative claims. The open source, PPRL processes we used represent scalable, low-cost options for other PPRNs and registries and are devoid of key restrictions such as end-user licensing and other costly impositions. Restrictions such as end-user licensing, even for open source programs, may require time and resources for drafting and implementing agreements. This study took advantage of the scalability of this privacy-preserving approach because no exchange of sensitive data fields such as Social Security numbers was required, nor was there any exchange of protected health information. In addition, scalability was facilitated by the freedom to select from different programming languages, including the more established SQL and Java.

Considering PPRN membership the silver-standard disease label compared with claims data, our findings support the concept that simpler and broader phenotype definitions may be just as good as stricter ones (the gold standard being medical records data). This study also demonstrated that claims-derived confirmation rates increased in direct proportion to the duration of health plan enrollment. Improved confirmation of PPRN conditions was found in patients with longer uninterrupted insurance coverage with a health plan. For commercially insured members, ≥5 years of continuous health plan enrollment may be required to assess the adequate level of claims-based confirmation rates for self-reported diagnosis. Therefore, linking PPRN membership to health plan data may yield greater value if they also link to health plan databases with greater local market share and longer longitudinal coverage.

Nonetheless, the process of linking data across different networks is fraught with challenges. Governance,30 security, and privacy issues, although exceptionally challenging now, may become even more difficult with future cyberthreats.2,3 Lapses in any of these areas could seriously interfere with member trust, which is essential for PCOR initiatives. To understand and address the data linkage trust issue on the member side, we conducted in-depth interviews of PPRN participants. In addition, we intend to produce an educational video on the PPRL process and disseminate our results to PPRN members.

Currently, health plan enrollees do have access to their own claims data through health plan member portals. Similarly, participants of PPRNs have access to their self-reported data contributed to the PPRN. Linking PPRN data with health plan claims helps build a foundation for future research. The HIRE repository affords researchers an important strength—large local market penetration and longitudinal follow-up that may not be available in other large health plan data repositories. The researchers in this study accessed only a limited set of deidentified data, but information curated in the HIRE allows for full member identification and can be used to communicate with and recruit members for studies. It is ideally suited for pragmatic clinical trials.

Study Limitations

Members in the HIRE and the PPRNs were identified by diagnosis and procedure codes. Administrative claims may have coding inaccuracies, resulting in outcome misclassifications and over- or underestimation of study samples. Claims may have incomplete clinical data capture, which interferes with estimates. Deterministic matching, or matching that requires strict direct string text matching, is overly conservative and may not link in situations with spelling or naming differences (eg, Robert in the HIRE data but Bob in the PPRN data). Deterministic matching is not as flexible as more sophisticated and probabilistic matching algorithms, which come at the cost of reduced trust by members because they require disclosure and full exchange of identifiers, either multiple non-unique,31 and sensitive identifiers, including Social Security number. Because PPRNs rely on patient trust for sustainability, deterministic matching through PPRL is a reasonable approach.

This process did not have internal validation through manual or other review of matches using fully identifiable data exchange because that was not possible in a PPRL approach, which is not a reversible process. A hash value cannot be traced back to the original identifying text string of identifiers without a specific key. It will be instructive for future research efforts to explore fully identifiable record linkage and compare the results with PPRL. Such research, however, will be challenging given the legal, governance, data security, and trust hurdles, including the possible need to obtain informed consent from all members of health plans and PPRNs alike. Currently, health plans do not have the legal authorization to require enrollees to consent to fully identifiable data linkage. Further, it is doubtful that an IRB or privacy board has jurisdiction to authorize waiver of informed consent for the transfer of the fully identifiable information of 60 million health plan members to PPRNs for internal validation of data linkages.

All enrollees in the HIRE and hence the linked PPRN members have commercial health insurance coverage, including Medicare Advantage. As a result, health plan turnover affected enrollment rates were for many reasons, including members switching health plans. In addition, because the study population was commercially insured, these results may not be readily generalizable to Medicaid members, who may have differing levels of access to health care resources, have insufficient educational preparation to understand and participate in the offerings of PPRN-type initiatives, or are precluded from participation because of logistic or geographic considerations. The challenges of obtaining state-by-state government permissions to include Medicaid data for research are not limited to PCORnet but are also a feature of other, similar efforts, such as Sentinel.32

Data were hashed and linkage attempted regardless of what type of insurance the PPRN member reported. It is probable that restricting the linkable sample to members who self-reported coverage from 1 of the 14 health plans studied would have greatly improved the linkage rates beyond the 16% to 25% we observed. To maximize our potential to capture the population that linked with Anthem coverage, we linked regardless of knowledge of what insurance coverage a PPRN member had. Limiting this pool to those who self-reported Anthem coverage would have missed those who might not have self-reported an insurance status or who reported their insurance status incorrectly. Finally, although some self-reported PPRN conditions, such as breast cancer and vasculitis, are highly specific, the confirmation rate of self-reported RA, for example, in the absence of additional data is relatively low.33 For this reason, requiring additional information from members to improve the specificity of self-reported conditions (eg, for RA or PsA, active care from a rheumatologist plus use of disease-modifying antirheumatic drugs or biologic therapy) increases the specificity of the condition and likely would have increased the confirmation rates beyond the 67% we observed in claims data for this study.

Aim 2

Engage health plan members.

Recruitment of health plan members for studies in health care research can be challenging, and studies generally struggle to achieve sample sizes with sufficient power, despite large expenditures of time and resources.34 This study examined the registration rates of health plan members who were invited to register and participate in organizations that focus on health conditions directly applicable to them—participation that could lead to enrollment in clinical studies. Our results, not unlike those generally reported for recruitment in actual clinical trials, were modest at best.

The main outcome in this study, registration in PPRNs after a 3-month outreach period, was substantially lower than 1% for the mail group (0.54%) and the email group (0.16%). Results from other outreach activities appear consistent with our finding: Direct mail outperformed email outreach. In our PCORnet ADAPTABLE work, we achieved a 0.8% (1549/185 150) portal response rate, as measured by members who entered a trial code expressing interest in the study. Only 355 people signed consent (0.2%). As this was a pragmatic clinical trial that required an intervention, engagement might have been lower compared with simply engaging with a patient population eligible to join a PPRN research community. No previous data for the results of PPRN outreach by mail vs email appear to be readily available, but responses to mail outreach in other sectors appear to be stronger than in our study. In a 2015 report on response rates from 485 industry respondents contacted by direct mail, the Direct Marketing Association (DMA) reported a response rate of 3.5%, which is 7 times greater than the response in our study. For emails, however, the DMA reported only 0.1% response, which is notably less than responses in our study.35 To be sure, there is little basis for comparing our results with those of the DMA survey. The study samples are likely quite different, and the DMA reported response rates, whereas our results were for actual registration. Nonetheless, for the main comparison in our study, results from direct mail vs email, in both cases USPS mail outperformed email. These results suggest that although email is less costly, it seems to be less effective at eliciting responses than direct mail. Email can, however, be beneficial when used in combination with direct mail, especially if supported by telephone outreach, for which the DMA reports a response rate of 9% to 10%.35

One contributor to lower email results is the greater likelihood that emails are fraught with more undeliverable addresses than direct mail. This finding may reflect the ease of abandoning or changing an email address based on evolving preferences for internet service providers or email hosting services vs changing a mailing address, which typically necessitates a physical move. The findings from do-not-contact requests may be more a reflection of personal values and preferences than anything inherent in the outreach method, but we found that the rate of do-not-contact requests was similar for both direct mail and email.

As might be expected, health plan members who registered appeared to have more comorbidities based on DCCI scores in addition to the main targeted diseases of interest and greater use of medical services, especially ED visits. Also not unexpectedly, women were represented in larger proportions across the PPRNs and, likely, in the registration rates. This finding could reflect the greater prevalence of women in certain disease types, including MS, breast cancer, and ovarian cancer.

As reflected by the relatively poor registration rates in this study and for outreach efforts in general, 1 urgent goal among health researchers is to find ways to improve registration rates. In fact, more important than unrewarded efforts and costs are the inaccuracies that may result from assessing unrepresentative samples based on high levels of nonparticipants, who are essential for a full and complete understanding of key research questions. Edwards et al in a Cochrane Database review reported on USPS and email outreach for 481 and 32 eligible studies, respectively, to explain response rates.32 Response odds at least doubled with the addition of monetary incentives, recorded delivery (registered mail), and teasers on the envelope suggesting a benefit upon opening, among other approaches. The authors noted that the odds of response also increased with nonmonetary incentives, heterogeneity, personalized letters, handwritten addresses, postal stamped (not machine franked) return envelopes, and assurance of confidentiality. They noted that questions of a sensitive nature reduced the odds of response. For the 27 studies assessing email outreach, response odds increased more than 50% based on nonmonetary incentives, heterogeneity, shorter communications document, and inclusion of a statement that others had responded, among others. Responses improved by a third with the use of a lottery and immediate notification of the results.32 Aim 2 and future patient engagement efforts could benefit from the aforementioned features (eg, incentives, personalized contents) and clinician involvement to improve the success rate of member outreach.

To be sure, our findings have implications for outreach efforts for both PPRNs and health plans. Effective engagement of members is both critical and challenging, even daunting. Determining the most effective methods for member engagement requires a solid understanding of risk factors that affect patients' health and the non–health care factors that affect how they seek care and respond to health plans and providers. Active health plan members who are still in the workforce, as in this study, may be hardest to engage because they often perceive themselves at lower risk for a medical event, expect clear explanations of expected outcomes, understand how success is defined, and likely have the key impediment: They lack the time to respond to outreach letters and emails.36

Limitations

The results of aim 2 of this study might have been limited by incomplete outreach efforts, as outlined by Edwards et al,32 which could have resulted in efforts to engage members who were predisposed to decline invitations to participate. Part of the reason is that although members in the HIRE repository and across the PPRNs were identified through diagnosis and procedure codes, they are likely still in the workforce and have only limited time to complete outreach letters, let alone participate in PPRN-type activities. Dillman et al discuss additional methodological implications for internet, phone, and mail outreach in mixed-mode surveys.37 Development of these methodologies was beyond the scope of our methods project given our focus on the development of linkage methods. Inherent features of secondary data—administrative claims in this study—can be affected by coding inaccuracies, resulting in outcome misclassifications and inaccurate estimation of potential participation populations. Claims primarily facilitate billing and other financial transactions; when repurposed for research, they may have only incomplete data sets. This study employed deterministic matching, linkage methodologies, and hashing to identify overlapping members in PPRNs and health plans. Although highly effective, this approach has notable limitations that this team of authors discussed in a companion publication.27 In addition, the PPRNs might have been concurrently engaging in their own recruitment efforts, which could have affected the study results. Furthermore, the population in this study was commercially insured, precluding ready generalizability of these results to different member populations, such as those insured through Medicare and Medicaid.

PPRN Patient Structured Interview

The PPRN representatives who participated in this research were knowledgeable about their specific disease state. Their level of satisfaction with their PPRNs was high, although several were concerned about the future, as PPRN funding is ending for some and future funding is questionable. Most felt that there was value in PPRN and HPRN collaboration because it would enable a more holistic view of the patient and increase access to otherwise hard-to-reach patient populations.

Participants identified the biggest barriers to health plan collaboration as a distrust of health plan motives for gathering information, patients' short-term relationships with health plans (many change annually), a lack of transparency, and fear of restriction or reduced access to benefits as a result of the information gathered.

Strategies that can encourage health plan collaboration include aligning with trusted partners such as physicians or existing patient groups, being transparent about motives, sharing how information will be used, and reassuring patients that the collaboration results will not lessen or restrict access to benefits. Other factors that affect potential collaboration are the uncertain political climate and the unknown future of laws governing preexisting conditions and GINA.

Barriers to collaboration affect outreach efforts from health plans. Participants feel that addressing issues of trust, transparency, and overall depth of the relationship will increase the likelihood of a response from any type of outreach, although outreach is challenging through typical channels because it can be difficult to create attention by using email or regular mail. Initial outreach may need to be through social media or in person, perhaps even through existing patient groups or foundations to raise patient awareness, with direct follow-up through other channels. Also, the current climate encourages frequent changes in health plan providers, thus making it inadvisable to invest time in a relationship that will become irrelevant in the short term.

Disease-specific information from health plans is uncommon and unexpected for those with rare diseases. PPRN participants do not expect a health plan to provide the depth of information that would be useful to them. Overall, most agree that health plan access to experts is limited, and they expect the information provided to be basic, less patient specific, and not providing any new knowledge.

The goal of linking data is to have a better all-around picture of patient health. This view would be especially beneficial if patients had access to the linked data so that they could access their own medical history. This would also make data access easier for others who are researching patients within a particular disease state. However, privacy is also a concern. PPRNs believe that having the whole story in 1 place increases the risk of the data being used against patients to deny them treatment or coverage in the future.

There are limitations to the structured interview portion of our study. As a qualitative study, this research may be thought of as hypothesis generating and descriptive, and we sought to obtain a heterogeneity of perspectives across PPRNs. In addition, we interviewed only a small number of PPRN representatives. Furthermore, interviewees' memories could have been subject to recall bias. Future work will need to seek the overall opinions of health plan engagement in patient-oriented research from a broader PPRN patient population. The goal of the current work was to address patient leadership within the PPRNs as a key future component in disseminating the value of health plan contribution to evidence generation.

Overall, our structured interviews with patients and our methodology in PPRL for health plans and patient registries highlight a willingness to work together to improve PCOR. Health plans are increasingly engaging membership in these activities, as evidenced by PCORnet's ADAPTABLE trial.38

Conclusions

Aim 1

Develop PPRL methods for the anonymous linkage and computable phenotype confirmation.

This study demonstrated the ability to successfully use PPRL between health plan member–level data and PPRN membership data. Higher claims-based confirmation rates of PPRN conditions were found for self-reported diagnosis from members with longer uninterrupted insurance coverage in the same health plan. The strict definition computable phenotypes analyzed in this study can be used to identify health plan members who do not currently belong to PPRNs so that they can be invited to join PPRNs and engage in related research opportunities while potentially shortening study recruitment timelines and reducing research costs.

Aim 2

Engage health plan members.

Our goal in aim 2 of this study—to invite health plan enrollees diagnosed with specific diseases of interest to register with disease-based PPRNs—was modestly successful. The study showed that outreach by regular USPS mailing outperformed email outreach, confirming other studies of outreach by direct mail. This study relied on the ability to successfully link health plan member–level data and PPRN membership data while ensuring the safety and security of member information through anonymous linkage to capture registration rates. Additional research is needed to test different outreach modes, including telephone outreach and other, more direct ways of capturing registration in PPRNs.

PPRN Patient Structured Interview

Our structured interviews with PPRN participants helped us better understand how those who participate in PPRN research view outreach from an HPRN and provided unique insight into issues related to interactions among HPRNs, CDRNs, and PPRNs in the conduct of patient-oriented research within PCORnet. The themes identified highlight the value of the HPRNs and collaboration; the ability of HPRNs to engage in patient outreach; data linkage and patient privacy concerns; and opportunities for HPRNs to engage in PCOR.

References

1.
Richesson RL, Hammond WE, Nahm M, et al. Electronic Health Records-Based Phenotyping. NIH Collaboratory Living Textbook of Pragmatic Clinical Trials. Published June 27, 2014. Accessed April 22, 2020. https:​//rethinkingclinicaltrials​.org/resources​/ehr-phenotyping
2.
Fleurence RL, Beal AC, Sheridan SE, Johnson LB, Selby JV. Patient-powered research networks aim to improve patient care and health research. Health Aff (Millwood). 2014;33(7):1212-1219. [PubMed: 25006148]
3.
Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc. 2014;21(4):578-582. [PMC free article: PMC4078292] [PubMed: 24821743]
4.
West SL, Johnson W, Visscher W, Kluckman M, Qin Y, Larsen A. The challenges of linking health insurer claims with electronic medical records. Health Informatics J. 2014;20(1):22-34. [PubMed: 24550563]
5.
O'Brien BC, Harris IB, Beckman TJ, Reed DA, Cook DA. Standards for reporting qualitative research: a synthesis of recommendations. Acad Med. 2014;89(9):1245-1251. [PubMed: 24979285]
6.
Rivera DR, Gokhale MN, Reynolds MW, et al. Linking electronic health data in pharmacoepidemiology: appropriateness and feasibility. Pharmacoepidemiol Drug Saf. 2020;29(1):18-29. [PubMed: 31950565]
7.
Wasser T, Wu B, Ycas J, Tunceli O. Applying weighting methodologies to a commercial database to project US Census demographic data. Am J Accountable Care. Accessed June 20, 2017. http://www​.ajmc.com/journals​/ajac/2015/2015-vol3-n3​/applying-weighting-methodologies-to-a-commercial-database-to-project-us-census-demographic-data/p-2
8.
Bullano MF, Kamat S, Willey VJ, Barlas S, Watson DJ, Brenneman SK. Agreement between administrative claims and the medical record in identifying patients with a diagnosis of hypertension. Med Care. 2006;44(5):486-490. [PubMed: 16641668]
9.
Funch D, Holick C, Velentgas P, et al. Algorithms for identification of Guillain-Barre syndrome among adolescents in claims databases. Vaccine. 2013;31(16):2075-2079. [PubMed: 23474311]
10.
Lo Re V 3rd, Haynes K, Goldberg D, et al. Validity of diagnostic codes to identify cases of severe acute liver injury in the US Food and Drug Administration's Mini-Sentinel Distributed Database. Pharmacoepidemiol Drug Saf. 2013;22(8):861-872. [PMC free article: PMC4409951] [PubMed: 23801638]
11.
Wahl PM, Rodgers K, Schneeweiss S, et al. Validation of claims-based diagnostic and procedure codes for cardiovascular and gastrointestinal serious adverse events in a commercially-insured population. Pharmacoepidemiol Drug Saf. 2010;19(6):596-603. [PubMed: 20140892]
12.
Wahl PM, Terrell DR, George JN, et al. Validation of claims-based diagnostic codes for idiopathic thrombotic thrombocytopenic purpura in a commercially-insured population. Thromb Haemost. 2010;103(6):1203-1209. [PubMed: 20352159]
13.
ABOUT Patient-Powered Research Network (ABOUT Network). Accessed June 25, 2017. [Link no longer active] http://pcornet.org/patient-powered-research-networks/pprn8-university-of-south-florida/
14.
ArthritisPower (ARthritis Partnership with Comparative Effectiveness Researchers). Patient-Centered Outcomes Research Institute. Accessed April 22, 2020. https://www​.pcori.org​/research-results/2015​/ar-power-arthritis-partnership-comparative-effectiveness-researchers-pprn
15.
iConquerMS: a patient-powered research network. Patient-Centered Outcomes Research Institute. Accessed April 22, 2020. https://www​.pcori.org​/research-results/2015​/iconquerms-participant-powered-research-network
16.
What is the Vasculitis Patient-Powered Research Network (V-PPRN)? Rare Diseases Clinical Research Network. Accessed April 22, 2020. https://www​.rarediseasesnetwork​.org/cms/vcrc/Research/VPPRN
17.
Secure Hash Standard (SHS). National Institute of Standards and Technology. Published March 6, 2012. Accessed April 22, 2020. https://www​.nist.gov​/publications/secure-hash-standard-shs
18.
Kijsanayotin B, Speedie SM, Connelly DP. Linking patients' records across organizations while maintaining anonymity. AMIA Annu Symp Proc. 2007:1008. [PubMed: 18694107]
19.
Mushlin AB, Bell C, Brown C, et al. Mini-Sentinel: anonymous linking of distributed databases. Published September 24, 2013. Accessed April 22, 2020. https://www​.sentinelinitiative​.org/sentinel​/data/complementary-data-sources​/anonymous-linking-distributed-databases
20.
Brown AP, Borgs C, Randall SM, Schnell R. Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets. BMC Med Inform Decis Mak. 2017;17(1):83. [PMC free article: PMC5465525] [PubMed: 28595638]
21.
Weber SC, Lowe H, Das A, Ferris T. A simple heuristic for blindfolded record linkage. J Am Med Inform Assoc. 2012;19(e1):e157-e161. doi: 10.1136/amiajnl-2011-000329 [PMC free article: PMC3392854] [PubMed: 22298567] [CrossRef]
22.
Whyte JL, Engel-Nitz NM, Teitelbaum A, Gomez Rey G, Kallich JD. An evaluation of algorithms for identifying metastatic breast, lung, or colorectal cancer in administrative claims data. Med Care. 2015;53(7):e49-57. [PubMed: 23524464]
23.
Mercuri E, Muntoni F. Muscular dystrophies. Lancet. 2013;381(9869):845-860. [PubMed: 23465426]
24.
Kim SY, Servi A, Polinski JM, et al. Validation of rheumatoid arthritis diagnoses in health care utilization data. Arthritis Res Ther. 2011;13(1):R32. https://www​.ncbi.nlm​.nih.gov/pmc/articles/PMC3241376/ [PMC free article: PMC3241376] [PubMed: 21345216]
25.
Sreih AG, Annapureddy N, Springer J, et al. Development and validation of case-finding algorithms for the identification of patients with anti-neutrophil cytoplasmic antibody-associated vasculitis in large healthcare administrative databases. Pharmacoepidemiol Drug Saf. 2016;25(12):1368-1374. [PMC free article: PMC5135635] [PubMed: 27804171]
26.
Quan H, Sundararajan V, Halfon P, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130-1139. [PubMed: 16224307]
27.
Agiro A, Chen X, Eshete B, et al. Data linkages between patient-powered research networks and health plans: a foundation for collaborative research. J Am Med Inform Assoc. 2019;26(7):594-602. [PMC free article: PMC7647185] [PubMed: 30938759]
28.
Willis GB. Analysis of the Cognitive Interview in Questionnaire Design. Oxford University Press; 2015.
29.
Moser A, Korstjens I. Series: Practical guidance to qualitative research. Part 3: Sampling, data collection and analysis. Eur J Gen Pract. 2018;24(1):9-18. [PMC free article: PMC5774281] [PubMed: 29199486]
30.
Consortium PCP, Daugherty SE, Wahba S, Fleurence R. Patient-powered research networks: building capacity for conducting patient-centered clinical outcomes research. J Am Med Inform Assoc. 2014;21(4):583-586. [PMC free article: PMC4078295] [PubMed: 24821741]
31.
Curtis LH, Brown J, Platt R. Four health data networks illustrate the potential for a shared national multipurpose big-data network. Health Aff (Millwood). 2014;33(7):1178-1186. [PubMed: 25006144]
32.
Edwards PJ, Roberts I, Clarke MJ, et al. Methods to increase response to postal and electronic questionnaires. Cochrane Database Syst Rev. 2009(3):MR000008. doi: 10.1002/14651858.MR000008.pub4 [PMC free article: PMC8941848] [PubMed: 19588449] [CrossRef]
33.
Mikuls TR, Saag KG, Criswell LA, et al. Mortality risk associated with rheumatoid arthritis in a prospective cohort of older women: results from the Iowa Women's Health Study. Ann Rheum Dis. 2002;61(11):994-999. [PMC free article: PMC1753931] [PubMed: 12379522]
34.
So R, Shinohara K, Aoki T, Tsujimoto Y, Suganuma AM, Furukawa TA. Effect of recruitment methods on response rate in a web-based study for primary care physicians: factorial randomized controlled trial. J Med Internet Res. 2018;20(2):e28. doi: 10.2196/jmir.8561 [PMC free article: PMC5824098] [PubMed: 29422450] [CrossRef]
35.
Response rate, by select direct media. Marketing Charts. Accessed April 22, 2020. https://www​.marketingcharts​.com/featured-53645​/attachment/dma-response-rate-for-select-media-apr2015
36.
37.
Dillman DA, Smyth JD, Christian LM. Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method. 4th ed. John Wiley & Sons; 2014.
38.
Shi Q, Shambhu S, Marshall A, et al. Role of health plan administrative claims data in participant recruitment for pragmatic clinical trials: An Aspirin Dosing: A Patient-centric Trial Assessing Benefits and Long-term Effectiveness (ADAPTABLE) example. Clin Trials. 2020:1740774520902989. doi: 10.1177/1740774520902989 [PubMed: 32009464] [CrossRef]

Acknowledgments

The authors would like to thank Bernard B. Tulsi, senior medical writer at HealthCore, who provided writing and editorial support for this report. Jessee Young and David (Marc) Cram, senior developers at HealthCore, created and tested the hashing algorithm. We thank our HealthCore team (Mark Cziraky, Elaine Rose-Kennedy, Sonali Shambu, Andrea DeVries, Dianna Hayden, Zhengzheng Jiang, Michael Mack, Amanda Marshall, Mark Paullin, Rebecca Merkh, Gurvaneet Sahota, Tracey Quimbo, and Jennifer Ostertag-Stretch); our iConquerMS colleagues (William Tulskie, Leonid Kagan, Kenneth Buetow); and our ArthritisPower colleagues (Lang Chen, Shou Yang, Robert Matthews) for programming and administrative support. We also thank our patient representatives: Sue Friedman and Marleah Dean Kruzel from ABOUT Network, Kelly V. Clayton from ArthritisPower, Laura Kolaczkowski from iConquerMS PPRN, Debbe McCall from Health eHeart Alliance, Kent Bressler from NephCure, and George Casey from V-PPRN. We also thank Kelly Gavigan (data scientist) from Global Healthy Living Foundation.

This work was funded by the Patient-Centered Outcomes Research Institute contracts for “A model for improving patient engagement and data integration with PCORnet Patient-Powered Research Networks any payer stakeholders”: ME-1503-28785. ArthritisPower was supported through a PCORI award (PPRN-1306-04811). ABOUT Network (American BRCA Outcomes and Utilization of Testing) was supported through a PCORI award (PPRN-1306-04846). iConquerMS (Multiple Sclerosis) was supported through a PCORI award (PPRN-1306-04704). V-PPRN (Vasculitis Patient-Powered Research Network) was supported through a PCORI award (PPRN-1306-04758).

Research reported in this report was funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (#ME-1503-28785). Further information available at: https://www.pcori.org/research-results/2015/developing-methods-link-patient-records-across-data-sets-preserve-patient

Appendices

Appendix A.

Dissemination Plan (PDF, 105K)

Appendix B.

Example Member Mailer (PDF, 231K)

Appendix E.

Glossary of Terms (PDF, 102K)

Institution Receiving Award: HealthCore, Inc.
Original Project Title: A Model for Improving Patient Engagement and Data Integration with PCORnet Patient-Powered Research Networks and Payer Stakeholders
PCORI ID: ME-1503-28785

Suggested citation:

Haynes K, Agiro A, Chen X, et al. (2020). Developing Methods to Link Patient Records across Data Sets That Preserve Patient Privacy. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/06.2020.ME.150328785

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

Copyright © 2020. HealthCore, Inc. All Rights Reserved.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK593580PMID: 37535796DOI: 10.25302/06.2020.ME.150328785

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.5M)

Other titles in this collection

Related information

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...