U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Leavy MB, Schur C, Kassamali FQ, et al. Development of Harmonized Outcome Measures for Use in Patient Registries and Clinical Practice: Methods and Lessons Learned [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2019 Feb.

Cover of Development of Harmonized Outcome Measures for Use in Patient Registries and Clinical Practice: Methods and Lessons Learned

Development of Harmonized Outcome Measures for Use in Patient Registries and Clinical Practice: Methods and Lessons Learned [Internet].

Show details

Key Project Activities

Activities for each of the clinical areas were phased in over the project period. While the process followed was essentially the same for each of the clinical areas, some refinements were made over the course of implementation to address the specific needs and preferences of each workgroup and to reflect the project team’s improved understanding of the most effective approaches.

Selection of Clinical Areas

The first step in the process was to select the five clinical areas that would serve as the focus of the workgroups and for which standardized data libraries would be developed. In order to start the harmonization process quickly, AHRQ and the project team selected atrial fibrillation (AFib) as the initial clinical area at the project outset. AFib affects between 2.7 and 6.1 million people nationwide, resulting in more than 750,000 hospitalizations, contributing to 130,000 deaths and costing the U.S. more than $6 billion each year.10 Several registries focused on cardiovascular disease exist at the local, regional, and national level and are well-established as resources for quality improvement and research, although they collect different data. Additionally, several consensus-based efforts aimed at harmonization and standardization exist, providing a foundation for the work of this project. Furthermore, effective February 8, 2016, CMS issued a Coverage with Evidence Decision, which stated that percutaneous left atrial appendage (LAA) closure therapy would be covered for patients with non-valvular atrial fibrillation under specific conditions. Among these is the noteworthy requirement that the “patient is enrolled in, and the treating physician team is participating in a prospective national registry,”11 thus offering a timely opportunity to test the ability of the OMF to support harmonization of outcome measures within the context of a patient registry.

The selection process for the remaining four clinical areas was iterative, with the project team conducting background research on specific conditions, discussing potential areas with knowledgeable stakeholders, and presenting the results of these efforts to AHRQ staff leads and representatives from the National Library of Medicine (NLM) and the U.S. Food and Drug Administration (FDA) collaborating with AHRQ on this project. This process, described below, took place over the first eight months of the project, in a phased approach, and resulted in the selection of asthma, depression, non-small cell lung cancer (NSCLC), and lumbar spondylolisthesis (see Appendix 1 for more information about each clinical area).

Considerations for Selection

In selecting the remaining four clinical areas, the project team sought to identify a varied set of conditions in terms of the following dimensions:

  • Patient populations affected, including prevalence in a range of high priority populations, e.g., women, children, and minorities as well as persons living in rural and urban areas, and persons with both public and private health coverage;
  • Significant disease burden with respect to prevalence and spending; and
  • Multiple treatment modalities and care provided by multiple specialties.

Other criteria used to assess the suitability of clinical areas for selection included:

  • Number and maturity of existing registries collecting patient outcomes along with the extent of overlapping outcome measures within identified registries; and
  • Prior attempts at registry and/or outcome harmonization, as identified in the literature review completed in the base year of the contract and through discussions with stakeholders.

Compiling Information on Potential Clinical Areas

In order to inform the selection process, the project team compiled information on each of the clinical areas under consideration. While a more thorough search was undertaken once clinical areas were selected (as described in the next section of this report), initial work in identifying registries was conducted to inform the selection process. At this stage, the project team focused on the first two steps in the process described below, obtaining a preliminary count of the number of registries and examining their distribution in terms of number of enrolled patients, purpose, and source of funding.

Information on the factors of interest was assembled in a series of tables for presentation to AHRQ, with the goal of choosing clinical areas representing a broad group of populations and practice modalities to fully test the applicability and flexibility of the OMF. Overall, we identified and presented information on 21 clinical areas prior to final selections being made. (See Appendix 1 for a listing of the clinical areas considered and the relevant information compiled.)

For specific clinical areas, we engaged with experts in the field to learn about ongoing harmonization and standardization efforts and to gather perspectives on the utility of undertaking a new harmonization effort in specific areas. For example, based on work completed by OMERACT (Outcome Measures in Rheumatology) related to rheumatoid arthritis, we contacted the American College of Rheumatology; in consultation with members of their quality measures subcommittee, it was decided that the already extensive work in the area obviated the need for another effort.

During this period, the team also conducted two Open Door Forum (ODF) webinars presenting an overview of the project, identifying the first few clinical areas selected, and soliciting input on the overall process as well as on additional conditions to target. There were approximately 50 attendees across the two webinars; several organizations provided recommendations and were instrumental in final selections.

Identifying Registries Within Clinical Areas

Once the clinical areas were selected, the project team focused on identifying and reviewing existing and newly launched patient registries. The objective was to identify all registries in a specific clinical area collecting information on patient outcomes. The project team established the following inclusion and exclusion criteria to evaluate registries:

  • Currently collects data or is planning to begin collecting data within one year
  • Enrolls patients in the United States
  • Meets the following definition of a patient registry:
    • An organized system that uses observational study methods to collect uniform data (clinical and other) to evaluate specified outcomes for a population defined by a particular disease, condition, or exposure and that serves one or more pre-determined scientific, clinical, or policy purposes.2

Registries were excluded if they did not collect patient outcomes (e.g., registries designed solely to track vaccination status). This process is summarized in Table 1.

Table 1. Steps in registry identification process.

Table 1

Steps in registry identification process.

The finalized list of registries for each clinical area was used for the next step in the process, recruiting participants for the registry workgroup.

Engaging Registries and Other Stakeholders

The workgroups for each of the five clinical areas were developed with several goals related to size, composition of registry and stakeholder representatives, and diversity in expertise and perspectives. In particular, the project team aimed to have each workgroup total 20 to 25 members, with 10 to 15 members representing registries and five to 10 individuals from various Federal agencies, payers, EHR vendors, health system representatives, patient representatives, healthcare accreditation associations, provider associations and clinical societies, and pharmaceutical and device manufacturers. In addition to registry and stakeholder participants, the team recruited chairs or co-chairs and clinical consultants for each clinical area to help guide the workgroups.

The project team recruited registry workgroup members using the list of eligible registries described above to identify the principal investigators (PIs) or directors of the registries and contacted them using a standard email invitation and follow-up phone calls as needed. In each of the clinical areas, the list of registries was first prioritized internally, and the subset of registries deemed most relevant were targeted for outreach and recruitment. In general, the PI or a designee participated as the registry representative. Registry members contributed critical information including the definitions of outcome measures used in their registries, the data elements that comprise the outcome measures, and the data definitions for the various data elements used to calculate their outcome measures.

Stakeholder organizations were identified based on a combination of recommendations from AHRQ and independent searches of organizations working to improve, advocate for, or support research and practice of the clinical areas; patient representatives, in particular, were targeted for recruitment due to their being at the nexus of patient-centered care. The executive staff of these organizations were invited to join the workgroup or specify a designee. The project team focused on diversity of organization types and perspectives when recruiting stakeholder members. Stakeholders who agreed to participate attended the first and last workgroup meetings to provide their perspectives to ensure the harmonized measures were useful and applicable across the learning health system. Table 2 below lists the sample rationale for various stakeholder types participating in the harmonization efforts and how they use registry data within the learning health system.

Table 2. Stakeholder participation rationale.

Table 2

Stakeholder participation rationale.

The workgroup chairs helped shape session agendas, reviewed meeting materials, and helped moderate discussions during the workgroup meetings. Clinical consultants were selected based on their areas of expertise. For example, in NSCLC, both a thoracic surgeon and a medical oncologist were engaged as clinical consultants to ensure that different clinical perspectives were represented during the harmonization work. The clinical consultants helped review the outcome measures submitted by the participating registries and map them to the OMF. The consultants also provided input on outcome measures used in the minimum measure set and guided development and refinement of definitions and other materials to support the workgroups.

The numbers of registry and stakeholder participants for the five workgroups are shown in Table 3 below. There were between 12 and 15 registries represented in each workgroup, while the number of stakeholder organizations participating varied from 8 to 16. A listing of specific registries and organizations is provided in Appendix 2. In some cases, more than one representative from a registry or stakeholder organization participated in the workgroup meetings.

Table 3. Registry and stakeholder organization participation, by clinical area.

Table 3

Registry and stakeholder organization participation, by clinical area.

The registries participating in each clinical area workgroup represented a wide range of interests, including academic, industry (both pharmaceutical and device), Federal, and societies and associations, with purposes ranging from clinical to quality to patient experience to surveillance. Overall, most of the registries represented were focused in the United States, with a few international participants. Recruitment of international registries, while not a specific priority, was a challenge due to travel logistics and/or time zone differences for remote participation, particularly with NSCLC, which had many internationally-based registries. Appendix 2 lists all participating registries and stakeholders, including their representatives.

Convening Registry and Stakeholder Workgroups

The workgroups were convened for a series of five meetings to develop and refine a Minimum Measure Set (MMS), a set of measures intended to serve as the core set of recommended measures for registry data collection, and to discuss and harmonize outcome measure definitions. As shown in Table 4, the series of meetings generally followed the pattern of three virtual meetings (Meetings 1, 3, and 4) as well as two in-person meetings (Meetings 2 and 5) held in Washington, D.C.* Additionally, preparation and debrief conference calls were conducted for each of the five meetings with the chairs/co-chairs, clinical consultants, moderators, and project team to facilitate an efficient work process. Some of the activities for these conference calls included preparing relevant meeting materials, developing question prompts and reviewing key discussion topics, identifying potential informational gaps, and resolving any potential points of contention.

Table 4. Overview of workgroup meeting sequence.

Table 4

Overview of workgroup meeting sequence.

Between meetings, virtual activities were conducted using a combination of web-based surveys and a cloud-based collaboration tool (Codigital). The web-based survey tool was consistently used in all five clinical areas to prioritize measures for inclusion in the MMS and for collecting other types of feedback. In addition to these surveys, Codigital was used to refine measure definitions in all workgroups, except for depression where the workgroup conducted all of its harmonization activities during the course of their meetings. Codigital is a real time, cloud-based tool for groups to generate and refine ideas, where specific questions or topics are posted and individual, anonymous responses are submitted for the group to view, edit, and rank. Because meeting time is limited and definition reconciliation requires substantial thought, the Codigital tool provided an additional opportunity for individuals to deliberate on definitions and interact with each other in a continuous, iterative manner.

Identification and Categorization of Outcome Measures

Following the organization of the workgroups, the project team identified and collected outcome measures for categorization in the OMF and potential harmonization as part of the MMS. The first step in the harmonization process summarized (see Figure 2), was to review the outcome measures submitted by the registries participating in the workgroups.

This flow chart shows the harmonization methodology using a series of boxes and arrows. The first box states, 1. Collected outcomes measures from registries and other relevant efforts. The second box states, 2. Categorized measures using the OMF. The third box states, 3. Built proposed minimum measure set. The fourth box states, 4. Harmonized definitions for measures in the minimum set. The fifth box states, 5. Identified characteristics to support risk adjustment. The sixth and final box states, 6. Produced final standardized library.

Figure 2

Harmonization methodology overview.

Collection of Outcome Measures

One stipulation for registries participating in the workgroups was the requirement to share their outcome measures. Specifically, for each outcome measure, registry representatives were asked to provide: (1) the outcome measure definition, (2) the data elements that comprise the outcome measure, and (3) the data definitions for each of the data elements used to calculate the outcome measure. For example, some registry representatives provided the outcome measures as defined by a study protocol, along with the study case report form, and the accompanying data dictionary. In some cases, multiple discussions were necessary to obtain the necessary details. For example, in atrial fibrillation, one registry provided ‘major bleeding’ as an outcome measure. Upon follow-up, the registry provided a reference to the published consensus definition of ‘major bleeding’ used in the registry. However, during further discussion, the registry clarified that they modified the definition for feasibility purposes. Across the clinical areas, the project team found that different stakeholders had different levels of understanding of the purpose and requirements of the project, and individual conversations were most effective for obtaining the necessary documents.

While the member registries’ outcome measures served as a starting point to build the MMS, the project team conducted additional research to identify other relevant outcome measures and measure definitions that may not have been submitted by workgroup members. In some cases, measures were not used by any of the participating registries; in other cases, the project team identified a measure produced by another harmonization effort. This additional research allowed the project team to ensure as complete a set of outcome measures as possible for each workgroup, as well as to build upon other harmonization efforts.

Additional sources of measures were identified through discussions with registry representatives, clinical consultants, workgroup chairs, and through environmental scans of ClinicalTrials.gov, Google Scholar, peer-reviewed journals, the Core Outcome Measures in Effectiveness Trials (COMET) Initiative database,12 and other relevant organizations/associations. The additional sources ranged from consensus documents for broadly accepted definitions, findings from outcome measure focused workgroups (e.g., the Asthma Outcome Workgroup sponsored by the NIH13), established value-based care models (e.g., CMS oncology care model14), measures produced by the International Consortium for Health Outcomes Management (ICHOM)15, and endorsed quality measures (e.g., those listed in the National Quality Forum (NQF) database16).

Comparison of Outcome Measures Across Sources

The project team organized the outcome measures submitted by participating registries and those from other sources into a spreadsheet in order to (1) more easily compare measures across sources to identify similar or overlapping measures, and (2) sort the measures into the OMF categories. For each measure, the measure title, method of measurement (e.g., use of a validated instrument), timeframe, measure definition, reference and/or registry, and the numerator and denominator (when relevant) were compared. Additionally, the project team, sometimes with the help of the clinical consultant, sorted and placed the measures into the appropriate OMF categories (survival, clinical response, events of interest, patient reported, resource utilization, and experience of care). When measures could be classified into multiple categories, which most commonly occurred with patient-reported and clinical response outcomes, they were brought to the workgroups for resolution.

Within each clinical area, some registries collected similar or overlapping measures, for which harmonization was needed. However, many of the outcome measures collected through this effort were only captured in one or a small number of registries (as discussed further in the chapter below on Challenges and Lessons Learned). As a result, prioritization was necessary to focus the workgroup activities on the most clinically relevant and broadly applicable measures.

Prior to each workgroup’s second meeting, registry participants rated the patient and/or clinical relevance of each outcome measure concept and suggested any missing measure concepts that should be included in the MMS (this feedback was provided through a web-based survey designed by the project team). For the first clinical area (AFib), we used a 5-point Likert scale, and then transitioned to a 7-point Likert for the remaining workgroups, which helped provide more granularity to the results. This virtual activity yielded valuable information on which measures to include in the MMS; however, because of the subjectivity of the ratings, we did not rely solely on the results of the virtual activity. Instead, we used the ratings to frame the discussions of the MMS during subsequent workgroup meetings, so measures may have been included or dropped based on these discussions. Additionally, rather than asking about specific tools or scales, we asked workgroup members about measure concepts, which made it easier to focus discussions on the utility of a measure and avoid debate (at least initially) about specific measurement tools.

The project team, in collaboration with the clinical consultant and workgroup chairs, used the ratings to develop a proposed MMS that served as a starting point for discussion during each workgroup’s second meeting. During these discussions, we found that there were often measures that participants wanted to include but may not have been widely applicable. Therefore, the team began grouping the measures by “minimum measures” and “supplemental measures”; this was meant to lessen the burden on registries by reducing the over number of measures in the MMS and presenting the supplemental measures as “nice to haves” or optional.

Refining Minimum Measure Sets and Harmonization

Identifying the MMS and harmonizing the measures in the MMS was an iterative process that occurred over approximately five months and involved multiple workgroup meetings and virtual activities, using web-based surveys and a cloud-based collaboration tool (Codigital). The web-based survey tool was consistently used in all five clinical areas to prioritize measures for inclusion in the MMS, as discussed above, and for collecting other types of feedback. These topic-specific surveys between meetings were used to incorporate flexibility in the overall process, address problem areas as they arose during discussions, and capture workgroups input on issues requiring additional thought. For example, in the depression workgroup, a survey was used to solicit recommendations of depression-specific characteristics to be included in the OMF. In NSCLC, a survey was used to prioritize patient-reported domains and instruments. In addition to these surveys, Codigital was used by each workgroup to refine measure definitions, except for the depression workgroup as previously noted. For example, the asthma workgroup modified the exacerbation definition during the second meeting, but some questions remained following the meeting. A Codigital activity showing the proposed definition was sent to the workgroup after the second meeting and workgroup members edited the definition and added comments with new ideas; the revised definition and comments were discussed at the following meeting until the group reached consensus.

Defining Participant, Provider, and Disease Characteristics

The first OMF domain describes characteristics of the participant, disease, and provider that are important for fully defining an outcome measure. These characteristics may be used to define the relevant patient population or to support appropriate risk adjustment. For each of the five clinical areas, the project team developed an initial list of participant, disease, and provider characteristics, which were then reviewed and refined through workgroup meetings and/or virtual activities. For asthma and depression, a web-based survey was used to refine the proposed characteristics. The results of these virtual activities were discussed by the group during subsequent meetings. In lumbar spondylolisthesis, characteristics were discussed during the second meeting and no additional virtual activity was required. The use of meeting time versus a virtual activity to define characteristics depended on the time allocation of meetings and the degree to which there was a pre-existing consensus surrounding characteristics in each clinical area. Although the project team encouraged workgroup members to limit inclusion of characteristics to those for which there is evidence in the peer-reviewed literature documenting their correlation with outcomes, there was some variation across the groups in the level of evidence justifying inclusion.

Defining Treatments

The second domain of the OMF describes treatment types and treatment intent. In general, it is critical to consider treatment options for two main reasons. First, understanding what types of treatments are included informs the outcome measures included in the MMS, as the measures need to be relevant and related to those treatment options. Second, the intent of each individual treatment may vary. Understanding the rationale and intent of a selected treatment is critical for choosing the appropriate outcomes for assessment. Additionally, introducing new treatment modalities could necessitate revisions to the outcomes of interest in a given clinical area.

Data Element Development: Measure Definitions and Data Element Descriptions

As a final step in the harmonization process, clinical informaticists mapped the narrative definitions (generated by the workgroups) to standardized terminologies to produce a Library of Common Data Definitions. Standardizing the definitions of the components that make up the harmonized outcome measures is important so that users can understand the level of comparability between measures across different systems and studies.

Development of Standardized Terminologies

The registry and stakeholder workgroups focused on harmonizing the narrative definitions of outcome measures. While use of a harmonized narrative definition has the potential to improve the comparability of information collected in different registries, narrative definitions still allow for inconsistency in data collection, particularly when data are abstracted from existing systems, such as EHRs. To improve consistency and reduce the burden of implementation, narrative definitions produced by the workgroups were translated into standardized terminologies to facilitate capture within an EHR. The project team’s clinical informaticists worked with clinical experts to map the narrative definitions to standardized terminologies, such as ICD-10, SNOMED, and LOINC.

For each measure, the recommended reporting period, initial population for measurement, outcome-focused population, and data criteria and value sets were defined. EHR data often will not contain all the requisite components of an outcome definition that would allow for the computational confirmation of that outcome. The approach used for this project was to gather the clinician’s assertion of an outcome condition and as much supporting evidence as possible, so that even where the expression logic cannot computationally confirm an outcome, some structured evidence might still be available.

Relationships between events raise a challenge because relationships are often not directly asserted in an EHR. Thus, where possible, relationships have been inferred based on time stamps and intervals. Where this is not possible (e.g., cause of death), the logic requires an asserted relationship.

For each outcome, the following were defined:

  • An object representing the outcome condition itself: In many cases, the only structured data will be an assertion of an outcome, with all the supporting evidence being present in the narrative.
  • Fast Healthcare Interoperability Resources (FHIR) resources for evidence for the outcome: These include labs, diagnostic imaging, etc.
  • FHIR resources for additional relevant events: These might include procedures, encounters, etc.
  • Temporal aspects for all events: These allow for inferred relationships.

Leveraging Existing Resources

A key goal of this project was to leverage existing resources and build connections across initiatives, where possible. To support that goal, the existing common data elements and value sets were used whenever possible. Existing common data elements and value sets were identified through review of four sources, as shown in Table 5.

Table 5. Sources of existing common data elements and value sets.

Table 5

Sources of existing common data elements and value sets.

Each website has a specific, unique purpose, and data representations vary, so while there are some direct comparisons with similar use cases, there are also important differences both in terms of data structures and use cases. For example, eCQMs are based on the NQF’s Quality Data Model, as expressed as HL7 Quality Reporting Document Architecture (QRDA) templates, whereas this project is based on FHIR version 1.8.0 objects. In addition, VSAC does not currently provide intentionally-defined value sets, making comparison more difficult. For this project, comparisons were done based on enumerated lists. Results of the comparisons were documented in the narrative document for each Library of Common Data Definitions, and existing common data elements and value sets were used where appropriate.

Public Comment

As a final step in the overall harmonization process, the Libraries of Common Data Definitions for each clinical area were posted to the AHRQ website for four-week public comment periods. Public comments submitted through the AHRQ website were reviewed and the respective libraries revised as appropriate.

Footnotes

*

In two of the clinical areas, AFib and NSCLC, the second and third meetings had to be reversed due to logistical challenges so that Meeting 3 was in-person and Meeting 2 was virtual.