U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cover of Testing Two Patient Surveys for Diagnosing Rare Genetic Conditions

Testing Two Patient Surveys for Diagnosing Rare Genetic Conditions

, MD, MPH and , PhD.

Author Information and Affiliations

Structured Abstract

Background:

Patients with rare diseases may be the best source of information on their phenotypes (their physical features). The premise of our study is that “self-phenotyping” will be an accurate source of patient data and will empower patients, which may be particularly beneficial to the rare disease population. ClinGen, an NIH-funded resource, developed a patient self-phenotyping survey, GenomeConnect (https://www.genomeconnect.org/), which asks patient-friendly questions that are mapped to a small subset of Human Phenotype Ontology (HPO) terms (https://hpo.jax.org/app/). An alternative method for self-phenotyping is for patients to generate HPO terms for their condition directly. Because most HPO terms are medical and often unfamiliar to patients, the Monarch Initiative translated approximately a third of the HPO terms into layperson language (the other terms were too clinical to have a lay-friendly translation). This layperson version of HPO is a much larger set of HPO terms than those mapped to the GenomeConnect survey. However, the layperson HPO has not yet been incorporated into patient-centered applications, and neither the GenomeConnect survey nor the layperson HPO has been validated to effectively inform diagnosis in patients with rare diseases.

Objectives:

The overall objective was to understand how best to support patients providing descriptions of their own phenotypes. The specific objectives were to (1) develop a layperson HPO-based tool for patients to use for self-phenotyping (Phenotypr); (2) validate the GenomeConnect-mapped HPO profiles and layperson HPO profiles computationally through the Monarch Initiative's algorithms (https://monarchinitiative.org/) to determine their diagnostic utility relative to gold-standard clinical profiles; and (3) test the GenomeConnect survey and Phenotypr tool in participants with diagnosed genetic diseases to determine which was better at identifying the clinical diagnosis and which survey participants preferred.

Methods:

To evaluate the diagnostic utility of the layperson HPO and GenomeConnect survey, we created derived disease profiles for every known Mendelian disease in the Monarch corpus (∼7600 diseases, encoded using the Mondo disease terminology). These disease profiles were subsets of the gold-standard Monarch profiles and were based on the GenomeConnect survey mappings to HPO and the layperson HPO subset of HPO. We compared each of the derived profiles against the corpus of Monarch gold-standard profiles and determined the degree to which the given disease was detected as the closest match.

We offered enrollment to participants aged ≥18 years, or the parent/guardian of a child aged <18 years, who were diagnosed with a genetic disease. We enrolled participants from the Boston Children's Hospital (BCH) genetics clinic, the BCH Down syndrome program, GenomeConnect, a 16p research registry, and the BCH Manton Center for Orphan Diseases Research. Participants were randomly assigned to the GenomeConnect survey, Phenotypr survey (the layperson HPO-based tool that we developed), or both. We compared the derived HPO phenotypic profile from each patient with the corresponding Phenotypr survey HPO profile and/or the GenomeConnect survey HPO profile (depending on which survey they completed, or if they completed both), as well as against the clinical-grade, gold-standard HPO profiles developed by Monarch. Multiple semantic similarity methods were used to determine which survey generated HPO terms that best matched the derived HPO profile and clinical gold-standard profiles. Our primary end point was a comparison of the similarity scores between the patient-derived HPO profiles from the GenomeConnect or Phenotypr survey and the simulated HPO profiles overall to see which survey, GenomeConnect or Phenotypr, yielded higher similarity scores on average and a tighter distribution compared with the simulated HPO profiles. We conducted qualitative interviews with a subset of participants to determine which survey participants preferred.

Results

  • Objective A (aim 1). We developed Phenotypr, a layperson HPO-based software application, which uses an autocompletion format (the user starts typing and a menu of options appears starting with the typed letters). We adapted the preexisting GenomeConnect survey, which uses a multiple-choice format, for the study.
  • Objective B (aim 1). For 7344 known Mendelian diseases in the Monarch corpus, the diagnostic power (ie, the capability of returning the correct diagnosis) of layperson HPO profiles was greater than that of the GenomeConnect HPO profiles. This was measured by simulating patient responses for each disease, running them through a disease classifier, and plotting a receiving operating characteristic (ROC) curve (eg, the true-positive rate vs the false-positive rate). The layperson subset had an area under the ROC curve (AUROC) of 0.95, and the GenomeConnect subset had an AUROC of 0.91, suggesting that the layperson subset can better distinguish the correct disease.
  • Objective C (aim 2). We offered enrollment to 1061 individuals and enrolled 282 (26.5%). Participants completing the GenomeConnect survey generally had a higher median similarity score to their derived disease profile than did those completing the Phenotypr survey; therefore, the GenomeConnect survey was more accurate. However, the Phenotypr survey had a tighter distribution of scores for respondents who completed both surveys and was therefore more precise. We conducted 17 qualitative interviews, and participants generally preferred the GenomeConnect multiple-choice format over the autocomplete Phenotypr format.

Conclusions:

Both the GenomeConnect and Phenotypr surveys were useful in obtaining phenotype data directly from patients. The GenomeConnect survey was more accurate, and the Phenotypr survey was more precise. The use of the layperson HPO, developed as Phenotypr (objective A), could effectively inform diagnosis if performed accurately by patients (objective B). Phenotypr has a large vocabulary of layperson HPO terms and uses an autocomplete method. GenomeConnect maps to tenfold-fewer HPO terms and uses a multiple-choice format. Future work will involve taking the strengths of each mode of survey to develop a combined approach to patient self-phenotyping (objective C).

Limitations:

The number of individuals with any given diagnosis was small, making comparisons between the 2 surveys for any given diagnosis challenging. In addition, the overall variety of types of diseases (eg, the heterogeneity of phenotypic profiles) was limited. These tools should be tested in larger and more-diverse populations and with patients with unknown diagnoses.

Background

Patients with an undiagnosed disease (a disease that is usually rare and for which the cause is unknown) spend a great deal of time on the web and in patient communities where they describe their signs and symptoms and try to find other patients with similar features that might provide a clue as to their diagnosis. These self-descriptions are often quite granular but are not expressed in a computer-recognizable form, and they are rarely used in clinical or informatics contexts. Patients with an undiagnosed disease also often go from clinician to clinician, each with their own subspecialty, who may not communicate with each other. Yet, a complete understanding of all aspects of the features that characterize a patients' condition (phenotyping) is vital to make a diagnosis and critical to inform the genetic analysis that may lead to the answer. The Human Phenotype Ontology (HPO) is a vocabulary of phenotypic abnormalities encountered in human disease (https://hpo.jax.org/app/) and was developed to facilitate “deep phenotyping,” whereby phenotypic findings (a phenotypic profile) are captured.1 The resulting “HPO profile” can be used to assist with identifying the most probable candidate disease, as well as with matching patients with similar phenotypes/genotypes as part of the Matchmaker Exchange,2,3 an online platform that connects researchers and clinicians to identify novel Mendelian disease genes using HPO terms (https://www.matchmakerexchange.org/).

Although the ease of analyzing genomic data has improved greatly,4 the collection of phenotypic data has not become more standardized or less expensive.5 One approach to phenotyping, that taken by the NIH Undiagnosed Diseases Program (UDP)6-8 and the expanded Undiagnosed Diseases Network (UDN), is to bring the patient to a medical center for comprehensive phenotyping by different subspecialists.6-8 The output of these evaluations includes a list of HPO terms that the clinical team puts together based on the findings from the evaluation and that characterize the patient's phenotypes. On the other hand, patients may be the only persons with all of the information about their phenotype. In addition, patients may have phenotypes that they know about but that do not become evident during clinical evaluations, either because clinicians do not ask about them or because the phenotypes are not easily observed in a clinical examination (eg, lack of production of tears).

GenomeConnect (https://www.genomeconnect.org/),9 conceived by ClinGen (an NIH-funded resource), developed a self-phenotyping survey whereby patients answer questions that describe their phenotypes, and the responses map to a small set of HPO terms. However, the GenomeConnect survey had not been validated to ensure that the HPO terms generated by the survey accurately reflect the patients' phenotype for diagnostic use, a vital step to demonstrate that patients' production of structured phenotypic data is accurate. An alternative method for self-phenotyping is for patients to generate HPO terms for their condition directly. Because most HPO terms are medical terms that are unfamiliar to patients, the Monarch Initiative10 (https://monarchinitiative.org/; Melissa Haendel, PhD, is a principal investigator [PI]) developed a layperson version of HPO, whereby a subset of the HPO was “translatable” into terms a layperson/patient would be more familiar with or more likely to use. The utility of the layperson version of HPO in self-phenotyping was unknown, as it had not been tested in patients.

The goal of this study was to make patient self-phenotyping as robust as possible, using the GenomeConnect survey or the layperson version of HPO, to generate HPO terms that accurately describe a patient's phenotype. The long-term objective is to integrate self-phenotyping into the evaluation of patients to facilitate clinical diagnostics and research. The potential impact of the research is that self-phenotyping may be a valuable addition to family history and diagnostic testing in identifying a genetic diagnosis for people who are not able to receive comprehensive phenotyping. It may save time and money, while empowering patients to take their care into their own hands. The specific aims of this PCORI-funded study were the following: (1A) to develop an application for patients to self-phenotype that uses layperson HPO terms; (1B) to computationally validate (ie, determine the diagnostic capability of) both the GenomeConnect survey and the layperson HPO software application (which we developed as Phenotypr); and (2) to validate patient-centered phenotyping in patients with diagnosed rare diseases.

Patient and Stakeholder Engagement

Stakeholders

We chose stakeholders who represented those impacted by the project. We chose 2 patient stakeholders who were advocates for individuals with rare and undiagnosed diseases; a genetics clinician and a researcher who would benefit from having patient self-phenotyping data; and a stakeholder from GenomeConnect who helped develop the GenomeConnect self-phenotyping survey that we used for the proposal.

Patient stakeholders:

  • Matt Might, PhD, director of the Hugh Kaul Personalized Medicine Institute at the University of Alabama Birmingham, president of NGLY1.org, and patient advocate/advisor for the UDN. Dr Might is the father of the first patient diagnosed with NGLY1 deficiency.
  • Ms Amy Clugston started Syndromes Without A Name USA (SWAN USA), an organization that offers support, information, and advice to families of children living with undiagnosed conditions.

Researcher and clinician:

  • Dr Alan Beggs, director of the Boston Children's Hospital (BCH) Manton Center for Orphan Disease Research, was the genetics researcher stakeholder.
  • Dr Olaf Bodamer, clinical chief of the BCH Division of Genetics and Genomics, was the clinical genetics stakeholder.

GenomeConnect:

  • Ms Erin Riggs, a board-certified genetic counselor and co-investigator of the ClinGen project, was a co-investigator of this proposal and a stakeholder representing GenomeConnect. Ms Riggs serves on the groups that developed the GenomeConnect Body Systems self-phenotyping survey evaluated in this study and oversees the GenomeConnect online community.

All stakeholder partners gave consent, through written communication, to list their names in this report.

Stakeholder Activities

Planning the Study

Ms Riggs, Dr Beggs, Dr Bodamer, and Dr Might were involved in planning the study and advised us on the study design and implementation.

Conducting the Study

Research decisions were made with equal input from each member of the stakeholder team. Particular attention was given to patient stakeholder input in areas that directly interface with patients or their families. We convened 2 in-person stakeholder meetings, 1 per year, and we had monthly stakeholder conference calls with the research team. Dr Might and Ms Clugston advised us on the implementation of the self-phenotyping surveys. Ms Riggs worked with us on validation of the GenomeConnect survey.

Perceived Impact on the Research

The patient stakeholders helped us refine the questions that we wanted to answer and the approaches to analyze the data to answer those questions. They advised us on how patients might use the surveys. Their input was important in making the surveys more user friendly. The clinician/researcher stakeholders helped us consider how the surveys would be implemented in the clinic and in research. The GenomeConnect stakeholder was instrumental in developing the online GenomeConnect survey and in relaying our suggestions to the GenomeConnect group.

NOTE: For the rest of the report, “we” refers to the study team and not the stakeholders.

Methods

Aim 1A Methods: Develop an Application for Patients to Self-phenotype That Uses Layperson HPO Terms

The Monarch Initiative, an international consortium to advance the diagnosis of rare diseases, developed a layperson version of HPO, but they had not developed an instrument for patients to use the layperson HPO for self-phenotyping. In the Innovation & Digital Health Accelerator at BCH, Dr Catherine Brownstein led the development of a layperson HPO online software survey application called Phenotypr for patient self-phenotyping. The GenomeConnect Body Systems survey (called the “GenomeConnect survey” in this report) was previously developed by ClinGen for patients to self-phenotype, but the online platform was not comparable with the Phenotypr platform. Given that a goal of our study was to compare Phenotypr with GenomeConnect in participants with genetic diseases to determine which survey was better at identifying the clinical diagnosis and which one the participants preferred (aim 2 of this study), it was necessary to develop an implementation of the GenomeConnect survey that was similar to Phenotypr. Both online surveys underwent iterative development in response to our stakeholders' suggestions and cognitive interviews. The output of both online surveys is the HPO terms for the participant's phenotype based on their responses to the survey.

Phenotypr Survey Development

Considerations for Phenotypr's design and development included the following:

  • Supporting autocomplete for layperson HPO terms. Autocompletion is a software function in which the user starts typing and a menu of options appears starting with the typed letters. The user can then choose which option best matches the word/phrase they were typing, making it easier to complete.
  • Partitioning the autocomplete options into anatomically specific sections.
  • Allowing users to choose layperson HPO terms or medical HPO terms by including both sets of terms as autocompletion menu options.
  • Including a place for users to add terms (eg, phenotypes, features) they may have forgotten in the anatomical sections.
  • Generating and returning to the participant the list of HPO terms that they could download.

GenomeConnect Survey Development

The GenomeConnect survey was being used by the GenomeConnect registry, but it had not been validated in patients. To compare the GenomeConnect survey with the Phenotypr application, we needed to provision a comparable computer interface for the GenomeConnect registry survey version, so we developed a customized survey presentation platform that was similar to the Phenotypr survey. We also collaborated with the GenomeConnect team to update a few questions and their HPO mappings to ensure precise mappings for each survey question.

Aim 1B Methods: Computationally Validate Both the GenomeConnect Survey and the Layperson HPO Application

Overview of Aim 1B

To computationally validate (ie, determine the diagnostic capability) the HPO terms generated by the GenomeConnect survey, we “answered” the GenomeConnect survey questions for each known Mendelian disease and recorded the derived HPO profiles that were generated (see Figure 1 in the “Deriving and Simulating Patient Profiles” section for an example). Similarly, we derived layperson HPO profiles for each known Mendelian disease in which the disease profiles were slimmed to profiles using only the terms in the layperson HPO.

Figure 1. Wilms Tumor Phenotype Comparison Example.

Figure 1

Wilms Tumor Phenotype Comparison Example.

HPO terms are organized hierarchically in different classes and subclasses.1 The HPO also provides a set of disease-to-phenotype annotations that are curated and ingested from online sources: Orphanet (a portal for rare diseases, https://www.orpha.net/consor/cgi-bin/index.php), OMIM (Online Mendelian Inheritance in Man, an online catalog of human genes and genetic disorders; https://omim.org/), and DECIPHER (DatabasE of genomIC variation and Phenotype in Humans using Ensembl Resources, an interactive web-based database incorporating tools to aid the interpretation of genomic variants; https://decipher.sanger.ac.uk/). Using the relational information between different HPO terms and disease-to-phenotype annotations, semantic similarity measures can be used to compare the HPO phenotypic profiles generated by patients through the GenomeConnect survey with those that would have been generated by clinicians against the Monarch gold-standard profiles in a clinical context. Semantic similarity and probabilistic measures leverage the node-based hierarchical structure of the ontology to make comparisons between graphs, such as different phenotypic profiles (which are nodes in a graph that represent a patient or a disease). The use of semantic similarity measures and probabilistic models in medical applications has increased over the past years, leading to a variety of algorithms to calculate semantic similarities between phenotypic profiles.11-14

The basis of this analysis was a comparison of derived phenotypic profiles with the Monarch gold-standard profiles. Our goal was to determine whether the layperson portion of the HPO was diagnostically informative via patient self-phenotyping. The Monarch gold-standard corpus is constructed by creating HPO phenotypic profiles for each disease and merging information from equivalent diseases from OMIM and Orphanet. We simulated patients for each of the 7344 known Mendelian diseases by completing the GenomeConnect survey (eg, using the mappings to HPO for the survey question answers) or using the layperson HPO directly. The result of the GenomeConnect survey and Phenotypr was a derived HPO profile of the disease. Because each of the 7344 known Mendelian diseases has a curated HPO profile developed by the Monarch Initiative, we determined, for each disease, whether the resulting phenotypic profiles from the 2 self-phenotyping methods (the GenomeConnect survey and Phenotypr) identify that given disease after comparing against all 7344 diseases curated with the gold-standard HPO profiles. The results from the analysis allowed us to improve the GenomeConnect survey and Phenotypr to ensure that the output HPO profiles from the 2 surveys were as informative as possible when used in downstream applications that use HPO terms to make clinical diagnoses.

We analyzed the clinical utility of the GenomeConnect and Phenotypr layperson-derived profiles using a probabilistic model for computational disease diagnosis. The goal of this analysis was to evaluate areas in which the layperson and GenomeConnect subsets could be enhanced to aid diagnostic utility. To this end, we proposed the use of a classifier described by Bauer et al15 and named the Bayesian Ontology Query Algorithm (BOQA), which we call “Bayes.” The classifier takes a list of HPO terms as input and then outputs a list of diseases and their probability distribution. The classifier assumes that the phenotypes are caused by 1 disease.

Summary of Analyses

Definition of key exposures

For each known Mendelian disease, we created derived “perfect” HPO profiles generated by completing the GenomeConnect survey based on the current existing mapping or by using the layperson HPO.

Definition of end points

We defined end points as the degree of similarity of the derived profiles from the 2 methods to the gold-standard disease phenotypic profiles within Monarch.

Deriving and Simulating Patient Profiles

For each survey, we created a derived disease phenotypic profile as described above. Figure 1 illustrates the derived profiles for the GenomeConnect survey and the Phenotypr survey for Wilms tumor (Mondo Disease Ontology [Mondo]:0006058). Terms fell into 1 of 2 categories:

  • Exactly matched (whether because it was lay-accessible [eg, “abdominal pain”]) or precisely translatable (eg, “increased urine output” translates to “polyuria”); or
  • Imprecisely matched (eg, “membranoproliferative glomerulonephritis” is a clinical term and is more specific than “kidney inflammation”)

For each survey, we also simulated 20 patients each for 7344 rare diseases, generating 146 880 simulated profiles per survey. To create each simulated patient, we started with the derived profile based on the available HPO terms in the survey by constraining the patient profile to HPO terms that had a layperson or GenomeConnect translation. We then computationally and randomly created synthetic profiles from the derived phenotypic profiles by omitting terms, adding imprecision, and adding noise (defined as terms not associated with the disease). Terms that were omitted were picked randomly by the computer. Terms that were randomly selected to become less precise were replaced by their parent term. For example, if a disease was annotated to the term “distal lower limb amyotrophy,” we would select the term “lower limb amyotrophy” for the simulated profile. Noise was added by selecting random phenotypes in the survey subset not annotated to the target disease. We chose the number of terms selected for each category based on the size of the derived profile. Ten percent of the terms were made less precise, 20% of the terms were omitted, and 23% of the terms in the final profile were noise. For the simulating profiles based on the gold-standard phenotypic profiles, we increased these parameters because the Phenotypr and GenomeConnect derivation process naturally added omissions and imprecision. For these gold-standard simulations, 30% of the terms were made less precise, 40% of the terms were omitted, and 23% of the terms in the final profile were noise. We ensured that all simulated profiles were unique.

Semantic Similarity Calculations

To run the Bayes algorithm, we used OWLSim3, a Java package that contains a collection of semantic similarity metrics and probabilistic approaches to phenotype similarity and disease classification (https://github.com/monarch-initiative/owlsim-v3). We began with full sets of derived phenotypic profiles for each instrument and the sets of simulated patients as described previously. We iterated over the phenotypic profiles as input for Bayes and stored the rank and probability of the correct disease match. The disease “rank” refers to the position of the reference disease in the prioritized list of candidate diseases suggested by the disease matching the derived or simulated phenotypic profiles against the gold-standard disease profiles. Diseases given the same score or probability were given the same rank. Using these data, we plotted a receiving operator characteristic (ROC) curve using the Scikit-learn and Matplotlib python packages.

Sensitivity Analysis

We hypothesized that the 2 sets of phenotypic profiles, those from the GenomeConnect and Phenotypr surveys, were biased toward certain categories of disease with more or fewer phenotypic features of certain types that are more readily observable. For example, musculoskeletal disorders may be disproportionately represented in the layperson HPO terms because these phenotypes are more easily observed and described in lay terms, whereas liver disease may be underrepresented because these phenotypes may not be as “visible” to the patient. To measure whether types of diseases were over- or underrepresented in each phenotypic subset of HPO terms, we performed an enrichment and depletion analysis. We collected 11 113 diseases and classes of disease that had at least 1 phenotype term (noting, as per above, that it was beyond the scope of this PCORI-funded study to evaluate each disease for precision or completeness of the gold-standard profiles). For each disease, we generated a contingency table of the number of phenotypes in the subset of the phenotypic profile associated with that disease, the number of phenotypes in the subset not associated with the disease, the number of phenotypes not in the subset associated with the disease, and the number of phenotypes not in the subset not associated with the disease. For each disease and disease category, we performed a Fisher exact test to characterize the distribution of phenotypic terms across our disease phenotypic profiles and adjusted the P values with Bonferroni correction.

In performing this analysis, we assumed that the gold-standard HPO phenotypic profiles are adequately and accurately annotated. To test this assumption, we proposed a random sampling of 100 diseases. For each disease, we reviewed the literature and web resources and determined whether the HPO profiles were accurate and complete. As a pilot, we worked with a physician familiar with neurofibromatosis type 1 (NF1) to perform a review of current resources on this phenotypic profile. We determined that web resources have varying degrees of accuracy and provenance, and we decided that the originally proposed portion of the proposal that aimed to fill the gap in phenotypic terms on disease gold-standard phenotypic profiles could not be completed with the resources allocated.

Data Sources and Data Sets

GenomeConnect HPO mapping

We obtained the mapping from GenomeConnect survey questions to HPO terms from ClinGen (see a sample survey at https://clinicalgenome.org/site/assets/files/3469/genomeconnect_health_survey_-_2019.pdf). This subset includes 215 HPO terms.

Layperson subset

We used the 2018-10-09 version of the HPO OWL file to obtain the subset of terms with at least 1 layperson synonym (http://purl.obolibrary.org/obo/hp/releases/2018-10-09/hp.owl). This version of the HPO contains 4757 terms with at least 1 layperson synonym.

HPO phenotype to disease associations

Gold-standard annotations were generated from the HPO phenotype annotation data file, HPO version 2018-10-09, and Mondo disease ontology version 2018-10-26. Diseases marked as equivalent in Mondo were merged, including their annotations. Diseases that did not have an equivalent OMIM identifier were removed. This resulted in 7344 diseases and 109 053 annotations.

Aim 2 Methods: Validate Patient-Centered Phenotyping in Patients With Diagnosed Rare Diseases

Overview of Aim 2

Before administering the GenomeConnect survey and the Phenotypr survey, we conducted cognitive interviews to test the understandability and ease of use of both surveys. Feedback was incorporated into the final revisions of the surveys. The GenomeConnect and Phenotypr surveys were evaluated in individuals, or parents/guardians of individuals, with diagnosed known rare genetic diseases. The resulting HPO profiles generated from each survey were compared with the derived phenotypic profiles created for each known disease. We also determined, for both methods, whether the participant-provided HPO profiles identified each participant's diagnosed disease based on the Monarch gold-standard expert-curated annotations, and we evaluated the limits of phenotypic variability within a disease that can still lead to the correct diagnosis in those who have a known diagnosis. Throughout the study, we conducted qualitative interviews to critique the tools, provide feedback, and find out whether participants think they might use their results, and if so, how.

Cognitive Interviews

Participant selection

We enrolled a population similar to that of the target population for evaluation of the GenomeConnect and Phenotypr surveys.

Inclusion criteria:

  • Aged ≥18 years
  • Diagnosed with a rare genetic disease OR the parent/guardian of a child aged <18 years diagnosed with a rare genetic disease
  • The individual with the rare genetic disease was living

Exclusion criteria:

  • Non-English speaking
  • Geneticist or had a background in HPO
  • Intellectual disability preventing completion of a survey
Recruitment

We asked our patient stakeholders to identify individuals who fit our inclusion/exclusion criteria who might be interested in participating. Potential participants contacted us directly. We sent them a consent form by email or paper mail and requested that they call the research coordinator once the consent form was received so that the research coordinator could review the consent form with them by phone. In the event that the participant sent back the consent form without completing a verbal consent call, the research coordinator called the participant and obtained verbal consent. The participants signed and returned the form by mail.

Interviews

The research coordinator conducted interviews by phone. The purpose of the interviews was to test the understandability and ease of use of both surveys. During the interview, each participant was asked to take the online survey, and then the research coordinator went back and asked them about each survey section, including what their thoughts were when answering, whether any issues were encountered, whether any questions were problematic, and how they thought the problems identified could be solved. We also addressed the language, comprehensibility, answer retrieval, and relevance of each question. The research coordinator took extensive notes. After the interview, we sent the interviewee an email with a $45 gift card as a token of appreciation.

Analytical and Evaluative Approach

For each survey, we conducted interviews until we saw themes that needed to be addressed and made changes in the survey to make it clearer and easier to use. We conducted subsequent interviews using the revised survey and continued the process until we did not detect more themes that needed to be addressed.

Assign Participants to Either the GenomeConnect Survey or the Phenotypr Survey

Research design and conduct
Participant selection

Total enrollment was expected to be 264 participants. The plan was for only the first 20 participants to complete both surveys, as we felt that it would be too burdensome to ask all participants to complete both surveys. The rest of the participants were randomly assigned, in a 1:1 ratio, to complete either the GenomeConnect or Phenotypr survey.

Initial enrollment plan (BCH genetics clinic)

Offer enrollment to 700 patients for an expected enrollment rate of 30%, or 210 participants.

  • The first 20 participants complete both surveys.
  • The remaining 190 participants complete 1 of the 2 surveys (95 participants per survey).
  • We targeted enrollment to participants with more-common genetic diseases in order to enroll 5 to 30 individuals with the same diagnosis.
Initial enrollment plan (Manton Center for Orphan Disease Research at BCH)

Offer enrollment to 90 patients for an expected enrollment rate of 60%, or 54 participants. All 54 were expected to complete 1 of the 2 surveys (27 participants per survey).

Changes to the protocol

Due to lower-than-expected enrollment, we made the following changes:

  • Offered enrollment to anyone seen in the genetics clinic over the past 1 to 3 years, starting with the past year and expanding to up to 2 years if needed (IRB Amendment 6, January 26, 2019).
  • Added enrollment sites: patient registries and clinics across BCH and participants enrolled in GenomeConnect and other external groups who had a rare disease and had agreed to be contacted about other research projects.

Final inclusion criteria:

  • Aged ≥18 years
  • The individual with the rare genetic disease was alive
  • Diagnosed with a rare genetic disease OR the parent/guardian of a child aged <18 years diagnosed with a rare genetic disease from:

    Manton Center – molecular disease was diagnosis confirmed.

    BCH clinics – genetic disease confirmed with the primary geneticist. (Confirmation of the diagnosis with genetic testing was not an inclusion criterion, as many genetic diagnoses are clinical and not confirmed with genetic testing.)

    Patient registries – had a genetic disease.

    GenomeConnect – had a genetic disease.

Exclusion criteria:

  • Non-English speaking
  • Geneticist by occupation or had a background in HPO
  • Intellectual disability preventing completion of a survey

Recruitment

Genetics clinic

We used i2b2 (Informatics for Integrating Biology and the Bedside) at BCH to determine the number of patients with each disease seen in the past 3 years. Patients were ascertained from the electronic health record (EHR) by the informatics team at BCH. We reviewed the notes from the genetics clinic and the laboratory studies to identify patients with a genetic diagnosis. We collected the patient's diagnosis, medical record number, address, email address (if available), and date of birth. Before offering enrollment, we emailed the primary geneticist to confirm the genetic diagnosis and to find out whether there were circumstances preventing us from contacting the patient. Our IRB required us to initially contact the family by paper mail to give them the opportunity to opt out of further contact about the study. We mailed a letter that explained the study. The letter contained our email address and phone contact information, and we asked them to contact us if they did not want to learn more about the study. We offered enrollment to those who did not contact us within 2 weeks of the letter being sent.

Process for offering enrollment

We sent the link to the survey by email or by paper mail (if we did not have an email address). Because the response rate was lower among those to whom we sent the link by paper mail, and we had email addresses for a majority of the potential participants, we changed our recruitment to contact only those with an available email address. The list of emails was provided to the Innovation & Digital Health Accelerator team staff, who sent emails with individualized links inviting participation. If the survey was not completed within a week, we sent a second request. Just before the end of enrollment, we sent a final (third) request to anyone who had not responded. We also posted flyers about the study in the genetics clinic.

Manton Center for Orphan Disease Research

The Manton Center genetic counselor formulated a list of Manton participants who had a genetic diagnosis and had agreed to be re-contacted by email. The research coordinator emailed a letter explaining the study and containing our email address and phone contact, and we asked them to contact us if they did not want to learn more about the study. We offered enrollment to those who did not contact us within 2 weeks of the letter being sent. Recruitment was conducted as described previously in the “Process for offering enrollment” subsection.

Patient registries and clinics across BCH divisions and departments

We partnered with the Down syndrome registry and the 16p registry. Collaborators overseeing the patient registry sent an email about the study to eligible participants and/or posted a notice about the study on their registry Facebook page; our contact email was included in the email and notice. Interested individuals contacted us by email, and we confirmed the diagnosis with them.

GenomeConnect/other external groups of interest

Ms Riggs of GenomeConnect and the genetic counselor for the Beggs Research Laboratory sent an informational email with our email contact information to individuals with a genetic diagnosis who were participants in GenomeConnect or the Beggs Laboratory research studies, respectively. Interested individuals contacted us by email, and we confirmed their diagnosis with them.

Survey Distribution

Initial plan

The goal was to compare the 2 surveys in (1) the same participant who completed both surveys; (2) participants with the same diagnosis, of whom some completed 1 survey and some completed the other; and (3) two participants with different diseases but who had enough similarities to make the phenotyping similar. To achieve this, patients were assigned to 1 of 2 methods, as follows:

  • The first group of potential enrollees would be offered both surveys until 20 who had completed both had enrolled. However, we found that the enrollment rates were similar (if not better) in those who were offered both surveys. In addition, after completing several qualitative interviews, we recognized the value of having the same person complete both surveys.
  • Subsequent enrollment from the genetics clinic targeted patients with common genetic diseases to enroll 5 to 30 individuals with the same diagnosis, allowing us to compare the 2 methods in participants with the same diagnosis. However, we found that we could not guarantee that within any given diagnosis there would be at least 1 person who completed the GenomeConnect survey and 1 who completed the Phenotypr survey. Also, after expanding our enrollment to anyone seen in the genetics clinic, we had a much larger group of diseases for which there was only 1 participant.
  • For participants with a newly identified or very rare disorder (the only one in the study with that condition), we planned to match participants based on the phenotypic similarities of their diagnosed diseases and create “aggregate” matched pairs. However, we realized that because obviously not everyone to whom we offered enrollment would enroll, if we created matched aggregate participant pairs before enrollment, we were limited, as we could not guarantee that both members of the prospectively created aggregate pair would enroll.
Changes to the recruitment plan

We significantly increased the number of individuals to whom we sent both surveys. We did not prospectively create aggregate pairs.

Data Collection

The GenomeConnect survey and the Phenotypr survey were administered via an external web interface. Each user was assigned a computer-generated unique ID. Once the data were properly entered, we captured them in our secure databases (for BCH, this was Research Electronic Capture [REDCap]) for analysis.16 More specifically, we created a CRON job on the server that ran a script (code) to export data from our local database into BCH REDCap. The script ran every morning at 8 am to export the data from the local database and enter them into the appropriate fields in REDCap. Before data export, a project was created in REDCap to reflect all the questions that were asked on the external web form. REDCap administrators added participants along with unique IDs and emails. These IDs were pulled into the local database to match the participants (to make the export process error free). REDCap provided a secure way to analyze data, create participant records, and manage users.

Incentives

After survey completion, we sent the participant an email with a gift card as a token of appreciation ($15 for those who completed 1 survey and $30 for those who completed 2 surveys).

Analytical and Evaluative Approach

Groups of respondents
  • Group 1. Participant completed both surveys. We alternated which survey was assigned first so that equal numbers were assigned the GenomeConnect survey and the Phenotypr survey first.
  • Group 2. Participants completed only 1 survey: there was at least 1 person with a given diagnosis who took the GenomeConnect survey and 1 who took the Phenotypr survey. Participants in the genetics clinic were purposely assigned within a diagnosis to each method in order to assign equal numbers to each method and were matched as much as possible by sex, race, ethnicity, and age.
  • Group 3. Participants who completed only 1 survey: there was only 1 person with the diagnosis OR everyone with a diagnosis, by chance, took the same survey even though we assigned participants within a diagnosis to each method.
  • Overall. Participants who completed 1 survey (combined groups 2 and 3).
Aggregated pairs and cohort analyses

Given that we were unable to match participants to create “aggregate” matched pairs prospectively, we explored the possibility of creating “cohorts” of participants with similar diseases at the end of the study. To generate cohorts, we performed single-linkage hierarchical clustering of the participants based on their phenotypes. After a review by 2 geneticists (including Dr Ingrid Holm), we concluded that many of the disease groupings were not medically meaningful. Instead, in our analysis, we approached all those with a single rare disease as 1 cohort.

Summary of Analyses

Key exposures

We defined key exposures as completion of the Phenotypr or GenomeConnect survey.

Primary end point

For each method (GenomeConnect and Phenotypr), calculate the similarity scores between the patient-derived HPO profiles and the simulated HPO profiles developed in aim 1 to create an HPO profile for each survey. The similarity score is a measure of how similar 2 profiles are: the more similar 2 profiles are, the higher their similarity score. This analysis allowed us to determine how similar the participant-generated profiles were to the simulated profile (ie, if the survey was completed perfectly for that disease). Thus, our primary end point (PE) was an assessment of the survey's performance in the real (participant) vs ideal (simulated) world and how well participants could fill out the surveys relative to their theoretical maximum. The initial primary plan was to compare the means within each disease and within the aggregate pairs. However, we did not have enough participants with any given disease to compare within each disease and were unable to create aggregate matched pairs (see “Aggregated pairs and cohort analyses” subsection above). We revised the plan to compare similarity scores overall (among those who completed 1 survey) and within groups 1 to 3 (see “Groups of respondents” subsection above).

Secondary end points

For the secondary end points, we compared similarity scores between the patient-derived HPO profiles and the Monarch gold-standard disease HPO profiles. The Monarch disease HPO profiles are the “gold standard” (most accurate and used for clinical diagnostics currently), as the simulated profiles may have noise because they are the HPO profiles generated if a patient completed the survey perfectly. This analysis is of clinical significance because the clinician could potentially use the patient's HPO profile in combination with their own phenotyping and other data to perform analytics to identify which condition the patient has. For these analyses, we also determined a rank, that is, how far down the list is the actual disease when one uses the patient HPO profiles to identify the disease; this is an important consideration for the clinical utility of patient-generated HPO terms.

Initial secondary end points plan

We did not carry out the following secondary end points, as we did not have enough participants with any given diagnosis and did not create aggregate pairs. Our plan was to do the following:

  • Determine the distribution of similarity scores within each disease and aggregate pair.
  • Determine the degree of sensitivity to missing data.
  • Determine how well participants were able to specify their HPO profiles above random noise.
Revised secondary end points plan

Our plan involved the following:

  • For each method, determine the similarity between the patient-derived HPO profiles and the Monarch gold-standard disease HPO profiles. This was as initially planned.
  • Compare paired ranks to see which method identifies the correct clinical disease “sooner” going down the ranked list of diseases. For each method, the patient-derived HPO profiles generate a list of diseases in rank order of how well the profiles match each disease using the Monarch gold-standard disease profiles. The closer the patient's disease is to the top of the list (lower rank number), the more accurate the HPO profile is. This was a new secondary end point.
  • Determine whether the information from taking both surveys improved the ability to identify the disorder by aggregating the information from both surveys into 1 profile for those who answered both surveys (group 1) and assessing how well the combined profile matched the gold-standard profile. This was a new secondary end point.
  • For those who completed both surveys, assess whether survey order mattered by determining whether the HPO profile from the second survey that the participant completed matched the gold-standard profile better than the first survey they completed, suggesting that experience with any survey improves self-phenotyping over time. This was a new secondary end point.
  • To test if variability of disease mattered, we determined whether the methods provide higher similarity scores and lower ranks for diseases with less-variable phenotypes (eg, Down syndrome and NF1) than do diseases with more-variable phenotypes (eg, 16p deletion/duplication syndromes). This is a new secondary end point.
Plan for covariates

Our plan had been to adjust the analysis for covariates, including cohort (genetics clinic or the Manton Center), genetic diagnosis, and age. However, the groups did not separate out into the cohorts, we did not have enough individuals with most diagnoses, and there was no good rationale to adjust for age because the gold-standard annotations already take age of onset of phenotypic features into account. Thus, we did not use covariates.

Plans for missing data

HPO terms are generated based on the report of signs and symptoms AND the report of a lack of signs and symptoms. For both GenomeConnect and Phenotypr, we may not know if the data are missing or if the patient does not have that symptom. We planned to examine the probability distribution of randomized missing HPO terms at different levels of the hierarchy and to use these aberrant profiles for the comparison of patient profiles. However, we had too many diagnoses to conduct this analysis.

Analytic Approaches

The goal of this study was to compare the GenomeConnect and Phenotypr surveys as aids to patients with rare diseases who wish to contribute to the process of diagnosing their condition.

HPO profiles generated

Participant-derived HPO profiles were generated for each participant and for the survey/s (GenomeConnect and/or Phenotypr) they completed:

  • PRO-1. Monarch gold-standard profile for each disease
  • PRO-2. Simulated GenomeConnect survey HPO profile for each disease created in aim 1
  • PRO-3. Simulated HPO profile for each disease generated in aim 1
  • PRO-4. Participant-generated HPO profiles using the GenomeConnect survey
  • PRO-5. Participant-generated HPO profiles using the Phenotypr survey
Profiles from our original plan that we did not generate
  • PRO-4a and PRO-5a: simulated GenomeConnect (PRO-4a) and Phenotypr (PRO-5a) survey HPO profile for each disease generated in aim 1 with randomized missing annotations as per above. Due to the heterogeneity in our participants, we did not perform this analysis.
  • PRO-6 and PRO-7: 1000 randomized simulated GenomeConnect (PRO-6) and Phenotypr HPO profiles (PRO-7) with the same information content distribution as the patient diagnoses in the 2 cohorts.
Semantic similarity scores, posterior probabilities, and ranks

The basis for the statistical comparisons were semantic similarity scores (comparison of the similarity of 2 sets of terms), posterior probabilities, and ranks (the position of the reference disease in the prioritized list of candidate diseases). To calculate the semantic similarity between phenotypic profiles, we used PhenoDigm, an algorithm that uses both information content (Resnik similarity) and Jaccard similarity to compute similarity.17 PhenoDigm is used to compute phenotypic similarity between patients and candidate disease genes in Exomiser, a variant prioritization tool.10,18,19 To run PhenoDigm, we used a python library written for this project (https://github.com/monarch-initiative/hpo-survey-analysis).

In addition, we analyzed the data using the multiclass naive Bayes classifier (BOQA; see “Overview of Aim 1B” section above), which we call Bayes.15 This classifier assumes that the phenotypes are caused by a single disease. Therefore, the posterior probabilities for all diseases causing the phenotype sum to 1, in comparison with PhenoDigm, where the similarity scores for different profiles are computed independently. To run the Bayes algorithm, we used OWLSim3, a Java package that contains a collection of semantic similarity metrics and probabilistic approaches to phenotype similarity and disease classification (https://github.com/monarch-initiative/owlsim-v3).

For each survey, we repeated the analyses over the participant-generated phenotypic profiles as input and stored the rank and either the similarity score (PhenoDigm) or posterior probability (Bayes) of the diagnosed disease. The disease rank is the position of the participant's diagnosis in the prioritized list of candidate diseases corresponding to a profile. Diseases given the same similarity score or posterior probability were given the same rank. Note that for some diseases, there may be many diseases at the same rank based on phenotype similarity due to the general nature of the HPO term(s), or based on there being very few or very many terms (eg, diseases with only “intellectual disability” as a phenotype). For GenomeConnect, phenotypes that mapped to questions in which participants answered “no” or “I'm not sure” were not included in this analysis, as the lack of information did not contribute to our analysis.

Differences between PhenoDigm and Bayes

PhenoDigm provides a measure of similarity between 2 phenotypic profiles. It takes as input 2 profiles, for example, measuring similarity between a patient profile and a Monarch gold-standard disease profile, and outputs a score. Alternatively, Bayes uses the Monarch gold standard to compute prior probabilities, and then given a patient profile, it classifies the phenotypes as being caused by a disease by computing posterior probabilities that each disease is present conditional on the phenotype, with the assumption that the patient only has 1 disease. Bayes was not used for SIM 22 and 23 because a comparison operation is not available given that Bayes is a classifier rather than a calculation of similarity. The similarity scores/ranks derived for the analyses are shown in Table 1.

Table Icon

Table 1

Similarity Scores/Ranks Derived for the Analyses.

Analysis of Primary End Point

The primary intent of this study was to compare the GenomeConnect and Phenotypr surveys as aids to patients with rare diseases who wish to contribute to making a diagnosis for their condition. The goal of the PE was to assess each survey's performance in the real (participant) vs ideal (simulated) world and assess how well participants could fill out the surveys relative to their theoretical maximum. The measure of effectiveness is the similarity of the patient-derived HPO profiles (from the survey responses) to the simulated HPO profiles (derived in aim 1). We compared the similarity scores of the simulated HPO profiles (derived in aim 1B) with the patient-derived HPO profiles drawn from responses to the GenomeConnect (SIM 22) and Phenotypr (SIM 23) surveys. We compared them overall and within each group (see above) to determine which method yields (1) higher similarity scores and (2) a tighter distribution of scores across patient-derived profiles.

Statistical methods
Group 1 and group 2

For group 1 (both surveys) and group 2 (1 survey, with both survey types completed within a diagnosis), similarity scores cannot be considered independent. Therefore, we needed to use either a paired test OR the estimation of the variance, which ultimately influences P values, to take this clustering of observation into account. In addition, because the assumption of a normal distribution is violated for the similarity scores based on preliminary analyses from aim 1, we used nonparametric tests.

  • Given that the distributions of the similarity scores were skewed (meaning, they were not normal), the median is a better measure of central tendency than is the mean. To determine statistical differences in the median similarity scores between the methods, we used a quantile regression predicting the median and accounted for clustering of the paired observations.
  • To determine if the distribution of scores from one method is tighter, we used the Pitman test for equality of variances, which is a paired test that allows us to account for clustering of observations.
Overall analyses (all who did 1 survey, ie, groups 2 and 3) and group 3 (1 survey, with only 1 survey type completed within a diagnosis)

Respondents completed only 1 survey and are independent.

  • To determine statistical differences between the median similarity scores, we used a quantile regression (as described above).
  • To determine if the distribution of 1 method is tighter, we tested for the equality of variances between the 2 methods using a Levene test.

Analysis of Revised Secondary End Points 1 to 5

Secondary end point 1

The goal of secondary end point 1 (SE1) was, for each method, to determine the similarity between the patient-derived HPO profiles and the Monarch gold-standard disease HPO profiles. We used the same statistical methods as we used for the PE because the analyses were the same. The differences from the PE were that (1) patient profiles were compared with the Monarch gold-standard disease HPO profiles; and (2) we used 2 algorithms, PhenoDigm and Bayes). We compared SIM 24 (GenomeConnect) with SIM 25 (Phenotypr) and had 1 set of similarity scores for each algorithm.

Secondary end point 2

The goal of secondary end point 2 (SE2) was to compare paired ranks to see which method identifies the correct clinical disease sooner going down the ranked list of diseases. We derived the rank order scores from the similarity scores (RANK-14 compared with RANK-15); the ranks were calculated using both the PhenoDigm and Bayes algorithms. We determined which method (GenomeConnect or Phenotypr) (1) yielded, on average, a lower rank; and (2) had a higher percentage of ranks of ≤10. In addition, ranks provide an opportunity to compare the 2 algorithms to determine if one performs better in matching patient-derived profiles to either simulated or Monarch gold-standard HPO profiles. The similarity scores produced by the 2 algorithms are very different in scale, so they cannot be compared directly.

We used the following statistical methods:

  • Group 1 (both surveys) and group 2 (1 survey, with both survey types completed within a diagnosis). The same considerations apply for SE2 as in the PE.
    We compared GenomeConnect and Phenotypr as follows:

    To determine statistical differences between the median ranks between the methods, we used quantile regression as in the PE analysis (above).

    To determine which method yields more respondents where the correct clinical disease is identified at rank ≤10, we generated a dichotomous variable that indicated if this was the case for a given respondent and then used the method/survey type as an independent variable in a logistic regression predicting this dichotomous variable accounting for the clustering of respondents.

    We compared the PhenoDigm and Bayes algorithms as follows:

    To determine which algorithm (PhenoDigm or Bayes) is superior regarding median rank, we fit a quantile regression within each survey type with the algorithm as an independent variable accounting for the clustering of respondents.

    To determine which algorithm is superior in predicting if the clinical disease was correctly identified at rank ≤10, we used a logistic regression with the algorithm as the independent variable, accounting for the clustering of respondents.

  • Overall analyses (all who did 1 survey, ie, groups 2 and 3) and group 3 (1 survey, only 1 survey type completed within a diagnosis). Respondents completed 1 survey and are independent.
    We compared GenomeConnect and Phenotypr as follows:

    To determine statistical differences between the median ranks between the methods, we used a quantile regression predicting the median.

    To determine which method yielded more respondents where the clinical disease was correctly identified at rank ≤10, we used a Fisher exact test.

    We compared PhenoDigm and Bayes algorithms as follows:

    To determine which algorithm is superior with regard to median rank, we fit a quantile regression within each survey type with the algorithm as an independent variable.

    To determine which algorithm is superior in terms of predicting if a disease is correctly identified in the first 10 ranks, we used a Fisher exact test.

Secondary end point 3

The goal of secondary end point 3 (SE3) was to determine whether the information from completing both surveys improved the ability to identify the disorder. For group 1 (both surveys), we determined if the aggregate information from both surveys, on average, yields higher similarity scores and a tighter distribution than does GenomeConnect or Phenotypr alone (SIM 26 compared with SIM 24, and SIM 26 compared with SIM 25). We also determined whether the aggregate information identifies the disease at a rank of ≤10 compared with each method alone (RANK 16 compared with RANK 14, and RANK 16 compared with RANK 15). We used both the PhenoDigm and Bayes algorithms.

Statistical methods

We had to account for clustering (as described above) as follows:

  • To determine statistical differences between the median similarity scores for the aggregate single profiles, we used a quantile regression.
  • To determine if the distribution of the aggregate profiles is tighter than the single method, we used a Pitman test for equality of variances.
  • To determine statistical differences between the median ranks for the aggregate profiles compared with the single profiles, we used quantile regression.
  • To determine if the aggregate yields more respondents where the clinical disease was correctly identified at rank ≤10 than the single profiles, we performed logistic regressions (as described above) using the aggregate profiles compared with the single profile as an independent variable.
Secondary end point 4

The goal of secondary end point 4 (SE4) was to determine, for those who completed both surveys, whether survey order mattered, suggesting that experience with any survey improves self-phenotyping over time. For group 1 (both surveys) respondents, we determined which survey type they completed first. For each algorithm and survey type, we compared similarity scores and rank (SIM 22, SIM 23, SIM 24, SIM 25, RANK 14, and RANK 15) based on survey order.

Statistical methods

Because we compared respondents based on survey order, we did not need to account for clustering.

  • To determine statistical differences between the median similarity scores by survey order, we used a quantile regression using survey order as an independent variable.
  • To determine statistical differences between the median ranks by survey order, we used the same methodology, quantile regression.
Secondary end point 5

The goal of secondary end point (SE5) was to test whether variability of disease mattered. We compared whether results were on average better for diseases with less variability within the disease (Down syndrome and NF1) compared with those with more variability within the disease (16p deletion/duplication syndromes). To maximize the sample size, in addition to respondents who only completed 1 survey, we included the first survey completed by respondents who completed both surveys. Within each algorithm and survey type, we compared SIM 22, SIM 23, SIM 24, SIM 25, RANK 14, and RANK 15. Respondents in these analyses completed only 1 survey and were considered independent. We used the same statistical methods as for SE4 (see “Secondary end point 4” subsection above).

Statistical Methods Relevant to All End Points

  • The final P values for all of the statistical tests were adjusted for multiple comparisons using the Benjamini-Hochberg adjustment because the Bonferroni adjustment is generally too conservative.
  • We calculated effect sizes to determine when differences in methods were substantial and to report the absolute value of effect sizes. We used the following effect sizes:

    For quantile regression, we standardized the continuous variables included so that they had a mean of 0 and a variance of 1. The resulting regression coefficients can be interpreted as effect sizes across all quantile regressions. We report absolute values that range from 0 to 1; higher numbers represent stronger effects.

    For logistic regression, we used the standardized coefficients that can be interpreted as odds ratios and can be interpreted as effect sizes across all logistic regressions. Odds ratios range from 0 to infinity. The closer to 1, the weaker the effect; the closer to 0 or infinity, the stronger the effect.

    For 2 × 2 contingency tables, where Fisher exact tests were used to assess the association between 2 variables, we also used odds ratios.

  • Effect sizes for equality of variance tests are not specifically developed, so they are not reported.

Postenrollment Qualitative Interviews

Participant selection

Initially, everyone who completed either 1 or both surveys was offered the chance to complete an interview. About halfway through the interview timeline, we realized that the interviews of people who completed both surveys yielded more helpful feedback, so we subsequently offered participation only to those who completed both surveys.

Inclusion criteria

Participants completed the GenomeConnect survey, the Phenotypr survey, or both.

Recruitment

In the email that we sent to participants who completed the survey, we invited them to email us if they were interested in completing an interview. The consent process was the same as for the cognitive interviews (see “Cognitive Interviews” subsection under “Overview of Aim 2”).

Interviews

The research coordinator called the participant using Zoom (https://bostonchildrens.zoom.us/). We developed an interview guide with questions regarding usability and familiarity on the part of the participant. We started with some general questions about the survey that the participant completed, including what they thought the purpose of the survey was, the value of the survey, how they thought the survey could be used, and if they thought the survey would have helped in their journey to obtain a diagnosis. We then asked participants if the survey was user-friendly, how it could have been changed to be easier to use, what was challenging or frustrating about the survey, the time it took to complete the survey, if they liked the look and feel of the survey, how completing the survey made them feel, and whether they had any additional concerns. If they completed both surveys, we asked these questions for both surveys. We asked the interviewee for permission to record the interview, and if they agreed, the interview was audio recorded and transcribed by the research coordinator. The research coordinator also took extensive notes. After the interview, we sent the interviewee an email with a $30 gift card as a token of appreciation.

Analytical and evaluative approach

The recording transcripts and the research coordinator's notes were reviewed by the research coordinator, who identified themes, and by the PI. The research coordinator and the PI reconciled the discrepancies to create a catalog of the most prevalent themes. Data were analyzed using descriptive statistics, and recurring patterns were identified.

Results

Aim 1A Results: Develop an Application for Patients to Self-phenotype That Uses Layperson HPO Terms

Overview

The HPO application Phenotypr (http://body.phenotypr.com/phenotypr/#/terms-of-use) first asks patients to select all of the categories of signs and symptoms that apply to them. Once they choose the categories, the next page asks them to start typing in their signs and symptoms. Using an autocomplete format, any layperson HPO term or standard HPO term that includes that symptom appears in a list as the participant starts typing, and the participant chooses which term fits their symptom best.

The GenomeConnect Body Systems survey (https://body.phenotypr.com/phenotypr/genome-connect) asks if the participant has issues with 1 body system (organ system) at a time. If the participant clicks “yes” or “I am not sure,” the application takes them to a screen that has many signs and symptoms specific to that body system to choose from. The participant can click on as many of the signs and symptoms in that body system as they want and then move to the next body system.

The essential differences between Phenotypr and GenomeConnect are that (1) Phenotypr requires participants to enter a symptom to generate a list to autocomplete (ie, user starts typing, and a menu of options starting with the typed letters appears, from which the user chooses the best option), whereas GenomeConnect provides a multiple-choice format; and (2) Phenotypr, because it uses the layperson HPO directly, includes a much greater number and higher specificity of HPO terms, whereas GenomeConnect uses a free-text set of answers that are mapped to many fewer HPO terms. Both instruments use lay language.

Phenotypr Survey

Phenotypr is a freely available tool that uses the layperson HPO, developed to allow patients to record layperson HPO phenotype terms for their condition. Phenotypr consists of an administrative tool for updating ontology versions, user and administrative permissions, and support for alternative implementations; a front-facing public site; and a relational database for securely housing the data. Autocomplete features were implemented by processing the HPO and using a search engine (https://github.com/monarch-initiative/hpo-plain-index). The user interface was implemented as a single-page application (https://github.com/monarch-initiative/phenotypr-body).

When the participant clicks on the link in the email, they see the opening page with a short introduction explaining Phenotypr (Figure 2). Participants then select the body systems that are affected by their condition (Figure 3). This step is intended to minimize the list of possibilities shown by the autofill feature. Participants are then invited to start entering their signs and symptoms (Figure 4).

Figure 2. Opening Page of Phenotypr.

Figure 2

Opening Page of Phenotypr.

Figure 3. Phenotypr Page to Select the Body System(s) Affected by the Condition.

Figure 3

Phenotypr Page to Select the Body System(s) Affected by the Condition.

Figure 4. Phenotypr Page to Enter Symptoms, Aided by Autofill.

Figure 4

Phenotypr Page to Enter Symptoms, Aided by Autofill.

Once the participant is done, they are asked if they found every symptom that they wanted to list or if they would like to enter more signs and symptoms (Figure 5). If the participant indicates that they did not find every symptom, they are then invited to continue filling out Phenotypr for any body system. This is necessary because sometimes a phenotype may appear to fit in more than 1 body system. For example, the HPO code for “autism” is in “behavioral/psychiatric,” but a layperson may select “brain/nervous system” (Figure 6).

Figure 5. Phenotypr Page to Add More Symptoms.

Figure 5

Phenotypr Page to Add More Symptoms.

Figure 6. Phenotypr Page to Continue Filling Out Survey.

Figure 6

Phenotypr Page to Continue Filling Out Survey.

Participants are then asked to fill out basic demographic information (Figure 7). Finally, participants are given the opportunity to provide any information they feel was missed (Figure 8). Participants are then able to provide feedback (Figure 9).

Figure 7. Phenotypr Demographics Page.

Figure 7

Phenotypr Demographics Page.

Figure 8. Phenotypr Page to Provide Additional Information.

Figure 8

Phenotypr Page to Provide Additional Information.

Figure 9. Phenotypr Page to Provide Feedback.

Figure 9

Phenotypr Page to Provide Feedback.

At the end of the survey, participants are given the HPO terms for the phenotypes they entered (Figure 10). Participants are given the opportunity to download the results as a PDF (Figure 11).

Figure 10. Phenotypr Page Where Participants Give Feedback and Can Download HPO Terms.

Figure 10

Phenotypr Page Where Participants Give Feedback and Can Download HPO Terms.

Figure 11. Phenotypr Downloaded HPO Terms.

Figure 11

Phenotypr Downloaded HPO Terms.

GenomeConnect Survey

For the GenomeConnect survey, participants undergo a similar workflow: they click on the link in the email and see the opening page with a short introduction explaining GenomeConnect (Figure 12). In contrast to Phenotypr, there are specific questions to answer (Figure 13).

Figure 12. Initial GenomeConnect Page.

Figure 12

Initial GenomeConnect Page.

Figure 13. Example of Questions to Answer on the GenomeConnect Survey.

Figure 13

Example of Questions to Answer on the GenomeConnect Survey.

Participants fill in demographic questions (Figure 14). They are given the opportunity to enter additional symptoms or information (Figure 15).

Figure 14. GenomeConnect Demographic Questions.

Figure 14

GenomeConnect Demographic Questions.

Figure 15. Opportunity to Enter Additional Symptoms or Information on GenomeConnect.

Figure 15

Opportunity to Enter Additional Symptoms or Information on GenomeConnect.

Aim 1A Outcomes: Develop an Application for Patients to Self-phenotype That Uses Layperson HPO Terms

The results from this analysis, and the improvements made in the GenomeConnect survey and Phenotypr, ensured that the output HPO was sufficiently informative, based on ontology metrics, and would be useful in downstream informatics applications that use the HPO terms for diagnostics.

Aim 1B Results: Computationally Validate Both the GenomeConnect Survey and the Native HPO Application

Evaluation of Simulated Profiles

We measured the performance of the Bayes classifier given the 3 input simulations (the gold-standard simulations, layperson simulated, and GenomeConnect simulated) by plotting a ROC curve and recording the area under the ROC curve (AUROC) for each input set (Figure 16). This approach measures the classifier's ability to differentiate between diseases given 3 sets of input data. The full set of HPO terms (gold-standard simulations) performed the best. The layperson subset performed second best, and the GenomeConnect subset performed the worst, but with an AUROC of 0.913, which is considered good for many classification problems. These results make sense, as the lay HPO subset is approximately a third of the full HPO, whereas the GenomeConnect HPO subset is <2%; as such, diseases described with increasingly fewer phenotypes are expected to match gold-standard diseases with less precision. The percentage of matches in the top 10 for the lay HPO subset was 61%, and the percentage of matches in the top 10 for the GenomeConnect subset was 25%.

Figure 16. Comparison of Simulated Patients With the Full HPO Terminology (Gold Standard Simulated), the Layperson Subset, and the GenomeConnect Subset Tested on 7344 Diseases.

Figure 16

Comparison of Simulated Patients With the Full HPO Terminology (Gold Standard Simulated), the Layperson Subset, and the GenomeConnect Subset Tested on 7344 Diseases.

Based on the results, we provided feedback to the developer of HPO (https://hpo.jax.org/app/) to add new layperson synonyms to terms for diseases that performed the worst in our simulation. As a result, 5 layperson synonyms were added to the HPO. Given that the GenomeConnect simulation performed well, we did not suggest adding any new questions or multiple-choice terms. Although adding more terms would allow patients to better describe their signs and symptoms, we felt that it was not clear enough from the simulation which terms to prioritize without overwhelming the user with questions and options and without knowing which disease groups would be targeted in aim 2.

Sensitivity Analysis

We performed Fisher exact tests on 11 113 diseases and classes of disease for the layperson and GenomeConnect subsets of HPO terms to test for enrichment and depletion (eg, whether there were specific types of diseases that were over- or underrepresented) and corrected P values with Bonferroni correction (Tables 2a and 2b). For the layperson analysis, we tested each disease for the hypothesis that the proportion of lay phenotypes associated with that disease is higher than the proportion of lay terms associated with all other diseases. Therefore, the P value measures the probability that the null hypothesis (that the same proportion of the lay terms are annotated to the disease and to every other disease) is true. Low P values (eg, <.05) are an indication of evidence against the null hypothesis. We performed the same test for each disease comparing terms in the GenomeConnect subset and terms not in GenomeConnect. The GenomeConnect subset was enriched for 7480 diseases and classes of disease (based on a significant P < .01). The layperson subset was enriched for 4708 diseases and classes of disease. This suggests that many diseases are significantly enriched, relative to all other diseases, with terms that a layperson could provide (via either survey), and therefore potentially more correctly matched by patients' phenotyping. The GenomeConnect subset of HPO terms was most enriched for vascular, neoplastic, and conjunctival diseases. The layperson subset was most enriched for bone, odontologic, and connective tissue diseases. Neither the layperson nor GenomeConnect subset had significantly depleted disease associations. The layperson subset had 2 borderline-significant depleted diseases before P value correction, sick sinus syndrome 2 and skeletal muscle neoplasms (uncorrected P = .051 and .061, respectively). Disease classes enriched with layperson terms have more phenotypes with an observable physical manifestation, such as phenotypes related to development, for example, short stature, cleft lip, and a short ring finger. Vascular disorders are predominately enriched in the GenomeConnect subset. This is likely because GenomeConnect asks several questions about primary and secondary phenotypes related to vascular disorders, such as, “What specific types of blood/bleeding issues have you had?” and “What specific heart or blood vessel problems have you had?”

Table Icon

Table 2a

Top 10 Disease Terms Overrepresented in the HPO Subsets: Layperson Subset Enrichment.

Table Icon

Table 2b

Top 10 Disease Terms Overrepresented in the HPO Subsets: GC Subset Enrichment.

Aim 2 Results: Validate Patient-Centered Phenotyping in Patients With Diagnosed Rare Diseases

Cognitive Interviews

We conducted 13 cognitive interviews, 6 with participants who took the GenomeConnect survey and 7 with those who took the Phenotypr survey. Based on feedback we obtained from 5 participants who took the GenomeConnect survey, we rephrased some questions and added directions on the demographics page to specify that we are collecting demographic information about the person with the genetic disease. Participants also identified technical issues, which were corrected.

The primary feedback from 5 participants who took the Phenotypr survey was that they were confused that no medical terminology appeared. We added medical terms and ranked the suggested options by relevance. When a user enters text, Phenotypr first searches the layperson HPO term. If no layperson terms are found, the application searches all terms. We also received feedback that it was not clear to participants when they had completed the survey, so we added a final note to thank them for their participation.

After making these changes, we conducted 2 more interviews with participants who took the GenomeConnect survey and 1 with a participant who took the Phenotypr survey, and there were no issues. The surveys were then implemented.

Assess the GenomeConnect and Phenotypr Surveys in Patients With a Diagnosed Rare Disease

Enrollment tables and creation of final analysis data set

We offered enrollment to 1061 individuals, and 283 of them enrolled, for an enrollment rate of 26.7%. We sent 659 individuals 1 survey, and 154 participants completed it (23.4%). The completion rate was higher for the GenomeConnect survey (28.0% [92/329]) than for the Phenotypr survey (18.8% [62/330]). We offered 402 individuals both surveys, and 129 participants completed at least 1 (32.1%).

Note that the enrolled participant, the respondent, is the person completing the survey. In most cases, the respondent is a parent completing the survey in reference to their child who is the person who has the condition. Occasionally, the respondent is completing the survey about themselves, as they have the condition.

During data cleaning, we realized that 4 participants completed 2 surveys, 1 for themselves and 1 for their child under the same unique ID. It was impossible to disentangle the 2 individual surveys, so these cases were deleted from the data set. Three of these respondents completed both surveys, and 1 completed the GenomeConnect survey. Most individuals invited to complete both surveys did so (101/126 [80.2%]). Nearly 17% (21/126 [16.7%]) completed only the GenomeConnect survey, and 3.2% (4/126) completed only the Phenotypr survey. Furthermore, there were 2 individuals with 2 distinct genetic diseases for which separate similarity indices could be generated. One of these individuals completed the Phenotypr survey, and the other completed the GenomeConnect survey. In the analyses, we included both similarity scores for each individual. Thus, our final sample was 281 surveys and 279 respondents.

For 23 respondents, we could not calculate similarity scores because either the genetic condition did not have an OMIM or Mondo mapping or no phenotypes were annotated to the condition. One respondent answered all questions in the negative, and the algorithms use only positive phenotypes. These 24 respondents were dropped from our analyses. The final data set for analyses therefore includes 257 diseases reported by 255 respondents. Of the 255 respondents, 89 answered both surveys, 63 answered Phenotypr, and 103 answered GenomeConnect (see Figure 17).

Figure 17. Flowchart of Recruitment, Enrollment, Survey Completion, and the Final Data Set for Analysis.

Figure 17

Flowchart of Recruitment, Enrollment, Survey Completion, and the Final Data Set for Analysis.

The distribution of respondents by race and sex and by ethnicity and sex for the final analysis sample is displayed in Table 3.

Table Icon

Table 3

Final Data Set of Respondents by Race/Ethnicity and Sex.

Table 4 shows the demographic characteristics of the final data set of respondents by survey type. Although we offered the survey to all parents regardless of race, ethnicity, or sex, the respondents were predominantly White and female. Most surveys were completed by the parent of a child with the condition, and the female predominance was because the mother was generally the parent who completed the survey for their child. Table 5 shows the sex distribution of the person with the genetic disease. We did not collect race data on the person with the genetic disease; because the affected person was generally related to the respondents, we presume the race and ethnicity are correlated.

Table Icon

Table 4

Demographic Characteristics of Final Data Set of Respondents by Survey Group.

Table Icon

Table 5

Sex of the Individual With the Genetic Disease.

Correction for multiple testing

We conducted 102 statistical significance tests. Using the Benjamini-Hochberg methodology to adjust for multiple testing, we found that the new critical P values were <.015. We denoted values that would be statistically significant at a P < .05 but are no longer statistically significant due to multiple testing with “ns” for “not significant.”

Similarity and ranks

Univariate descriptive statistics for similarity scores and ranks are shown in Table 6.

Table Icon

Table 6

Univariate Descriptive Statistics for Similarity Scores and Ranks.

Results for Primary End Point

The PE is the comparison of the similarity scores between the simulated HPO profiles (derived in aim 1B) and the patient-derived HPO profiles for GenomeConnect (SIM 22) and Phenotypr (SIM 23) to determine which method yields (1) higher similarity scores and (2) a tighter distribution. We performed analyses for the overall group of those who completed 1 survey (participants in groups 2 and 3; Tables 7-9), group 1 (both surveys; Tables 10-12), group 2 (1 survey, with both survey types completed within a diagnosis; Tables 13-15), and group 3 (1 survey, only 1 survey type completed within a diagnosis; Tables 16-18). For each group, the first table is the descriptive statistics, the second is the analysis of similarity scores for the surveys compared with the simulated HPO profiles, and the third assesses the equality of the distribution of the scores for the surveys compared with the simulated HPO profiles.

Table Icon

Table 7

Descriptive Statistics – Overall Analysis PE.

Table Icon

Table 9

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Overall Analysis PE.

Table Icon

Table 10

Descriptive Statistics – Group 1 PE.

Table Icon

Table 11

Quantile Regression Results to Compare Median Similarity Scores Between the Methods – Group 1 PE.

Table Icon

Table 12

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Group 1 PE.

Table Icon

Table 13

Descriptive Statistics – Group 2 PE.

Table Icon

Table 14

Quantile Regression Results to Compare Median Similarity Scores Between Methods – Group 2 PE.

Table Icon

Table 15

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Group 2 PE.

Table Icon

Table 16

Descriptive Statistics – Group 3 PE.

Table Icon

Table 17

Quantile Regression Results to Compare Median Similarity Scores Between the Methods – Group 3 PE.

Table Icon

Table 18

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Group 3 PE.

Note that for Table 8 and all tables showing the estimated median difference, this number is determined by subtracting the GenomeConnect score from the Phenotypr score; thus, the estimated median difference is negative if the GenomeConnect similarity score is higher (higher score means more similar) than the Phenotypr score.

Table Icon

Table 8

Quantile Regression Results to Compare Median Similarity Scores Between Methods – Overall Analysis PE.

Summary of results for PE

For the overall analysis (all who completed 1 survey, ie, groups 2 and 3) and group 1 (respondents who completed both surveys), the median similarity score between the GenomeConnect survey HPO profiles and the simulated HPO profiles was significantly higher than the similarity scores between the Phenotypr survey HPO profiles and the simulated HPO profiles, demonstrating that the GenomeConnect survey HPO profiles were closer to the simulated profiles. For group 1 only, the distribution of similarity scores between the Phenotypr survey HPO profiles and the simulated HPO profiles was significantly tighter than the distribution of similarity scores between the GenomeConnect survey HPO profiles and the simulated HPO profiles, demonstrating that the Phenotypr profiles have less variability. There were no differences for either outcome for groups 2 (1 survey, with both survey types completed within a diagnosis) or 3 (1 survey, with only 1 survey type completed within a diagnosis).

Secondary End Point 1

The goal of SE1 was, for each method, to determine the similarity between the patient-derived HPO profiles and the Monarch gold-standard disease HPO profiles (unlike the PE, where we compared the patient-derived HPO profiles with the simulated HPO profiles derived in aim 1B). We performed analyses for the overall group (all who completed 1 survey, ie, groups 2 and 3; Tables 19-21), group 1 (both surveys; Tables 22-24), group 2 (1 survey, with both survey types completed within a diagnosis; Tables 25-27), and group 3 (1 survey, with only 1 survey type completed within a diagnosis; Tables 28-30). For each group of 3 tables, the first table is the descriptive statistics, the second is the analysis of similarity scores for the surveys compared with the Monarch gold-standard HPO profiles, and the third assesses the equality of the distribution of the scores for the surveys compared with the Monarch gold-standard HPO profiles.

Table Icon

Table 19

Descriptive Statistics – Overall SE1.

Table Icon

Table 20

Quantile Regression Results to Compare Median Similarity Scores Between the Methods – Overall Analysis SE1.

Table Icon

Table 21

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Overall Analysis SE1.

Table Icon

Table 22

Descriptive Statistics – Group 1 SE1.

Table Icon

Table 23

Quantile Regression Results to Compare Median Similarity Scores Between the Methods – Group 1 SE1.

Table Icon

Table 24

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Group 1 SE1.

Table Icon

Table 25

Descriptive Statistics – Group 2 SE1.

Table Icon

Table 26

Quantile Regression Results to Compare Median Similarity Scores Between the Methods – Group 2 SE1.

Table Icon

Table 27

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Group 2 SE1.

Table Icon

Table 28

Descriptive Statistics – Group 3 SE1.

Table Icon

Table 29

Quantile Regression Results to Compare Median Similarity Scores Between the Methods – Group 3 SE1.

Table Icon

Table 30

Equality-of-Variance Test to Determine if the Distribution Is Different Between the Methods – Group 3 SE1.

Summary of results of SE1

For each method (GenomeConnect survey and Phenotypr survey), we determined the similarity between the patient-derived HPO profiles and the Monarch gold-standard disease HPO profiles. For group 1 (both surveys) only, using the Bayes algorithm, the median similarity scores between the Monarch gold-standard HPO profiles and the GenomeConnect survey HPO profiles were significantly higher than the similarity scores between the Monarch gold-standard HPO profiles and the Phenotypr survey HPO profiles, demonstrating that the GenomeConnect profiles were closer to the gold-standard profiles. For group 2 (1 survey, with both survey types completed within a diagnosis) only using the Bayes algorithm, the distribution of similarity scores between the Monarch gold-standard HPO profiles and the GenomeConnect survey HPO profiles was significantly tighter than that between the Monarch gold-standard HPO profiles and the Phenotypr survey HPO profiles, demonstrating that the GenomeConnect profiles have less variability. There were no differences for any of the other analyses, and there were no differences when the PhenoDigm algorithm was used.

Secondary End Point 2

We compared the ranks (the position of the reference disease in the prioritized list of candidate diseases) to see which survey identifies the correct clinical disease sooner when going down the ranked list of diseases. For patients taking the survey, the closer their disease is to the top of the list (lower rank number), the more accurate the HPO profile is. We also compared the Bayes and PhenoDigm algorithms to explore differences in their performance. We note here that due to the generality of the HPO term(s), or very few or very many terms used to describe a disease, there can sometimes be many diseases listed at the same rank. We performed analyses for the overall group (all who completed 1 survey, ie, groups 2 and 3; Tables 31-35), group 1 (both surveys; Tables 36-40), group 2 (1 survey, with both survey types completed within a diagnosis; Tables 41-45), and group 3 (1 survey, with only 1 survey type completed within a diagnosis; Tables 46-50). For each group of 5 tables, the first table is the descriptive statistics; the second compares ranks between survey types; the third assesses ranks of ≤10 between survey types; the fourth compares ranks between the 2 algorithms, Bayes and PhenoDigm, used to calculate the similarity scores; and the fifth assesses ranks of ≤10 between the 2 algorithms.

Table Icon

Table 31

Descriptive Statistics – Overall SE2.

Table Icon

Table 32

Quantile Regression Results to Compare Ranks Between the Methods – Overall SE2.

Table Icon

Table 33

Fisher Exact Test to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between Methods – Overall SE2.

Table Icon

Table 34

Quantile Regression Results to Determine if There Is a Significant Difference in Ranks Between the 2 Algorithms – Overall SE2.

Table Icon

Table 35

Fisher Exact Test to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between the Algorithms – Overall SE2.

Table Icon

Table 36

Descriptive Statistics – Group 1 SE2.

Table Icon

Table 37

Quantile Regression Results to Compare Ranks Between Methods – Group 1 SE2.

Table Icon

Table 38

Logistic Regression Test to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between Methods – Group 1 SE2.

Table Icon

Table 39

Quantile Regression Results to Determine if There Is a Significant Difference in Ranks Between the 2 Algorithms – Group 1 SE2.

Table Icon

Table 40

Logistic Regression to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between Algorithms – Group 1 SE2.

Table Icon

Table 41

Descriptive Statistics – Group 2 SE2.

Table Icon

Table 42

Quantile Regression Results to Determine if There Is a Significant Difference in Ranks Between Methods – Group 2 SE2.

Table Icon

Table 43

Logistic Regression to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between Methods – Group 2 SE2.

Table Icon

Table 44

Quantile Regression Results to Determine if There Is a Significant Difference in Ranks Between Algorithms – Group 2 SE2.

Table Icon

Table 45

Logistic Regression to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between the Algorithms – Group 2 SE2.

Table Icon

Table 46

Descriptive Statistics – Group 3 SE2.

Table Icon

Table 47

Quantile Regression Results to Determine if There Is a Significant Difference in Ranks Between Algorithms – Group 3 SE2.

Table Icon

Table 48

Fisher Exact Test to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between Methods – Group 3 SE2.

Table Icon

Table 49

Quantile Regression Results to Determine if There Is a Significant Difference in Ranks Between Algorithms – Group 3 SE2.

Table Icon

Table 50

Fisher Exact Test to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between Methods – Group 3 SE2.

Note that for the following tables, the lower the median score, the better the rank, the closer the disease was to the top of the prioritized list of candidate diseases, and the sooner the correct clinical disease was identified.

Summary of results of SE2

A comparison to see which survey identifies the correct clinical disease (the disease that the respondent had) sooner when going down the ranked list of diseases showed that Phenotypr yielded significantly lower (ie, better) ranks for group 1 (both surveys) using Bayes and group 2 (1 survey, with both survey types completed within a diagnosis) using PhenoDigm. This was not surprising given that Phenotypr uses a greater proportion of the HPO. However, the group 1 results were conflicting because the GenomeConnect survey yielded significantly lower ranks using PhenoDigm. Thus, we can make no conclusions about which survey is superior for this end point. We believe these results may have been inconclusive due to the low number (n) for each group.

A comparison of algorithms (Bayes and the PhenoDigm) to explore differences in their performance showed that Bayes yielded lower rank scores than did PhenoDigm for the GenomeConnect survey (overall group and group 3), the Phenotypr survey (group 2), and both surveys (group 1).

Secondary End Point 3

We performed analyses of aggregate scores (aggregating the information from both surveys into 1 profile for those who completed both surveys [group 1]) compared with scores for each survey alone for group 1 (both surveys). Tables 51 to 53 compare the similarity scores: Table 51 shows the descriptive statistics, Table 52 compares similarity scores, and Table 53 assesses the equality of distribution. Tables 54 to 56 compare the ranks: Table 54 shows the descriptive statistics, Table 55 compares the ranks, and Table 56 assesses ranks of ≤10.

Table Icon

Table 51

Descriptive Statistics for the Similarity Scores – SE3.

Table Icon

Table 52

Quantile Regression Results to Compare Median Similarity Scores Between Each Method Alone and the Aggregate Profile – SE3.

Table Icon

Table 53

Equality-of-Variance Test Results to Determine if the Distribution of the Aggregate Profiles Is Tighter Than With Either Method Alone – SE3.

Table Icon

Table 54

Descriptive Statistics for the Rank Scores – SE3.

Table Icon

Table 55

Quantile Regression Results Ranks to Determine if There Is a Significant Difference in Ranks Between the Aggregate Profiles and Either Method Alone – SE3.

Table Icon

Table 56

Logistic Regression to Determine if the Proportions of Respondents With a Rank of ≤10 Are Significantly Different Between the Aggregate Profiles and Either Method Alone – SE3.

Summary of results for SE3

We compared aggregate profiles (aggregating the information from both surveys into 1 profile for those who answered both surveys [group 1]) with GenomeConnect or Phenotypr alone. Using the PhenoDigm algorithm aggregate profiles generally performed better, with higher similarity scores (ie, aggregate profiles were closer to the simulated profiles) and lower ranks (ie, aggregate profiles identified the disease sooner), than either GenomeConnect or Phenotypr alone. When using the Bayes algorithm, we found that both GenomeConnect and Phenotypr alone performed better, with higher similarity scores and tighter distributions than those of the aggregate profiles. Because the results differed depending on the algorithm used, we cannot draw conclusions about the performance of the aggregate profiles compared with either survey alone.

Secondary End Point 4

For group 1 (completed both surveys), we investigated whether the order in which the respondent took the 2 surveys mattered. Table 57 shows the descriptive statistics of the similarity scores using the simulated profiles, and Table 58 shows them using the Monarch gold-standard profiles. Table 59 compares survey order using simulated profiles, and Table 60 compares it using the Monarch gold-standard profiles. Table 61 shows the descriptive statistics for ranks. Table 62 shows assessment of the ranks.

Table Icon

Table 57

Descriptive Statistics of Similarity Scores Based on Simulated Profiles – SE4.

Table Icon

Table 58

Descriptive Statistics of Similarity Scores Based on Monarch Gold-Standard Profiles – SE4.

Table Icon

Table 59

Quantile Regression Results to Compare Median Similarity Scores, Using the Simulated HPO Profiles, Between Methods by Survey Order – SE4.

Table Icon

Table 60

Quantile Regression Results to Compare Median Similarity Scores, Using the Monarch Gold-Standard Disease HPO Profiles, Between Methods by Survey Order – SE4.

Table Icon

Table 61

Descriptive Statistics Ranks – SE4.

Table Icon

Table 62

Quantile Regression Results to Compare Ranks Between Methods by Survey Order – SE4.

Summary of results for SE4

For those who completed both surveys, survey order did not matter.

Secondary End Point 5

These analyses compare diseases that have less variability in phenotypes with diseases that have more variability in phenotypes (diseases with less variability within the disease compared with those with more variability within the disease). Tables 62 to 65 use similarity scores: Table 63 shows the descriptive statistics, Table 64 compares the methods using simulated HPO profiles, and Table 65 does the same using the Monarch gold-standard HPO profiles. Tables 66 and 67 use ranks: Table 66 shows the descriptive statistics, and Table 67 compares the ranks.

Table Icon

Table 63

Descriptive Statistics Similarity Scores – SE5.

Table Icon

Table 64

Quantile Regression Results to Compare Median Similarity Scores Between Methods and Simulated HPO Profiles for Variable Diseases Compared With Nonvariable Diseases – SE5.

Table Icon

Table 65

Quantile Regression Results to Compare Median Similarity Scores Between Methods and Monarch Gold-Standard HPO Profiles for Variable Diseases Compared With Nonvariable Diseases – SE5.

Table Icon

Table 66

Descriptive Statistics Ranks – SE5.

Table Icon

Table 67

Quantile Regression Results Determine if There Is a Significant Difference in Ranks Between Variable Diseases and Nonvariable Diseases Using GC or PHN – SE5.

Summary of results for SE5

These analyses compared diseases with less-variable phenotypes with diseases with more-variable phenotypes. There was no difference in the similarity scores between less-variable and more-variable diseases; however, less-variable diseases ranked significantly lower than more-variable diseases. This suggests that diseases with a more consistent phenotype may have been more consistently described by the participants.

Summary of Results

Primary end point

Using the similarity scores between the survey-derived HPO profiles and the simulated HPO profiles, for respondents who took only 1 survey, and for those who took both surveys, the GenomeConnect survey had a higher median similarity score than did the Phenotypr survey, and was thus more accurate. The Phenotypr survey had a tighter distribution of scores for respondents who completed both surveys, so it was more precise.

Secondary end point 1

Using the similarity scores between the surveys and the Monarch gold-standard HPO profiles, for respondents who completed both surveys, using the Bayes algorithm, GenomeConnect had a higher median similarity score than did the Phenotypr survey, and was thus more accurate. For matched pairs (group 2), the GenomeConnect survey also had a tighter distribution than did the Phenotypr survey, so it was more precise.

Secondary end point 2

There was no clear difference in ranks (ie, how far down the list of diseases to get to the disease of the respondent) between the GenomeConnect survey and the Phenotypr survey, and the results using Bayes and PhenoDigm were at times conflicting.

  • In the overall analysis, the Phenotypr survey yielded a greater percentage of ranks of ≤10 (ie, the respondent's condition was in the first 10 candidate conditions) than did GenomeConnect.
  • The Bayes algorithm generally resulted in lower rank scores than did PhenoDigm and resulted in a greater percentage of ranks of ≤10 for group 1 only.
Secondary end point 3

The performance of the aggregated profiles depended on which algorithm was used to determine the similarity score. In general, with the PhenoDigm algorithm, the aggregate profiles performed better; with the Bayes algorithm, the GenomeConnect survey and the Phenotypr survey alone performed better.

Secondary end point 4

Survey order did not matter.

Secondary end point 5

Similarity scores were no different between diseases with more or less phenotypic variability (i.e., between less-variable diseases and more-variable diseases). Less-variable diseases generally rank lower do than more-variable diseases.

Postenrollment Qualitative Interviews

Data Collection

We conducted 17 interviews: 5 with individuals who completed the GenomeConnect survey, 5 with individuals who completed the Phenotypr survey, and 7 with individuals who completed both surveys.

Themes Identified

  • General

    The surveys will be helpful to patients/clinicians to input signs and symptoms and see suggested diagnoses.

    It was satisfying to see all signs and symptoms listed in 1 place.

    The length of both surveys was fine.

  • Phenotypr

    The typing aspect of Phenotypr was difficult, and a list would be preferred.

    It was not always clear under which organ system to put a symptom for Phenotypr.

    The Phenotypr language was very clinical and difficult to understand at times.

  • GenomeConnect

    GenomeConnect was a bit too broad (ie, terms were too general) and not granular enough.

    The layout and structure of GenomeConnect was more streamlined and manageable.

Plans for Reporting Qualitative Analyses

We plan to report on the qualitative analysis as part of our publication on the results for aim 2.

Discussion

The premise of our study was that self-phenotyping may be an accurate and comprehensive source of data on and by patients that could empower them. Self-phenotyping is not intended to lead to a diagnosis on its own, but rather could be used as a tool for clinicians and patients as part of the diagnostic process, which includes the clinical evaluation, family history, laboratory work, and so forth. Although some work has gone into assessing the potential role of self-reported health data in complementing EHR data,20 little has been done to assess the role and value of self-phenotyping in informing clinical care or research. In addition, self-phenotyping surveys, such as the GenomeConnect survey or the layperson version of HPO, have not been tested to determine if they are accurate. Further, the layperson version of HPO had not been tested as a self-phenotyping method in patients. The primary goal of this study was to determine if patients could effectively use the layperson version of the HPO to self-phenotype at a level that could be clinically useful, which we defined as leading to a survey-derived HPO profile that was similar to the HPO profile of the disease. We aimed to compare use of the full lay HPO in a new patient-centered application, Phenotypr (objective A, aim 1), with a whole-body survey provided by GenomeConnect.

To address the question, “Can patients generate useful HPO-based data for use in disease diagnosis?” in aim 1, we first created synthetic lay-subset profiles of each disease in the gold-standard corpus of HPO annotations and then compared these profiles using semantic similarity approaches to determine “how close” the lay versions of the disease profiles would be to the correct clinical disease when compared against all diseases. This result from aim 1 (objective B) is essentially the upper limit of what a patient might be able to achieve if they documented their phenotypic profile perfectly using the full availability of the lay HPO. The results from the similarity analysis for these synthetic lay profiles showed that indeed, patients could theoretically generate useful profiles.

In aim 1, we then compared simulated survey responses to the Monarch gold-standard disease profile with phenotype annotations to determine how close the simulated profiles were to the gold-standard profiles. The results of the simulation showed that Phenotypr is capable of generating phenotypic profiles closer to the gold standard than GenomeConnect. This was expected, given that the layperson subset contains 4757 terms in comparison with the 215 terms mapped by GenomeConnect. We concluded that synthetic lay profiles are effective at identifying the correct clinical disease, with 61% of simulated Phenotypr surveys and 25% of simulated GenomeConnect surveys ranking the clinical disease in the top 10 (Figure 16). We note that ties are often based on using either very general HPO terms or using very few or very many terms to describe any given disease. This can lead to accurate but not very precise comparisons and ranking.

In aim 2, we tested the GenomeConnect survey and the Phenotypr survey (the layperson HPO survey developed in aim 1) in participants with diagnosed genetic diseases in order to determine which survey performed better and which survey was preferred by participants (aim 2, objective C). The comparison with the simulated profiles allowed us to see how participants did compared with the best they could theoretically do if the survey was perfectly completed. Although the comparison with the Monarch gold standard gave us a comparison that was more clinically relevant, given that the clinician uses the equivalent comparison to reach a candidate diagnosis, the primary goal of this study was to compare the surveys themselves and assess how well participants could complete the surveys relative to their theoretical maximum.

It should be noted that we made some changes in our analysis plan because we did not have enough participants with any given disease and were thus unable to create “matched pairs.” For the PE, the initial plan was to compare the mean similarity scores within each disease and within the aggregate pairs. We revised the plan to compare similarity scores overall (among those who completed 1 survey) and within 3 groups. We did not carry out 3 of the secondary end points, which were to determine (1) the distribution of similarity scores within each disease and aggregate pair, (2) the degree of sensitivity to missing data, and (3) how well participants were able to specify their HPO profiles above random noise. Instead, we (1) compared the GenomeConnect survey with the Phenotypr survey to see which one could identify the correct clinical disease sooner when going down the ranked list of diseases; (2) determined whether the information from taking both surveys improved the ability to identify the disorder; (3) assessed whether survey order mattered (for those who took both surveys); and (4) tested whether the degree of variability of the phenotypes for a disease mattered.

For our PE, we found that the GenomeConnect survey had a higher median similarity score compared with the simulated HPO profiles than did the Phenotypr survey, so GenomeConnect is more accurate; however, the Phenotypr survey had a tighter distribution of scores, so it is more precise. We found smaller differences between the 2 surveys compared with the Monarch gold standard, suggesting that they performed similarly in a clinically relevant diagnostic setting.

A key comparison is the accuracy of the surveys in identifying the correct condition. We used the rank scores to address this issue. We found a tendency for Phenotypr to be better than GenomeConnect at including the correct clinical disease in the list of the top 10 diseases generated by participants' responses to the surveys. For many of the analyses, we compared 3 groups of respondents: those who took both surveys, matched pairs (1 pair completed GenomeConnect and the other completed Phenotypr), and a group where only 1 survey type was taken per diagnosis. This was necessary to make a comparison but may have decreased our power to see differences, as we divided our cohort into several smaller groups. Based on the finding that Phenotypr seemed to be better at identifying the disease sooner, we conclude that Phenotypr might be more useful for clinicians to garner useful phenotypic data for diagnostic use from patients. However, despite this difference, most of the time the patient's disease was not in the top 10 diseases generated by the patient's responses to the surveys. This suggests that more work is needed to refine the self-phenotyping strategy as well as to validate how to best use patient phenotyping in the context of clinical phenotyping.

The largest conceptual difference between the GenomeConnect and Phenotypr survey approaches is that GenomeConnect has a multiple-choice format, which our participants in general preferred. However, Phenotypr was more granular and able to generate a much larger number of terms for a given disease HPO profile. The responses to the qualitative interviews suggest that this granularity was overwhelming for some participants. We hypothesize that the multiple-choice format of GenomeConnect is easier for participants to use. Phenotypr requires users to recall phenotypes, placing the burden on the user to both remember and start typing a term with lexical similarity to one of the lay or clinical HPO terms. Many participants provided text in the additional signs and symptoms free-text field, suggesting they were unable to find all of their phenotypes (signs and symptoms). We also hypothesize that it is easier for users to select more-general terms in the HPO. For example, it is easier to say that a participant or family member experiences “behavioral abnormalities” than that they experience “obsessive-compulsive behavior.” In addition, participants found the Phenotypr language to be clinical and difficult to understand at times. This was likely because both layperson HPO terms and medical HPO terms were included in the terms from which respondents had to choose.

Although our study advances methods for self-phenotyping, the results do not suggest a clear-cut preferred method for self-phenotyping for patients and caregivers. We conclude that a survey model that combines elements of both surveys might be the most ideal, for example, a multiple-choice survey format (GenomeConnect) that had more-granular choices (Phenotypr).

Potential for the Results to Advance Methods and Improve the Validity, Trustworthiness, and Usefulness of Findings

In this study, we validated self-phenotyping methods computationally (aim 1) and in a patient cohort (aim 2). We have been able to demonstrate that both instruments (GenomeConnect and Phenotypr) can enable patients to provide critical phenotype information to the clinician. Ongoing work would leverage the cognitively preferred method of multiple-choice selection within a survey and the more precise free-association model used when autocompleting the lay HPO terms directly.

Strengths

There are many strengths to our study. We have created a novel tool, Phenotypr, and developed a new platform for GenomeConnect for patients to perform self-phenotyping, and we implemented the tools in the largest study of such tools of which we are aware. We have shown that both tools are useful in obtaining phenotype data directly from patients for clinical use, with the complementary strengths and weaknesses of both surveys suggesting the development of hybrid models and/or improvements to such instruments in the future.

Limitations

The number of different diagnoses in our patient cohort was large. This meant that the robustness with respect to disease diversity was high and therefore representative of the real-world disease heterogeneity. However, it also meant that the ability to make within-disease comparisons was challenging due to the very small cohort size for most diseases. We had originally planned to create subcohorts based on disease similarity to compare the 2 instruments. However, due to low recruitment, we expanded our inclusion criteria and thus did not have matched cohorts of participants; we were unable to compare the 2 surveys within diagnoses effectively, because we had too many different diagnoses and too few participants with the same diagnoses. Further, participants with diagnoses are more likely to have researched their condition and to have a better understanding of the clinical terms. These surveys need to be evaluated in an undiagnosed cohort to understand how they perform in a clinical setting to inform disease diagnosis. These cohort limitations are simply the reality of working on rare diseases in general, and the diverse cohort we evaluated in the end provisioned us a reasonable spectrum of end points. We favored breadth over depth for any one disease for its better reflection of how the tools would be used clinically.

Another significant limitation was the difference in the results based on the 2 algorithms (Bayes and PhenoDigm) against which we compared patient-generated phenotypic profiles. Although both algorithms returned values that have an exponential-logarithmic distribution, in reviewing the distribution of posterior probabilities returned by the Bayes algorithm, we noticed a steep drop-off between high and very small probabilities in comparison with the distribution of PhenoDigm scores. This may explain the opposing results when comparing variance between surveys. On closer inspection of the ranking results, we discovered that the Bayes algorithm produces large numbers of ties (diseases with the same posterior probability) when the input is 1 to 2 phenotypes, and the phenotypes are commonly annotated to diseases in the Monarch gold standard. For example, a participant with a particular chromosome deletion syndrome (16p13.11 microduplication) completed both the GenomeConnect and Phenotypr surveys; for GenomeConnect, they provided the phenotypes “behavioral abnormality” and “autistic behavior,” and for Phenotypr, they provided “behavioral abnormality” and “autism with high cognitive abilities.” Although Bayes ranked the correct disease that included these phenotypes 14th and 18th, respectively, in both cases, there were more than 175 diseases that included these phenotypes ranked between 1 and 10, with the ties occurring between 3 and 10. In another case, a participant surveyed a single phenotype, “abnormality of the nervous system.” The Bayes algorithm ranked the correct disease that included this phenotype sixth; however, there were 1393 diseases that included this phenotype ranked between 1 and 6.

This factor also affects the simulation results that used rank as a measure. It would not affect the ROC curve because we used the posterior probability as a measure to evaluate the performance of the classifier for each simulation instead of rank. Given that most participants provided more than 2 phenotypes, we do not think this affected the significance of the results. However, for future work, we propose adding a penalty for ties, either an average tie penalty (4 diseases tied for first would be given a rank of 2) or a maximum tie penalty (4 diseases tied for first would be given a rank of 4). In addition, the tools could be made to create suggestions to the patients to add more phenotypes where possible. We note that the tools are intended to provide clinical experts a set of phenotypes to enhance with additional clinical phenotypes for use in diagnostic tools, so although the ties are an issue, they do not diminish the value of the information provided to the diagnostician. Finally, some diseases simply have only 1 or a few phenotypes, and this approach to comparing profiles simply may make sense only from a disease exclusion perspective rather than one of profile matching.

Another limitation is that the participants knew their diagnosis at the time they completed the surveys, potentially leading to more accurate phenotyping than if they did not know their diagnosis and were solely dependent on the signs and symptoms they experienced. Because all of our participants knew the diagnosis, it will be important to test the tools in undiagnosed patients during the diagnostic odyssey to better see how the tools perform.

Finally, although we offered our survey regardless of race, ethnicity, or sex, our population was not very diverse (primarily White and female). Mostly women (mothers) and White individuals answered.

Recommendations for Further Research

The tools developed in this project are not yet ready for clinical diagnostic use. Future work should focus on developing a survey that is a combination of the best of both GenomeConnect and Phenotypr (ie, patient-friendly and granular). Such a survey should be tested in a more diverse, undiagnosed disease population to determine whether it leads to improved diagnostic rate or efficiency and to greater patient satisfaction regarding active participation in the diagnostic odyssey. Both of these outcomes are the ultimate goal of self-phenotyping. Validation of the tools and a greater understanding of how patient phenotyping and clinician phenotyping can best be used together are needed. To address the lack of racial and ethnic diversity, reaching out specifically to underrepresented populations would be a good strategy in future research. It would also be useful to define the literacy or education level needed to use these instruments. Finally, studies to determine whether patients find self-phenotyping useful or informative could be conducted, for example, by surveying patients after they complete the survey or by conducting interviews.

Conclusions

Computationally, better phenotypic profiles were generated with Phenotypr than with GenomeConnect. When participants completed the surveys, they preferred the GenomeConnect survey format. The GenomeConnect survey was more accurate; however, the Phenotypr survey was more precise. This suggests that a hybrid approach that provides a familiar tool format but access to richer HPO terms may be warranted. Such tools could be used to improve and accelerate diagnostic pipelines and promote collaboration and patient engagement with clinical caregivers and diagnosticians.

References

1.
Köhler S, Doelken SC, Mungall CJ, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42(Database issue):D966-D974. doi:10.1093/nar/gkt1026 [PMC free article: PMC3965098] [PubMed: 24217912] [CrossRef]
2.
Mungall CJ, Washington NL, Nguyen-Xuan J, et al. Use of model organism and disease databases to support matchmaking for human disease gene discovery. Hum Mutat. 2015;36(10):979-984. [PMC free article: PMC5473253] [PubMed: 26269093]
3.
Philippakis AA, Azzariti DR, Beltran S, et al. The Matchmaker Exchange: a platform for rare disease gene discovery. Hum Mutat. 2015;36(10):915-921. [PMC free article: PMC4610002] [PubMed: 26295439]
4.
Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB. The real cost of sequencing: higher than you think! Genome Biol. 2011;12(8):125. [PMC free article: PMC3245608] [PubMed: 21867570]
5.
Hunter L. Computational challenges of mass phenotyping. Pac Symp Biocomput. 2013:454-455. [PMC free article: PMC5927555] [PubMed: 23424150]
6.
Dias C, Sincan M, Cherukuri PF, et al. An analysis of exome sequencing for diagnostic testing of the genes associated with muscle disease and spastic paraplegia. Human Mutat. 2012;33(4):614-626. [PMC free article: PMC3329376] [PubMed: 22311686]
7.
Gahl WA, Tifft CJ. The NIH Undiagnosed Diseases Program: lessons learned. JAMA. 2011;305(18):1904-1905. [PubMed: 21558523]
8.
Tifft CJ, Adams DR. The National Institutes of Health Undiagnosed Diseases Program. Curr Opin Pediatr. 2014;26(6):626-633. [PMC free article: PMC4302336] [PubMed: 25313974]
9.
Kirkpatrick BE, Riggs ER, Azzariti DR, et al. GenomeConnect: matchmaking between patients, clinical laboratories, and researchers to improve genomic knowledge. Hum Mutat. 2015;36(10):974-978. [PMC free article: PMC4575269] [PubMed: 26178529]
10.
Shefchek KA, Harris NL, Gargano M, et al. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res. 2020;48(D1):D704-D715. doi:10.1093/nar/gkz1997 [PMC free article: PMC7056945] [PubMed: 31701156] [CrossRef]
11.
Köhler S. Improved ontology-based similarity calculations using a study-wise annotation model. Database (Oxford). 2018;2018:bay026. doi:10.1093/database/bay026 [PMC free article: PMC5868182] [PubMed: 29688377] [CrossRef]
12.
Köhler S, Schulz MH, Krawitz P, et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet. 2009;85(4):457-464. [PMC free article: PMC2756558] [PubMed: 19800049]
13.
Cheng J, Cline M, Martin J, et al. A knowledge-based clustering algorithm driven by Gene Ontology. J Biopharm Stat. 2004;14(3):687-700. [PubMed: 15468759]
14.
Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence. IJCAI; 1995:448-453.
15.
Bauer S, Köhler S, Schulz MH, Robinson PN. Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics. 2012;28(19):2502-2508. [PMC free article: PMC3463114] [PubMed: 22843981]
16.
Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research Electronic Data Capture (REDCap)–a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377-381. [PMC free article: PMC2700030] [PubMed: 18929686]
17.
Smedley D, Oellrich A, Köhler S, et al. PhenoDigm: analyzing curated annotations to associate animal models with human diseases. Database (Oxford). 2013;2013:bat025. doi:10.1093/database/bat025 [PMC free article: PMC3649640] [PubMed: 23660285] [CrossRef]
18.
Bone WP, Washington NL, Buske OJ, et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet Med. 2016;18:608-617. [PMC free article: PMC4916229] [PubMed: 26562225]
19.
Smedley D, Jacobsen JO, Jager M, et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat Protoc. 2015;10(12):2004-2015. [PMC free article: PMC5467691] [PubMed: 26562621]
20.
Fort D, Wilcox AB, Weng C. Could patient self-reported health data complement EHR for phenotyping? AMIA Annu Symp Proc. 2014;2014:1738-1747. [PMC free article: PMC4419899] [PubMed: 25954446]

Acknowledgment

Research reported in this report was funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (ME-1511-33184). Further information available at: https://www.pcori.org/research-results/2017/testing-two-patient-surveys-diagnosing-rare-genetic-conditions

Appendix

Terms of Reference (PDF, 119K)

Institution Receiving Award: Boston Children's Hospital
Original Project Title: Realization of a Standard of Care for Rare Diseases Using Patient-Engaged Phenotyping
PCORI ID: ME-1511-33184

Suggested citation:

Holm IA, Haendel M. (2021). Testing Two Patient Surveys for Diagnosing Rare Genetic Conditions. Patient-Centered Outcomes Research Institute (PCORI). https://doi.org/10.25302/07.2021.ME.151133184

Disclaimer

The [views, statements, opinions] presented in this report are solely the responsibility of the author(s) and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute® (PCORI®), its Board of Governors or Methodology Committee.

Copyright © 2021. Boston Children's Hospital. All Rights Reserved.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License which permits noncommercial use and distribution provided the original author(s) and source are credited. (See https://creativecommons.org/licenses/by-nc-nd/4.0/

Bookshelf ID: NBK604851PMID: 39008648DOI: 10.25302/07.2021.ME.151133184

Views

Other titles in this collection

Related information

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...