U.S. flag

An official website of the United States government

MedGen Help Documentation

MedGen is NCBI's portal to information about conditions and phenotypes related to Medical Genetics. Terms from the NIH Genetic Testing Registry (GTR), UMLS, HPO, Orphanet, ClinVar and other sources are aggregated into concepts, each of which is assigned a unique identifier and a preferred name and symbol. The core content of the record may include names, identifiers used by other databases, mode of inheritance, clinical features, and map location of the loci affecting the disorder. The concept identifier (CUI) is used to aggregate information about that concept, similar to the way NCBI Gene serves as a gateway to gene-related information. MedGen provides links to such resources as:

Genetic tests registered in the NIH Genetic Testing Registry (GTR) GeneReviews ClinVar OMIM Related genes Disorders with similar clinical features Medical and research literature Practice guidelines Consumer resources Ontologies such as HPO and ORDO

Links to the GTR, GeneReviews, and Practice Guidelines are based on curation by NCBI staff. Other data feeds are automated, but reviewed by NCBI staff and informed by feedback from the community.

We welcome your comments and suggestions.

MedGen Quick Start

Questions can be entered by entering any term, by selecting from a list of selected terms, or by using the Limits or Advanced functions to ask more complex questions.

Query strategies

Purpose

Examples

(try the link)

Comments
Table 1. Some query strategies
Find information about a condition by name achondroplasia As you type, names of genetic disorders used in the GTR will be provided. If you do not make a selection from the menu that appears under the search box as you type, your query is processed by looking for a match on a word or phrase. * is used as the wild card, and that wild card can be used only at the end of a word.
Find information about a condition based on the causative gene LMNB1[gene] If you enter a gene symbol followed by [gene], the diseases caused by or with some association to that gene will be retrieved.
Find information about a condition based on a clinical feature short stature[clinical features] If you enter the name of the feature followed by [clinical features] the diseases with that feature will be retrieved.
Find information about a condition based an identifier from another database 273800 For most databases, the query can be entered as you see it on the web page.  For more detailed examples, see below.

How to find information

MedGen uses the same type of query interface you may be familiar with if you use PubMed or Gene or the sequence databases. In other words, when you enter a term or phrase of interest in the query box, what you enter will be processed to retrieve records that contain or have some relationship to the word(s) you entered. The information is also organized into information categories or fields, so that queries can be constructed that retrieve records only if the term of interest occurs in that field. If you know the name of the field, you can enter that field name yourself. Otherwise you can use the Limits page or the Advanced page to help you build your query.

Simple queries

Just type your search term, and press enter or click on the Search button to the right of the query box.

Query by database identifier

MedGen is searchable by identifiers established by several resources for disorders or findings.  The field in which these identifiers are indexed is "Source ID".  Note the hints below.

Resource Type of identifier Note Sample query
OMIM disorder Current MIM numbers are searchable in MedGen. If an integer is entered without field qualifier [mim], then it will be searched both as a MIM number and as an identifier (MedGen UID). 192600
OMIM phenotypic series A phenotypic series ID can be used as a simple query. Phenotypic series data are added MedGen interactively, so the data are not yet complete.  PS192600
HPO disorders and findings Current and alternative IDs from HPO are maintained in MedGen. Searching by those alternative ids will direct the users to the current record of that HPO concept. HP:0012444
Orphanet disorders and findings A query for an identifier from Orphanet must be constructed as orphanet_12345, not ORPHA12345. orphanet_180
MeSH disorders and findings Records retrieved based on a query by MeSH identifier are based on relationships supplied by UMLS. D015211
NCI disorders Records retrieved based on a query by NCI identifier are based on relationships supplied by UMLS. C2910

Using Limits

Click on Limits in the grey query bar. There you can make selections to restrict your query by chromosome, relationships to other NCBI databases, types of content, and/or sources of the terms.

Using Advanced/MedGen Advanced Search Builder

Advanced is very useful when you want to construct a query that combines several concepts. The interface allows you browse for terms that may be anywhere in the database (All Fields) or in a particular field (selected from the menu of field names). Each search term can be combined with the others by the choice of AND, OR or NOT provided to the left of the next query term. Documentation of the mechanics of using Advanced Search Builder is provided here.

As an example of using the MedGen Advanced Search Builder, try the following steps to identify disorders that have been coded to have the clinical features of cleft palate and learning disability.

  1. Click on Advanced in the query bar at the top of the page.
  2. In the menu labeled All Fields, scroll down and select Clinical Features.
  3. Enter cleft in the box at the right, and click on Show index list to review the terms indexed in the clinical features field that start with cleft
  4. Select cleft palate. You may have to scroll down. The terms are listed alphabetically.
  5. In the next open line, make certain AND is displayed in the first menu.
  6. In the menu labeled All Fields, scroll down and select Clinical Features.
  7. Enter learning in the box at the right, and click on Show index list to review the terms indexed in the clinical features field that start with learning.
  8. Process your query by clicking on Search.

Note: If you want to combine several terms for a search, and some are alternative terms (connected by OR) the order in which you enter terms does make a difference. For example, "cleft palate"[Clinical Features] AND "learning disability"[Clinical Features] OR "developmental delay"[Clinical Features] is processed differently from "learning disability"[Clinical Features] OR "developmental delay"[Clinical Features] AND "cleft palate"[Clinical Features] . See the general documentation about how to use the Edit or History function on the Advanced page to control your search.

Name Abbreviation Scope and explanation
Table 2. Fields in MedGen
Chromosome

[chr]

[chromosome]

The chromosome(s) contributing to the disorder, based on the genes or alleles that have been identified.  A search including a term like 5[chr] will restrict results to conditions mapping to chromosome 5.
Clinical features

[clinfeat]

[clinical features]

Clinical features associated with a disorder. These are built from annotation data supplied by the Human Phenotype Ontology (HPO), or features from OMIM as processed by UMLS. Images from Elements of Morphology mapped to HPO terms are displayed as available.
Definition

[defn]

[definition]

Terms occurring in the definition of the disorder.  Definitions are provided from vocabularies registered in UMLS, from GeneReviews, CPIC, Medical Genetics Summaries, or more...
Exact title [exact title] The preferred name of any record.  This field is used to improve retrieval based on the full name of any record.  If the name is made of more than one word, performance is improved by surrounding the query in quotes, e.g. "ectodermal dysplasia"
Filter

[sb]

[filter]

Terms in this field can be used to find records that fall into certain categories, such as all records for which a gene relationship has been reported.
Gene Full Name [gene full name] Full names of genes that have some relationship to a phenotype. The full names used to support this search are limited to current official names from the HUGO Gene Nomenclature Committee (HGNC).
Gene Name [gene] Gene symbols that have some relationship to a phenotype. The symbols used to support this search are limited to current or previous official symbols from the HGNC. If there is no official symbol from the HGNC, then NCBI Gene's preferred symbol is used.  The data supporting the gene-phenotype relationship is built primarily from OMIM, GTR, NHGRI GWAS catalog, etc.
Guideline title [Guideline title] Titles of guidelines associated with a record in MedGen. Both the full title and terms in the title are represented in the index.  This field is useful to find records associated with a practice guideline, position statement, or recommendation based on the title of that document.
Keyword [keyword] For a limited number of records, a key word may be added to the indexed terms to facilitate retrieval.
MIM [mim] The MIM (Mendelian Inheritance in Man) number associated with a record. See OMIM.
Mode of inheritance [mode of inheritance] The mode of inheritance for records that are heritable disorders.
Modification date [moddate] The date any element in the record was last modified.
Properties

[prop]

[property]

Terms in the property field are standards used to categorize records in MedGen. The full list of properties, and their definitions, are provided in this document.
SNOMED CT CUI [SNOMED CT CUI] The concept identifier(s) (CUI) used by SNOMED CT for terms associated with the MedGen record.
Source ID [Source ID]

The identifier used by the source of the term in MedGen, e.g. MeSH, OMIM, SNOMED  CT. This field also includes the concept identifiers (CUI) from UMLS and identifiers created by NCBI (format CN + 6 digits) when no CUI was associated. If a CUI is established, the previous NCBI identifier is retained in this field for indexing.

Text word [text] Any word in a MedGen record.
Title [title] Any word in the title of a MedGen record.
UID [uid] The integer identifer that corresponds to the CUI. This integer and the CUI exist in a 1:1 relationship, i.e. if a CUI for a concept is updated for any reason, the UID will change as well. A partial history of CUI modifications is provided on ClinVar's FTP site.
Vocabulary

[vocab]

[vocabulary]

Abbreviations of the vocabularies used explicitly in MedGen.  gtr:  GTR, hpo: Human Phenotype Ontology, msh: MeSH,  nci: NIH National Cancer Institute, omim: Online Mendelian Inheritance in Man, snomedct: SNOMED CT

Application programming interface (API)

Data from MedGen can be retrieved programmatically via several application programming interfaces (API). These include:

E-utilities and Entrez Direct

As part of NCBI's Entrez system, MedGen can be accessed by E-utilities, both via web services and a UNIX command line as Entrez Direct. Documentation for these utilities is available from NCBI's BookShelf. To apply the general cases in the documentation to the specifics of MedGen, please note that MedGen currently supports only esearch and esummary.

The content of MedGen's document summary is the same as the XML displayed by selecting XML as a Display Setting. Please note that to conform to the requirements of NCBI's Entrez system, there are many elements in the ConceptMeta element, with element tags represented as the ASCII transform so the complex structure can be processed as a text string. Thus, depending on how you access the data, you may want to consider a global replacement of these encoded values.

Function Example
Use esearch to find unique identifiers of records of interest http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=medgen&term=charcot&retmax=500
Use esummary to retrieve the document summary of one of the identifiers retrieved in the previous query http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=medgen&id=

Data in MedGen can be accessed by URLs. The result may be a single record, or a set of records that satisfy the query. These examples are a sample only; any field or property listed in the previous section can be used to build a URL by combining the base URL with the term and its field qualifier.

The base url is: http://www.ncbi.nlm.nih.gov/medgen/

which also takes you to MedGen's home page.

Add the value in the Qualifier column in the table below to construct the final URL.

Type Qualifier Example
Table 3. Representative URLs
CUI C0010674 https://www.ncbi.nlm.nih.gov/medgen/C0010674
Clinical feature Hypoplasia of the ulna[clinical features] https://www.ncbi.nlm.nih.gov/medgen/?term=%22Hypoplasia+of+the+ulna%22%5BClinical+Features%5D
Symbol of the gene that contributes to the phenotype FGFR3[gene] https://www.ncbi.nlm.nih.gov/medgen/?term=fgfr3%5Bgene%5D
Identifier used by a source, e.g. OMIM 188400 188400[sourceid] https://www.ncbi.nlm.nih.gov/medgen/?term=188400%5Bsourceid%5D
Preferred name cystic fibrosis[exacttitle] https://www.ncbi.nlm.nih.gov/medgen/?term=cystic+fibrosis%5Bexacttitle%5D
Pharmacogenetic phenotypes pharmacogenetic[keyword] https://www.ncbi.nlm.nih.gov/medgen?term=%22pharmacogenetic%22%5BKeyword%5D

Using the query results

MedGen search results are provided in the Summary format, in order of relevance and with 20 items per page. You can change the display format as well as the number of the items per page by opening the Display Settings menu at the upper left and resetting the values.

Options for Display Settings

Format option Description
Summary

The default summary display.  Includes the title, a brief description /definition when available, a report of the UID, the ConceptID, and the type of record, and links to GTR, Genes, OMIM, and GeneReviews when available.

The source of the definition is indicated in square brackets at the end. Details about each source are provided here.

Summary (text) The text of the summary (no links). Includes the title, a brief description /definition when available, the (semantic) type of record, the identifier for that semantic type, the Concept ID, and the integer ID.
UI List The list of integer IDs for the records returned by a query.
XML The document summary with elements of ConceptId, Title, Definition, SemanticId, SemanticType, Suppressed, ConceptMeta, ModificationDate, and Merged. ConceptMeta contains structured information about related identifers and terms.

To display the full record of an item, click on the title of the summary (available only from the Summary display option).

Using the full report page

The full report page provides descriptive information defining a record, and links to sites with related information. This page may be divided into several sections; not all records will have content in all sections and each section can be collapsed or expanded for ease of navigation.

Names and identifiers

This section summarizes names and identifiers associated with the record. If the record is a disorder with a known genetic basis, the gene or genes are reported, along with mode of inheritance and cytogenetic location (Figure 1).

Figure 1

Names and Identifiers

The preferred name and its preferred acronym (in parentheses) are provided at the top of the page. Under that are the identifiers assigned to the record, namely the MedGen integer identifier and the alphanumeric Concept ID. Please note that the MedGen integer identifier and the alphanumeric Concept ID are maintained on a 1:1 basis, so there is no avantage to storing one over the other.

When the concept ID has an integer at the second position, as in this example, then the identifier is provided by UMLS. If starting with CN, the concept is not from the current release of UMLS but generated from NCBI.  With a new release of UMLS, the CN identifier may be replaced with a concept identifier from UMLS.  The history of these changes as they relate to GTR and ClinVar is provided for ftp at ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/ConceptID_history.txt.

Of all the terms available for a Concept ID, MedGen selects the preferred one based on preferred names from these sources, listed in order of precedence:

Curatorially reviewable by the NIH Genetic Testing Registry (GTR) and ClinVar Preferred name from the Human Phenotype Ontology NCI thesaurus for cancer-related terms (semantic type of Neoplastic Process) SNOMEDCT_US (which is used as a first priority for names from UMLS when available) Orphanet UMLS

Synonyms

Alternate terms are derived from the vocabularies being used in MedGen that belong to the same concept.

Modes of inheritance

Report of one or more modes of inheritance for this disorder.  These data are integrated from Orphanet or association files from the Human Phenotype Ontology (HPO). The values that may be reported are documented here.

SNOMED CT

Terms from SNOMED CT for this concept, and their identifiers. In this example, the identifiers are all the same, but in some cases UMLS will include terms from different SNOMED CT concepts within the same UMLS concept; thus providing the identifers from SNOMED CT will make it clear when that has occurred.

Gene

The symbols of genes reported  to contribute to a disorder. Each symbol anchors a link to the record in NCBI's Gene database. The complete list of Gene-MedGen relationships is provided from Gene's FTP site

Cytogenetic location

Cytogenetic locations associated with a disorder. There are based on the location of each gene with a reported relationship to the disorder.

OMIM

The MIM condition number corresponding to this record.

HPO

The HPO number corresponding to this record.

Disease characteristics and additional descriptions

This section of the full report provides a brief description or definition of the concept, along with attribution for the primary source of that description. More details about each source are provided here. If, as in this example (Figure 2), content is provided from a GeneReviews, the section also provides links to explicit sections of the GeneReviews and the list of authors.

Figure 2

Names and Identifiers

Clinical features

A record about a condition may include a section describing the features of the condition. These data are provided from either the Human Phenotype Ontology (HPO) or OMIM. The first five features are displayed, with an option to view the full list.

Figure 3

Names and Identifiers

If you hover over any term in the list, a pop-up (Figure 4) will display the definition of that term, and provide links to either the full report in MedGen for that feature (Feature record), or other conditions reported to have that feature (Search on this feature).

Figure 4

Names and Identifiers

Images depicting clinical features are displayed, as available from Elements of Morphology e.g., Crumpled ear.

Term hierarchy

The hierarchies (Figure 5)  are constructed based on relationships reported for each concept as direct or indirect links between terms from vocabulary sources.  Concepts in GTR hierarchy (Figure 5- A) are displayed alongside any available links to Clinical tests, Research tests, OMIM, or GeneReviews.  Each term is linked to the full report for that concept.

Figure 5

Names and Identifiers

Practice Guidelines

This section (Figure 6- A) is provided via curation by staff at NCBI. If we have missed any, please contact us to provide the information.

Recent clinical studies

This section (Figure 6) is generated at the time of display of the web page.  It uses the title of the record as a query and processes that title as if it had been submitted as a clinical query to PubMed (PubMed Clinical Queries), with display of the results in the Clinical Study Categories section, selecting Diagnosis with broad scope. Additional filters are applied, namely English language and human, and not comment publication types nor letter publication types. The results are presented in 5 subsections of Etiology (B), Diagnosis (C), Therapy (D), Prognosis (E), and Clinical prediction guides (F).

Figure 6

Names and Identifiers

Recent systematic reviews

This section (Figure 6- G) is generated at the time of display of the web page.  It uses the title of the record as a query and processes that title as if it had been submitted as a clinical query to PubMed (PubMed Clinical Queries), with display of the results in the Systemic Reviews section. Additional filters are applied, namely English language and human, and not comment publication types nor letter publication types.

Discovery panel at the right

Table of contents and page navigation

The table of contents (Figure 7) summarizes the sections that are available in the record being displayed.  Each term is hyperlinked to support quick navigation to that section.  When you are within the page, the Go to link at the top of any section header (e.g. Figure 2) provides a pull down menu with quick links to other sections.

NIH Genetic Testing Registry (GTR)

The NIH Genetic Testing Registry section (Figure 7) lists the types of genetic tests that are reported in the GTR for the condition or phenotype, according to the primary method being used.  The section provides links to all tests, or tests using the method category being listed.

Figure 7

Names and Identifiers

Outreach and support

This section enumerates and provides links to several sources that may provide information for consumers.

Figure 8

Names and Identifiers

Reviews

The Reviews section (Figure 9- A) provides links based on curation by NCBI staff.

Figure 9

Names and Identifiers

These links (Figure 9- B) point to other NCBI databases, and are computed by NCBI's query retrieval system.

Recent activity

A list of your recent queries and retrieval sets.  This function is provided by many NCBI databases.

Display settings

On the full report page, three format options are provided.

Format Description
Full report The default full display
Full report (text) This title is misleading.  The text is rather the text of the document summary for this report.
XML The XML for the document summary for this report.

Building the database

MedGen is built by aggregating data from multiple resources.  The primary foundation is concepts provided from UMLS without restriction, with updates according the UMLS' release cycle (May and November).  Other sources also provide terms as summarized in the Data Sources table below.

The data model of MedGen is patterned on that of UMLS, namely sets of terms thought to be equivalent are grouped by a concept unique identifier (CUI). That equivalency depends on the term and its definition, and the type of concept it represents. For example, the same term may be the name of disorder and a description of a clinical feature. Because those are different types of terms, they may be assigned different CUI values.

Terms from sources not in scope for UMLS may be integrated into concepts already created by UMLS, or new 'concepts' may be created.  You can tell the difference because concepts from UMLS begin with the letter C followed by numerals. Concepts from MedGen's processing start with CN. 

Note: Identifiers generated by MedGen may be retired if UMLS generates a concept that corresponds to one initiated by MedGen. The history of those changes is reported as MedGen_CUI_history.txt.

The automated aggregation of terms in MedGen can be reviewed by staff of GTR and ClinVar.  If staff members or external user question a mapping, or identify a gap, curators will review data sources and the data flow to identify a solution. In some cases, staff members may establish records to seed the automated processing.

Data Sources for terms

Representation from UMLS  is not comprehensive, but is selected for MedGen based both on vocabulary source (to ensure coverage and requests from you, our user community) , and categories of terms within that vocabulary source (semantic type). In other words, even though MedGen includes terms from MeSH, it does not include terms from such categories as Subheadings or Technology and Food and Beverages. Vocabulary sources used from UMLS include MeSH, NCI, OMIM and SNOMED CT US. 

Selected sources with frequency of update

MedGen is updated daily, but not all data sources update that often. Updates may include adding terms, or adding connections to related concepts in other databases.

Source Frequency Comments
ClinVar Daily A subset of terms provided by submitters to ClinVar. Terms are reviewed by NCBI staff before releasing to MedGen. Thus there may be condition reported to ClinVar that is not represented in MedGen.
GeneReviews Daily Terms may be reviewed by GTR staff. Definitions are added based on the MIM number relationship.
GTR Daily Terms provided by those registering tests in the NIH Genetic Testing Registry (GTR)
Human Phenotype Ontology (HPO) Within a month of any release from HPO A primary source for clinical features of Mendelian disorders. MedGen uses the mapping of preferred terms from HPO to CUI provided by UMLS. Until one is available, MedGen assigns a CUI starting with CN. Thus CUI used in MedGen for HPO-specific data may change with updates from UMLS.
Medical Genetics Summaries When published to the NCBI BookShelf Definitions are submitted based on CUI.
Mondo Monthly (by release from Mondo) Terms used by Mondo, their identifiers, their definitions, and their mappings between Mondo IDs and IDs from GARD, OMIM Phenotypic Series and Orphanet.
OMIM Daily Terms from OMIM are processed from both UMLS (which releases information twice a year), and daily updates directly from OMIM. The direct updates from OMIM are also used as a foundation of reporting gene-disease relationships. CUI assigned to records defined by MIM numbers may change with updates from UMLS, i.e. when MedGen-generated CUI is replaced with one from UMLS.
OMIM phenotypic series Releases from Mondo Concepts represented by OMIM's phenotypic series are integrated into MedGen as part of releases from Mondo.
ORDO Releases from ORDO and Mondo Terms and definitions from the Orphanet Rare Disease ontology (ORDO) are processed into MedGen automatically based on mappings establshed by Mondo. Concepts include the modes of inheritance characteristic of any disorder.
UMLS twice a year

Representation of terms from UMLS is restricted to a subset of vocabulary sources, and categories of terms (semantic types). Vocabulary sources included in the UMLS data flow include

  • GARD,
  • MeSH,
  • NCI,
  • OMIM,
  • SNOMED CT
The categories in UMLS used by MedGen are processed as properties and include
  • Congenital Abnormality
  • findings
  • Molecular Function
  • Pathologic Function
  • Disease or Syndrome
  • Mental or Behavioral Dysfunction
  • Pharmacologic Substance
  • sign or symptom
  • Anatomical Abnormality
  • Neoplastic process
Elements of Morphology unscheduled Images from Elements of Morphology: Human Malformation Terminology mapped to HPO terms are displayed on Clinical feature records, as available.

Sources of definitions

For a complete listing of the sources of definitions used by MedGen, please refer to the Sources of definitions page.

Sources of relationships between disorders and their clinical features

OMIM, based on data from UMLS, represented as 'has_manifestation' in the relationships file (MGREL).

Human Phenotype Ontology, based on data from HPO and UMLS, represented as 'has_manifestation' in the relationships file (MGREL).

Support Center

Last updated: 2022-04-04T17:23:22Z