U.S. flag

An official website of the United States government

Names of clinical features, conditions, genes, proteins, and variants used in ClinVar, GTR, and MedGen

Return to GTR - Return to MedGen

Over the years, a condition, gene, protein, or variant may have been described by a variety of terms. For disorders with a genetic basis, drug responses, and other conditions that are of interest in medical genetics, the Genetic Testing Registry (GTR) and ClinVar aggregate comparable terms, assign them a stable identifier, and select one term from each set as a preferred name. These terms are then integrated with the subset of terms in the UMLS that are provided without restriction, and with other ontologies, to generate terms to support MedGen. ClinVar, GTR, and MedGen all support searching and display for both preferred and alternate names. The full XML extract of ClinVar also includes the sources of the alternate names. GTR and ClinVar share the database infrastructure for curating names of genes, proteins, variants, and conditions.

ClinVar, the Genetic Testing Registry (GTR), Gene, and MedGen use the same preferred term for the same concept.

Standard terms for other categories of data are documented as Authorities used in ClinVar.

Diseases and other phenotypes

Integration of names

Names are integrated from multiple sources; see the list of Data Sources for terms in MedGen.

Descriptions and definitions of disorders

GTR and MedGen display definitions or descriptions of disorders from multiple sources. For a complete listing of the sources of definitions, please refer to the Sources of definitions page.

Genes and proteins

The authority for symbols and full names for human genes is the HUGO Gene Nomenclature Committee (HGNC). If an official symbol has not yet been assigned, the preferred name in NCBI's Gene database is used in ClinVar and GTR.

Proteins are named based on UniProtKB/Swiss-Prot, in accordance with the NCBI RefSeq practice.

Variants

ClinVar calculates a name, or title, for every variant (VCV) record.

Curated names

  • A curated name takes precedence over an automated name.
  • Curated names may come from an authoritative source, like CPIC, or from curation by NCBI staff.
    • An example of a curated name from an authoritative source is CYP2C19*1.
    • An example of a curated name from curation by NCBI staff is UGT1A1*28.
  • There are only a few variants with curated names.

Sequence variants

For the purpose of naming, most variants in ClinVar are considered "sequence variants", in that we can calculate an HGVS expression for them.

The format for the preferred name for sequence variants is based on a nucleotide HGVS expression, with the addition of gene symbol and protein change (p.) when available.

  • [accession.version](gene symbol):[nucleotide change] (protein change)
    • e.g. NM_007294.4(BRCA1):c.5588A>G (p.Tyr1863Cys)
  • [accession.version](gene symbol):[nucleotide change] if there is no p. HGVS expression
    • e.g. NM_007294.4(BRCA1):c.*1363A>T

To choose the accession number/HGVS expression for the name, we use the order of precedence below.

At each step, if there is more than one choice because of overlapping genes, we use the HGVS for a transcript from the submitted gene.

  1. a MANE transcript
    • The MANE Select transcript is used most often (>95% of variants in ClinVar)
    • The MANE Plus Clinical transcript is used if the molecular consequence of the variant on that transcript is more severe than on MANE Select
  2. the NM transcript annotated on the RefSeqGene, where the variant is in the coding region
  3. the NM transcript annotated on the RefSeqGene, where the variant is in 5'UTR, intron, or 3'UTR
    • location in the 5' UTR and intron take precedence over 3' UTR
  4. the submitted NM or NR transcript
    • we use our mapping to the most recent version of the NM or NR where possible
  5. genomic HGVS on an NC (chromosome) or NW (contig) when there is no submitted transcript and the variant is not in a gene
    • we give precedence to an NC or NW from the most recent assembly
  6. HGVS expressions on other accessions are used when the previous options are not available

You can download a list of the HGVS expressions ClinVar calculates for each VariationID and AlleleID.

More information is available about HGVS expressions in ClinVar.

Copy number variants

For the purpose of naming, variants in ClinVar are considered copy number variants when the variant type is either copy number gain or copy number loss.

  • the format for the preferred name is [assembly] [cytogenetic band](chr:start-stop)x[copy_number]
    • e.g. GRCh38/hg38 2q22.1-22.3(chr2:136937358-146681810)x1
    • we give precedence to the location on the most recent assembly
    • copy number is computed relative to normal, so for a variant on chr X observed in a male, a copy number gain could be two copies (x2).

Exceptions

Our naming rules do not account for all kinds of variants. These exceptions are named "single allele" or "multiple alleles" by default, until a better name can be curated or calculated.

Exceptions are a small percentage of ClinVar variants. These exceptions include:

  • variants with variant type "complex"
  • variants with only a protein description
  • variants with only a cytogenetic location, no sequence location

Last updated: 2023-01-13T21:48:59Z