Instructions for ClinVar submission spreadsheets
This page provide general information about filling in ClinVar's submission spreadsheet. The divisions in this page primarily correspond to the tabs on the worksheet, but we also give special attention to submitting information about disease/phenotype and your interpretation of clinical significance.
Not every column is described here; instructions are also included in each column in the spreadsheet itself.
- Checklist for faster processing
- Variants
- Condition
- Clinical significance
- Evidence
- Citations
- Merge submissions
- Delete submissions
- Columns added by ClinVar
Checklist for faster processing
Help your ClinVar curator process your submission faster!
- Provide data in all required fields
- Validate your HGVS expressions with VariantValidator or Mutalyzer
- Do not modify the column headers
- Do not modify the cell validation; use allowed values when there is a list
- Check the instructions for each column on the spreadsheet for correct format and separators for a list
Variants
ClinVar welcomes submissions of variants interpreted as homozygotes, haplotypes and compound heterozygotes, and we offer the following guidance:
- if each variant has been interpreted independently, please submit the variants separately to ClinVar. Distinct interpretations for each variant may be more useful to those using data in ClinVar.
- For example, if you observed two variants in compound heterozygosity and you determined that they are pathogenic for an autosomal recessive disease, submit each variant on a separate row as a pathogenic variant for that disease.
- The submission for each variant can note that it was observed with the other variant and the mode of inheritance.
- if multiple variants have been interpreted together because you cannot interpret them independently, submit them on one row as the appropriate combination, either as a haplotype or as a compound heterozygote.
- if you are submitting a variant that was observed in a homozygote, do not submit the variant as an HGVS expression for the homozygote, e.g. c.[105C>A ]; [105C>A ]
- submit the single variant c.105C>A
- provide the clinical significance based on the single variant's contribution to disease
- fill in the column "mode of inheritance" if possible
- if you provide aggregate evidence on the Variant tab, fill in the column "Number of homozygotes"
- if you provide individual evidence on the CaseData tab, enter "homozygote" in the column "Zygosity"
Sequence variants
HGVS expressions
- Check that your HGVS expressions are valid with VariantValidator or Mutalyzer.
- On the lite spreadsheet template, enter the HGVS expression in the 'HGVS' name column.
- On the full spreadsheet template, enter the accession.version number in the 'Reference sequence' column and the c./g. portion of the HGVS expression in the 'HGVS' column.
- We only accept NCBI RefSeq accession numbers as the reference sequence due to technical constraints (namely, that we do not have alignment datasets for GenBank accessions).
- Do not include the p. HGVS expression in these columns. It may be provided in the 'Alternate designations' column instead.
- If you have information on multiple nucleotide changes that result in the same protein change, submit each nucleotide change on a separate row.
- Spreadsheets with examples of valid HGVS expressions that ClinVar accepts and invalid HGVS expressions and corresponding error messages are available.
Chromosome coordinates
- Use the full spreadsheet template (SubmissionTemplate.xlsx).
- Note that you can delete any columns that are not required by ClinVar to make the spreadsheet simpler to use
- Provide the chromosome in the 'Chromosome' column.
- Provide the first and last positions of the variant in the 'Start' and 'Stop' columns.
- For variants of 15 or fewer nt, provide the reference and alternate alleles in the 'Reference allele' and 'Alternate allele' columns.
- Use VCF-style notation with an anchor nucleotide, e.g. ref AT alt T
- Do not fill in the 'Variant type' column; we will calculate this for you.
- For variants of more than 15 nt, fill in the 'Variant type' column.
- Do not fill in the 'Reference allele' and 'Alternate allele' columns.
- Spreadsheets with examples of valid cases with chromosome coordinates that ClinVar accepts and invalid cases and corresponding error messages are available.
Structural variants
- Use the full spreadsheet template (SubmissionTemplate.xlsx).
- Note that you can delete any columns that are not required by ClinVar to make the spreadsheet simpler to use.
- Provide the chromosome in the 'Chromosome' column.
- Provide the type of variant in the 'Variant type' column.
- For deletions and duplications that are detected by array, use "copy number loss" and "copy number gain", rather than "deletion" and "duplication".
- If the exact coordinates (to basepair resolution) of the variant call are known, fill in the 'Start' and 'Stop' columns.
- If only the minimal region is known, use inner_start and inner_stop.
- If only the maximum region is known, use outer_start and outer_stop.
- Otherwise, use outer_start (lower value) and inner_start (upper value) to define the interval in which the call begins. Likewise, use inner_stop (lower value) and outer_stop (upper value) to define the interval in which the call ends.
- Provide the observed copy number in the 'Copy number' column
- Provide the expected copy number in the 'Reference copy number' column
Other considerations for structural variants:
- a structural variant is often interpreted but not for a specific disease. You have a few options:
- You may enter "not provided" as the 'Preferred condition name'
- If you are also providing observed phenotypes in the 'Clinical features' column, you may enter "See cases" as the 'Preferred condition name' so that users know there is no asserted condition but there is phentoype information about the case
- See the Condition section for more options.
- You may indicate the gene(s) affected by the variant, but it is not required.
- ClinVar will calculate the genes that are affected by the variant.
- ClinVar also displays the results of ClinGen's dosage sensitivity curation to flag genes that are known to cause a phenotype when there is a loss or gain of one copy of the gene.
Cytogenetic variants
- Processing for cytogenetic variants is under discussion.
- Please contact us if you have cytogenetic variants to submit.
Somatic variants
- indicate "somatic" for 'Allele origin '
- we do not have a recommendation for terms for the interpretation of somatic variants; please use the options for clinical significance
- if the variant is interpreted for its effect on a tumor's response to a drug, consider "drug response" as the clinical significance. More details on this type of submission are in the next section on pharma variants .
- contact us if you have feedback about types of evidence we should support for somatic variants
Pharma variants
- if the variant is a haplotype
- provide the set of variants in the haplotype as an HGVS expression on a single row on the Variant tab
- if there is a star allele name for the haplotype, provide that name for 'Official allele name'
- use "drug response" for 'Clinical significance'
- in 'Explanation if clinical significance is other or drug response' provide a short description of the type of response, such as "poor metabolizer" or "likely responsive"
- to provide the drug and the condition for which the drug is used, see the table in the Condition section , "You interpreted the variant for its effect on a drug response"
- if you want to indicate the clinical significance of a variant for a disease and also describe its effect on a drug response, provide that information as two rows on the Variant tab
- one row for the interpretation of pathogenicity for the disease
- a second row for the interpretation of a drug response
Haplotypes
An interpretation may be submitted to ClinVar for a set of variants in cis, i.e. a haplotype.
- This should be done only if the combination of variants is important for the interpretation. For example, some star alleles for pharma variants are defined by several SNPs observed in cis.
- An interpretation for the haplotype is not necessary when the haplotype exists because a clinically important variant is always seen in combination with another variant, e.g. due to a founder effect. In that case, submit your interpretation of the clinically important variant.
Genotypes
An interpretation for a genotype may be submitted to ClinVar. However, ClinVar is a variant-level database, not a case-level (or patient) database; thus an interpretation should be provided for each individual variant whenever possible.
- if you observed two variants in compound heterozygosity and you determined that they are pathogenic for an autosomal recessive disease, submit each variant separately as a pathogenic variant for that disease. Distinct interpretations for each variant may be more useful to those using data in ClinVar. The submission for each variant can note that it was observed with the other variant and the mode of inheritance can be indicated.
- if you observed a single variant in homozogosity, do not submit the variant as an HGVS expression for the homozygote, e.g. c.[105C>A ]; [105C>A ]. Instead, submit the single variant, e.g. c.105C>A, and provide the zygosity for the individual.
There are a few cases where an interpretation for a genotype is appropriate for submission to ClinVar:
- some practice guidelines are written based on the combination of variants seen in an individual. When the professional society who provides the practice guideline submits data to ClinVar, it may describe the genotype for each interpretation.
- in some cases, a combination of variants in trans causes a different phenotype than expected. For example, two oathogenic CFTR variants in trans may be expected to cause cystic fibrosis, but they may actually cause a less severe phenotype such as congenital bilateral absence of the vas deferens.
Autoclassified variants
If submitting a variant that was autoclassified as benign:
- in the “Comment on clinical significance” field, note that the variant was autoclassified and filtered, not subjected to a comprehensive review
- do not provide assertion criteria for autoclassified variants; assertion criteria imply that the variant was subjected to a comprehensive review
Variation identifiers
Optionally, you can provide an identifier used for the variant in one of the following databases: OMIM (allelic variant ID), dbSNP (rs number), dbVar (submitter or region identifier), or COSMIC ID. Use the format database_name:database_identifier. For example:
- OMIM:611101.0001
- dbSNP:rs104894321
- dbVar:nsv491743 or dbVar:essv12345
- COSMIC:COSM13027
Alternate designations
Optionally, you can provide other names for the variant. This is most important for variants that have legacy names that cannot be automatically calculated from current data, such as "deltaF508" or names that use old numbering systems. While you can provide other valid HGVS expressions in this column, such as the p. description, it is not necessary because ClinVar calculates other valid HGVS expressions for genomic DNA, cDNA, and protein automatically. Examples of useful alternate names include:
- deltaF508 - common name for NM_000492.3:c.1521_1523delCTT
- Z allele - common name for NM_001127701.1:c.1096G>A
- 3120G->A - legacy name for NM_000492.3:c.2988G>A
Condition
We strongly encourage using database identifiers to represent standard terms; this facilitates comparison of data among submitters.
The table below describes options for providing different kinds of disease and phenotype information.
Your data | Spreadsheet | Column | Tabs | Comment | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
You interpreted the variant for a single disorder, diagnostic term, or broad disease category |
Full (SubmissionTemplate.xlsx) and lite (SubmissionTemplateLite.xlsx) |
Condition ID type Condition ID value |
Full: Variant Lite: Variant, ExpEvidence |
|
|||||||||||||||||||||
You interpreted the variant for multiple disorders that were observed together in the same individual or group of individuals. | Full (SubmissionTemplate.xlsx) and lite (SubmissionTemplateLite.xlsx) |
Condition ID type Condition ID value Condition uncertainty |
Full: Variant Lite: Variant, ExpEvidence |
|
|||||||||||||||||||||
You interpreted the variant for multiple disorders which occur in different individuals. | Full (SubmissionTemplate.xlsx) and lite (SubmissionTemplateLite.xlsx) |
Condition ID type Condition ID value |
Full: Variant Lite: Variant, ExpEvidence |
Provide distinct interpretations for each disorder by submitting the variant on multiple rows, one row per condition |
|||||||||||||||||||||
You interpreted the variant for multiple disorders but you are uncertain which is correct. | Full (SubmissionTemplate.xlsx) only |
Condition ID type Condition ID value Condition uncertainty |
Full: Variant |
|
|||||||||||||||||||||
You are describing clinical features/phenotypes observed in an individual with the variant. | Full (SubmissionTemplate.xlsx) only | Clinical features |
Variant - to describe phenotypes observed in a group of individuals. CaseData - to describe phenotypes observed in the specific individual represented on that row. |
Provide the list of phenotypes separated with semi-colons, as either:
You can provide clinical features whether or not there is a disorder for the interpretation. See below for how to indicate that there is no disorder for the interpretation. |
|||||||||||||||||||||
You interpreted the variant for its effect on a drug response | Full (SubmissionTemplate.xlsx) and lite (SubmissionTemplateLite.xlsx) |
Condition ID type Condition ID value or Preferred condition name |
Full: Variant Lite: Variant, ExpEvidence |
The condition should be provided as drug name + response,
e.g. Warfarin
response
|
|||||||||||||||||||||
List of clinical findings that define a novel syndrome | Full (SubmissionTemplate.xlsx) and lite (SubmissionTemplateLite.xlsx) | See comment | See comment |
If you are proposing a name for the syndrome
|
|||||||||||||||||||||
Diagnostic term AND indication for testing | Full (SubmissionTemplate.xlsx) only |
Condition ID type Condition ID value Indication |
Variant CaseData |
|
|||||||||||||||||||||
No diagnostic term, only indication for testing | Full (SubmissionTemplate.xlsx) only |
Preferred condition name Indication |
Variant CaseData |
|
|||||||||||||||||||||
No specific disorder, because you intrepreted the variant as benign, likely benign, or uncertain significance for multiple disorders | Full (SubmissionTemplate.xlsx) and lite (SubmissionTemplateLite.xlsx) | Preferred condition name |
|
- View our tutorial on selecting MedGen terms for ClinVar and GTR submissions.
- Consult our gene-disease list for examples of relationships curated by authoritative groups.
- Contact us.
Clinical significance
If you do not see an appropriate value of clinical significance, e.g. for a pseudodeficiency allele, please submit "other" as the clinical significance and also provide your value of clinical significance as "Explanation if clinical significance is other or drug response". You may also consider providing more information in the "Comment on clinical significance" explaining how and why your laboratory reports the variant.
Assertion score
Submitters are encouraged to provide documentation describing the criteria they use to classify variants, referred to as assertion criteria. If your assertion criteria include a point-based scoring system, the final score, or point value, may be submitted as the Assertion score (Variant tab). The ACMG/ClinGen CNV Guidelines, 2019 is an example of a point-based scoring system for variant classification. If Assertion score is provided, Assertion criteria must also be.
Assertion criteria and review status
We ask submitters to provide documentation describing the criteria they use to classify variants, which we call "assertion criteria".
- Read the requirements for assertion criteria.
- If you have questions about assertion criteria, contact us.
How to meet the requirements for the "criteria provider, single submitter" review status
Assertion criteria is provided through the ClinVar Submission Portal as either:
- a citation ID (PubMed ID, PubMedCentral ID, Bookshelf, or DOI) or
- an electronic document (a Word document or PDF)
See How to provide assertion criteria in your submission
A submission must reference one set of assertion criteria only, and these criteria must apply to every variant classification in the submission.
A review status of "criteria provided, single submitter" requires this documentation AND either supporting evidence for each classification OR a public contact for your organization. Supporting evidence in the submission includes:
- One or more of these columns on the Variant tab:
- comment on clinical significance
- citations on clinical significance
- citations or URLs that cannot be represented in clinical significance citations column
- comment on evidence
- citations on evidence
- number of individuals with variant
- number of families with variant
- number of families with segregation observed
- number of homozygotes
- number of heterozygotes
- number of compound heterozygotes
- number of hemizygotes
- evidence citations
- OR data on the CaseData tab
Evidence
ClinVar requires that you provide some evidence for your interpretation.
The evidence is provided as observations, one of:
- individuals with the variant described in aggregate
- each case with the variant described individually
- functional/experimental evidence
Aggregate observations
- You can provide a single row on the Variant tab that represents your intepretation and all individuals in whom you observed the variant.
- only provide data in a column if it applies to everyone in the group, for example:
- provide an age range rather than one specific age
- "mixed gender group" if the group includes males and females
- only provide ethnicity if all individuals in the group are of the same ethnicity
- only provide zygosity if all individuals in the group have the same zygosity
- only provide data in a column if it applies to everyone in the group, for example:
- Alternatively you can provide multiple rows on the Variant tab that represent the same interpretation but different aggregate observations.
- You can aggregate by whichever variable is relevant to your data.
- Some examples:
- one row for affected individuals and a second row for unaffected individuals
- one for for single heterozygotes and a second row for homozygotes
- one row for age <20, a second row for age 21-40, and a third row for age >41
- You may also combine fields for more specific aggregates, such as affected homozygotes one one row and unaffected single heterozygotes on a second row
Case observations
If you are able to submit information about each case in whom the variant was observed:
- Provide the variant interpretation on the Variant tab
- Fill in one row per variant/condition interpretation
- Provide a value for LinkingID; this represents your variant/condition interpretation
- Do not fill in the columns in the sections for "Details of test and individuals tested", "Details of testing results", or "Methods"
- Provide your observations on the CaseData tab
- Each case that supports the variant/condition interpretation has its own row on CaseData, so there may be one or multiple CaseData rows for each Variant row
- The LinkingID links one row on the Variant tab to one or multiple rows on the CaseData tab
- One case may be described multiple times on CaseData. e.g. if you submit distinct interpretations for both variants from a compound heterozygous individual:
- each variant is provided on a separate row on the Variant tab
- the case data is provided on two rows on the CaseData tab, once for each variant interpretation, represented by the LinkingID
Functional evidence
Research laboratories may generate experimental evidence to support the functional consequence of a variant. This type of evidence is submitted on the full spreadsheet template, on the FunctionalEvidence tab.
This tab is to be used *only* to submit your own research results. If you are submitting interpretations from another method such as clinical testing and you would like to note that there is functional evidence for the interpretation, please include that evidence on the Variant or CaseData tab, as appropriate, as one or more citations or as a comment.
- on the Variant tab
- define the variant , as described above.
- provide a LinkingID to link this row on the Variant tab to one or more rows on the FunctionalEvidence tab
- If you are describing the functional consequence of the variant, but not the clinical significance for a patient:
- enter "not provided" for 'Preferred condition name'
- alternatively, you may provide a disease name if there is a clear gene-disease relationship that prompted the development of your functional assay
- enter "not provided" for 'Clinical significance'
- enter a value for 'Functional consequence' based on your experimental results
- optionally you may enter a 'Comment on functional consequence' to provide more details
- enter "not provided" for 'Preferred condition name'
- On the FunctionalEvidence tab
- enter each experimental observation on a separate row
- multiple experiments may be submitted for the same variant interpretation
- use the LinkingID to link multiple experiments on this tab to the corresponding row on the Variant tab
- choose the appropriate 'Collection method'
- choose the appropriate 'Allele origin'; "not applicable" may be the best option
- choose the appropriate 'Affected status'; "not applicable" may be the best option
- describe the functional assay in the 'Method' column, including a description of any scale or scoring system used, and the standard error of the assay if appropriate
- report the result of the assay for the specific variant in the 'Result' column; a longer description of the effect should be provided in the 'Comment on functional consequence', described above
- optionally you may cite a publication that describes the assay in the 'Methods citations' column
- enter each experimental observation on a separate row
Collection method
Collection method describes the setting in which the variant interpretation is made.
It's important for users of ClinVar data to understand if it was collected:
- as part of clinical testing with very standardized classification
- as part of research where the classification may be standardized or more experimental
- from the literature which may be out of date
Collection method | Use |
---|---|
clinical testing | For variants that were interpreted as part of clinical genetic testing, or as part of a large volume research study in which results compliant with CLIA, ISO, GLP, or an equivalent accreditation body are routinely returned to research subjects. Interpretation may be guided from the literature, but the number of individuals tested are reported only from the direct testing. |
research | For variants that were interpreted as part of a research project but results are not routinely returned to research subjects and do not meet the requirements for clinical testing above. This is a general term to use when other more specific methods to not apply. |
case-control | For variants gathered in a research setting but results are not routinely returned to research subjects and do not meet the requirements for clinical testing above. This term is for research projects specifically to compare alleles observed in cases and controls (without data about segregation). |
in vitro |
For variants that were interpreted as part of an in vitro research project, such as experiments performed in cell culture, but results are not routinely returned to research subjects and do not meet the requirements for clinical testing above. This value is only used on the full spreadsheet template, on the FunctionalEvidence tab for experimental evidence. |
in vivo |
For variants that were interpreted as part of an in vivo research project, such as a mouse model, but results are not routinely returned to research subjects and do not meet the requirements for clinical testing above. This value is only used on the full spreadsheet template, on the FunctionalEvidence tab for experimental evidence. |
reference population | For variants gathered in a research setting but results are not routinely returned to research subjects and do not meet the requirements for clinical testing above. This term is used for baseline studies of a population group of apparently unaffected individuals to assess allele frequencies. |
provider interpretation | For variants that were interpreted by a clinician, and for variants submitted by clinicians or researchers who are reinterpreting clinical test results. |
phenotyping only | For variants that are submitted to ClinVar to provide individual observations with detailed phenotype data, such as submissions from clinicians or patient registries, without an interpretation from the submitter. The interpretation from the testing laboratory may be provided in a separate field. |
literature only |
For variants extracted from published literature with interpretation as reported in the citation. No additional curation has been performed by the submitter; the interpretation is from the publication(s) only. This method is used by third parties, not the authors of the paper. To report results from your own paper, use one of the other collection methods, such as "research". |
curation | For variants that were not directly observed by the submitter, but were interpreted by curation of multiple sources, including clinical testing laboratory reports, publications, private case data, and public databases. |
Allele Origin
Allele origin may generally describe whether the variant was germline or somatic. Alternatively, it may be a more specific description of germline, if you are providing case data or if it is part of your aggregate data.
- allowed values for allele origin are listed in the table below
- for some display and search purposes, submissions with allele origins other than "somatic" are grouped into "germline", but ClinVar retains the specific term you provide on your submission
Allele origin |
Use |
Grouping for search and display |
Germline |
Any germline variant |
germline |
Somatic |
Any somatic variant |
somatic |
De novo |
A germline variant that was not inherited |
germline |
Maternal |
A germline variant inherited from the mother |
germline |
Paternal |
A germline variant inherited from the father |
germline |
Inherited |
A germline variant that was inherited |
germline |
Unknown |
Allele origin is unknown |
germline |
Biparental |
A variant present on both chromosomes and inherited from both parents. |
germline |
Affected status
Affected status refers to the condition for the interpretation, i.e.
- whether all the individuals in that aggregate are affected or unaffected for the interpreted condition
- whether each case is affected or unaffected for the interpreted condition
- if you do not know the individuals are affected with that specific condition, provide "unknown" for affected status
Note that you can also provide information about the clinical features that were observed in individuals, whether or not they relate to the condition for the interpretation. See the Condition section for how to provide clinical features.
Control data
ClinVar does accept control data, for example, data from a specific ethnic group for variants that are reported elsewhere to be pathogenic.
- Provide "no" for 'Affected status'
- Provide "case-control" for 'Collection method'
- Provide descriptive information for the individuals in the control group, e.g. 'Population Group/Ethnicity'
Citations
Citations from PubMed, PubMed Central and Bookshelf, or a DOI, may be added in several fields. For example, you can add citations that you identified while researching the clinical significance of the variant in the column "Clinical significance citations". Examples of each format, i.e. database source:identifier, include:
- PMID:123456
- PMCID:PMC3385229
- NBK:1535
- doi:10.21037/atm-21-2913
Merge submissions
Occasionally you may find that you have more than one submission for the same variant and condition. For example, you may have used different HGVS expressions for the same variant on different submissions.
To merge submissions, submit an update :
- include all the data that you would like to retain on the resulting record
- on the Variant tab
- enter the accession number that should be retained in the 'ClinVarAccession' column
- in the 'Novel or update' column, enter "update"
- in the 'Replaces ClinVarAccessions' column, enter the other accession number(s) that will not be retained. Note that this accession number will no longer be displayed but it will be searchable in ClinVar.
Delete submissions
Rarely you may need to delete a submission. For example, you may determine that the variant itself was miscalled. When possible, you should update your submission (e.g. to change the interpretation, the condition, etc.) or merge it into a redundant submission instead of deleting.
Note that deleting your submission means that it is removed from the web display and from future ClinVar files (e.g. XML and VCF files). However, it remains archived in past files and it is retrievable on the web if a user queries with the appropriate RCV accession number.
To delete a submission, submit an update using the full submission template:
- use the Deletes tab
- indicate the SCV accession to be deleted in the ClinVarAccession column
- you have the option to provide a comment for public display indicating why the interpretation is being deleted
Columns added by ClinVar
Some columns of data are added to the submission spreadsheet during pre-submission validation.
Error
The error column is added so that the submitter knows which rows have errors and cannot be processed by ClinVar. Contact us at clinvar@ncbi.nlm.nih.gov if you receive an error message that you don’t understand.
ncbi_cluster_key
The ncbi_cluster_key column is added so that it can be used in ClinVar’s downstream processing. You may see this column in the spreadsheet that we return to you if you validate and stop the process to fix all errors before submission. You do not need to do anything with the data in this column.