U.S. flag

An official website of the United States government

Sequence Processing at NCBI

Sequences are checked for a number of issues before they are accepted for GenBank. You will be notified during submission processing if your sequences have any of the issues listed below. If you have questions, please write to: gb-admin@ncbi.nlm.nih.gov and include your submission number.

Error List

Trimmed Vector

Sequences with terminal vector (or adaptor, linker, etc.) contamination are trimmed to remove the contaminating sequence. Sequences are checked for vector via BLAST search of your sequences against our vector and UniVec databases. While these similarities may be due to a variety of reasons, there is the possibility that contamination is the cause. To perform a BLAST search against the vector database, go to VecScreen.

Removed Vector

Sequences with internal vector matches or sequences that match vector across the length of the sequence are removed. Sequences are checked for vector via BLAST search of your sequences against our vector and UniVec databases. To perform a BLAST search against the vector database, go to VecScreen.

Trimmed Ends and Ambiguous Sequences

Terminal NNNs and sequences with a high percentage of ambiguities near the ends of the sequences are trimmed. Sequences with more 50% ambiguities are removed. Please be sure to trim or remove low quality sequence before submitting sequences to GenBank.

Removed Short Sequences

Short sequences (<150 bp) are automatically removed from your submission. Unassembled sequences from next-generation sequencing platforms should be submitted to the NCBI Sequence Read Archive SRA.

Removed Long Sequences

Sequences longer than the known length of the complete viral genome or segment are automatically removed from your submission.

Unusual Sequences

A sequence is lacking at least one type of nucleotide (A, T, G, or C). It is highly unusual for a sequence to not contain at least all four types of nucleotides (A, T, G, and C). If your sequence is missing one of these four nucleotides, it is likely an artifact or is a low quality sequence and the sequence should be removed from the submission.

Contamination Upstream

Sequence contains extra nucleotides upstream of the consensus 5' end sequence of Influenza viruses. Trim any linker from the 5' end.

Contamination Downstream

Sequence contains extra nucleotides downstream of the consensus 3' end sequence of Influenza viruses. Trim any linker from the 3' end.

Reverse Complement

The input sequence is the reverse complementary strand of the coding sequence. Please reverse complement the sequence.

CDS Has Stop Codon

The predicted coding region contains an internal stop codon. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

CDS Has Frameshift

The predicted coding region contains a frameshift (insertions/deletions based on alignments with other seqeunces). This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Insertion of Nucleotides

The predicted coding region contains an insertion based on alignment. This may indicate errors in the nucleotide sequence or assembly, or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Deletion of Nucleotides

The predicted coding region contains a deletion based on alignment. This generally indicates errors in the nucleotide sequence or assembly, or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Wrong Exon Number

The sequence of the predicted coding region contains a large insertion, deletion or is misassembled. Please upload the corrected sequences.

Mutation at Start

Sequence contains a mutation where the start codon for coding region should be located. This generally indicates errors in the nucleotide sequence. Please upload the corrected sequences.

Mutation at End

Sequence contains a mutation where the stop codon for coding region should be located. This generally indicates errors in the nucleotide sequence. Please upload the corrected sequences.

Splice Site Not Found

Sequence contains a mutation at point where splice consensus (GT/AG) should be located. This generally indicates errors in the nucleotide sequence. Please upload the corrected sequences.

Peptide Frameshift

The predicted coding region contains a frameshift (insertions/deletions based on alignment in a mature peptide. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Peptide Overlap

The predicted coding region contains a deletion resulting in the overlap of two mature peptides. This generally indicates errors in the nucleotide sequence. Please upload the corrected sequences.

Peptide Separated

The predicted coding region contains an insertion resulting in mature peptides that are not contiguous. This generally indicates errors in the nucleotide sequence. Please upload the corrected sequences.

Mature Peptide Not Found

An expected mature peptide can not be found. This generally indicates errors in the nucleotide sequence. Please upload the corrected sequences.

No Annotation

Sequence similarity is too low to add annotation. Verify that you have chosen the correct source on the Submission Type page and that the sequence is correct. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

Gene Not Found

No gene can be found on this sequence based on sequence similarity. Verify that you have chosen the correct source virus on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

Protein Not Found

No coding region can be found on this sequence based on sequence similarity. Verify that you have chosen the correct source on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

No BLAST Hits Found

No sequence similarity is detected with other viral genomes. Verify that you have chosen the correct source virus on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

Coding Capacity

No coding region can be found on this sequence based on sequence similarity. Verify that you have chosen the correct source virus on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

No Type Info

No sequence similarity is detected with other virus genomes. Verify that you have chosen the correct source virus on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

Wrong Segment

Sequence does not match known Influenza segments. Only Influenza A, B or C sequences should be submitted with this tool. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

System Problem

An internal error has occurred in the program. If you receive this error and no other errors in your submission, write to gb-admin@ncbi.nlm.nih.gov and include the Submission ID.

Undefined Error

An internal error has occurred in the program. If you receive this error and no other errors in your submission, write to gb-admin@ncbi.nlm.nih.gov and include the Submission ID.

Indefinite Annotation

Sequence similarity is too low to add confident annotation. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Indefinite Annotation Start

Sequence similarity is too low near the start of a coding region to add confident annotation, often as a result of a frameshift. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Indefinite Annotation End

Sequence similarity is too low near the stop of a coding region to add confident annotation, often as a result of a frameshift. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Unexpected Length

The length of the predicted complete coding region is not a multiple of 3. This indicates a possible reading frame shift within the coding region and may be the result errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Peptide Adjacency Problem

The predicted mature peptides are not contiguous. This generally indicates errors in the nucleotide sequence. Please upload the corrected sequences.

Peptide Translation Problem

Sequence similarity is too low near the stop to add confident annotation. This generally indicates errors in the nucleotide sequence. Please upload the correcte sequences.

Low Feature Similarity Start

Sequence similarity is too low near the start of a predicted feature to add confident annotation. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Low Feature Similarity End

Sequence similarity is too low near the end of a predicted feature to add confident annotation. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Low Feature Similarity

Sequence similarity is too low at the site of a predicted feature to add confident annotation. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Low Similarity Start

Sequence similarity is too low near the start of the sequence to add confident annotation. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Low Similarity End

Sequence similarity is too low near the stop of the sequence to add confident annotation. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Low Similarity

Sequence similarity is too low to add confident annotation. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

No Features Annotated

Sequence aligns to region of viral genome which contains no annotation. Please contact gb-admin@ncbi.nlm.nih.gov with an explanation.

Duplicate Regions

Multiple regions of the sequence are similar to the same region of the model sequence. This indicates a misassembly of the sequence. Please upload the corrected sequences.

Misassembled

Sequence contains similarity on both the plus and minus strands compared to the model sequence. VADR may list this error as Indefinite Annotation. This indicates the sequence is misassembled. Please upload the corrected sequences.

Discontinuous Similarity

Sequence appears to be out of order compared to the model sequence. This indicates the sequence is misassembled. Please upload the corrected sequences.

Unexpected Divergence

Sequence is not similar enough to the model sequence to confidently assign annotation. Verify that you have chosen the correct source on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

Biased Sequence

Unusual sequence composition, such as highly repetitive regions, detected within alignment to model. Verify that you have chosen the correct source on the Submission Type page and that the sequence is correct.

Incorrect Specified Subgroup

Sequence is not similar enough to the model sequence. Verify that you have chosen the correct source virus and/or genogroup on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

Incorrect Specified Group

Sequence is not similar enough to the model sequence. Verify that you have chosen the correct source virus on the Submission Type page. If you are submitting other types of sequences, you need to use a different submission tool for submitting to GenBank and annotate the appropriate features when you submit.

Low Coverage

Sequence alignment to its best match covers less of the sequence than expected. This generally indicates errors in the nucleotide sequence or insufficient trimming of low quality sequence ends. Please upload the corrected sequences.

Support Center

Last updated: 2019-11-07T18:22:43Z