What is a Third Party Annotation:inferential (TPA:inferential) Sequence?

Important update
Beginning in January 2025, TPA-Exp and TPA-Inf submission types will no longer be accepted as new submissions. Please see INSDC TPA Announcement for more information.

What is a Third Party Annotation:inferential (TPA:inferential) Sequence?

TPA:inferential: A database of sequences annotated by inference, where the source molecule or its product(s) have not been the subject of direct experimentation.

A TPA sequence is derived or assembled from primary sequence data currently found in the DDBJ/EMBL/GenBank databases. It can be genomic or mRNA sequence, and can be assembled or derived from primary genomic and/or mRNA sequences. These sequences are submitted to DDBJ/EMBL/GenBank as part of the process of publishing biological experiments that include the annotation of existing nucleotide sequences in the primary sequence database.

Examples of TPA:inferential

CDS and related annotation applied to a sequence derived from existing genomic and/or mRNA primary data with reported wet-lab experimental evidence for a homologous molecule but no direct wet-lab experimental evidence. The reported experimental evidence must have been generated by the submission group and must be published in a peer-reviewed journal.
CDS and related annotation applied to a sequence derived from existing genomic and/or mRNA primary data in addition to novel sequencing with no wet-lab experimental evidence. If the novel sequence was only used to bridge two pieces of sequence, there must be reported wet-lab experimental evidence for a homologous molecule.
Sequence and annotation covered in a review paper or discussion section, where wet-lab experimental evidence is reported, but not generated by the TPA submitter. The experimental evidence should be reported directly in the review paper or be from a paper by the author of the review paper.
Annotation of non-coding genes and transcripts with no wet-lab experimental evidence for their existence and/or function but are submitted as part of a study. One or more of the study's sequences should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank. For an example of this type of study see PubMed 14681587. The annotations cannot be generated by an annotation program such as tRNAscan.
Annotation of pseudogenes with no wet-lab experimental evidence, when submitted as part of a study that includes sequences of functional homologs of the pseudogene. One or more of the study's sequences should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank.
Annotation of pseudogenes that are not part of a gene study but there is experimental evidence. An example of experimental work done to support the description of a pseudogene can be found in PubMed 15908099.
A sequence submitted as part of a collection of annotated members of a gene family, where wet-lab experimental evidence does not exist for the annotation. One or more members of the set should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank.
A sequence representing an assembled genome or naturally occurring plasmid that includes features with assigned gene symbols or product identifiers, where the annotated features may be a mix of experimentally and inferentially determined data.

How Do TPA:inferential Sequence Records Differ from TPA:experimental and Other GenBank/EMBL/DDBJ Records?

The display of a TPA record is similar to other Collaboration records, but includes the following:

Keywords: TPA; Third Party Annotation; TPA:inferential.
The label 'TPA_inf:' at the beginning of each Definition Line.
PRIMARY field providing the base pair spans of the primary sequences that contribute to the TPA sequence.

Other Features and References are similar to those displayed in other GenBank/EMBL/DDBJ records.

An example of a TPA:inferential submission is BK000554

TPA sequence records are shared by all three Collaboration databases and can be found using typical search methods in EntrezNuc and EntrezProt (ie, submitter name, gene/protein name, Accession Number, etc)

How to Submit TPA Sequence Data

Sequence can be submitted to the TPA database using BankIt:

BankIt
- Check 'No' to answer the question 'Is This Primary Sequence Data?'.
- Input list of Accession Numbers of all the primary sequences used to assemble or derive the submitted sequence.
- Provide explanation of all experimental evidence.
- Complete standard submission process, being sure to annotate all new descriptive information (CDS, protein name, gene name, etc) for the TPA sequence.
- Sequence submission will be labeled as a TPA sequence and will be processed accordingly.

What should not be submitted to TPA:inferential

Sequences with annotation supported by experimental evidence. See TPA:experimental
Synthetic constructs such as cloning vectors that use well characterized, publicly available genes, promoters, or terminators; these should be submitted as synthetic sequences for GenBank.
Microsatellites and related types of repeat regions
New sequence updates or changes existing sequence data from another submitter; these should be submitted as new sequences for GenBank.
Annotation that has arisen from an automated tool, such as GeneMark, tRNA scan or ORF finder, where no further evidence, experimental or otherwise, is presented for the annotation.
Annotation from in vivo, in vitro, or in silico experimentation that will not be submitted for publication in a peer-reviewed journal.

GenBank

Public nucleic acid sequence repository