Beginning in January 2025, TPA-Exp and TPA-Inf submission types will no longer be accepted as new submissions. Please see INSDC TPA Announcement for more information.
What is a Third Party Annotation:inferential (TPA:inferential) Sequence?
TPA:inferential: A database of sequences annotated by inference, where the source molecule or its product(s) have not been the subject of direct experimentation.
A TPA sequence is derived or assembled from primary sequence data currently found in the DDBJ/EMBL/GenBank databases. It can be genomic or mRNA sequence, and can be assembled or derived from primary genomic and/or mRNA sequences. These sequences are submitted to DDBJ/EMBL/GenBank as part of the process of publishing biological experiments that include the annotation of existing nucleotide sequences in the primary sequence database.
Examples of TPA:inferential
- CDS and related annotation applied to a sequence derived from existing genomic and/or mRNA primary data with reported wet-lab experimental evidence for a homologous molecule but no direct wet-lab experimental evidence. The reported experimental evidence must have been generated by the submission group and must be published in a peer-reviewed journal.
- CDS and related annotation applied to a sequence derived from existing genomic and/or mRNA primary data in addition to novel sequencing with no wet-lab experimental evidence. If the novel sequence was only used to bridge two pieces of sequence, there must be reported wet-lab experimental evidence for a homologous molecule.
- Sequence and annotation covered in a review paper or discussion section, where wet-lab experimental evidence is reported, but not generated by the TPA submitter. The experimental evidence should be reported directly in the review paper or be from a paper by the author of the review paper.
- Annotation of non-coding genes and transcripts with no wet-lab experimental evidence for their existence and/or function but are submitted as part of a study. One or more of the study's sequences should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank. For an example of this type of study see PubMed 14681587. The annotations cannot be generated by an annotation program such as tRNAscan.
- Annotation of pseudogenes with no wet-lab experimental evidence, when submitted as part of a study that includes sequences of functional homologs of the pseudogene. One or more of the study's sequences should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank.
- Annotation of pseudogenes that are not part of a gene study but there is experimental evidence. An example of experimental work done to support the description of a pseudogene can be found in PubMed 15908099.
- A sequence submitted as part of a collection of annotated members of a gene family, where wet-lab experimental evidence does not exist for the annotation. One or more members of the set should be supported by experimental evidence and be in TPA:experimental or DDBJ/EMBL/GenBank.
- A sequence representing an assembled genome or naturally occurring plasmid that includes features with assigned gene symbols or product identifiers, where the annotated features may be a mix of experimentally and inferentially determined data.
How Do TPA:inferential Sequence Records Differ from TPA:experimental and Other GenBank/EMBL/DDBJ Records?
The display of a TPA record is similar to other Collaboration records, but includes the following:
- Keywords: TPA; Third Party Annotation; TPA:inferential.
- The label 'TPA_inf:' at the beginning of each Definition Line.
- PRIMARY field providing the base pair spans of the primary sequences that contribute to the TPA sequence.
Other Features and References are similar to those displayed in other GenBank/EMBL/DDBJ records.
An example of a TPA:inferential submission is BK000554
TPA sequence records are shared by all three Collaboration databases and can be found using typical search methods in EntrezNuc and EntrezProt (ie, submitter name, gene/protein name, Accession Number, etc)
How to Submit TPA Sequence Data
Sequence can be submitted to the TPA database using BankIt:
- BankIt
- Check 'No' to answer the question 'Is This Primary Sequence Data?'.
- Input list of Accession Numbers of all the primary sequences used to assemble or derive the submitted sequence.
- Provide explanation of all experimental evidence.
- Complete standard submission process, being sure to annotate all new descriptive information (CDS, protein name, gene name, etc) for the TPA sequence.
- Sequence submission will be labeled as a TPA sequence and will be processed accordingly.
What should not be submitted to TPA:inferential
- Sequences with annotation supported by experimental evidence. See TPA:experimental
- Synthetic constructs such as cloning vectors that use well characterized, publicly available genes, promoters, or terminators; these should be submitted as synthetic sequences for GenBank.
- Microsatellites and related types of repeat regions
- New sequence updates or changes existing sequence data from another submitter; these should be submitted as new sequences for GenBank.
- Annotation that has arisen from an automated tool, such as GeneMark, tRNA scan or ORF finder, where no further evidence, experimental or otherwise, is presented for the annotation.
- Annotation from in vivo, in vitro, or in silico experimentation that will not be submitted for publication in a peer-reviewed journal.