Important update
Beginning in January 2025, TPA-Exp and TPA-Inf submission types will no longer be accepted as new submissions. Please see INSDC TPA Announcement for more information.

TPA Frequently Asked Questions

What is the difference between TPA:experimental and TPA:inferential?
What is the difference between TPA and GenBank?
What is the difference between TPA and RefSeq?
What is primary sequence?
Should pseudogenes be submitted to TPA?
What constitutes an experiment?
Should phylogenetic or population studies be submitted to TPA?
Can a complete genome created from sequences in the database or improved annotation of a complete genome be submitted to TPA?
Can gene families from a single organism or across multiple organisms be submitted to TPA? organ?
Can I update my TPA record with additional sequence and/or annotation?
Can a TPA:inferential record be changed to a TPA:experimental record?
Can annotation of existing data be appropriate elsewhere?

What is the difference between TPA:experimental and TPA:inferential?

Sequence records in the TPA:experimental database are supported directly by experimental evidence while sequence data and annotation in the TPA:inferential database is indirectly supported by experimental evidence.

What is the difference between TPA and GenBank?

The TPA database consists of sequences that are derived/assembled from primary genomic and/or mRNA sequence that are already represented in the DDBJ/EMBL/GenBank Database. These sequence records include directly or indirectly experimentally supported new annotation, which has been published in a peer-reviewed scientific journal.

The GenBank archival sequence database includes publicly available DNA and RNA sequences submitted from individual laboratories and large-scale sequencing projects. GenBank accession numbers are assigned to these submitted sequences. Submitted sequence data is exchanged between NCBI's GenBank, EMBL Nucleotide Sequence Database (EMBL) and the DNA Data Bank of Japan (DDBJ) to achieve comprehensive coverage. As an archival database, GenBank can be redundant for some loci. GenBank sequence records are owned by the original submitter and can not be altered by a third party.

The major difference between TPA and GenBank is that GenBank records represent nucleotide sequences that have been directly determined by the submitter, whereas TPA records represent nucleotide sequences built from this primary sequence data with new annotation that is directly or indirectly supported by experimental evidence.

What is the difference between TPA and RefSeq?

Both TPA and RefSeq sequences are derived from primary sequence data found in the DDBJ/EMBL/GenBank International Nucleotide Sequence Collaboration Databases. However, while RefSeq annotation is based on additional information available in the literature and/or an automated computation method, TPA annotation must be supported, directly or indirectly, with experimental evidence by the submitter.

What is primary sequence?

Primary nucleotide sequences have been experimentally determined by their submitter. To be used to build a TPA sequence submission, a primary sequence must be currently publicly available in DDBJ/EMBL/GenBank, SRA, trace archive, or Whole Genome Shotgun (WGS) sequence databases.

RefSeq sequences are not considered a primary sequence since they are derived from primary sequence. For example, NCBI Accession numbers that begin with the prefix XM_ (mRNA) and XR_ (non-coding transcript) are model reference sequences produced by NCBI's Genome Annotation project.

Should pseudogenes be submitted to TPA?

Though pseudogenes are common throughout the genomes of organisms, it is often difficult to prove a sequence truly represents a pseudogene. Therefore experimentally supported pseudogenes are acceptable only for TPA:inferential. An example of experimental work done to support the description of a pseudogene can be found in PubMed: 15908099 . In addition, pseudogenes may be submitted as part of a study that includes TPA:experimental and/or DDBJ/EMBL/GenBank functional homologs as comparison sequences.

What constitutes an experiment?

The TPA database was created as a repository for annotations that are derived as a result of wet-bench experiments based on existing nucleotide sequence deposited in the DDBJ/EMBL/GenBank databases.

Supporting evidence for valid TPA sequences includes using the nucleotide sequence information to study expression of the RNA in various tissues and to study promoter elements. Conceptual translations have been used for a variety of protein studies, including GST-fusion analyses and enzyme kinetics. These are just some examples of the studies that can be done.

Computational studies on their own do not constitute experimental evidence and must be accompanied by biological experiments that support the new annotation.

Should phylogenetic or population studies be submitted to TPA?

Phylogenetic or population studies that describe a collection of annotated members of a gene family may be submitted as TPA:inferential providing that at least one member of the set is supported by experimental evidence determined by the submitter of the set. The supporting evidence submission should qualify for either TPA:experimental or DDBJ/EMBL/GenBank.

Can a complete genome created from sequences in the database or improved annotation of a complete genome be submitted to TPA?

A sequence record representing a complete genome can be submitted to TPA:inferential provided the annotated features have been assigned gene symbols or product identifiers. An entire genome's sequence should not be submitted to TPA if only some of the annotation has been improved; just the new annotation's relevant sequence should be submitted.

Can gene families from a single organism or across multiple organisms be submitted to TPA?

Phylogenetic and BLAST (similarity) analysis on their own are not sufficient to support new annotation for either TPA:inferential or TPA:experimental. However, if at least one of the members of the set is supported by experimental evidence determined by the submitter of the set this may be submitted to TPA:inferential. The supporting evidence submission should qualify for either TPA:experimental or DDBJ/EMBL/GenBank.

Can I update my TPA record with additional sequence and/or annotation?

To update an existing record with additional sequence and/or annotation the following requirements should be met:

Any new sequence must be covered by primary sequence. If the current primaries do not cover this additional sequence new primary accession numbers must be provided.
Any new annotation must be covered in the existing publication or a new peer-reviewed publication.

Can a TPA:inferential record be changed to a TPA-experimental record?

For a sequence to move from TPA:inferential to TPA:experimental, wet-bench experimental work that supports the annotation presented in the TPA records must be performed and be published in a peer-reviewed journal.

Can annotation of existing data be appropriate elsewhere?

The NCBI RefSeq group does welcome expert comment, critique, and advice from others in the scientific community. There are many individuals and groups contributing to RefSeq on the organism, gene, or gene family level. If you are interested, please contact the NCBI RefSeq group at refseq-admin@ncbi.nlm.nih.gov or see Refseq .

GenBank

Public nucleic acid sequence repository