What is dbEST?
dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms. A brief account of the history of human ESTs in GenBank is available (Trends Biochem. Sci. 20:295-6;1995). Also, consult the special "Genome Directory" issue of Nature (vol. 377, issue 6547S, 28 September 1995).
About ESTs
Expressed Sequence Tags (ESTs) are short (usually <1000 bp), single-pass sequence reads from mRNA (cDNA). Typically they are produced in large batches. They represent a snapshot of genes expressed in a given tissue and/or at a specific developmental stage. They are tags (some coding, others not) of expression for the cDNA library of interest.
Additional information about ESTs can be found in: Boguski MS, Lowe TM, Tolstoshev CM. 1993. dbEST--database for "expressed sequence tags." Nat Genet 4(4):332-333.
Most EST projects generate large numbers of sequences. These are commonly submitted to GenBank and dbEST as batches of dozens to thousands of entries, with shared citation, submitter and source information. To improve the efficiency of the submission process for this type of data, we have designed a streamlined submission process using tbl2asn.
dbEST is reserved for single-pass reads. Assembled sequences should not be submitted to dbEST. GenBank accepts assembled EST submissions through the TSA (Transcriptome Shotgun Assembly) division. Please contact gb-admin@ncbi.nlm.nih.gov for more information about submitting EST assemblies to TSA. The individual reads which make up the assembly should be submitted to dbEST or the Short Read Archive (SRA) prior to the submission of the assemblies.
NOTE: Sequences derived from "next generation" sequencing platforms should be submitted to the Short Read Archive (SRA) (For information contact sra@ncbi.nlm.nih.gov .)
Sequences which should not be included in EST submissions include the following: mitochondrial sequences, rRNA, viral sequences, vector, contaminant, and low quality sequences. Vector and linker regions should be identified using VecScreen and removed prior to submission.
How to submit data
Beginning in 2019, you may submit to dbEST using tbl2asn. Only input data files 1 and 2 under REQUIRED are necessary to generate an EST submission.
- Template file containing a text ASN.1 Submit-block object (suffix .sbt)
- Nucleotide sequence data in FASTA format (suffix .fsa). Each sequence in the FASTA file should include [organism=Genus species] [tech=EST] [moltype=mRNA]. The SeqID will be used as the clone value. You may use other relevant source modifiers as well, including tissue-type, dev-stage, etc., as in [tissue-type=cardiac muscle], [dev-stage=adult], [strain=ABC123], etc.
- Features are not required, but you may include a feature table with no more than a single misc_feat for each sequence with "similar to ..." in the note. No other features may be applied to ESTs.
Once you have generated your template.sbt and ESTs.fsa files, the command line for running tbl2asn will look something like this:
tbl2asn -t template.sbt -i ESTs.fsa -a s -V v
The submission file generated in this example would be ESTs.sqn
A more thorough description of the possible command line arguments is available on the tbl2asn page.
You may also include BioProject or BioSample information, if the ESTs are part of a larger, ongoing project.
Please send your EST submission (.sqn file) to gb-sub@ncbi.nlm.nih.gov.
Any questions may be sent to info@ncbi.nlm.nih.gov or gb-admin@ncbi.nlm.nih.gov.
Other ways to access dbEST
EST sequences are included in the EST division of GenBank, available from NCBI through E-Utilities.
EST sequences are also available by anonymous ftp in the /repository/dbEST directory at ftp.ncbi.nih.gov
Please note that ESTs received through the end of 2018 are available by FTP at ftp.ncbi.nlm.nih.gov/repository/dbEST
ESTs received after 2018 will not be included in this format.