U.S. flag

An official website of the United States government

Prokaryotic Genome Annotation Examples

Figure 1: Sample FASTA-formatted sequence

>contig001 [organism=Escherichia coli] [strain=HTE831]
tagagcaaaaaatagacattttaatggcgctaatcatacaaggaaggaataataacactg
acatggatacatccacttaatctacatttgcttattcctatcttgactatatctatatcc
[etc.]

Figure 2: Feature table format

This mock example of a feature table file includes:

Note that the relative order of the features in the file does not matter, and that the misc_feature and repeat_region features do not have a corresponding gene feature, and so do not have a locus_tag.

See the flatfile view of this file in Figure 3 .

>Feature contig001
63574  65173   gene
            locus_tag       Ngs_17131
63574  65173   CDS
            product hypothetical protein
            protein_id  gnl|ncbi|Ngs_17131
            inference   similar to DNA sequence:INSD:AY123455.2
102492  101261  gene
            locus_tag       Ngs_3038
            gene    ftsW
102492  101261  CDS
            product flippase
            protein_id  gnl|ncbi|Ngs_3038
112616  >113646  gene
            locus_tag       Ngs_2945
112616  >112646  CDS
            product bifunctional methylenetetrahydrofolate dehydrogenase/methenyltetrahydrofolate cyclohydrolase
            EC_number       1.5.1.5
            EC_number       3.5.4.9
            experiment  Western blot
            protein_id  gnl|ncbi|Ngs_2945
101    180 gene
            locus_tag       Ngs_10111
            gene    trnL
101  180 tRNA
            product Leu
45111  45190   gene
            locus_tag       Ngs_10112
            pseudo
45111    45190   tRNA
            product Xxx
2103   400 gene
            locus_tag       Ngs_11232
2103    400 rRNA
            product 16S ribosomal RNA
60101    60567   misc_feature
            note    similar to ABC transporters
43027   43136   repeat_region
            mobile_element  transposon:Tn22

Figure 3: GenBank flatfile

This is part of the flatfile view of the .sqn file made from the .fsa file ( Fig. 1 ) and .tbl file ( Fig. 2 ).

    source          1..116100
                     /organism="Escherichia coli"
                     /mol_type="genomic DNA"
                     /strain="HTE831"
                     /db_xref="taxon:562"
     gene            101..180
                     /gene="trnL"
                     /locus_tag="Ngs_10111"
     tRNA            101..180
                     /gene="trnL"
                     /locus_tag="Ngs_10111"
                     /product="tRNA-Leu"
     gene            complement(400..2103)
                     /locus_tag="Ngs_11232"
     rRNA            complement(400..2103)
                     /locus_tag="Ngs_11232"
                     /product="16S ribosomal RNA"
     repeat_region   43027..43136
                     /mobile_element="transposon:Tn22"
     gene            45111..45190
                     /locus_tag="Ngs_10112"
                     /pseudo
     tRNA            45111..45190
                     /locus_tag="Ngs_10112"
                     /product="tRNA-OTHER"
                     /pseudo
     repeat_region   56408..56558
                     /mobile_element="transposon:Tn22"
     misc_feature    60101..60567
                     /note="similar to ABC transporters"
     gene            63574..65173
                     /locus_tag="Ngs_17131"
     CDS             63574..65173
                     /locus_tag="Ngs_17131"
                     /inference="similar to DNA sequence:INSD:AY123455.2"
                     /codon_start=1
                     /product="hypothetical protein"
                     /translation="MQSTQSKSDRSSMHRGPLLLCAVMVVLVTLPEQINARMAFEKLT
                     DFDFPGNTYYSVKNLSLYECQGWCREEADCQAAAFSFVVNPLSPSQETHCQLQNDSSA
                     ANPSAAPQRSANMYYMIKLQLRSENVCHRPWSFERVPNKVIRGLDNALIYTSTKEACL
                     SACLNERRFVCRSVEYDYNNMKCVLSDSDRRSSGQFVQLVDAQGTDYFENLCLKPAQA
                     CKNNRSFGNSQKMGVSEEKVAQYVGLHYYTDKELQVTSESACRLACEIESEFLCRSFL
                     ALAVTCALMILLYISTLFCYYMKKWMQPHKIVA"
     gene            complement(101261..102492)
                     /gene="ftsW"
                     /locus_tag="Ngs_3038"
     CDS             complement(101261..102492)
                     /gene="tpnI"
                     /locus_tag="Ngs_3038"
                     /codon_start=1
                     /product="flippase"
                     /translation="MRMRGRRLLPIILSLLLIVLLSLCYFSNHLRDSSQSRKNGFLLH
                     LPLETKRNPSNPNTPLSNLLNLTDFHYLLASNVCRKAKRELLAVLIVTSYAGHDALRS
                     AHRQAIPQSKLEEMGLRRVFLLAALPSREHFISQDQLASEQNRFGDLLQGNFIEDYRN
                     LSYKHVMGLKWVSEECKKQAKFIIKLDDDIIYDVFHLRRYLETLEVREPGLATSSTLL
                     SGYVLDAKPPIRLRANKWYVSKKEYPQALYPAYLSGWLYVTNVPTAERIVAEAERMSF
                     FWIDDTWLTGVVRTRLGIPLERHNDWFSANAEFIDCCVRDLKKHNYECEYSVGPNGGD
                     DRLLVEFLHNVEKCYFDECVKRPVGKSLKETCLAAAKSRPPKHGFPEIKALRLR"
    gene             112616..113646
                     /locus_tag="Ngs_2945"
     CDS             112616..113646
                     /locus_tag="Ngs_2945"
                     /EC_number="3.5.4.9"
                     /EC_number="1.5.1.5"
                     /experiment="Western blot"
                     /codon_start=1
                     /product="bifunctional methylenetetrahydrofolate dehydrogenase/methenyltetrahydrofolate cyclohydrolase"
                     /translation="MESITFGVLTISDTCWQEPEKDTSGPILRQLIGETFANTQVIGN
                     IVPDEKDIIQQELRKWIDREELRVILTTGGTGFAPRDVTPEATRQLLEKECPQLSMYI
                     TLESIKQTQYAALSRGLCGIAGNTLILNLPGSEKAVKECFQTISALLPHAVHLIGDDV
                     SLVRKTHAEVQGSAQKSHICPHKTGTGTDSDRNSPYPMLPVQEVLSIIFNTVQKTANL
                     NKILLEMNAPVNIPPFRASIKDGYAMKSTGFSGTKRVLGCIAAGDSPNSLPLAEDECY
                     KINTGAPLPLEADCVVQVEDTKLLQLDKNGQESLVDILVEPQAGLDVRPVGYDLSTND
                     RIFPALDPSPVVVKSLLASVGNRLILSKPKVAIVSTGSELCSPRNQLTPGKIFDSNTT
                     MLTELLVYFGFNCMHTCVLSDSFQRTKESLLELFEVVDFVICSGGVSMGDKDFVKSVL
                     EDLQFRIHCGRVNIKPGKPMTFASRKDKYFFGLPGNPVSAFVTFHLFALPAIRFAAGW
                     DRCKCSLSVLNVKLLNDFSLDSRPEFVRASVISKSGELYASVNGNQISSRLQSIVGAD
                     VLINLPARTSDRPLAKAGEIFPASVLRFDFISKYE"
ORIGIN
        1 tagagcaaaa aatagacatt ttaatggcgc taatcataca aggaaggaat aataacactg
       61 acatggatac atccacttaa tctacatttg cttattccta tcttgactat atctatatcc
       [etc.]
Support Center

Last updated: 2019-01-22T22:40:35Z