Figure 5.1. A generalized flow chart of genome annotation.

Figure 5.1A generalized flow chart of genome annotation

FB: feedback from gene identification for correction of sequencing errors, primarily frameshifts. General database search: searching sequence databases (typically, NCBI NR) for sequence similarity, usually using BLAST. Specialized database search: searching domain databases, such as Pfam, SMART, and CDD, for conserved domains, genome-oriented databases, such as COGs, for identification of orthologous relationship and refined functional prediction, metabolic databases, such as KEGG for metabolic pathway reconstruction, and possibly, other database searches. Statistical gene prediction: use of methods like GeneMark or Glimmer to predict protein-coding genes. Prediction of structural features: prediction of signal peptide, transmembrane segments, coiled domain and other features in putative protein functions.

From: Chapter 5, Genome Annotation and Analysis

Cover of Sequence - Evolution - Function
Sequence - Evolution - Function: Computational Approaches in Comparative Genomics.
Koonin EV, Galperin MY.
Boston: Kluwer Academic; 2003.
Copyright © 2003, Kluwer Academic.

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.