CoreGenes: a computational tool for identifying and cataloging "core" genes in a set of small genomes

BMC Bioinformatics. 2002 Apr 24:3:12. doi: 10.1186/1471-2105-3-12.

Abstract

Background: Improvements in DNA sequencing technology and methodology have led to the rapid expansion of databases comprising DNA sequence, gene and genome data. Lower operational costs and heightened interest resulting from initial intriguing novel discoveries from genomics are also contributing to the accumulation of these data sets. A major challenge is to analyze and to mine data from these databases, especially whole genomes. There is a need for computational tools that look globally at genomes for data mining.

Results: CoreGenes is a global JAVA-based interactive data mining tool that identifies and catalogs a "core" set of genes from two to five small whole genomes simultaneously. CoreGenes performs hierarchical and iterative BLASTP analyses using one genome as a reference and another as a query. Subsequent query genomes are compared against each newly generated "consensus." These iterations lead to a matrix comprising related genes from this set of genomes, e. g., viruses, mitochondria and chloroplasts. Currently the software is limited to small genomes on the order of 330 kilobases or less.

Conclusion: A computational tool CoreGenes has been developed to analyze small whole genomes globally. BLAST score-related and putatively essential "core" gene data are displayed as a table with links to GenBank for further data on the genes of interest. This web resource is available at http://pumpkins.ib3.gmu.edu:8080/CoreGenes or http://www.bif.atcc.org/CoreGenes.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chloroplasts / genetics*
  • Computational Biology / methods*
  • Databases, Genetic
  • Genes / genetics
  • Genes, Essential / genetics
  • Genes, Plant
  • Genes, Viral / genetics
  • Genome*
  • Genome, Plant
  • Genome, Viral
  • Genomics / methods
  • Internet
  • Mitochondria / genetics*
  • Programming Languages
  • Software Design
  • Software Validation
  • Software*
  • Viral Structural Proteins / genetics

Substances

  • Viral Structural Proteins