About this database
The vast increase in genomic sequences has led to a flood of data to the protein databases as well. Many strain-specific genomes
are now being sequenced (for example Streptococcal genomes). The result
can be an overwhelming amount of data to look through when executing BLAST similarity searches. In order to help
alleviate both the processing of the data and to present a broader taxonomic view, the concise protein
database was constructed.
All proteins from complete genomes are compared by BLAST (all against all). Protein clusters
are constructed from the results such that all top blast hits are combined into a single cluster (each protein
in a given cluster must have all other proteins in the cluster as top BLAST hits).
Clusters may span a large taxonomic branch (kingdom) or may reside at a specific node (family, genus,
species, etc.). Clusters may consist of many proteins, or be comprised of only two proteins. From this entire set of clusters, genus-specific clusters
are used for this database. From the proteins at
the genus-level, one (randomly selected) is chosen as a representative for the Concise Microbial Protein BLAST database and will be found in BLAST queries. The other
proteins in the cluster are automatically linked to this representative and will also be found in the search results, although without the BLAST score and E-value as
they are not specifically examined. All proteins that do not belong
to the genus-level clusters are also added to the database for completeness.
The result will be faster processing times and reduced load on the database. The broader taxonomic view will help eliminate some
of the redundancy that is found when many proteins of closely related organisms are found in BLAST results.
Return to top
Query Page
Queries can be either protein or nucleotide using blastp and blastx programs, respectively. Accessions, GIs, or sequences in FASTA
format can be entered in the query box.
Default parameters are set below the query box. The expect threshold is set low, which will help
reduce BLAST results. Information on each parameter is available by clicking on each name.
Return to top
Results Page
The results page is not the one typically returned for BLAST results although a link is provided
to view the results in standard format.
The query is shown, along with the length, and the number of hits for total proteins, and the proteins represented by the genus-level clusters.
Results are returned in a collapsed table format. Genus level clusters are represented with a plus (+) sign
at each level, which can be expanded. The table is sortable by organism name and by BLAST score. As only one protein from a genus-level cluster is chosen,
there will be no BLAST score nor E-value for the other proteins in a cluster as they are not searched when a query is submitted.
The table shows the organism name, protein name, accession, length, locus_tag, BLink, BL2seq, score, and E-value.
Links to taxonomy (organism name), protein (protein name, accession), and gene (locus_tag) entrez databases are available
for each protein. The BLink shows precomputed BLAST results for that specific protein, while the BL2seq link
runs a comparison of the query and that specific protein in the table using BLAST.
Return to top
BLAST Help
Help and information on BLAST are available from the main BLAST page.
Microbial genomes can be searched here.
Return to top
|