Relationship of SARS-CoV to other pathogenic RNA viruses explored by tetranucleotide usage profiling

BMC Bioinformatics. 2003 Sep 20:4:43. doi: 10.1186/1471-2105-4-43. Epub 2003 Sep 20.

Abstract

Background: The exact origin of the cause of the Severe Acute Respiratory Syndrome (SARS) is still an open question. The genomic sequence relationship of SARS-CoV with 30 different single-stranded RNA (ssRNA) viruses of various families was studied using two non-standard approaches. Both approaches began with the vectorial profiling of the tetra-nucleotide usage pattern V for each virus. In approach one, a distance measure of a vector V, based on correlation coefficient was devised to construct a relationship tree by the neighbor-joining algorithm. In approach two, a multivariate factor analysis was performed to derive the embedded tetra-nucleotide usage patterns. These patterns were subsequently used to classify the selected viruses.

Results: Both approaches yielded relationship outcomes that are consistent with the known virus classification. They also indicated that the genome of RNA viruses from the same family conform to a specific pattern of word usage. Based on the correlation of the overall tetra-nucleotide usage patterns, the Transmissible Gastroenteritis Virus (TGV) and the Feline CoronaVirus (FCoV) are closest to SARS-CoV. Surprisingly also, the RNA viruses that do not go through a DNA stage displayed a remarkable discrimination against the CpG and UpA di-nucleotide (z = -77.31, -52.48 respectively) and selection for UpG and CpA (z = 65.79,49.99 respectively). Potential factors influencing these biases are discussed.

Conclusion: The study of genomic word usage is a powerful method to classify RNA viruses. The congruence of the relationship outcomes with the known classification indicates that there exist phylogenetic signals in the tetra-nucleotide usage patterns, that is most prominent in the replicase open reading frames.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cattle
  • Coronavirus 229E, Human / genetics
  • Coronavirus, Bovine / genetics
  • Dinucleotide Repeats / genetics
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / statistics & numerical data
  • Gene Expression Regulation, Viral / genetics*
  • Genome, Viral
  • Hemorrhagic Disease Virus, Rabbit / genetics
  • Humans
  • Microsatellite Repeats / genetics*
  • Multivariate Analysis
  • Open Reading Frames / genetics
  • Phylogeny
  • RNA Viruses / genetics*
  • RNA Viruses / pathogenicity*
  • RNA, Viral / biosynthesis
  • RNA, Viral / genetics
  • Rabbits
  • Severe acute respiratory syndrome-related coronavirus / genetics*
  • Severe acute respiratory syndrome-related coronavirus / pathogenicity*

Substances

  • RNA, Viral