Tree-based disease classification using protein data

Proteomics. 2003 Sep;3(9):1673-7. doi: 10.1002/pmic.200300520.

Abstract

A reliable and precise classification of diseases is essential for successful diagnosis and treatment. Using mass spectrometry from clinical specimens, scientists may find the protein variations among disease and use this information to improve diagnosis. In this paper, we propose a novel procedure to classify disease status based on the protein data from mass spectrometry. Our new tree-based algorithm consists of three steps: projection, selection and classification tree. The projection step aims to project all observations from specimens into the same bases so that the projected data have fixed coordinates. Thus, for each specimen, we obtain a large vector of 'coefficients' on the same basis. The purpose of the selection step is data reduction by condensing the large vector from the projection step into a much lower order of informative vector. Finally, using these reduced vectors, we apply recursive partitioning to construct an informative classification tree. This method has been successfully applied to protein data, provided by the Department of Radiology and Chemistry at Duke University.

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Computational Biology / methods
  • Databases, Protein
  • Diagnosis, Computer-Assisted
  • Disease / classification*
  • Humans
  • Mass Spectrometry / methods*
  • Proteins / classification*
  • Proteins / metabolism
  • Retrospective Studies

Substances

  • Proteins