Assessing local influence in principal component analysis with application to haematology study data

Stat Med. 2007 Jun 15;26(13):2730-44. doi: 10.1002/sim.2747.

Abstract

In many medical and health studies, high-dimensional data are often encountered. Principal component analysis (PCA) is a commonly used technique to reduce such data to a few components that includes most of the information provided by the original data. However, PCA is known to be very sensitive to some abnormal observations. Therefore, it is essential to assess such sensitivity in PCA. In this paper, the assessments of local influence based on generalized influence function are developed under the case-weights and additive perturbation schemes, along with a discussion of the perturbation scheme and the generalized influence function approach. When perturbing different variables of the data, it is noted that the directions of the largest joint local influence for the eigenvalues are all the same. Moreover, these directions are completely determined by the score values of the observations, to which an approximate cut-off point is given. The proposed methods are applied to analyse a set of haematology study data for illustration. Results add new insights in finding influential observations in the studied data set.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Health Status
  • Health Surveys
  • Hematology / statistics & numerical data*
  • Humans
  • Inhalation Exposure / statistics & numerical data
  • Paint / analysis
  • Principal Component Analysis*