Disease risk estimation by combining case-control data with aggregated information on the population at risk

Biometrics. 2015 Mar;71(1):114-121. doi: 10.1111/biom.12256. Epub 2014 Oct 28.

Abstract

We propose a novel statistical framework by supplementing case-control data with summary statistics on the population at risk for a subset of risk factors. Our approach is to first form two unbiased estimating equations, one based on the case-control data and the other on both the case data and the summary statistics, and then optimally combine them to derive another estimating equation to be used for the estimation. The proposed method is computationally simple and more efficient than standard approaches based on case-control data alone. We also establish asymptotic properties of the resulting estimator, and investigate its finite-sample performance through simulation. As a substantive application, we apply the proposed method to investigate risk factors for endometrial cancer, by using data from a recently completed population-based case-control study and summary statistics from the Behavioral Risk Factor Surveillance System, the Population Estimates Program of the US Census Bureau, and the Connecticut Department of Transportation.

Keywords: Aggregated information; Estimating equation; Spatial epidemiology; Spatial point process.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Case-Control Studies*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Endometrial Neoplasms / epidemiology*
  • Epidemiologic Methods
  • Female
  • Humans
  • Information Storage and Retrieval / methods
  • Models, Statistical*
  • Prevalence
  • Risk Assessment / methods*