Assessing optimal neural network architecture for identifying disease-associated multi-marker genotypes using a permutation test, and application to calpain 10 polymorphisms associated with diabetes

B V North; D Curtis; P G Cassell; G A Hitman; P C Sham

doi:10.1046/j.1469-1809.2003.00030.x

Assessing optimal neural network architecture for identifying disease-associated multi-marker genotypes using a permutation test, and application to calpain 10 polymorphisms associated with diabetes

Ann Hum Genet. 2003 Jul;67(Pt 4):348-56. doi: 10.1046/j.1469-1809.2003.00030.x.

Authors

B V North¹, D Curtis, P G Cassell, G A Hitman, P C Sham

Affiliation

¹ Academic Department of Psychiatry, Barts and The London Queen Mary's School of Medicine and Dentistry, London E1 1BB, UK. b.v.north@qmul.ac.uk

PMID: 12914569
DOI: 10.1046/j.1469-1809.2003.00030.x

Abstract

Biallelic markers, such as single nucleotide polymorphisms (SNPs), provide greater information for localising disease loci when treated as multilocus haplotypes, but often haplotypes are not immediately available from multilocus genotypes in case-control studies. An artificial neural network allows investigation of association between disease phenotype and tightly linked markers without requiring haplotype phase and without modelling any evolutionary history for the disease-related haplotypes. The network assesses whether marker haplotypes differ between cases and controls to the extent that classification of disease status based on multi-marker genotypes is achievable. The network is "trained" to "recognise" affection status based on supplied marker genotypes, and then for each multi-marker genotype it produces outputs which aim to approximate the associated affection status. Next, the genotypes are permuted relative to affection status to produce many random datasets and the process of training and recording of outputs is repeated. The extent to which the ability to predict affection for the real dataset exceeds that for the random datasets measures the statistical significance of the association between multi-marker genotype and affection. This permutation test performs well with simulated case-control datasets, particularly when major gene effects are present. We have explored the effects of systematically varying different network parameters in order to identify their optimal values. We have applied the permutation test to 4 SNPs of the calpain 10 (CAPN10) gene typed in a case-control sample of subjects with type 2 diabetes, impaired glucose tolerance, and controls. We show that the neural network produces more highly significant evidence for association than do single marker tests corrected for the number of markers genotyped. The use of a permutation test could potentially allow conditional analyses which could incorporate known risk factors alongside marker genotypes. Permuting only the marker genotypes relative to affection status and these risk factors would allow the contribution of the markers to disease risk to be independently assessed.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Calpain / genetics
Case-Control Studies
Computer Simulation
Diabetes Mellitus, Type 2 / genetics
Genetic Testing / methods*
Genotype
Haplotypes / genetics*
Humans
India
Neural Networks, Computer*
Polymorphism, Genetic / genetics*

Substances

Calpain
calpain 10