The current multistep carcinogenesis models of colon cancer do not fully capture the genetic heterogeneity of the disease, which is additionally complicated by the presence of passenger and driver genetic alterations. The aim of the present study was to select in the context of this significant heterogeneity additional genes functionally related to colon cancer development. Methods: High-throughput copy number and gene expression data of 36 microsatellite stable sporadic colon cancers resected from patients of a single Institution characterized for mutations in APC, KRAS, TP53 and loss of 18q were analyzed. Genes whose expression correlated with the underlying copy number pattern were selected and their association with the above listed mutations and overall survival was evaluated. Results: Gain of 20q was strongly associated with TP53 mutation, and overall survival with alterations on 7p, 8p, 13q, 18q and 20q. An association with 18q loss and gain of 8q24 was also observed. New candidate genes with a potential role in colon cancer are PLCG1 on 20q, DBC1 on 8q21 and NDGR1 on 8p24. In addition an unexpected pattern of loss and mutability was found in the region upstream of KRAS gene. Conclusions: By integrating copy number alterations with gene expression and mutations in colon cancer associated genes we have developed a strategy that identifies previously known molecular features and additional players in the molecular landscape of colon cancer.
Overall design: A total of 48 sporadic colon cancer samples were analyzed by Affymetrix Mapping 250K Nsp SNP Arrays and 36 of them were also analyzed by Affymetrix Human Exon 1.0 ST Array [transcript (gene) version]. Short summary: Expression data was correlated to copy number data to identify genes whose expression was induced by copy number changes. Gene dosage candidates were then evaluated for their association with gene mutation status of APC, KRAS and TP53, loss of heterozigosity of 18q and overall survival. Long summary: Raw intensity .CEL files of the SNP arrays were processed with Chromosome Copy Number Analysis Tool (v.1.5.6 Affymetrix, Santa Clara, CA) to identify chromosomal gains and losses. Forty eight normal samples from the HapMap project supplied by Affymetrix were used as an un-paired reference set [http://www.affymetrix.com/support/technical/sample_data/500k_data.affx]. All genomic coordinates of the SNP array probes were mapped to the Human Mar. 2006 assembly the UCSC genome browser. Raw intensity .CEL files of the exon arrays were processed with the Robust Multi-Array implementation of Affymetrix Power Tools (v.1.8.6) using the core set of features (22011 probesets). All plots and analysis steps following this processing were done using the R programming language version 2.9.0Bioconductor packages. The identification of statistically significant segments of aberration in the copy number data was performed using the default parameters of the KC-SMART algorithm and 1000 permutations (KCsmart v.2.2.0). The identification and annotation of genes within each aberrant segment was performed using biomaRt v.2.0.0. The list of genes was further annotated by using the cancerGenes resource, Cancer Gene Census and the list of breast and colon CAN genes listed in Wood et al. The genomic landscapes of human breast and colorectal cancers. Science 2007;318:1108-1113. Gene dosage effects across 36 samples for which gene expression and copy number data was available was assessed by evaluating the Spearman correlation of the raw continuous copy number (log-ratios) expression (log-intensity) for each genomic region surrounding each gene falling within the segments identified by KC-SMART. Prior to performing this task, to reduce the total number of correlation tests to perform, we filtered the gene expression dataset by removing all entries having no Gene Symbol annotation 17291 probesets and half the remaining dataset exhibiting the lowest variance resulting in 8645 probesets. Category package version 2.10.0 was used to apply a linear model-based test to detect enrichment of systematic high correlation in specific chromosomal bands taking into consideration the hierarchical structure of the bands. Gene expression class comparisons and survival analysis of the selected gene dosage candidates were performed using two sample t-tests and Cox proportional hazards regression. P-value adjustment of the correlation tests and the gene expression associations was performed using the step-up false discovery rate (FDR) controlling procedure of Benjamini and Hochberg. Raw data, supplementary methods and figures supplied as reproducible documentation (Sweave file) available from the Web Link.
Clinical information of samples contains: gender (F=female, M=male), age (years), dimension (cm), stage (Duke's I-IV), status (0=Alive, 1=Dead), survival (months), apc (0=wt, 1=mut), kras (0=wt, 1=mut), tp53 (0=wt, 1=mut), chr18qloh: (0=no LOH, 1=LOH).
Less...