See
Genome Information for Homo sapiens
Background and Aims: RNA biomarkers derived from sloughed enterocytes would provide an ideal, non-invasive method for early detection of colorectal cancer (CRC) and precancerous adenomas. To realize this goal, a highly reliable method to isolate preserved human RNA from stool samples is needed. Here we develop a protocol to identify RNA biomarkers associated with CRC to assess the use of these biomarkers for noninvasive screening of disease.
Methods: Stool samples were collected from 454 patients prior to a colonoscopy. A nucleic acid extraction protocol was developed to isolate human RNA from 330 stool samples and transcript abundances were estimated by microarray analysis. This 330-patient cohort was split into a training set of 265 individuals to develop a machine learning model and a testing set of 65 individuals to determine the model’s ability to detect colorectal neoplasms.
Results: Analysis of the transcriptome from 265 individuals identified 200 transcript clusters as differentially expressed (p<0.03). These transcripts were used to build a Support Vector Machine (SVM) based model to classify 65 individuals within the testing set. This SVM algorithm attained a 95% sensitivity for precancerous adenomas and a 65% sensitivity for CRC (stage I-IV). The machine learning algorithm attained a specificity of 59% for healthy individuals and an overall accuracy of 72.3%.
Conclusions: We developed an RNA-based neoplasm detection model that is sensitive for CRC and precancerous adenomas. The model allows for non-invasive assessment of tumors and could potentially be used to provide clinical guidance for individuals within the screening population for colorectal cancer.
Overall design: Total RNA was isolated from 338 stool samples and the transcriptome was assessed using the Affymetrix Human Transcriptome Array 2.0. Differentially expressed genes were identified using the transcript fold change between healthy and diseased individuals. These transcriptomes were used to build a machine learning algorithm to classify individuals as diseased or not diseased.
Reference samples are 'normal,' cases are 'adenomas' or 'cancer'.
Accession | PRJNA388922; GEO: GSE99573 |
Data Type | Transcriptome or Gene expression |
Scope | Multiisolate |
Organism | Homo sapiens[Taxonomy ID: 9606] Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo; Homo sapiens |
Submission | Registration date: 1-Jun-2017 Griffith, McDonnell Genome Institute, Washington University School of Medicine |
Relevance | Medical |
Project Data:
Resource Name | Number of Links |
---|
GEO DataSets | 1 |
GEO Data DetailsParameter | Value |
---|
Data volume, Spots | 23836774 |
Data volume, Processed Mbytes | 547 |
Data volume, Supplementary Mbytes | 6797 |