GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Series GSE40522 Query DataSets for GSE40522
Status Public on Aug 31, 2012
Title ENCODE PSU Hardison RnaSeq
Project Mouse ENCODE
Organism Mus musculus
Experiment type Expression profiling by high throughput sequencing
Summary This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison If you have questions about the Genome Browser track associated with this data, contact ENCODE (
Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function.
The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA with a function preserved in mammals versus that with a function in only one species will be discovered.
One of the epigenetic features most closely related to genomic activity is the production of stable RNA, including transcripts from both protein-coding genes and noncoding transcripts. These genomic compilations of transcripts, or transcriptomes, are primary determinants of the way cells function, respond and differentiate, both by the production of proteins translated from coding transcripts and the regulatory activity of untranslated non-coding transcripts. Non-coding RNA's regulate gene expression through diverse mechanisms ranging from reducing chromatin accessibility (affecting large regions or whole chromosomes) to precise fine-tuning of transcription from specific genes, e.g. via RNAi.
Even though a large proportion of mammalian genomes is transcribed, many of the transcribed segments have yet to be assigned any function. The ENCODE project aims to create a comprehensive, quantitative annotation of the human transcriptome in several cell and tissue types as well as to understand regulation of transcriptomes by establishing the relationship between regulatory factors and their targets. Mapping the mouse transcriptome in similar tissues will allow us to discern conservation of transcriptome profiles between mouse and human and to discover species-specific transcription patterns, and to infer conserved versus species-specific regulatory mechanisms. The results will have a significant impact on our understanding of the evolution of gene regulation.

For data usage terms and conditions, please refer to and
Overall design Cells were grown according to the approved ENCODE cell culture protocols (
Total RNA was extracted from 5-10 million cells using TRIzol reagent. This was followed by mRNA selection, fragmentation and cDNA synthesis, which were performed as described previously (Mortazavi et al., 2009). Double-stranded cDNA samples were processed for library construction for Illumina sequencing, using the Illumina ChIP-seq Sample Preparation Kit.
Strand-specific libraries were generated in a similar manner, except for a couple of modifications described previously (Parkhomchuk et al., 2009). Briefly, instead of dTTP, dUTP was used during second-strand cDNA synthesis to label the second-strand cDNA. During library preparation, the dUTP-labeled cDNA was treated with Uracil N Glycosylase, prior to the PCR amplification step. This was done to remove uracil from the second-strand, following which the DNA was subjected to high heat to facilitate abasic scission of the second strand.
Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are considered as biological replicates.
Sequencing was done on the Illumina Genome Analyzer IIx and on the Illumina HiSeq 2000. FastQ files for the resulting sequence reads (single read and paired-end, directional and non-directional) were moved to a data library in Galaxy, and tools implemented in Galaxy were used for further processing via workflows ((Giardine et al., 2005), (Blankenberg et al., 2010 ), (Goecks et al., 2010). Data processing was also performed on the CyberSTAR high-performance computing system at Penn State. The reads were mapped to the mouse genome (mm9 assembly) using the program TopHat ((Langmead et al., 2009) and (Trapnell et al., 2009)). Signal tracks were created using BEDtools ((Quinlan et al., 2010)) and SAMtools ((Li, Handasaker et al., 2009)).
Web link
Contributor(s) Hardison R, Paulson R, Bodine D, Weiss M, Mishra T, Keller C, Giardine B, Mishra T, Taylor J
Citation missing Has this study been published? Please login to update or notify GEO.
BioProject PRJNA66167
Submission date Aug 31, 2012
Last update date May 15, 2019
Contact name ENCODE DCC
Organization name ENCODE DCC
Street address 300 Pasteur Dr
City Stanford
State/province CA
ZIP/Postal code 94305-5120
Country USA
Platforms (2)
GPL11002 Illumina Genome Analyzer IIx (Mus musculus)
GPL13112 Illumina HiSeq 2000 (Mus musculus)
Samples (17)
GSM995525 PSU_RnaSeq_MEP_paired-end (superseded by GSE90218)
GSM995526 PSU_RnaSeq_MEL_DMSO_2.0pct_single (superseded by GSE93476)
GSM995527 PSU_RnaSeq_G1E-ER4_diffProtD_14hr_paired-end (superseded by GSE90211)
SRA SRP015338

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE40522_RAW.tar 3.3 Gb (http)(custom) TAR (of BIGWIG)
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap