In metazoans, thousands of DNA replication origins (Oris) are activated to replicate DNA at each cell cycle. Although their timing of activation is better understood, their genomic organization and their genetic nature remain elusive. Here, we identified Oris by nascent strand (NS) purification and characterized their common features by performing a genome-wide analysis in both Drosophila and mouse cell lines. We show that in both species most CpG islands (CGI) contain Oris, although methylation is nearly absent in Drosophila, suggesting that this epigenetic mark is not crucial for defining the initiation event. Moreover, nascent strands at the borders of CGIs show a striking bimodal distribution, suggestive of a dual initiation event. We also found that Oris contain a unique nucleotide skew around NS peaks, characterized by G/T and C/A over-representation at 5’ and 3’ of the Ori sites, respectively. GC-rich elements are detected, which are good predictors of Oris, in both mouse and Drosophila, suggesting that common sequence features are part of metazoan Oris. In the heterochromatic chromosome 4 of Drosophila, Oris are strongly correlated with HP1 binding sites. At the chromosome level, we show that Oris are organized in Ori-rich and -poor regions that co-localize with early and late replicating domains, respectively. Finally, the genome-wide analysis was coupled with a DNA combing analysis of the in-vivo spacing of replication origins. The results suggest that Oris are in large excess, and organized in groups of site-specific but flexible origins that define replicons, where a single origin is used in each group. This organization provides both site specificity and Ori firing flexibility in each replicon, allowing possible adaptation of DNA replication to environmental cues and cell fates.
Overall design: Cells:
MEFs were derived from 13.5-day mouse embryos and used at passage 4 or 5. P19 cells and ES cell line CGR8 are pluripotent embryonic mouse cells. Kc cells are were embryonic Drosophila cells. Two biological independent samples for Drosophila, four independent samples for P19 cells, three independent samples from ES and three for MEF cells were used.For control drosophila Kc G2/M arrested cells, cells were synchronized in prometaphase by incubation with 4 mM thymidine overnight, release in fresh medium for 4 hours and incubation with nocodazole overnight.
Nascent Strand preparation:
Dividing cells (2.5-5 ×108) were lysed in DNAzol (Invitrogen) and digested with 200 μg/ml proteinase K. DNA was precipitated with ethanol and resuspended in 2 ml TEN20. Denatured genomic DNA was loaded onto a 30 ml neutral 5 to 30% sucrose gradient prepared in TEN300. Fractions corresponding to 0.5-2.5 kb were pooled and phosphorylated with T4 polynucleotide kinase (PNK) and digested with lambda exonuclease. NS were subjected to one or two further cycles of T4 PNK phosphorylation and lambda exonuclease digestion. Finally, NS were purified using the CyScribe GFX Purification Kit and amplified using the WGAII kit (Sigma). Hybridization, washing and scanning of microarrays were done by Nimblegen Service Laboratory.
Microarray Design:
Drosophila melanogaster samples were hybridized using 2.1M Nimblegen microarrays (Design ID 6262). These tiling arrays contain in total 2,164,511 oligonucleotide probes representing the non-repetitive regions of the Drosophila melanogaster genome (chromosome 2L, 2R, 3L, 3R, 4 and X; Flybase release 4.3).
All the processed data (GSM720962, GSM721024) were generated using the BDGP/Flybase release 4 of the Drosophila melanogaster genome assembly (UCSC dm2, April 2004).
Mouse samples were hybridized using the Nimblegen 389K tiling arrays (Design ID 4095) which cover 60.4 Mb of non-repetitive DNA sequences in chromosome 11 (56.6-117 Mb). All the processed data (GSM718701, GSM718735, GSM718744) were generated using the UCSC mm8 (NCBI Build 36, February 2006) of the Mus musculus genome assembly.
Data normalization processed data compilation:
Experimental (Cy5) and control (Cy3) signal intensities quantified and provided by Nimblegen were converted into log2 ratios (log2 (Cy5/Cy3)). The Lowess normalization method was applied to eliminate intensity-dependent variations in dye bias (Yang et al. 2002). A sliding median window with a length of 5 oligonucleotide probes was used to smooth the signal. Mode (m) and s (median absolute deviation) of normalized log2 ratios were computed. Assuming that the normal distribution (specified by m and s) covered the entire background noise (non-significant signals), for each probe, one p-value was computed by applying the false discovery rate (FDR) correction (Benjamin and Hochberg 1995). To create processed data, normalized log2 ratios of replicate samples were combined by averaging the values at the corresponding genomic positions and the corrected p-values were combined using a Chi-Square distribution (Fisher 1932).
Less...