U.S. flag

An official website of the United States government

Format

Send to:

Choose Destination

SRX1388732: GIAB Illumina 6kb matepair-HG002_NA24385_son_NIST_Stanford_Illumina_6kb_matepair_MPHG002-23100077
1 ILLUMINA (Illumina HiSeq 2500) run: 215.4M spots, 53.8G bases, 23Gb downloads

Design: Library Prep: Mate Pair libraries were generated using Nextera Mate Pair Sample Preparation Kit (Illumina, Cat# FC-132-1001). Briefly, 4 µg of high molecular weight genomic DNA from the NIST Reference Materials (or from Coriell for the Asian parents) was fragmented to about 7 kb in a 400 mL tagmentation reaction containing 12 µL of Tagment Enzyme at 55°C for 30 minutes. The tagmented DNA fragments were purified with Zymo Genomic DNA Clean & ConcentratorTM Kit (Zymo Research, Cat# D4010). The gap in the tagmented DNA was filled with a Strand Displacement Polymerase in a 200 µL strand displacement reaction at 20°C for 30 minutes. DNA was then purified with AMPure XP Beads (0.5x vol, Beckman Coulter, Cat# A63880) and size-selected by 0.6% agarose gel electrophoresis in 0.5x TBE buffer. The 6-9 kb fragments were excised from gel and DNA was recovered using a ZymocleanTM Large Fragment DNA Recovery Kit (Zymo Research, Cat# D4045). Up to 600 µg of DNA was then circulated overnight at 30°C with Circularization Ligase in a 300 µL reaction. After overnight circularization, the uncirculated linear DNA was removed by Exonuclease digestion. Both DNA Ligase and Exonuclease were inactivated by heat treatment and the addition of Stop Ligation Buffer. Circularized DNA was then sheared to smaller sized fragments (300-1000 bp) using Covaris S2 with T6 (6x32 mm) glass tube (Covaris, Part# 520031 and 520042) under these conditions: Intensity of 8, Duty Cycle of 20%, Cycles Per Burst of 200, Time of 40 sec, Temperature of 6-8°C. The sheared DNA fragments that contain the biotinylated junction adapter are mate pair fragments. These fragments were isolated by binding to Dynabeads M-280 Streptavidin Magnetic Beads (Invitrogen, Part# 112-05D) in Bead Bind Buffer. The unbiotinylated molecules in solution are unwanted genomic fragments that are removed through a series of washes. All downstream reactions were carried out on bead and beads were washed between successive reactions. The sheared DNA was first end-repaired to generate blunt ends followed by an A-Tailing reaction to add a single “A” nucleotide to the 3’ ends of the blunt fragments. Then the Illumina T-tailed indexing adapters were ligated to the A-tailed fragments. The adapter-ligated fragments were PCR amplified [98°C/1 min, 11 cycles of (98°C/10 sec, 60°C/30 sec, 72°C/30s), 72°C/5 min , 4°C /hold] to generate the final library. The amplified library was purified using AMPure XP Beads (0.67x vol) and eluted in Resuspension Buffer. The size distribution of the library was determined by running a sample on an Agilent Technologies 2100 Bioanalyzer. Library concentration was measured by the Qubit dsDNA HS Assay Kit (Life Technologies, Cat# Q32851). Sequencing: Pooled Mate-Pair libraries were sequenced on an Illumina HiSeq 2500 in Rapid mode (v1) with 2x101 bp paired-end reads. The loading concentration was 9.5 pM. This Initial run was for library QC purposes prior to running high throughput. The Mate-Pair libraries were also sequenced on an Illumina HiSeq 2500 in high output mode (v4) with 2x125 bp paired-end reads. Libraries were sequenced on individual lanes (not pooled). The template loading concentration for each lane was adjusted based on the cluster density from the QC run. Two replicate flowcells were sequenced simultaneously, each with 6 lanes of mate-pair libraries. Bioinformatics: To assess duplication rate, coverage, and insert size of the mate-pair libraries, reads were stripped of adapter sequences. Read pairs were removed if the sequence of one or both mates was less than 20 bp after adapter stripping, or if the adapter sequence was at the beginning rather the end of a read (indicating the read inserts were likely to be in inward-facing F/R orientation rather than the expected outward-facing R/F orientation). Reads were then mapped to the hg19 reference genome using “bwa mem” (Li 2013) with default settings, and duplicates were marked using samblaster (Faust 2014). The high rate of PCR duplicates (close to 50% in some libraries) resulted in lower than expected sequence coverage (13–17x average across all sequenced genomic positions). A more relevant metric for mate-pair data is the physical coverage, which measures the number of inferred fragments that cover a particular genomic position (including both the sequenced ends as well as the unsequenced genomic region between the ends). Because the empirical insert size average was between 6–7kb per individual, the physical coverage of the genome was quite high (>400x per individual). BAMs were stripped of duplicate reads to reduce file size, but the full data are available in fastq format.
Submitted by: NCBI
Study: A public-private-academic consortium, Genome-in-a-Bottle (GIAB), hosted by NIST to develop reference materials and standards for clinical sequencing
show Abstracthide Abstract
The Genome in a Bottle Consortium (www.genomeinabottle.org) is a collaboration between NIST, FDA, NCBI, other government agencies, academic sequencing groups, sequencing technology developers, and clinical laboratories. A principal motivation for this consortium is to develop widely accepted reference materials and accompanying performance metrics to provide a strong scientific foundation for the development of regulations and professional standards for clinical sequencing. NIST is developing large batches of human genome DNA from several cell lines for NIST Reference Materials (RMs), which will be characterized by the Consortium for homogeneity, stability, and sequence with as much sequencing technologies and library preparation methods as possible. Information from these datasets will be integrated to form a high-confidence set of genotype calls, which can be used by clinical and research laboratories to understand performance of their sequencing and bioinformatics methods. NCBI is serving as the DCC and repository for the raw sequencing reads, mapped reads, genotypes, and other details for each sample on a dedicated FTP site ( ftp://ftp-trace.ncbi.nih.gov/giab/ftp/ ). The pilot sample is NA12878 (HG001), and NIST received over 8,000 aliquots in April 2013, which will initially be distributed to partners in the Consortium to assist in characterization, and later will be distributed by NIST as Reference Material 8398, likely in March or April 2015. Samples from an Ashkenazim trio (son HG002-NA24385-huAA53E0, father HG003-NA24149-hu6E4515, and mother HG004-NA24143-hu8E87A9), and a Han Chinese trio (son HG005-NA24631-hu91BD69, father NA24694-huCA017E, and mother NA24695-hu38168C) from Personal Genome Project (PGP) are also candidate NIST reference materials and are currently being characterized. The Ashkenazim trio will be available both as NIST RMs 8391 (son only) and 8392 (entire trio). Only the son of the Asian trio will be a NIST RM (8393). DNA and cell lines for all samples are also available from Coriell, but the NIST RMs are from a single homogenized batch of DNA, so there may be small differences between the samples at Coriell and the NIST RMs. Details about the NIST Reference Materials, data, and future plans are at https://sites.stanford.edu/abms/content/giab-reference-materials-and-data. When the NIST RMs are available, they can be purchased from NIST at http://www.nist.gov/srm/, where a Report of Investigation describing the DNA will also be available.
Sample: NIST HG002 NA24385
SAMN03283347 • SRS817069 • All experiments • All runs
Organism: Homo sapiens
Library:
Name: HG002_NA24385_son_NIST_Stanford_Illumina_6kb_matepair_MPHG002-23100077
Instrument: Illumina HiSeq 2500
Strategy: WGS
Source: GENOMIC
Selection: size fractionation
Layout: PAIRED
Spot descriptor:
forward126  reverse

Runs: 1 run, 215.4M spots, 53.8G bases, 23Gb
Run# of Spots# of BasesSizePublished
SRR2832448215,352,02453.8G23Gb2015-10-27

ID:
1972454

Supplemental Content

Search details

See more...

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...