NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1620447 Query DataSets for GSM1620447
Status Public on May 21, 2015
Title PS034_R2_1_480
Sample type SRA
 
Source name mixture of U87 human glioma cells and MCF10a human breast cancer cells
Organism Homo sapiens
Characteristics cell type: mixture of U87 human glioma cells and MCF10a human breast cancer cells
Growth protocol Cells were cultured in DMEM supplemented with 10% FBS.
Extracted molecule polyA RNA
Extraction protocol RNA was extracted from individual cells in individual microfluidic chambers following cell lysis by Triton X-100 and freeze-thaw.
mRNA from individual cells was reverse transcribed with a primer containing a cell-identifying barcode followed by oligo(dT). Following second strand synthesis using DNA Polymerase I and reagents from the MessageAmp II kit (Ambion), ds-cDNA from all barcoded individual cells was pre-amplified by in vitro transcription using T7 RNA polymerase in a pool. The pools of amplified RNA from each lane of the microfluidic device were individually reverse transcribed using barcoded random hexamers containing both a unique molecular identifier (random 8-base barcode) followed by a lane-identifying barcode (6-base barcode). Illumina adapters were inserted on either end of the library during the two previous reverse transcription steps and were used to then enrich the library by PCR. The pooled library was sequenced on an Illumina NextSeq 500.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina NextSeq 500
 
Description PS034_1
3'-end RNA-Seq
processed data file: PS034_R2_1.txt
Data processing We collected the set of reads that uniquely mapped to the transcriptome and assigned an address comprised of its cell-identifying barcode, gene, UMI, and mapping position. We then filtered the reads to identify unique molecules. Reads with identical addresses were collapsed to a single molecule. In addition, reads with identical cell-identifying barcodes, genes, mapping positions, and with UMIs having a Hamming distance less than or equal to two were collapsed to a single molecule. Finally, because the mapping positions produced by STAR do not necessarily correspond to the beginning of a read, we further considered reads to originate from identical molecules if they had identical genes, cell-identifying barcodes, UMIs with a Hamming distance less than or equal to two, and a mapping position within five bases.
To identify barcodes that correspond to actual individual cells in our device, we filtered the observed cell-identifying barcodes by progressively downsampling the corresponding gene profiles to the same number of total reads and assessing the number of unique molecules detected from each cell-identifying barcode. After excluding cell-identifying barcodes having zero associated molecules, we found the distribution of associated unique molecules to be bimodal, with one small subpopulation having nearly as many unique molecules as reads at low read totals. We found the size of this subpopulation to be in excellent agreement with our device imaging data. We took these 598 profiles to represent the actual individual cells captured in our device with a barcoded bead.
We conducted more detailed analysis on 370 single-cell profiles with the highest coverage in our data set across all five lanes of the microfluidic device. Raw fastq data from read 2 of those 370 cells is provided here. Note that the UMI for each read appears in the comment line of each fastq entry.
The processed data files contain the number of molecules counted for each gene based on counting reads with HTSeq and filtering the UMIs to identify unique molecules. If two UMIs had a Hamming distance less than three, they were considered to be the same UMI. If two reads with identical UMIs mapped to the transcriptome to within 6 bases of each other, they were considered identical molecules.
Genome_build: hg19
Supplementary_files_format_and_content: The processed data files contain the number of molecules counted for each gene based on counting reads with HTSeq and filtering the UMIs to identify unique molecules.
 
Submission date Feb 27, 2015
Last update date May 15, 2019
Contact name Peter A Sims
E-mail(s) pas2182@columbia.edu
Organization name Columbia University
Street address 3960 Broadway, Lasker 203AC
City New York
State/province NY
ZIP/Postal code 10032
Country USA
 
Platform ID GPL18573
Series (1)
GSE66357 Scalable Microfluidics for Single Cell RNA Printing and Sequencing
Relations
BioSample SAMN03379340
SRA SRX893045

Supplementary data files not provided
SRA Run SelectorHelp
Raw data are available in SRA
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap