NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1488129 Query DataSets for GSM1488129
Status Public on Jun 09, 2015
Title ESC_B11
Sample type SRA
 
Source name Embryonic stem cells
Organism Mus musculus
Characteristics cell type: Embryonic Stem Cells
growth medium: 10% FBS + 10ng/mL LIF
cell cycle stage: G1
batch barcode: TACAAG
cell barcode: GTATAG
Extracted molecule polyA RNA
Extraction protocol Single cell RNA sequencing (no RNA extraction, FACS sorting into 0.6µL lysis buffer containing 10% NP-40)
Polyadenylation-site specific single cell transcriptome sequencing using the BATSeq protocol. (Velten et al., in preparation). The protocol is highly multiplexed and uses both batch and cell barcodes, which are given in the 'Characteristics' section. 4 independent sequencing runs were performed.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina MiSeq
 
Description Sample 56
Single Cell
Data processing Demultiplexed using batch and cell barcode (in fwd. and rev. read, respectively). Pyhton2.7/HTSeq
Trim off 8 bases of the reverse read (molecular barcode) and store with first read. Discard reverse read. Perl5
Trim off polyA from 3' end of forward read. Discard reads not ending in a poly A tail. Python2.7/HTSeq
Align reads to Mus musculus genome GRCm38 (ensembl version 38.73), with the sequences of the ERCC control RNA spike ins appended to it. Gsnap
Filter out reads of non-unique alignment, low alignemt quality, alignment to A-rich genomic regions, A-rich reads. Python2.7/HTSeq
Count unique molecular barcodes aligning to different genomic features. Python2.7/HTSeq
Remove barcode counts stemming from similar molecular barcodes with a Hamming distance of less than 1. Perl5
Collapse putative polyadenylation sites stemming from genomic positions within 12bp distance. Perl5
For the summary files, 27 cells were removed because less than 1000 molecular barcodes were observed. R-3.0.2
Genome_build: GRCm38
Supplementary_files_format_and_content: *isoforms.counts.gz: comma-separated value files containing the ensembl gene ID of the closest genomic feature, the 3' alignment position (polyadenylation site), the Barcode Count, the Read Count, a brief description of the alignment site, the distance to the closest annotated transcript termination site
Supplementary_files_format_and_content: TotalGeneCounts.csv contains total counts of molecular barcodes by gene (identified by ensembl gene ID, column 1) for 107 cells which pass the quality control criteria. (columns 2-108); includes some summarisation of the processed data files (removal of polyadenylation site information).
Supplementary_files_format_and_content: TotalIsoformCounts.csv contains molecular barcode and read counts (column 3+4) by gene (column 1), polyadenylation site in genomic coordinates (column 2), and Cell (column 7) for 107 cells which pass the quality control criteria. Additional information on distance to closest annotated transcript termination site (column 6) and a brief description of the alignment position (column 5) is also given.
 
Submission date Aug 26, 2014
Last update date May 15, 2019
Contact name Lars Velten
E-mail(s) lars.velten@crg.eu
Organization name CRG
Department Bioinformatics and Genomics
Lab Velten lab
Street address C. Dr. Aiguader 88, P05
City Barcelona
ZIP/Postal code 08003
Country Spain
 
Platform ID GPL16417
Series (1)
GSE60768 BATSeq of ESC-FCS/ESC-2i/NSC
Relations
BioSample SAMN03009037
SRA SRX687163

Supplementary file Size Download File type/resource
GSM1488129_testMPX_plate1_B_11_ESC-pA.isoforms.counts.txt.gz 31.5 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap