NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM3402512 Query DataSets for GSM3402512
Status Public on Dec 19, 2018
Title E15_red
Sample type SRA
 
Source name Pancreatic progenitor cells
Organism Mus musculus
Characteristics age: E15.5
genotype: Neurog3-Cre; Rosa26mTmG
cell type: FACS sorted mTomato+ E15.5 pancreatic cells
Growth protocol Neurog3-Cre; Rosa26mT/mG embryos were dissected at E15.5 or E18.5, dissociated using Trypsin, and sorted for mTomato+ and mGreen+ cells. NEUROG3-eGFP hESCs were differentiated using Rezania et al. protocol and sorted for GFP+ cells.
Extracted molecule total RNA
Extraction protocol The 10x Genomics ChromiumTM controller and Single Cell 3’ Reagent Kits v2 (Pleasanton, CA, USA) were used to generate single cell libraries. Briefly, cells were counted following FACS and cell suspensions were loaded for a targeted cell recovery of 3000 cells per channel.The microfluidics platform was used to barcode single cells using Gel Bead-In-Emulsions (GEMs). RT is performed within GEMs, resulting in barcoded cDNA from single cells.
The full length, barcoded cDNA is PCR amplified followed by enzymatic fragmentation and SPRI double sided size selection for optimal cDNA size. End repair, A-tailing, Adaptor Ligation, and PCR are performed to generate the final libraries that have P5 and P7 primers compatible with Illumina sequencing. The libraries were pooled and sequenced using an Illumina NextSeq500 platform with a 150 cycle High Output v2 kit in paired-end format with 26 bp Read 1, 8 bp I5 Index, and 85 bp Read 2.
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model Illumina NextSeq 500
 
Data processing Alignment to reference genome and generation of UMI counts performed with cellranger counts, with parameter "--cells=5000".
Samples can be merged using cellranger aggr, with parameter "--normalize=mapped"
10x output files are read into Scater R package for quality control steps.
1. Load 10x raw matrix output folder into a SingleCellExperiment object using read10xResults.
2. Filter out cells that have less than 1000 transcripts. The output matrix includes all the barcodes used, including barcodes associated to empty GEMs, which are background noise. Here, we are assuming a cell-containing GEM should have more than 1000 RNA transcripts.
3. Gene symbols are used as row names from now on, duplicated gene names, which are genes corresponding to multiple ensemble id, will have an additional number suffix attached to the gene name. (e.g. gene_1)
4. (Optional: if dealing with aggregated dataset) Uses the suffix attached to each barcode to obtain library origin information and add this information to colData to the object. (E.g., the attachment of “-1” means coming from red population while “-2” means from green. The order depends on the aggregation csv file.
Scater R script
5. Filter out genes that are not expressed by any cells; the output matrix includes all genes in the reference transcriptome even some of them are not expressed at all, so this step is removing those unexpressed genes.
6. Select out the mitochondrial DNAs by using searching for gene names starting with “mt-” (“MT-” for human) and set them as Spike-in using isSpike function.
7. Calculate QC Metrics using Scater’s calculateQCMetrics with use_spikes parameter set to true and filter out genes base on total counts, total features and percentage of cells transcribing the mitochondrial DNAs, using isOutlier function. Cells are discarded if a cell has one of them above feature greater than three median absolute deviation away from the mean.
(total_counts: filter on both ends, log10 scale; total_features: filter on the lower end, non-logged; pct_counts_MT: filter on the higher end, non-logged.
8. Another automatic filtering algorithm, runPCA is used to filter out genes and cells base on PCA.
9. After removing out invalid genes and cells, genes are selected for one last time to keep only genes that are expressed by more than three cells.
Post filtered Scater object is loaded into Seurat R package for downstream biological analysis.
1. Seurat object is then created using the post-scater-filtering raw counts, with an additional minimal threshold of genes being expressed in three cells and cells having 500 genes.
2. Seurat object is then undergone log-normalization using Seurat’s NormalizeData with default size factor of 10000.
3. Data is then scaled while regressing out the effect of cell cycle phases.
4. Variable genes are then found using mean expression and LogVMR dispersion function.
5. The variable genes are inputted for PCA.
6. Cluster division is then performed using the first 15 principle components, which utilizes a “shared nearest neighbor (SNN) modularity optimization-based clustering algorithm” (not K-means clustering), with resoultion of 0.6.
7. TSNE embedding is then performed, which allows the visualization of data in reduced dimensions.
8. Markers that are dimensionally expressed in each cluster comparing to all other cells in the dataset are used for cell type identification.
Genome_build: Customized reference transcriptomes were made by adding tdTomato and eGFP gene sequences to mm10 v1.3.1 mouse transcriptome and adding eGFP to GRCh38 v2.0.0 human transcriptome. These two customized transcriptomes are uploaded as supplementary files of this submission.
Supplementary_files_format_and_content: Three files, barcodes.csv, genes.tsv and matrix.mtx, outputted from cellranger counts pipeline, form the raw UMI counts matrix. Seurat normalized counts matrix is also included. Cells and genes in this matrix have undergone the whole quality control and filtering process as described and they are natural log-transformed, with a scale factor of 10000, using Seurat's NormalizeData function, with default parameter. For each normalized counts matrix, each row represents a cell and each column is a gene. Cell metadata is also stored in columns: column 1 is cell barcode; column 2 is cell type assignment, column 3 is cell phase assignment. For aggregated libraries: E15_aggr -E18_aggr, E18_endocrine_aggr -the raw UMI counts matrix are treated as raw data because do not have corresponding bam file. Therefore, only the normalized counts matrix is included here as processed data.
 
Submission date Sep 26, 2018
Last update date Dec 19, 2018
Contact name Francis C Lynn
E-mail(s) fclynn@interchange.ubc.ca
Phone 6048752000
Organization name University of British Columbia
Department Surgery
Lab Lynn
Street address 950 28th Ave W
City Vancouver
State/province BC
ZIP/Postal code V5Z 4H4
Country Canada
 
Platform ID GPL19057
Series (1)
GSE120522 Single cell transcriptome profiling of mouse and hESC-derived pancreatic progenitors
Relations
BioSample SAMN10133633
SRA SRX4741782

Supplementary file Size Download File type/resource
GSM3402512_E15_red_barcodes.tsv.gz 2.2 Mb (ftp)(http) TSV
GSM3402512_E15_red_genes.tsv.gz 212.8 Kb (ftp)(http) TSV
GSM3402512_E15_red_matrix.mtx.gz 55.7 Mb (ftp)(http) MTX
GSM3402512_E15_red_normalized_counts_matrix.txt.gz 20.8 Mb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap