Requestor Affiliation Project Date of approval Request status Public Research Use Statement Technical Research Use Statement Addepalli, Kanakadurga NIH Scientific Management of data for Cancer Data Service under the Cancer Research Data Commons Oct06, 2022 approved I need access to these data sets to make the data submitted to CDS, findable and accessible through the NCI's Cloud Resources. The objective of this project is to manage the scientific data. The Cancer Data Service is a scientific data repository under the Cancer Research Data Commons. CDS makes the study data available and accessible on the cloud. To perform the scientific management of these data on the cloud, CDS needs to access the associated metadata. To design and develop technical and scientific administrative and managing capabilities for this data repository, we need access to these datasets and the SRA metadata. This list of datasets is expected to expand to include other data types in the future. Addepalli, Kanakadurga NIH NCI Cancer Genomics Cloud Pilots/Resources Evaluation Feb01, 2018 closed The NCI Cloud Resources cloud computing model allows scientists, clinicians, and researchers to access NCI-generated data and analyze these data with compute available on commercial cloud infrastructures. The co-localization of data and compute eliminates the need to download store petabyte-scale data and maintain a local compute environment. Cloud computing has been used as a low-overhead, cost-effective alternative to high performance computing, which might not be available at all research institutions. The objective of this project is to evaluate the NCI Cloud Resources, formerly known as NCI Cancer Genomics Cloud Pilots. These resources were funded by NCI and developed by the Broad Institute, the Institute for Systems Biology and Seven Bridges Genomics based on commercial cloud platforms, Google Cloud Platform and Amazon Web Service. (https://cbiit.cancer.gov/ncip/cloudresources) We assess and validate technical and scientific capabilities being continuously implemented on these platforms, which include analytical tools and workflows, using NCI Cancer datasets hosted on the platform. Currently, these cloud platforms host datasets from the Genomic Data Commons, which includes TCGA, TARGET, CCLE, as well as CPTAC and TCIA. This list of datasets is expected to expand to include other data types in the future. As part of the evaluation, we collaborate with the cancer research community, both intramurally and extramurally, using the platforms to analyze data by constructing proof-of-concept workflows and pipelines using above-mentioned NCI-generated data. Most of these collaborative projects work with controlled-access data in dbGaP. To date, we have developed containerized tools and workflows for whole exome sequencing (WES/WXS) variant/neoantigen prediction analysis, RNA-seq analysis, microbial sequencing (microbiome, metagenomics, pathogen) analysis, and data visualization. We collaborated with individual researchers to develop many of these tools and workflows. In addition, we offer education and training workshops for users of the platforms periodically and when requested. AKERMAN, MARTIN ENVISAGENICS, INC. Identifying splicing changes in pediatric AML Jan14, 2021 approved Immune-oncology (IO) therapies offers great promise in effectively treating cancers like AML, however, current tools for IO target identification focus on understanding the consequences of genomic mutations and are limited to the evaluation of peptide presentation mechanisms such as antigen processing and major histocompatibility complex (MHC) binding. Envisagenics has developed SpliceIO, a software platform for neoantigen discovery that detects splicing errors using only RNAseq data, which can effectively identify both MHC-presented and MHC-independent tumor neoantigens. Envisagenics will use SpliceIO to identify splicing-derived antigens from pediatric AML data. Acute leukemias are the most common childhood malignancies in the US pediatric population and account for most cancer-related pediatric deaths. About 25% of children with leukemia have AML. However, treatment of AML is tale of continued progress due to high relapse cases despite intensive treatment. Functional genomic studies have shown that the pathogenesis of AML is highly heterogeneous with a low mutational burden. Also, 40% to 85% of patients with pre-AML dysplasia show mutations in at least one out of 4 splicing factors (U2AF35, CRSR2, SRSF2, and SF3B1). The key role of splicing deregulation in AML transformation is currently being explored as a potential therapeutic venue for the disease. The mechanism of how splicing factor mutations leads to AML progression are still being investigated. However, evidence from other cancer types suggests that splicing deregulation results in the widespread expression of defective mRNAs. For example, in a previous publication, we have shown that overexpression of the splicing factor SRSF1 leads to alternative splicing (AS) of the CASC4 gene, promoting breast cancer growth and inhibiting apoptosis. Whether there are AML-specific AS events that promote AML transformation and/or immune evasion remains unknown. We propose to analyze RNA-seq data from the TARGET AML dataset with SpliceIO™, Envisagenics’ proprietary software platform for the discovery of splicing derived antigens using RNA-seq data. With SpliceIO, we would be able to identify immunological biomarkers and potential targets in pediatric AML patients for immunotherapy development. Studies have shown differences in the mutational and epigenetic patterns between adult and pediatric AML patients. To identify splicing changes specific to pediatric patients, we propose to compare the identified splicing changes to those identified in RNA-seq data from adult AML patients. The deidentified adult AML RNA-seq dataset was generated by our collaborator Omar Abdel-Wahab at Memorial Sloan Kettering. The comparative analysis is not expected to create any additional risks to patients. Completion of these aims will not only improve our knowledge of splicing biology and the etiology of pediatric AML but could also lead to the development of more effective treatments, diagnostic or prognostic markers for childhood cancers. SpliceIO was developed using elements of the company’s flagship software for AS analysis, SpliceCore, in combination with newly developed machine learning algorithms and open-source components. SpliceCore was launched in 2015 using NIH funds (SBIR Phase I&II) and publications which validated its efficacy (Anczukow et al. Mol. Cell 2015; Arun et al. Genes & Dev 2016). SpliceIO uses machine learning to estimate the likelihood of every exon trio to be translated into a protein, the likelihood of protein sequences to be presented on the cell surface, and their potential antigenicity. It detects an exclusive repertoire of neoantigens by considering the assembly of tumor-specific splicing junctions that span exons comprising distinct AS sequences. The feasibility of SpliceIO was confirmed by preliminary data analysis of breast cancer RNAseq data from the cancer genome atlas (TCGA) and protein-level assessment with tandem mass spectrometry (MS/MS) data from the Clinical Proteome Tumor Analysis Consortium (CPTAC). We will use SpliceIO to identify AS-derived antigens from pediatric AML data. Specific tasks for this project are: 1. Assembly of de novo reference transcriptome using Target AML RNAseq data 2. Mapping and analysis of reads from the RNA-seq dataset 3. Identification of consistent and reproducible splicing changes in pediatric AML samples compared to normal samples 4. Use SpliceIO to predict transcript viability and resultant peptide ectodomain topology of AS isoforms 5. Comparison to splicing changes identified in adult AML samples to determine similarities/differences in splicing changes between pediatric and adult AML patients Al-Lazikani, Bissan UNIVERSITY OF LONDON INST OF CANCER RES Novel Cancer Drug Discovery for paediatrics Apr17, 2018 expired When a child relapses with cancer, there are often no treatment options beyond those that have already failed. We will use the sequencing data generated by the TARGET consortium to compare with our own data from UK children with relapsed cancer, in order to be better able to decide which treatment or clinical trial will be most suitable for that particular patient. In addition, we will try to understand what features are particular of relapsed cancers. Finally, we try to identify new potential drug targets and drugs, using computational approaches that draw strength on state-of-the-art knowledge in the fields of chemistry, pharmacology and biology. Being an academic institution means that all our findings will be made available for the greater benefit of the cancer research and drug discovery community, hopefully leading to more accurate diagnosis, early warning signatures for relapse, and identification of new potential drug targets and medicines which may be suitable to use for rare childhood cancers. SMPaed is a stratified medicine program offering multi-omic profiling to UK patients with relapsed childhood solid tumors. The objective is to return results to treating clinicians within 28 days, to allow referral of patients to relevant clinical trials or suggest suitable targeted therapies. Our primary goal for the use of TARGET data is to put our observed patient data into context, to ascertain mutations and diagnoses, and assess their rarity. In addition, we will compare patterns of mutations and expression profiles between primary and relapsed patients. Finally, we intend to use novel computational methods to identify and prioritize novel cancer drug targets that expand the traditional druggable space for pediatric solid tumors, and suggest candidates for drug repurposing for currently untreatable relapsed pediatric tumors. As an integrative biological and chemical team, we mine multi-omics data to identify cancer-causing druggable targets. Embedded within the Cancer Research UK Cancer Therapeutics Unit at the ICR - a world renowned academic drug discovery organization - with eight clinical candidates currently in clinical trials and 17 drugs in pre-clinical development, we would have the ability to drive candidates forward. We will analyze the TARGET and our own data sets using in-house software pipelines to enable their integration with current ICR internal systems. Cancer genes exhibiting deregulation that pass our quality assessments, together with their signaling partners will be analyzed using our chemogenomics systems to a) rank them based on alternative evidence of druggability and chemical tractability that utilize chemical, 3D structure and network druggability criteria; b) identify likely chemical tools for testing in the laboratory or potential drugs for repurposing. Data will be analysed securely using our in-house pipelines. Use of TARGET data in this way will not create any additional risks to study participants and will be consistent with Use Restrictions for the requested datasets. Al-Shahrour, Fátima SPANISH NATIONAL CANCER CENTER Data validation and drug priorization for improvement of pediatric T-ALL clinical management Jan19, 2021 closed T-cell lymphoblastic leukemia/lymphoma (T-ALL/LBL) is an aggressive type of hematological cancer mainly affecting children and adolescent males. Current cure rates are high, but two main problems can arise during the clinical management of patients: toxicity induced by chemotherapy, which often leads to treatment discontinuation or long-term side effects; and re-occurrence of tumor after remission (relapse), which is a terrible scenario since cure rates in such case fall down to ~7%. Thus, more effective ways to treat these patients are urgently required. The objectives of our team are to suggest alternative tailored treatments and to discover novel alterations that help decide therapeutic options for pediatric T-ALL patients, in particular for patients suffering toxicity-derived side effects or not achieving complete remission with current protocols. All the results from the analysis undertaken in this project will be published in scientific journals for the benefit of the broad pediatric cancers research community. Despite the detailed knowledge of the global landscape of alterations accompanying pediatric T-ALL tumors, even leading to a more refined classification of this disease, these patients keep receiving a standard treatment protocol based on high-dose multiple chemotherapy, sometimes followed by HSC transplant. Although very successful, this therapeutic approach is not infallible. Toxicity caused by treatment leads to discontinuation or to adverse long-term side effects in the ever-increasing surviving pediatric population. Around 15% of pediatric T-ALL patients relapse, being current protocols roughly futile in such case, as cure rates collapse to ~7%. Our aim is to reduce treatment-derived toxicity and to improve the clinical management of pediatric T-ALL patients, in particular of those not achieving complete remission with current therapies. We propose to do so by two perspectives: 1) Personalized precision medicine: omics data will be integrated as input for the drug priorization algorithm PanDrugs. This will be done in collaboration with Dr. Al-Shahrour (CNIO) and Dr. Fernández-Navarro (ISCIII), following previously described approaches (1–3). The valuable availability of matched diagnosis, remission and relapse samples allows the integration of intra-tumor heterogeneity as an additional layer of information for drug prescription. We expect to suggest alternative tailored treatments for sufferers of toxicity, non-responders or relapsed pediatric T-ALL patients, as well as combinations potentially capable of targeting clonal diversity. 2) Improved therapeutic stratification: omics data will be searched for novel alterations affecting non-coding genes (miRNAs, lncRNAs, circRNAs...) that may represent new biomarkers with prognostic or therapeutic value. The exploration of these new biomarkers in future pediatric patients might improve as well the rate of success of their clinical management. All the results from the analyses undertaken in this project will be published in scientific journals for the benefit of the broad pediatric cancers research community. References: 1. doi: 10.1186/s12885-019-6209-9 2. doi: 10.3390/cancers11091361 3. doi: 10.1186/s13073-018-0546-1 Alexander, Thomas UNIV OF NORTH CAROLINA CHAPEL HILL Germline genetic deletion predisposing to pediatric acute lymphoblastic leukemia Jun19, 2018 expired Cancer is the leading cause of mortality in children beyond infancy in the United States. Acute lymphoblastic leukemia is the most common malignancy in children. While most cases of leukemia are sporadic, researchers are identifying a growing number of inherited genetic mutations that increase the chance of an individual developing malignancy. We will evaluate DNA from normal cells of patients who developed ALL in order to continue to identify genetic changes that could increase a patients risk of developing acute lymphoblastic leukemia. New insights better inform genetic counselors making decisions about which genetic tests to order for families with predisposition to leukemia, and can subsequently influence counseling and screening decisions. Cancer is the leading cause of mortality in children beyond infancy in the United States. Acute lymphoblastic leukemia is the most common malignancy in children. While most cases of leukemia in children are sporadic, researchers are identifying a growing number of inherited genetic mutations that increase the chance of an individual developing malignancy. Eight percent of children with cancer have a known inherited genetic predisposition and that percentage is growing with ongoing research. Our findings can have future implications for genetic testing, diagnosis and clinical practice. For example, genetic counselors could make decision about which genetic tests to order for families with predisposition to leukemia, and can subsequently influence counseling and screening decisions. Most of the evaluation of SNP array data has focused on somatic mutations, using germline simply as a control, or focused on germline single nucleotide polymorphisms. We propose to analyze SNP array data to specifically assess for germline deletions that could predispose to acute lymphoblastic leukemia in the pediatric population. We will start with a focus on the CDKN2A/B locus. CDKN2A/B is a gene locus that is commonly deleted in malignant cells, which contributes to uncontrolled growth. However, it is extremely uncommon for this region to be deleted in normal tissues. While deletions in this region have been described in families with melanoma or other solid tumors, it has not been implicated in predisposition to leukemia. Our proposed project seeks to demonstrate for the first time that germline CDKN2A/B deletion can lead to a leukemia predisposition syndrome. We will then systematically evaluate for germline deletions in other genetic regions that are recurrently deleted in pediatric acute lymphoblastic leukemia. We do not plan to combine the requested dataset with other datasets outside of dbGaP. Alexandrov, Ludmil UNIVERSITY OF CALIFORNIA, SAN DIEGO Mapping the trajectories of childhood cancer evolution Dec21, 2023 approved This project will compare the molecular events found in pediatric cancer samples to molecular events found in normal tissues, pre-cancer samples, and metastasis. Our overall goal is to understand the molecular difference throughout the difference stages of pediatric (pre-)cancer development and use this knowledge to suggest potential cancer prevention and/or treatment strategies. The genomes of pediatric cancer cells are peppered with somatic mutations. These mutations start accumulating during the normal lineage of the cancer and continue throughout all phases of clonal expansion: macroscopic somatic growths, precancers/benign expansions, invasive cancers, and metastasis. The malignant progression from a normal cell through macroscopic clone, precancer, invasive cancer and metastasis is often marked by a series of somatic mutations that can be traced, characterized, and genomically studied. Elucidating the mutational events throughout these stages will allow understanding of the mutational processes driving clonal expansion and it may provide avenues for effective and timely interventions that can prevent or treat cancer. In this project, we will re-analyze existing pediatric cancer genomics data and compare the molecular events in these cancers to the ones found in macroscopic somatic clones, pre-cancerous lesions, and metastasis. The overarching goal of this project is to understand the evolutionary events in pediatric cancer in order to allow more effective treatments, diagnostic tests, and prognostic markers for childhood cancers. The aims of this research project will be to: 1) Consistently call somatic mutation across multiple pediatric cancers using DNA and RNA data 2) Identify the mutational signatures and their activities across the cancers 3) Identify the microbiome signatures found in different pediatric cancers 4) Compare molecular events between primary cancers and macroscopic somatic clones, pre-cancerous lesions, or metastasis from other pediatric datasets 5) Link mutational signatures with other genomic landscape features of pediatric cancers: genome topography, driver mutations, MHC-1/MHC-2 mutability To perform these goals, data downloaded from dbGaP will be combined with in-house generated cancer data as well as with data from other sources (e.g., EGA). All data will be analyzed together to increase our statistical power for detecting mutational events. There is no additional risk for any participant as the analysis will focus on identifying molecular events involved in somatic mutagenesis. We believe that the proposed analyses are aligned with the intended use of the datasets being requested. We are committed to sharing the results of these analyses with the scientific community through peer-reviewed publications in accordance with NIH data use policies. Alizadeh, Ash STANFORD UNIVERSITY High resolution genetic analysis of human malignancies Apr21, 2011 approved We have used the cancer genome data available in dbGAP to investigate the genetic cause of cancer and identify new candidate genes involved in the genesis of lymphomas and other cancers. Our objectives for research use of dbGaP data are to study alterations in cancer genomes sequenced as part of NIH funded activities. Our study involves tool development in linking genotypic heritable & somatic variants to clinical features to test & validate findings in recurrence estimations & functional assays. We will publish & share our results. We aim to define & characterize acquired cancer genetic changes with a focus on diverse tumors. To better understand these tumors, we will compare heritable & somatic variation across diverse tumors. Accordingly, dbGAP datasets assessing genetic & epigenetic features (expression, copy number, sequencing, methylation) will be analyzed, primarily served as control datasets against which to compare results obtained for various tumors. Data integration from several studies is planned but is not expected to pose additional risks to participants. Genomic regions identified in tumors will be been examined in dbGAP data to test recurrent somatic or germline changes in other cancers. -Use of "Therapeutically Applicable Research to Generate Effective Treatments" will be limited to a subproject on pediatric cases. Sites of recurrent DNA breakpoints identified by sequencing will be examined both in copy number datasets & NGS datasets for evidence of involvement in other cancers. Germline polymorphisms will be used to analyze sequencing results from tumors for which no germline DNA is available. -Use "NCI Cancer Genome Characterization Initiative (CGCI)" will focus on research on biology, causes, treatment & late therapy complications of lymphomas, and will be used for methodological development (software/algorithms). -Use of "AML Sequencing Project", "Acquired Copy Number Alterations in Adult Acute Myeloid Leukemia Genomes", will specifically focus on hematologic diseases and related conditions, with special attention to clonal hematopoiesis. -Use of "Expressed Pseudogenes in the Translational Landscape of Human Cancers" will be used for methodological development (software/algorithms) for cancer research, with a focus on the distinction between coding genes and pseudogenes. -Use of "Follicular Lymphoma (phs000729.v1.p1)" will be related to Lymphoma and include methods development research (software/algorithms) for cancer research, with a focus on distinctions between lymphoma subtypes in their genomic profiles. -Use of "Genentech whole-genome sequencing of a non-small cell lung carcinoma (phs000299.v2.p1)" will be limited to health/medical/biomedical purposes, and will not include the study of population origins or ancestry. Use will include methods development research (software/algorithms) -Use of "Genetic Heterogeneity of Diffuse Large B-Cell Lymphoma (phs000573.v1.p1)" will be limited to research purposes only, will include methods development research (software/algorithms). -Use of "Genome-wide Analysis of Chronic Lymphocytic Leukemia (phs000364.v2.p1)" will be cancer related, and we will make results of our studies using the data available to the larger scientific community. Use will include methods development research (software/algorithms) -Use of "Genome-wide Analysis of Lymphoma (phs000328.v2.p1)" will be cancer related and will make results of studies using the data available to the larger scientific community. Use will include methods development research (software/algorithms) -Use of "Genome-wide analysis of Splenic Marginal Zone Lymphoma (phs000502.v1.p1)" will be limited to scientific research relevant to the etiology, prevention, treatment, and late complications of cancer therapy and for development of analytical methods and software. We plan to publish and to broadly share findings from our studies with the scientific community. -Use of "Whole Exome Sequencing of Diffuse Large B-Cell Lymphoma (phs000450.v3.p1)" will be limited to Stanford, a non-for-profit organizations, and will include methods development research (software/algorithms) related to Blood and Lymph Disorders. -Use of "The Genetic Landscape of Mutations in Burkitt Lymphoma (phs000562.v1.p1)" will be limited to research purposes only. -Use of "Genetic Heterogeneity of Diffuse Large B-Cell Lymphoma (phs000573.v1.p1)" will be limited to research purposes only, and includes methods development (software/algorithms). -Use of "Sequencing Follicular Lymphoma (phs001229/DS-LYM)" will be limited to research on Lymphomas. -Data from phs0001417 (breast & prostate) will only be used in research consistent with the DUL, and not be combined with other data for other phenotypes. Use of all other datasets will comply with corresponding DULs. Allan, James UNIVERSITY OF NEWCASTLE The mutational landscape of lineage switched MLL rearranged leukaemias Feb26, 2018 closed Damaging changes (mutations) in the mixed lineage leukemia (MLL) gene are common in infant and childhood leukemia. In some cases, mutation of the MLL gene causes leukemia in a certain type of cells - B cells. In other cases, mutation of the MLL gene causes leukemia in another cell type - myeloid cells. However, in very rare cases, mutation in MLL can cause leukemia in one cell type that then relapses (comes back after therapy) in another cell type. These so called “lineage switch leukemias” are very hard to treat and there is an urgent need to understand what causes them and what mutations (in addition to MLL) might be involved. The aim of our study is to better understand lineage switch leukaemias so that we can develop novel therapies and improve patient outcome. The t(4;11)(q21;q23) translocation, resulting in the MLL-AF4 fusion gene, is prototypical of high risk infant acute lymphoblastic leukemia (iALL). Uniquely, amongst over 80 MLL fusions it is specifically associated with development of pro-B cell ALL (Meyer et al 2017). However, modelling the disease has proven challenging and key questions remain about the cellular origins, predisposing genetic/epigenetic environment and lineage determination of MLL-AF4 pro-B ALL. Despite the strong association with lymphoid presentation, MLL-AF4 rearranged leukemias have an intriguing characteristic - that of lineage switched relapses. We have investigated the mutational landscape of lineage switched MLL-AF4 acute leukemias using exome, genome and RNA sequencing, but now need to compare these data with mutation, transcription and epigenetic data from a wider spectrum of MLL rearranged leukemias. In order to do this, we request access to DNA, RNA and methylation sequencing data available through dbGAP and derived from either MLL rearranged acute leukemias (ALL and AML) as well as MLL germline leukemias to act as comparator cases. Specifically, we have already characterised the mutational landscape of lineage switched MLL rearranged infact/childhood leukemia and now wish to see whether the identified mutations are present in a non-switched MLL rearranged leukemia (including both ALL and AML). We do not propose to combine raw datasets. Rather, we will analyse lineage switched MLL re-arranged leukemia, non-lineage switched MLL re-arranged leukemia and MLL germline leukemia separately. Lineage switch leukemias are associated with a very poor outcome. The aim of our study is to identify mutations and dysregulated cell signalling pathways that could be exploited develop novel therapies and improve patient outcome. Alter, Orly UNIVERSITY OF UTAH Multi-Tensor Decompositions for Personalized Medicine Sep29, 2017 approved We request continued access to the TCGA, TARGET, AACR Project GENIE, and dbGaP Kids First Study data, and now also to the Childhood Cancer Data Initiative (CCDI) data and the Diffuse Intrinsic Pontine Gliomas (DIPGs) data, in order to continue to translate to the clinic our artificial intelligence and machine learning (AI/ML) that is uniquely suited to personalized medicine, and overcomes the limitations of typical AI/ML in genetic data. Our algorithms discover interpretable and actionable predictors, applicable to the general population, from as few as 50–100 patients, and our predictors outperform all others, where they exist. We experimentally validated a genome-wide pattern in brain cancer tumors as the most accurate and precise predictor of survival and response to treatment. We request continued access to the TCGA, TARGET, AACR Project GENIE, and dbGaP Kids First Study data, and now also to the Childhood Cancer Data Initiative (CCDI) data and the Diffuse Intrinsic Pontine Gliomas (DIPGs) data, in order to continue to translate to the clinic our artificial intelligence and machine learning (AI/ML) for personalized medicine. We will continue to publish or otherwise broadly share any findings with the scientific community. Inventors of the "eigengene," we formulate comparative spectral decompositions, physics-inspired multi-tensor generalizations of the singular value decomposition, to (i) compare and integrate any data types, of any number and dimensions, and (ii) scale with data sizes. Our models (iii) are interpretable in terms of known biology and batch effects and (iv) correctly predict previously unknown mechanisms. Our prospective and retrospective validation of a genome-wide pattern of DNA copy-number alterations in brain tumors proved that the models discover predictors of survival and response to treatment that are (v) the most accurate and precise, (vi) clinically actionable in the general population based upon as few as 50–100 patients, and (vii) are consistent across studies and over time. We discovered this, and patterns in lung, nerve, ovarian, and uterine tumors, in public data. Such alterations were recognized in cancer, yet all other attempts to associate them with outcome failed, establishing that our AI/ML is uniquely suited to personalized medicine. Alves Castro, Mauro UNIVERSIDADE FEDERAL DO PARANA Gene regulatory networks associated with cancer risk genotypes Jul11, 2024 approved Just as the physiognomy of a face can be typical of a family, so can the risk of cancer (or other conditions such as diabetes or heart disease). This similarity in risk between members of a family is caused by the inheritance of small variations within hundreds of different loci. Traditionally, such variations have been studies one by one, which is slow and difficult. In a study published in the journal Nature Genetics (Castro et al., Nature Genetics, 48(1):12-21, 2016) our research group demonstrated how the combined effect of multiple genetic variants contribute to breast cancer risk. This study catalysed the development of new approaches that improved our understanding on the regulation of gene expression networks. Our approach of identifying underlying regulatory networks is applicable for many other complex diseases. To this end we request access to TCGA, GDC, CCG GDAN, ALCHEMIST, DLBCL, MILD, HCMI, TARGET, and MMRF-COMMPASS data, to enable us to generate new insights into the genetics of cancer. Over the past eight years we have shown that gene regulatory networks (GRN) can be used as a framework to understand the effects of variation at breast cancer GWAS loci. We have used networks based on transcription factors (TFs) and their target genes or ‘regulons’ (Castro et al., Nature Genetics, 48(1):12-21, 2016). Many genes are regulated by more than one TF, and so belong to more than one regulon; thus, the regulons overlap, forming a network. We showed that genes that were eQTLs at the then confirmed breast cancer GWAS loci are enriched in regulons centred on TFs known to be involved in oestrogen signalling. This regulon-based analysis has provided insights into the effects of inherited genetic variation, giving information on how multiple variants can act in combination to influence high-level cellular regulatory processes. We now wish to access cancer gene expression data coupled to genotyping data to be able to validate our findings in independent data sets. Our approach of identifying underlying regulatory networks is broadly applicable, and we have extended it to inform on differences between cancer subtypes using multiple TCGA analytical platforms (Robertson et al., Cell, 171(3):540-556, 2017; Corces et al., Science 362(6413):eaav1898). I request renewal of our access to CGEMS, TCGA, GDC, and CCG GDAN expression data and genotyping data to enable us to generate new insights into the genetics of cancer, comparing regulon patterns in individuals carrying risk versus non-risk alleles. My group is currently contributing with the following TCGA Analysis Working Groups (AWGs): "GDAN ATAC-Seq AWG", "TCGA TGCT AWG", "CCG TMP AWG" and "CCG TMP: LIHC/CHOL subgroup". We are also contributing with "ALCHEMIST AWG" and "CTSP DLBCL AWG", "CCG MILD AWG", and "HCMI AWG" which are not under the same TCGA access number. Therefore, I also request renewal access to ALCHEMIST, CTSP DLBCL, CCG MILD, and HCMI controlled access data. We are still on the process of data integration and running analyses for these AWGs. For the current renewal application, we also request access to two additional dgGAP datasets: 1) We request access to TARGET-NBL (phs000218) to investigate N-MYC regulation in neuroblastomas. N-MYC amplification exhibits age-dependent expression profiles in sporadic neuroblastoma, influencing disease aggressiveness and patient outcomes. While it is well established that specific germline variations are more prevalent in different ethnic groups, the extent to which the combined effect of N-MYC amplification, age, and ethnicity influences susceptibility to neuroblastoma remains unclear. Specifically, we aim to use our GRN framework to investigate ethnicity- and age-dependent N-MYC regulation in neuroblastoma. We plan to analyze this dataset independently, without combining it with any other datasets outside of dbGaP. 2) We request access to MMRF-COMMPASS (phs000748) to investigate B-cell receptor (BCR) diversity in multiple myeloma samples. In multiple myeloma, abnormal plasma cells proliferate in the bone marrow, leading to the production of large amounts of a single type of immunoglobulin, in the form of a monoclonal protein. Understanding the diversity and structure of immunoglobulins in multiple myeloma is crucial for developing targeted treatments and improving patient outcomes. We want to assess RNA-seq raw data from plasmocyte-enriched samples in order to untangle BCR’s light and heavy immunoglobulin chains. Specifically, we aim to develop an algorithm to resolve immunoglobulin light-heavy chain pairing from bulk tissues, using multiple myeloma samples as “ground true” training data. Amankwah, Ernest JOHNS HOPKINS UNIVERSITY Identification of novel microRNA based predictors for early relapse of pediatric acute lymphoblastic leukemia Sep23, 2021 closed Early relapse in pediatric acute lymphoblastic leukemia (ALL) is a significant public health concern, especially in Hispanics (who have a disproportionately high incidence of relapse), due to the lack of optimal biomarkers for predicting risk at the time of diagnosis. The outcome after relapse is generally poor with a dismal survival rate of approximately 20% compared to 85% for non-relapse patients. Previous studies, on mostly Non-Hispanic White, have suggested genetic and epigenetic markers for pediatric ALL relapse. Knowledge is limited on genetic and epigenetic markers of pediatric ALL relapse among Hispanics who have a higher incidence of relapse. Studying Hispanic patients would likely identify novel biomarkers that can be used at the time of diagnosis to predict patients at high risk of developing early relapse to facilitate individualized clinical treatment decision-making. Objective Previous studies, on mostly Non-Hispanic White, have suggested genetic and epigenetic markers for pediatric acute lymphoblastic leukemia (ALL) relapse. Studying Hispanic patients who have a higher incidence of relapse would likely identify novel genetic and epigenetic markers of pediatric ALL relapse. The aims of the proposed project are: Aim 1. To identify miRNAs and their potential target genes associated with pediatric ALL early relapse among Hispanics Aim 2. To identify somatic mutations in genes associated with pediatric ALL early relapse among Hispanics Aim 3. To identify methylation signatures associated with pediatric ALL early relapse among Hispanics Study design and Analysis plan The study population will be Hispanic pediatric ALL patients. We will determine the association between genetic and epigenetic markers and early relapse by comparing miRNA and gene expression as well as mutation and methylation signatures of patients who developed early relapse, defined as relapse within three years of diagnosis, to patients in remission (>=4 years after diagnosis). Comparison of differential miRNA and gene expression, mutation signatures and methylation signatures will be compared between early relapse and remission samples. Similar analyses will be performed on Non-Hispanic Whites and identified markers in Hispanics will be compared with non-Hispanics. The Principal Investigator is in good standing at Johns Hopkins and is permitted to conduct independent research. The dataset will be used appropriately for research only and solely for the project described in the research use statement and according to any limitations on the use of the data. The data will not be made available to any other person not named on this application and the data will not be combined with any other datasets. Collaboration The analysis will be performed in collaboration with Michael Considine, MS and Leslie Cope, PhD. Mr. Considine is a senior biostatistician and Dr. Cope is an Associate Professor in the Division of Biostatistics and Bioinformatics at Johns Hopkins School of Medicine. AMATRUDA, JAMES CHILDREN'S HOSPITAL OF LOS ANGELES Molecular Basis of Childhood Solid Tumors Nov18, 2021 approved Most cancer in children is treated by combinations of chemotherapy, radiation therapy and surgery. While these treatments can be successful, they can also cause severe and long-lasting side effects. Cancers in children occur when mutations (changes) in the cell's DNA lead to uncontrolled growth of cells. Understanding the exact nature of these mutations is the key to developing new treatments that are more effective and less toxic. By studying the DNA mutations that have been described in studies submitted to dbGaP, we can develop new potential targets for improved therapies. The objective of our study is to identify actionable driver genes in solid malignancies of children. We have developed a novel algorithm to integrate data from different platforms (e.g. next generation sequencing, SNP array and expression profiling data). We used this method with our own locally acquired Wilms tumor sequence (see PMID 25190313) to identify potential drivers. During the course of the study we accessed TARGET sequencing data for childhood Wilms tumors, and used the dataset to validate our findings. Specifically, we identified a set of commonly upregulated genes (including DICER1, NKAIN, PLAG1, TMEM87B and OTUD4) that occur in tumors with DICER1 or DROSHA mutations. We have also carried out analysis of other solid pediatric tumors, including rhabdomyosarcoma. We developed a novel algorithm, called "iExCN", that integrates copy-number and gene expression data to identify drivers of rhabdomyosarcoma development (see PMID 29972784). AMATRUDA, JAMES UT SOUTHWESTERN MEDICAL CENTER Molecular Basis of Childhood Solid Tumors Apr02, 2015 closed Current chemotherapy regimens fail to cure all children with cancer, and those that are cured face severe lifelong adverse health effects of chemotherapy. To develop new treatments that are more effective and less toxic, we are analyzing the genome of childhood cancers, to understand more precisely how mutations and other changes in the genome cause the tumors to grow, and to identify targets for more effective therapies. We have developed high-resolution data interrogating the genomes of childhood cancer types including germ cell tumors and kidney cancers. We would like to compare our results to findings by other groups and in other tumor types, to identify the most important mutations that cause these cancers to develop. The objective of our study is to identify actionable driver genes in solid malignancies of children. We have developed a novel algorithm to integrate data from different platforms (e.g. next generation sequencing, SNP array and expression profiling data). We used this method with our own locally acquired Wilms tumor sequence (see PMID 25190313) to identify potential drivers. During the course of the study we accessed TARGET sequencing data for childhood Wilms tumors, and used the dataset to validate our findings. Specifically, we identified a set of commonly upregulated genes (including DICER1, NKAIN, PLAG1, TMEM87B and OTUD4) that occur in tumors with DICER1 or DROSHA mutations. We have also carried out analysis of other solid pediatric tumors, including rhabdomyosarcoma. We developed a novel algorithm, called "iExCN", that integrates copy-number and gene expression data to identify drivers of rhabdomyosarcoma development (see PMID 29972784). We will study whether other pediatric tumors share similar driving mechanisms. Amos, Christopher BAYLOR COLLEGE OF MEDICINE Genetic Analysis of Lung Cancer Susceptibility Nov13, 2023 rejected We are seeking to identify genetic factors that influence the risk for developing lung cancer. We anticipate that the genetic factors that contribute to risk may be different according to the subtypes of lung cancer and also may vary by sex, age at onset and smoking status. The purpose of our study is to perform a meta-analysis of genome-wide scans and sequencing studies of lung cancer overall and within specific subgroups. In particular, we are interested in identifying genetic factors according to histological subtypes, age of onset, stage, family history, sex and smoking status. The EAGLE, PLCO and never smoker studies along with the large studies of African-Americans and Asians that have been collected will be aggregated with ongoing studies from the Oncoarray which we led to identify novel susceptibility loci. This analysis will continue to identify genetic factors influencing lung cancer risk overall and will help generate additional hypotheses that we will follow up within the International Lung Cancer Consortium (ILCCO). We intend to publish or otherwise broadly share findings from our studies with the scientific community. We will also publish the findings from the most significant SNPs identified through aggregation of data sets and will annotate findings with functional information. Finally, our intent is to perform new imputations from the downloaded data using the results that have been made available at the Michigan server for the TopMed program. Squamous lung cancer contains a significant inflammatory component and is strongly associated with HLA subtypes. In particular the 8.1 haplotype, which influences risk for several autoimmune diseases also influences risk for squamous lung cancer Therefore, we would also like to use data from the Type 1 Diabetes Genetics Consortium to impute HLA genotypes. Andrade Ortiz, Jorge GILEAD SCIENCES, INC. Identification of CAR-T antigens for pediatric Acute Myeloid Leukemia Dec12, 2018 approved Recent successes in CAR-T therapy are extremely promising for their sustained improvement in cancer prognosis and outright cures. However, the technology is limited to a target antigen that might be shared in normal cell types. Through comparing the cell AML dataset with normal cell type datasets from GEO and BLUEPRINT, we plan to identify novel antigens to use in immunotherapy. Research Use Statement: We have been involved in developing novel T-cell chimeric antigen receptor (CAR-T) therapies and TCR-T therapies for Acute Myeloid Leukemia in pediatric population. Unlike current methods, our approach involves a novel CAR-T system “switch”, which only functions upon administration of a drug. We believe that this approach will enable T-cell therapy to be more effective with reduced side effects. Our system potentially allows us to extend the treatment to cancers which might not have unique antigens such as AML. To this end, we plan to explore the following RNA-seq datasets: phs000218 (subset phs000465) TARGET: Acute Myeloid Leukemia (AML) (AML pediatric patients) phs000549 Clonal Evolution of Pre-Leukemic Hematopoietic Stem Cells Precedes Human Acute Myeloid Leukemia (6) We plan to combine and process RNA expression from this dataset and other publicly available normal blood cell lineage datasets such as BLUEPRINT (http://dcc.blueprint-epigenome.eu), and studies available on GEO a consistent fashion, by aligning the reads to the human genome and then counting their occurrence per genetic feature. We plan to utilize cloud computing for our processing pipelines. We plan to further verify potential antigens using independent proteomics datasets such as ProteomicsDB (processed separately), and validate using laboratory methods. We do not believe that this kind of analysis poses any risk to the participants. We hope that these discoveries will lead to rapid application of our technology in the clinic. Gilead as an organization is committed to publishing results of analyses we undertake that reveal scientific insights. Andrade Ortiz, Jorge GILEAD SCIENCES, INC. CAR-T target discovery for B-ALL Mar19, 2020 closed Recent successes in CAR-T therapy are extremely promising for their sustained improvement in cancer prognosis and outright cures. However, the technology is limited to a target antigen that might be shared in normal cell types. Through comparing the cell ALL dataset with normal cell type datasets from GEO and BLUEPRINT, we plan to identify novel antigens to use in immunotherapy. Additionally, we plan to mine the data for genes and pathways that may help us identify drug or T-cell enhancement combinations to improve efficacy of our therapy. Research Use Statement: We have been involved in developing novel T-cell chimeric antigen receptor (CAR-T) therapies for B-ALL in pediatric population. To identify antigens for CAR-T and potential CAR-T combination therapies, we plan to explore the following RNA-seq datasets: phs000218 (subsets phs000464 and phs000463) TARGET: Acute Lymphoblastic Leukemia (ALL) We plan to combine and process RNA expression from this dataset and other publicly available normal blood cell lineage datasets such as BLUEPRINT (http://dcc.blueprint-epigenome.eu), and studies available on GEO a consistent fashion, by aligning the reads to the human genome and then counting their occurrence per genetic feature. We plan to identify genes and pathways that will be deferentially expressed in B-ALL vs normal tissues to guide our CAR-T therapy design. We plan to utilize cloud computing for our processing pipelines. We plan to further verify potential antigens, and combination therapy targets using independent proteomics data-sets such as ProteomicsDB (processed separately), and validate using laboratory methods. We do not believe that this kind of analysis poses any risk to the participants. Additionally, we plan to interrogate the dataset in an interactive fashion, by correlating the RNA expression data with survival, cell composition, and other clinical phenotypes in order to guide improvement in CAR-T therapy. We hope that these discoveries will lead to rapid application of our technology in the clinic. Gilead as an organization is committed to publishing results of analyses we undertake that reveal scientific insights. Andrade Ortiz, Jorge UNIVERSITY OF CHICAGO Genetic risk of pediatric cancers Mar22, 2017 closed We have previously identified by novel genetic and genomic analyses variants that are likely causal for increased cancer risk in both inherited and sporadic cancers. Here, we will continue and expand upon this work. Additionally, we will identify sets of variants and genes that distinguish inherited risk between pediatric and adult cancers, and we will test their association with clinical factors. We are interested in exploring the genetic contribution to pediatric cancers. In order to further our investigations, we are requesting all TARGET datasets. We will use the data to perform meta-analyses to study the prevalence of germline mutations across all cancers. We will integrate genomics and clinical data in order to evaluate the inherited susceptibility to cancer development and to discover candidate causal germline variants in pediatric cancers. We will explore the differences between the mutational spectrum of those pediatric cancers with adult cancers as well as The 1000 Genome Project. Lastly, we will continue with the top candidate variants and genes to verify the presence of those mutations in corresponding samples using the deep sequencing data from targeted capture panel when available (i.e. TARGET AML). Our study will provide new insights into the potentially different genetic risk of cancer development in children in a large -omics scale and guidance for treatment decision making. ANNA, POETSCH DRESDEN UNIVERSITY OF TECHNOLOGY Genome specificity of Mutagenesis Jul29, 2022 rejected Changes in cancer genomes follow very distinct mechanisms that lead to where the mutations happen. We plan to use cancer genomes from different tissues to understand which mechanisms lead to changes in the genome at the different locations. Different tumour types are characterized by very distinct mutagenic mechanisms. With preliminary data we could show that even when mutagenic mechanisms are comparable between different tumour samples, the genomic distribution of mutations is highly heterogenous. We would therefore like to increase the power of our investigation with the use of additional data from TCGA and TARGET to investigate the mechanisms underlying genome specificity of mutagenesis. The aim is to understand the impact of different mutagenic mechanisms on cancer driving events. Paediatric cancer data is necessary in this context as the mutagenic mechanisms are to some extent distinct. For this we require base calls for tumour-normal pairs, from which we extract mutagenic mechanisms using mutational signatures. The resulting mutagenic mechanisms are then associated with where in the genome mutations happen, dependent on epigenetic factors and genome function, tumour tissue of origin and preexisting driver mutations. We aim to use the obtained results to estimate probabilities of mutagenic mechanisms to cause cancer driving mutations. There is significant therapeutic purpose of this approach for which especially paediatric patients are of critical importance with the potential of huge benefit in this patient group. Understanding the localisation of mutagenesis in paediatric cancer tissue and with help of digital twin simulations of mutagenic processes we will attempt to predict mutagenesis as part of continuing tumour evolution and in response to genotoxic treatment. This will allow us to i) assess probabilities of new mutations in the tumour, including in its sub clones, which could be used for targeted search of known biomarker mutations (e.g. in liquid biopsies) or we may even develop criteria for treatment decisions based on the probabilities of developing a certain biomarker mutation - even if the mutation is in the bulk not visible (yet); ii) predict mutation locations in response to treatment. This can be used to develop risk assessments of the treatment in respect to development of resistance causing mutations and/ or mutations that contribute to long term side effects of treatment. This is an increasingly urgent problem in the growing numbers of paediatric cancer survivors. Therefore this research can contribute to optimising treatment schemes with reduced long term side effects while not compromising on efficacy. We may possibly even be able to suggest epigenetic co-treatments to “relocate” mutations. These are the long-term visions of this project and may seem far-fetched at this point, which is why I did not go into much detail in my original application. However, to make such therapeutic applications possible, we are really in need to include paediatric cancer from the beginning. I hope I could convince you that there is indeed a therapeutic use of our research in paediatric cancer and would like to ask you to reconsider your disapproval. Ansari, Marc UNIVERSITY OF GENEVA Prioritization of candidate somatic genetic markers of glucocorticoid response in children with acute lymphoblastic leukemia for a prospective genetic association study MPGx INDALL Jul25, 2024 approved Lymphoblastic Leukemia is a cancer of the white blood cells, which are important components of the immune system, helping to protect the body. The disease develops in the bone marrow, where blood cells are produced. There are various types of leukemia which are distinguished by how long it takes for the disease to progress (acute or chronic) . Acute Lymphoblastic leukemia (ALL) is the most common form of childhood cancers. Glucocorticoids are powerful medicines that are commonly used in the ALL treatment in children, as they help to destroy leukemia cells. But, they do not work well in some children as expected. This is due to the fact that their leukemia cells show genetic alterations that make them resistant to the treatment, i.e. the treatment does not kill them. Identifying these genetic alterations would help recognizing those children with resistant leukemia cells and propose an alternative care. This is the goal of the work that we plan to perform using the resources made available in the the database of Genotypes and Phenotypes (dbGaP). Pediatric acute lymphoblastic leukemia (ALL) is the most common childhood cancer (PMID: 28910269). ALL is a malignant proliferation of lymphoid cells blocked at an early stage of differentiation that can invade bone marrow, blood, and extramedullary sites (PMID: 32247396). ALL is characterized by chromosomal abnormalities and genetic alterations involved in the differentiation and proliferation of lymphoid precursor cells (PMID: 32247396). Glucocorticoids (GCs) are an integral component of therapy for pediatric ALL (PMID: 20408842) and response to therapy defined as efficient lymphoblast reduction after 7 days of treatment is a strong predictor of relapse-free survival (RFS) (PMID: 12529655, 20947430). De novo or secondary resistance to GCs is an adverse prognostic factor for ALL (PMID: 20947430). GCs resistance both de novo or acquired affects up to 30% of the children (PMID: 26465987, JIPMER center experience) with underlying mechanisms still not completely understood, especially in specific ethnic ancestry populations. GCs exert their cytotoxic effect by binding to glucocorticoid receptors (GRs) (PMID: 3864532). GRs can then either translocate to the nucleus and transactivate gene expression or repress the activity of various transcription factors (PMID: 20947430). Both processes inhibit cytokine production (PMID: 8899106), alter the expression of various oncogenes (PMID: 8119231) and induce cell cycle arrest and apoptosis (PMID: 12529655, 20947430). The early identification of children that would be resistant, or would acquire resistance to GCs, would help to tailor ALL treatment in those children. One of the early biomarker of this GC response could be somatic DNA mutations, and previous efforts have uncovered some of the possible mechanisms and genetic markers associated with GCs resistance (PMID: 25253770, 27997540, 29689546). For instance, several signaling mutations have been shown to increase or stabilize the expression of anti-apoptotic factors thereby abrogating the pro-apoptotic effect of GCs (PMID: 25253770, 27997538, 27997540). Some of the findings need to be evaluated for their applicability in genetically distinct populations. We are currently recruiting a cohort of pediatric acute lymphoblastic leukemia (ALL) patients of Indian origin in the frame of a prospective observational pharmacogenetic association study (NCT05512169). One of the aims of this study is to identify somatic genetic markers associated with the efficacy of GCs treatment among patients undergoing the standardized IciCLe-ALL-14 treatment protocol (PMID: 35101099). The goal of the present work is to lay out the foundation for this analysis. We propose here to prioritize candidate somatic biomarkers of GCs resistance for an association in Indian patients with ALL whose somatic/germline exomes will be sequenced. We plan to use the TARGET phase 2 data (PMID: 36050548) and the St. Jude database (PMID: 36604538) for the somatic mutation association analysis with blast count on day 8, 15, 29 and with relapse. The TARGET database contains the somatic exome sequencing data for 2,754?children and adolescents and young adults with newly diagnosed B-ALL (n?=?2,288) or T-ALL (n?=?466). The St. Jude database contains the results of sensitivity experiments performed on primary leukemia cells, from 805 pediatric ALL patients, with 18 therapeutic agents, including 2 GCs of interest (dexamethasone and prednisolone). A total of 726 patients have been characterized in both databases. We plan to match the sample ids from the 2 databases, i.e. link the somatic mutations from the TARGET database to the results of the sensitivity experiments, available in the St. Jude database, by identifying the correspondence between the patient identifiers in both databases. We will then perform the somatic variants-GC resistance associations, using the statistical approach implemented in SOMAT which has been shown to be statistically powerful and computationally efficient in various situations (PMID: 28722765). Our final goal will be to test whether the candidate somatic biomarkers of GCs resistance in ALL patients, identified previously, are associated to treatment resistance in our Indian cohort. There are no additional risks foreseen for the patients. We plan to publish the results of our study in peer-reviewed high-impact journals (open access). Our results will also be presented at national and international meetings either as talks or posters. Aplenc, Richard CHILDREN'S HOSP OF PHILADELPHIA Pediatric AML Sequencing Project Jun26, 2015 closed We propose to analyze TARGET data to find genetic changes associated with heart failure after chemotherapy for acute myeloid leukemia. We also propose to analyze TARGET data to find new targets for immune therapy of AML. We propose to use the TARGET germline DNA sequence data to understand the genetic drivers of cardiotoxicity in pediatric AML patients. Specifically, we will screen for recurrent germline missense and frameshift mutations in genes known to be associated with cardiomyopathy in the general population with the hypothesis that patients carrying such a mutation will be at increased risk of anthracycline associated cardiac toxicity. Data on cardiac toxicity events will be obtained from the Children’s Oncology Group. Dr. Aplenc, the PI for this data request, was the study chair for the AAML0531 trial and has actively curated the reported cardiac toxicities on the AAML0531 trial including the assembly of a data set of approximately 1,700 shortening/ejection fraction data points from patients with reported cardiac toxicity on the AAML0531 trial. In addition to this work, we are requesting RNAseq data to evaluate transmembrane domains within tumor samples to facilitate novel target discovery for chimeric antigen receptor based immunotherapy. In addition, Molecular interaction networks have been used to identify causal genetic mutations and deregulated gene pathways in cancers. In this project, we will develop an novel computational algorithm to identify therapeutic targets responsible for induction failure and relapse in AML patients. We will integrate the various omics datasets generated by the TARGET consortium. Applebaum, Mark UNIVERSITY OF CHICAGO Identifying neuroblastoma drivers and bringing them to the clinic Jul08, 2020 approved We are exploring the potential for new computational and functional genomics approaches to reveal drivers of neuroblastoma. Based on our previous rhabdomyosarcoma (RMS) studies, we expect this innovative strategy will provide an increased understanding of the biologic underpinnings of high-risk neuroblastoma and also lead to the identification of new therapeutic targets, and biomarkers for high-risk neuroblastoma. Our team recently conceived of a plan to explore the hypotheses that a) relevant changes in chromosome numbers in neuroblastoma are those in which correlates with gene expression, and b) identifying those genes will reveal new therapeutic targets and better biomarkers for risk stratification. In this project, we are employing a new computational algorithm in which probabalistic methodology provides an integrative analysis of gene Expression and Copy-Number (iExCN). Our three Specific Aims are: Specific Aim 1: To uncover drivers and tumor suppressors in neuroblastoma by using iExCN, a Bayesian-based, integrative analysis algorithm Specific Aim 2: To explore how chromosome amounts and expression of iExCN genes correlates with clinical variables to improve risk stratification models and guide therapy for neuroblastoma We are exploring the potential for new computational and functional genomics approaches to reveal oncogenic drivers of neuroblastoma. Based on our previous rhabdomyosarcoma (RMS) studies, We expect this innovative analytic strategy will provide an increased understanding of the biologic underpinnings of high-risk neuroblastoma and also lead to the identification of new therapeutic targets, and biomarkers for high-risk neuroblastoma. Our team recently conceived of a plan to explore the hypotheses that a) relevant CNVs in neuroblastoma are those in which gene copy-number alteration correlates with gene expression, and b) identifying those genes will reveal new therapeutic targets and better biomarkers for risk stratification. In this project, we are employing a new computational algorithm in which Bayesian methodology provides an integrative analysis of gene Expression and Copy-Number (iExCN), an approach that has already met with some success in rhabdomyosarcoma. Our three Specific Aims are: Specific Aim 1: To uncover oncogenic drivers and tumor suppressors in neuroblastoma by using iExCN, a Bayesian-based, integrative analysis algorithm. We will use data from neuroblastoma patients from the Gabriella Miller Kids First Project and use TARGET data for validation. Specific Aim 2: To explore how CNVs and expression of iExCN genes correlates with clinical variables to improve risk stratification models and guide therapy for children with neuroblastoma. ARCECI, ROBERT PHOENIX CHILDREN'S HOSPITAL Target Identification for High Risk Childhood AML based on Genome-Wide Analysis Oct12, 2010 closed AML is a molecularly heterogenous disease with alterations occurring by mutation, genomic loss and gain as well as by epigenetic modifications. The identification of distinct molecular changes in subtypes of AML should result in strategies to therapeutically target such changes. To this end, and through the TARGET Initiative and with the Children’s Oncology Group, we are proposing to determine the detailed copy number alterations, LOH status, transcriptome analysis and genome-wide epigenetic methylation profiling in a carefully defined cohort of patients with high risk AML. This cohort has been selected to represent patients with no known cytogenetic or molecular prognostic markers who are at the highest risk of relapse. To this end, we have identified diagnostic and remission specimens from a cohort of over 200 uniformly treated, standard risk patients without known cytogenetic or molecular prognostic markers who have relapsed after achieving an initial morphologic remission. We believe that this cohort will have the highest likelihood of identification of novel genomic and epigenetic alterations associated with relapse. This cohort will represent nearly half of the patients with no known prognostic markers who are at the highest risk of relapse. Thus, detailed genome wide analysis of diagnostic and remission bone marrow samples from this group of patients should prove to be a tremendously effective approach to identifying high-risk molecular characteristics as well as identify novel targets that can be therapeutically exploited. In addition, relapse samples are also available from individuals in this group, which will be invaluable for future analyses directed toward examining aspects of clonal evolution and emergence of resistance pathways. Identifying molecular mechanisms of malignant transformation that can be specifically targeted can lead to more effective and less toxic cancer treatments. Numerous lines of experimental evidence support genome-wide approaches to defining transcriptional RNA patterns, DNA structural changes, epigenetic methylation changes and functional pathway screening. Identifying molecular mechanisms of malignant transformation that can be specifically targeted can lead to more effective and less toxic cancer treatments. Numerous lines of experimental evidence support genome-wide approaches to defining transcriptional RNA patterns, DNA structural changes, epigenetic methylation changes and functional pathway screening. Each of these critical areas is known to be altered in cancer, providing potentially selective pathways for therapeutic targeting. AML represents approximately 15% of acute leukemias in children, ~1,000 cases per year in children under age 20 years in the US. In contrast, AML represents the majority of acute leukemia in adults. Approximately 50% to 60% of all children with AML can currently be cured with intensive chemotherapy with or without hematopoietic stem cell transplantation. Despite the clinical heterogeneity of AML, the identification of prognostic groups has led to risk-directed treatment approaches based on clinical, cytogenetic, and molecular markers as well as minimal residual disease quantitation. Such risk group stratification has thus identified cohorts of patients with survival rates above 75% while others are below 50%. Emerging data on AML in adults has suggested that microRNA expression and epigenetic patterns may also define additional subtypes. The report of deep genomic sequencing of a single patient with AML identified approximately a dozen mutated protein encoding genes, some of which were known to be associated with outcome in AML while others appeared as novel mutated genes. In some instances, molecular characteristics and prognostic markers have been or are currently being clinically tested for therapeutic targeting in pediatrics, such as the reversal of multidrug resistance drug exporters. ARMSTRONG, SCOTT CHILDREN'S HOSPITAL BOSTON Mutations in Pediatric Acute Lymphoblastic Leukemia Dec15, 2011 closed Acute lymphoblastic leukemia (ALL) is the most common cancer of childhood. Despite significant improvements in cure rates, a substantial number of children still relapse and little is understood about the underlying biology and genetics of these patients. Recent research has described a number of mutations found in this leukemia, however, these mutations have been studied in a relatively small number of patients and it is important to validate these in a larger independent set of patients to truly understand their frequency and significance before they can be used to influence clinical decisions. We will use recently developed sequencing techniques to analyze a large number of samples from our patient cohort of uniformly treated pediatric ALL in order to accomplish this. Large sequencing efforts have identified a number of genes mutated in acute lymphoblastic leukemia (ALL), some of which carry prognostic value, however, there have only been a few studies in separate cohorts independently validating these results. We propose to use to mass spectrometry, exon capture and next generation sequencing to study the frequency and prognostic significance of previously described mutations in our large, uniformly treated cohort of pediatric ALL patients; therefore we are collecting data as to which mutations have been previously described. The resequencing data within the ALL dataset of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative would provide an ideal source of additional mutations for identification within our cohort. Asai, Yoshiyuki YAMAGUCHI UNIVERSITY Analyses of transposable element variations associated with gene regulation in pediatric cancers Nov28, 2018 approved Transposable elements (TEs) are DNA sequences which can jump in genomes. In human, TEs consists about half of the genome. One of TE family, retrotransposon, can jump through "copy and paste" mechanisms and have increased human genome diversity. Almost all TEs are lost their jumping activity, some retrotransposons (L1, Alu, and SVA) are still active and jumping in the human genome. Recent studies have suggested that these genetic variations by TEs are related to human health and diseases through regulation of gene expression. In this study, we investigate the relationship between genetic variations of TEs and gene expression patterns in patients with pediatric cancers using genome sequencing data and gene expression data. Our study would find novel cancer-associated genetic variations related to gene regulation. Transposable elements (TEs) are major components of the human genome, comprising between 45 and 66% of the genome. Although most TEs have fixed in the human genome, several TEs (L1, Alu, and SVA) still retain their transposable activity. TEs can cause inactivation of gene function through insertional mutagenesis. Recent studies have suggested that TE insertion (TEI) polymorphisms may have important implications for health and disease (Wang L. et al., Curr Opin Genet Dev 2018). In these studies, they used whole genome sequencing (WGS) and RNA sequencing (RNA-seq) data from 1000 genomes project (samples from healthy individuals) showed that TEI polymorphisms might lead to disease-related human regulatory variation. In this study, we investigate between TEI variations and transcriptional regulation in patients with pediatric cancers using genome sequencing and gene expression data to find novel genetic causality of pediatric cancers. Our approach will provide new insight for developing more effective treatments in pediatric cancers. We do not plan to combine requested datasets with other datasets outside of dbGaP. Asgharzadeh, Shahab CHILDREN'S HOSPITAL OF LOS ANGELES TARGET Neurolbastoma Project Jul07, 2011 approved This project is designed to provide a complete genetic characterization of the pediatric cancer neuroblastoma. I am submitting this application as previous co_investigator of the Neuroblastoma-TARGET project. We generated the Expression and methylation data to understand molecular subgroups of neuroblastomas and identify components of the tumor microenvironment that may be subgroup specific. I would like to continue to utilize this information to analyze and identify subgroups of neuroblastoma and use the data in my research. Asgharzadeh, Shahab CHILDREN'S HOSPITAL OF LOS ANGELES Uncovering the role of tumor microenvironment in neuroblastoma Nov04, 2020 closed The contribution of immune cells in neuroblastoma is not completely understood. Our group proposes to utilize genomic data in this cohort and others to identify levels of immune cell infiltration and assess their role in recurrence of this cancer in children. Recent studies in NB by our group and others have demonstrated that tumor-infiltrating immune cells, particularly the tumor-associated macrophages (TAMs), and related inflammation establish a pro-tumorigenic tumor microenvironment (TME) that promotes NB growth, drug resistance, evasion from immune-mediated destruction, and correlates with poor outcome. Our findings raise the prospect that targeting pathways involved in the tumor-TAM interaction may be an important therapeutic vulnerability that may be exploited to develop combinatorial immunotherapeutic approaches for eradicating this cancer. The TME of NB has been the subject of extensive investigation by our group Assess signaling pathways associated (‘Inflammation’ subgroup representing ‘hot’ TME, and ‘Metabolic’ subgroup representing ‘cold’ TME). We plan to utilize this data along with other data generated in the lab and from TARGET data to estimate the role of TME in these tumors and correlate with other clinical parameters and germline variants. Backert, Linus IMMATICS BIOTECHNOLOGIES Discovery and validation of targets for cancer immunotherapy MOVED FROM: Jens Fritsche Jul15, 2020 approved Immunotherapies are most promising in the fight against cancer, but their development requires the discovery of novel and safe targets. Targets for T-cell based therapies are tumor-associated peptides (TUMAPs) presented on the surface of cancer cells by specialized molecules called human leukocyte antigens (HLA). For the development and validation of new targets for cancer immunotherapy, both transcriptome and genome/exome sequencing data of normal, diseased, and cancerous tissues including metastases, are relevant. The sequencing data will be used to identify patient specific variations, the individual HLA genotype, and the expression of relevant targets. Finally, the integration and combination of the datasets with our in-house data, will allow us to gain a better understanding of tumor biology and the development of computational, both necessary to ultimately help pediatric and adult cancer patients by providing better and safe immunotherapies. Immunotherapies are most promising in the fight against cancer, but their development requires the discovery of novel and safe targets. Targets for T-cell based therapies are tumor-associated peptides (TUMAPs) presented by either HLA class I or HLA class II. While we use mass-spectrometry by LC-MS for identification and quantitation of HLA presented peptides, NGS enables corroboration of the observed tumor association on mRNA-level. Thus, both transcriptome and genome/exome sequencing data of normal and cancerous tissues including metastases are relevant for the development and validation of new targets for cancer immunotherapy. The analysis will cover adult as well as pediatric (children and young adolescents) cancer patients. The genome and exome sequencing data will be used to identify patient specific variations and the individual HLA genotype, both of which influence the presentation of HLA peptides. Gene expression is one of the most important factors for HLA peptide presentation. Thus, we will investigate gene expression and patient genotype on a large cohort of primary and metastatic cancers, as well as pediatric malignant tumors, to develop novel biomarkers necessary to stratify patients for immunotherapeutics to increase safety and efficacy (malignancies considered but not limited to in this application are Prostate Cancer, Breast Cancer, Lung Cancer, Thoracic Malignancies, Rectal Neoplasms, Multiple Myeloma, Uveal Melanoma, Melanoma). In addition, the data from normal and diseased tissue (like COPD, Cystic Fibrosis, and Gastrointestinal Diseases) will be inspected to prevent on-target toxicity and assure clinical safety for the patient. Another goal is the development and improvement of computational methods for immunotherapy as well as target discovery and validation (data for which this is not allowed will not be used for the development of computational methods (e.g., TARGET)). The methods may contain integrative bioinformatics approaches, machine learning algorithms, and other statistical approaches. The integration and combination of the datasets with our in-house data, will allow us to gain a better understanding of tumor biology necessary to provide improved immunotherapies. Bailey, Kelly UNIVERSITY OF PITTSBURGH AT PITTSBURGH Detection of Fusion Genes in Pediatric Sarcomas Dec05, 2016 closed We will examine how the sarcoma-specific gene fusions can serve as diagnostic biomarkers, identifiers of specific sarcoma subtypes, and drivers of metastasis. This will help us uncover the important classes of gene fusions in pediatric sarcomas and potentially will highlight new therapeutic targets for intervention. Investigate how fusion genes contribute to pediatric sarcoma progression and metastasis. In this project, we will combine this and other datasets to discover these critical fusion genes: Fusion gene data will be integrated with somatic mutations, CNV, transcriptomic and epigenomic data to strengthen observations. In silico research will be validated with wet lab research and hopefully lead to novel targets and therapeutics. This research is performed in an academic setting and will be made publicly available and/or published in scientific journals. Ballinger, Dennis COMPLETE GENOMICS, INC. TARGET whole genome sequence (WGS) data integrity verification for pediatric cancers Jul19, 2012 closed To verify that the data submitted to db GaP did not get altered in the transmission and storage process. To periodically test the integrity of Complete Genomics Whole Genome Sequence (CG WGS) data submissions for the National Cancer Institute (NCI) TARGET project, we would like to download BAM files from the NCBI SRA to compare with the data before transmission and storage (dbGaP Study Accession: phs000218.v4.p1). Ballinger, Dennis COMPLETE GENOMICS, INC. Compare TARGET Genotypes with genomic sequence for pediatric cancers. Feb23, 2011 closed Genotype data will be used to quality control genomic sequence data for pediatric cancers as part of the TARGET program. SNP genotypes of pediatric cancer cases that are part of the TARGET study will be compared to sequence calls for the same locations in complete genome sequence generated as part of a subcontract with SAIC (#10ST1018) from the NCI. The primary use of the data will be in quality control of the genomic sequence, for estimating error rates at the overlapping locations. Barnes, Betsy FEINSTEIN INSTITUTE FOR MEDICAL RESEARCH Identification of biomarkers predicting osteosarcoma metastasis Sep01, 2022 expired Osteosarcoma in children and adolescents has had significant improvements in treatment regimens, but still many children die when the cancer spreads from the bones to other areas of the body—particularly the lungs. Understanding which gene factors are involved that drive this spread is important to identify which children are at the highest risk for metastasis and to try new treatments that target these factors. The objective of this study is to identify gene association with osteosarcoma disease severity and presence of metastasis. Understanding drivers of metastases is particularly important for osteosarcoma as those with localized disease have a relatively high survival rate whereas those with metastatic spread have less than 30% survival rate. We aim to identify potential changes in the immune factors with functional changes that drive metastases. Specifically, some interferon regulatory factors have been noted to be tumor suppressors and associated with tumor severity in other cancers. Our previous work has demonstrated specific interferon factors that correlate with disease severity in human breast ductal carcinoma [PMID 22053985]. Our goal is to assess if a similar correlation and driver of severe disease exists in osteosarcoma. We will assess the relative expression of interferons identified from other studies and compare with presence of metastases, pathological disease state and recurrence as well as outcomes overall survival and event free survival. Further work includes use of whole exosome sequencing to correlate our findings in tumor cells with their exosomes to understand if these interferons alter cargo gene cargo in exosomes correlating with metastatic disease. Barr, Frederic Glenn NIH Cross-cancer comparison of genomic amplification events Jun05, 2017 approved Amplification is a mechanism that can increase copy number and expression of specific genes, and thereby contribute to the development of various cancers. Studies in my lab showed that some regions are frequently amplified in multiple cancer categories. However, it is not clear whether the precise amplified segments are the same or if there are differences in these amplified segment between cancer categories. Similarly, it is not clear whether there are differences between cancer categories in the genes that are overexpressed as a result of these copy number changes. In this project, we propose to use data from genomic databases to compare and contrast the precise amplified segments and the overexpressed genes among cancer types with similar amplification events. After identifying similarities and differences between the amplification events among cancer types, we will then perform laboratory studies to assess the functional role of specific amplified/overexpressed genes and to explore drugs that may target the overexpressed protein products. Previous studies in my and other laboratories revealed the frequent occurrence of amplification of specific chromosomal regions in various types of cancer. In these amplification events, a chromosomal region is present in multiple copies instead of the usual two copies per cell, resulting in increased RNA expression from genes within the amplified region. Some of these abnormally expressed genes are postulated to contribute to the development of these cancers. An important finding in studies of genomic amplification is that the same chromosomal region appears to be amplified in more than one cancer category. For example, my laboratory identified amplification of the 12q13-q14 chromosomal region in the pediatric soft tissue cancer alveolar rhabdomyosarcoma. Survey of the literature has also identified that this chromosomal region is also reported to be amplified in numerous other cancer categories including carcinomas (e.g., lung adenocarcinoma), brain tumors (e.g., glioblastoma multiforme), and other sarcomas (e.g., liposarcoma). As the well-known oncogene CDK4 is located in this chromosomal region, we and others have hypothesized that CDK4 is a key oncogene targeted by these amplification events to result in overexpression of the CDK4 mRNA and protein product. In this project, we will use genomic copy number data for several cancer categories to identify subsets within each category with amplification of several commonly involved chromosomal regions (such as 2p24, 12q13-q14 and 13q31). In particular, we will use a statistical approach developed in this laboratory to establish the genomic region which is usually amplified in each cancer category. We will then use RNA sequencing to identify genes within the amplified regions that are usually overexpressed at the RNA level in each cancer category. Using these data, we will then determine the degree to which the precise amplified regions and overexpressed genes are concordant between the different cancer categories, thus determining if there are amplified regions and/or overexpressed genes common to all the cancer categories with a specific amplified chromosomal region. Furthermore, we will also determine if there are tumor-type specific differences in the DNA content and/or RNA targets of these amplification events. Finally, we will proceed to in vitro and in vivo studies to assess the oncogenic role of the amplified/overexpressed genes and whether these events can be targeted by specific pharmacologic approaches. These findings will thus use the commonalities and differences between these amplification events to better understand the molecular pathogenesis of these cancers and to determine the potential susceptibility of these cancers to targeted drugs. Barrett, Christian ISOMMUNE, INC. Identification of tumor-specific proteins for monoclonal antibody and peptide vaccine therapies against pediatric cancers Jan19, 2016 expired Identifying molecules that are only present in pediatric tumors is a key discovery challenge in the development of new therapies for childhood cancers. To discover molecules that are only present in tumors, we have developed a systematic process that is a combination of custom software and standard molecular biology procedures. We intend to apply our process to all of the TARGET tumor types for the purpose of discovering tumor-specific protein variants that we can then use for new therapies against the TARGET childhood cancer types. Background Identifying molecules that are specific to tumors for use in early detection, diagnosis, and therapy is both a primary goal and a key discovery challenge across diverse areas of oncology. To discover tumor-specific molecules, we have developed custom bioinformatics algorithms to analyze transcriptome sequence (RNA-seq) data to identify candidate tumor-specific mRNA isoforms. Additionally, we have developed a bioinformatics infrastructure for high-throughput RT–qPCR validation of candidate isoforms. In recently published work (Barrett et al, PNAS 2015) we applied our process to 296 ovarian cancer and 1,839 normal tissues. Our results revealed multiple mRNAs encoding protein isoform therapeutic targets that had unique amino acid sequences and that were expressed in most of the cancers examined but not in normal tissues. Use of TARGET Data Our intention is to use TARGET transcriptome data to discover molecular targets that will form the basis therapeutic programs that we aim to develop for the TARGET cancer types. Specifically, we will use our recently demonstrated computational and wet-lab process to identify tumor-specific protein isoforms that we can use as monoclonal antibody targets or that contain tumor-specific peptides that we can use in anti-cancer peptide vaccines. Since we seek protein isoforms that are only expressed in TARGET pediatric tumors and not in normal tissues, our discovery process requires transcriptome data from as many normal tissues as possible. Ideally, these normal control data would also be from children. Such data largely do not exist, but the NIH GTEx project is producing large transcriptome data sets for a diverse range of normal adult tissues. We intend to use the GTEx data in conjunction with the TARGET data for our discovery process. Raw data from the two data sets will not be mixed. Rather, GTEx and TARGET raw data will be analyzed separately, and only at the last stage of our process will the analysis results be combined. Barrow, Alexander UNIVERSITY OF MELBOURNE Role of NKp44 and PDGF-D in AML Oct09, 2019 closed A paper published in Oncotarget (Shemesh et al. 2016) claimed that a mRNA variant of the transcript encoding the NKp44 receptor (NCR2-1) has a negative effect on AML patient survival. However, in the dataset Shemesh et al. used; TCGA-LAML, most of the samples do not significantly express NKp44 (read count = 0). We want to analyze the controlled data in the TARGET-AML dataset that has better NKp44 expression in samples and a greater sample size to determine the role of NKp44 and PDGF-D in pediatric AML. Natural killer (NK) cells are cytotoxic and cytokine-secreting immune cells with potent anti-tumour cell activity and NK cell activity is positively associated with a good prognosis in cancer (Morvan MG & Lanier LL. Nat Rev Cancer. 2016). NKp44 is an activating receptor expressed by NK cells that promotes NK cell anti-tumor activity and PDGF-D, which is often over-expressed by tumor cells, is the ligand for NKp44 (Barrow et al. Cell, 2018). A paper published in Oncotarget (Shemesh et al. 2016) claimed that a mRNA variant of the transcript encoding the NK cell receptor NKp44 (NCR2-1) has a negative effect on adult AML patient survival. However, in TCGA-LAML dataset that Shemesh et al. used most of the samples do not significantly express NKp44 (read count = 0). Consequently, we want to analyze the controlled data in the TARGET-AML dataset that has (1) a higher proportion of NKp44 expression in the pediatric samples and (2) a larger sample size than the TCGA dataset with the goal of determining the role of NKp44 as well as the NKp44 ligand, PDGF-D, on overall survival in pediatric AML as well as to provide and interesting comparison to the adult TCGA-LAML dataset for AML disease. This is important because the conclusion of the Shemesh et al. 2016 paper is that NKp44 receptor isoform encoded by the NCR2-1 mRNA species may have inhibitory signaling functions and propose that NKp44 may therefore represent a potential new target for antibody checkpoint blockade. However, our results suggest that NKp44 is an activating receptor that binds to PDGF-D to activate anti-tumor functions of NK cells (Barrow et al. Cell 2018), and therefore hypothesize that blocking the function of NKp44 or its PDGF-D ligand will likely inhibit NK cell activity and may thus be the wrong approach for diseases such as pediatric AML. We propose to perform survival analysis in relation to expression of the different NKp44 and PDGF-D mRNA isoforms in the TARGET-AML dataset. In order to perform our analysis, we respectfully request access to the RNA seq data files for each mRNA isoform/species (e.g. NCR2-1, NCR2-2, NCR2-3 etc.) i.e. 'Isoform Expression Quantification' for each gene and not 'overall gene expression' in the TARGET-AML dataset as well as vital status, total survival days, age, type of AML and tumor purity information for each sample. Our proposed study therefore has clear relevance for the diagnosis and treatment of childhood cancer. We will not combine the requested dataset with other datasets outside of dbGaP. Only comparisons will be made between datasets. We do not intend to use Cloud computing to carry out the research. Our collaborator is the student's research project supervisor only and will not handle any of the data themselves. Beltinger, Christian UNIVERSITY OF ULM Determining the minimal genomic data required for a diagnosis of neuroblastoma Dec28, 2015 approved Neuroblastoma is a pediatric cancer with a poor outcome in high-risk cases. Neuroblastoma is diagnosed by a large number of often complicated, time-consuming and expensive methods, many of which may be redundant and also subjective. We want to find the minimum of objective genomic data that is required to make a definitive diagnosis of neuroblastoma. Currently, in addition to traditional histological and immunohistochemical analyses that are often subjective, neuroblastomas (NBLs) are subjected to a multitude of different molecular analyses. While these analyses are objective and often complementary, they can be time-consuming, expensive and redundant. The minimal genomic data required for a diagnosis of NBL is unknown. We hypothesize that NBL can be diagnosed just by next generation sequencing of nucleic acids. To prove this hypothesis we will investigate a large cohort of patients whose NBL and germline has been assessed simultaneously by gene expression arrays, copy number analysis, whole genome sequencing and whole exome sequencing. The TARGET NBL data is ideal for this purpose, as it contains data on the same patients using different genomic methods. Part of he TARGET NBL data will be employed as a training set to generate a classifier for NBL using appropriate bioinformatic tools. In an independent validation set of expression and sequence data from the TARGET NBL database we will validate this classifier. By iteratively narrowing the sources of the sequencing data in the training set we will determine whether and which single sequencing method(s) suffice(s) to make a definitive diagnosis of NBL without loosing information on actionable targets. We expect that our results will streamline the diagnosis of NBL and that the approach employed will be useful for other cancers as well. The results of this projects will be made freely available to the scientific community. We do not anticipate any increased risk to the participants. We do not plan to combine requested datasets with other datasets outside of dbGaP. Benvenisty, Nissim HEBREW UNIVERSITY OF JERUSALEM Genetic and Epigenetic analysis of human pluripotent stem cells and cancer cells Sep26, 2019 rejected Human pluripotent stem cells are a potential important tool in disease modeling and regenerative medicine The aim of our research is to characterize genetic and epigenetic changes in human pluripotent stem cells in order to better understand the mechanisms underlying resistance to chemotherapy in various types of cancers. Such study will enable a better use of the cells for personalized medicine in cancer and in stem cells research. The aim of the research is to uncover genetic and epigenetic mechanisms that are involved in conferring resistance to various anticancer treatments. We developed a human pluripotent stem cell platform enabling the identification of mechanisms of action and pathways that contribute to drug resistance. This system offers a genetically uniform background for avoiding tumor heterogeneity, a known limitation in cancer research. With the data obtained through this application, we plan to perform a comprehensive analysis of patients’ genomic imprinting patterns, global DNA methylation levels, gene expression, chromosomal copy numbers and point mutations, together with the corresponding clinical data. Comparing the information obtained from the human pluripotent stem cell platform and patients’ datastes will allow us to uncover the mechanisms underlying resistance to various treatments in various types of cancer and to show the model’s relevance in the clinical arena. Understanding the mechanisms underlying resistant to treatments will enable more personalized treatment and will benefit patients. It is essential to corroborate the results obtained from the pluripotent stem cell platform with the data obtained from different populations of patients with different type of cancers in order to show the clinical relevance the model in each subgroup. This include children and patients suffering from different types of cancers. In order to make our results relevant to patients suffering from specific diseases, we would make sure to segregate between the data obtained for each of the phenotypes. This is our mainstay in the research and specifically it holds for the following data obtained from the following datasets: • phs000748.v7.p4 : Multiple Myeloma CoMMpass Study • phs001444.v1.p1 : Genomic Variation in Diffuse Large B Cell Lymphomas • phs001628.v1.p1 : Clinical Crenolanib Resistance in AML • phs001611.v1.p1 : Pancreas Cancer Organoid Profiling We would also provide disease and age-group specific analysis for any other diseases, if meaningful results would be obtained for these cases. The results obtained from this study will be published in the relevant journals and be shared with the scientific community. BERNT, KATHRIN CHILDREN'S HOSP OF PHILADELPHIA Transcriptional and epigenetic mechanisms in AML Jun21, 2018 rejected Acute myeloid leukemia affects 20,000 patients in the US each year. We are studying a particularly aggressive subtype of AML. 80% of these patients die within 2 years of diagnosis. This subtype is characterized by very high levels of a gene called MN1. We are investigating why MN1 is high in these patients. Specifically, we are wondering whether the MN1 gene is broken, and fused to a different gene, in these patients. Furthermore, we are studying the consequences of high MN1. We speculate that MN1 dysregulates access to other genes in the genome, with consequences similar to another common, high-risk subtype of AML. Therapies being developed for the latter subtype may also work for patients with high MN1. We seek a better understanding of how MN1 induces leukemia with the goal of eventually developing therapies that can block MN1 specifically. Transcriptional dysregulation is a critical molecular feature of AML. The transcriptional co-activator MN1 is recurrently rearranged, and frequently overexpressed in AML. High MN1 expression is associated with a poor prognosis. Forced expression of MN1 in mouse bone marrow induces a leukemia as a single hit by inducing a gene expression program similar to KMT2A rearrangements, including high levels of HOXA cluster gene expression. Analysis of the publicly accessible data set revealed several TCGA samples with very high MN1 expression. Most of these samples also express high levels of HOXA cluster genes, which are an independent predictor of poor prognosis. We are requesting access to the BAM files of the RNA-Seq data set to ask: 1. whether high MN1 expression is driven by crypic fusions of MN1 and 2. whether human MN1 high leukemia shares a transcriptional profile with KMT2A rearranged AML (similar to what is observed in the mouse model). We also request access to the Target AML data to perform the same analysis. High MN1 expression is also observed in children. However, it typically does not correlate with high HOXA expression, and is almost always found in AMLs with inv (16). Analysis of limited patient cohorts by multiple groups (published and unpublished) suggest a different role for MN1 in pediatric versus adult AML. The pediatric cohort will be analyzed independently from the adult cohort, and serve to investigate the transcriptional profile associated with high MN1 expression in pediatric patients, particularly those with inv(16). Results will inform and complement results from an inv(16) murine model. Complementary biochemical studies in the murine model in our lab have identified interactions between MN1 and several potentially targetable proteins. Therefore, ensuring that the murine models recapitulate transcriptional and molecular features of the human disease is of critical relevance for the development of novel targeted agents that act on AML with high MN1 expression. Beule, Dieter MAX DELBRUECK CENTRUM FOR MOLECULAR MED Integrating somatic abberations with the immune landscape in neuroblastoma Mar22, 2017 closed Neuroblastoma (NB) is a malignancy of the sympathetic nervous system. It accounts for 12% of all cancer related deaths in childhood and is the most diagnosed common cancer in one year old infants. The disease outcome is highly variable, it ranges from spontaneous regression to the death of the patient. Most cancers can be described as a genetic disease where the tumor cells have an altered genome compared to normal cells. Yet, like other pediatric cancers, neuroblastoma harbors few point mutations. We want to find other changes in the genome like structural variants and abnormal rearrangements. Further, the role of the immune system particularly will be analyzed from gene expression data. Neuroblastoma (NB) is a malignancy of the sympathetic nervous system. It accounts for 12 % of all cancer related deaths in childhood and is the most diagnosed common cancer in one year old infants. The disease outcome is highly variable and ranges from spontaneous regression to decease. This variability is not well understood, it may have genetic, epigenetic and physiological foundations. We strive to better understand the role of the immune system in the disease mechanisms by characterizing tumor infiltrating immune cell populations using gene expression signatures and deconvolution methods with RNA-seq data. Furthermore, we aim to analyze somatic and germline genetic variants, particularly InDels, structural variants and CNVs, in conjunction with immune cell infiltration and expression profiles. We plan to use newly gained knowledge to stratify NB cases and establish links to phenotypic data, and disease outcome. Pipelines and tools that have been used to perform similar analysis on GBM data will be adapted and re-used for NB data. Bhasin, Manoj EMORY UNIVERSITY Immune Landscape analysis of Pediatric Tumors Aug26, 2020 rejected The project will evaluate the immunogenic neoantigens in pediatric tumors. The neoantigens will be identified using an innovative analytical approach. The correlation of various kinds of neoantigens to overall survival and other clinical factors will be determined. The project is aimed at identifying the specific type of neoantigens and their association with outcomes and therapeutic responses. We would like to request access to FASTQ and BAM files from whole transcriptome (RNA-Seq), whole exome (WESeq) and whole genome (WGSeq) sequencing data across all pediatric cancer types from TARGET cancer program including Acute Lymphoblastic Leukemia (ALL), Acute Myeloid Leukemia (AML), Neuroblastoma (NBL), Kidney Tumors (WT, RT, CCSK), and Osteosarcoma (OS). These datasets contain overlapping samples that we will define and use for the following aims: Aim1: To detect expression and splice junctions in pediatric tumors using the RNA-Seq data. We will also identify genetic variants of transcriptomic and genomic origin to detect processes affecting splicing. Aim2: The detected splicing events from Aim (1) will be used to study associations with clinical factors and outcomes. BIGAS, ANNA FUNDACIO IMIM Discovery of new target genes for children T-ALL therapy by exploring alternative splicing dynamics Jul22, 2020 approved Cancer cells do behave differently to normal cells. These differences can be identified by tracking the expression of all the genes that are transcribed (transcriptome) in cancer-affected tissues. The comparison of the transcriptome of cancer samples and normal samples helps to discover the aberrant activity of genes that are contributing to cancer origin and proliferation and can directly affect the outcome of the patients. Here we seek to find these transcriptome differences between T-Cell Acute Lymphoblastic Leukemia (T-ALL) cells from sick children, and normal T-cells combining Next Generation Sequencing technologies (NGS) and computational algorithms. To do this, we aim to collect multiple transcriptomes. The characterization of the splicing dynamics and gene expression program associated with children T-ALL at high resolution will help to improve the clinical diagnostic of the pediatric patients and discover new drugs and therapies. RNA processing and alternative splicing is one of the multiple post-transcriptional mechanisms that contribute to functional diversification in eukaryotic cells. Cancer-associated alternative splicing events have been reported in different tumor types, establishing new putative targets for diagnostic tools and therapies. These events are orchestrated by the RNA-binding proteins (RBPs) (Dassi et al. 2017). Additionally, there is evidence about correlation existence between a particular AS configuration and cancer specificity (Climente-González et al. 2017). Our main objective is to characterize the transcriptional program at exon resolution of T-Cell Acute Lymphoblastic Leukemia (T-ALL) by exploring gene expression data of primary samples from different cohorts, as well as of normal T-cell maturation stages and different cell lines. For this purpose, we plan to use vastdb and vast-tools (Tapial et al. 2017) to characterize T-ALL specific alternative splicing events and infer which molecular mechanisms are involved in splicing deregulation in leukemia. A first exploration has shown a global impairment in exon skipping and intron retention dynamics in T-ALL. A second objective derived from this result is to conduct a thorough characterization of the splicing dysregulation scenario and the identification of its key drivers. We plan to construct a splicing regulatory map by exploring the RBPs mutational burden and differential expression in T-ALL patients compared to normal T-cell maturation stages. Additionally in our lab, we have identified, from an in vitro independent analysis of human T-ALL cell lines, a beta-catenin-dependent transcriptional program of genes involved in RNA processing. Based on this, our third objective is to elucidate the beta-catenin mechanistic role in AS configuration. We plan to characterize AS events modulated by beta-catenin activity in T-ALL cell lines and correlate them with the splicing dysregulation previously observed in T-ALL patients. Lastly, the fourth objective is to explore the association, if any, between a particular AS profile and response to treatment. We plan to integrate the AS events quantification and treatment outcome of T-ALL patients. Overall, the computational assessment of T-ALL specific isoforms will eventually help to improve the diagnosis, as well as the discovery of putative neoantigens targeted by immunotherapy. Finally, we aim to interrogate T-ALL patients transcriptome for the discovery of gene signatures associated to therapy resistance and relapse, with special view to the beta-catenin modulated-RNA processing gene set identified in T-ALL cell lines. Specifically, we plan to identify a T-ALL expression pattern associated to treatment outcome by means of unsupervised clustering methods. NOTE: We plan to combine the requested dataset with other public or controlled access resources (i.e., from GEO or European Genome-Phenome Archive repositories) to increase statistical power in our analysis. Our initial plan is to analyze them together if the potential batch effects can be successfully corrected, otherwise, they will be independently analyzed. In any case, we do not foresee any risk to participants. binbing, zhou SHANGHAI JIAO TON UNIVERSITY SCH OF MED pediatric acute lymphoblastic leukemia basic research Oct23, 2017 expired Acute lymphoblastic leukemia (ALL) is one of the most common pediatric cancer. Relatively high risk of relapse frequently diagnosed despite of the improvement in clinical therapeutics. Patients that relapse after treatment have very poor prognosis. Identifications of the biological and mechanistic basis of treatment failure is crucial to predicting relapse risk and to aiding in the development of new therapeutic strategies against relapse. This project aims to explore the oncogenic stress and its regulation during relapse. We would like to identify genes, genetic elements and/or molecular signatures that control and regulate ALL relapse at genomic as well as transcriptomic levels. Our long-term objective is to improve the cure rate and decrease the risk for relapse in ALL and other pediatric cancers through the integration of scientific research and patient treatment. Acute lymphoblastic leukemia (ALL) is one of the most common pediatric diseases, and relapse is the leading cause of mortality in childhood ALL. We had found that patients with PRPS1 mutant would relapse early during treatment. However, as a potential tumor clonal evolution driver, the oncogenic stress and its regulation during ALL relapse remain unknown. We are using high-throughput genomic/transcriptomic sequencing and molecular characterization to understand molecular mechanism. We would like to expand our initial findings in large patient cohorts to acquire a more reliable statistical evaluation. Large numbers of patient sample collected in Therapeutically Applicable Research To Generate Effective Treatments (TARGET) database provided a very valuable database to test our hypothesis. We use the TARGET database only for the purpose of pediatric translational and basic research. Bingham, Jonathan GOOGLE, INC. ISB Cancer Genomics Cloud TARGET Dec18, 2017 closed The growth of large-scale DNA sequence data for cancer research and its routine use in translational science is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic data and associated clinical annotations collected from various studies is critical to accelerating research and making new discoveries. This project aims to support the development of a new model for data analysis that will allow groups ranging in size from single laboratories to large research consortia to derive value from the investments made in TARGET data without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. The Institute for Systems Biology Cancer Genomics Cloud (ISB-CGC) is a NCI Genomics Cloud Pilot, supporting the analysis of biological data where data and compute are co-located behind an interface that ensures data security. Data for this project will include open and controlled access data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Cancer Genome Characterization Initiative (CGCI) datasets. Open access will include clinical, molecular, and somatic mutation data. Controlled access will include sequence, MAF, VCF, and SNP-chip data. All data will be obtained from the Genomic Data Commons and users will be properly authenticated and authorized to access it. The objective is to host cancer datasets like TARGET in a cloud environment so users can analyze the data quickly and efficiently. Other large genomics data sets are available at ISB-CGC, and researchers are able to include private data. Researchers are bound by the Data Use Access agreements (including the Limitations) governing their use of each dataset. ISB was granted NIH Trusted Partner status for TCGA data in July, 2015 and an extension for TARGET and CGCI on September 27, 2016. ISB has implemented authentication and authorization protocols to ensure only approved users will be able to access Controlled data. ISB-CGC follows all NIH Trusted Partner requirements for storage and distribution of these data. The system security is governed by an Authority to Operate (ATO) granted under the FISMA-moderate level on February 18, 2016. ISB performs security impact assessments as required. The ISB-CGC has two subcontractors, Google and CSRA and we are applying for dbGaP approval. The subcontractors will need access to the controlled data using appropriate security controls, and will not make the data available through other means. By including TARGET and CGCI, ISB-CGC will advance pediatric cancer research with an environment where researchers can analyze data under appropriate control. No TCGA, TARGET or CGCI data will be published as a result of this work. At contract completion, all downloaded data will be destroyed as per requirement of the DUC. BLANCK, GEORGE UNIVERSITY OF SOUTH FLORIDA CNV and immune receptors in pediatric cancers Oct10, 2018 approved Our goal is to understand the lymphocytes in pediatric tumors; and to understand genetic changes that lead to pediatric tumor cell replication. There will be no mixing of datasets. The RUS for this project has been rewritten for this 2022 renewal. Since the last renewal, we have mined immune receptor recombinations from both the TARGET-NBL and Kidsfirst-NBL (phs001436) datasets. TARGET-NBL immune receptor studies have been published (1, 2) and will be listed in the publications for this renewal. The KidsFirst NBL set is part of an ongoing project whereby we are assessing immunologically cold tumors. Such tumors correlate with the amplification of the MYCN gene, but the details of what “immunologically cold” means in that setting are lacking. Thus, we are assessing both MYCN amplified and non-amplified cases for the recovery of immune receptor recombination reads for both the TARGET and the KidsFirst dataset. In addition, we have data indicating that the MAPT gene is amplified in a subset of NBL tumors that may explain why a subset of NBL tumors express high levels of tau (3), encoded by the MAPT gene. Also, we are discovering and evaluating anti-viral CDR3s in both NBL datasets, using the protocol in ref. (4). Finally, the other datasets that are part of this project are on our list for the recovery of immune receptor recombination reads (5-7), and follow up analyses, as has been done in the past for many published projects. 1. Kacsoh DB, Patel DN, Hsiang M, Gozlan EC, Chobrutskiy A, Chobrutskiy BI, Blanck G. Tumor Resident, B-Cell Receptor Chemical Characteristics Associated with Better Overall Survival for Neuroblastoma. J Mol Neurosci. 2022;72(9):2011-9. Epub 20220727. doi: 10.1007/s12031-022-02050-6. PubMed PMID: 35896862. 2. Gozlan EC, Chobrutskiy BI, Zaman S, Yeagley M, Blanck G. Systemic Adaptive Immune Parameters Associated with Neuroblastoma Outcomes: the Significance of Gamma-Delta T Cells. J Mol Neurosci. 2021. doi: 10.1007/s12031-021-01813-x. PubMed PMID: 33666857. 3. Zaman S, Chobrutskiy BI, Blanck G. MAPT (Tau) expression is a biomarker for an increased rate of survival in pediatric neuroblastoma. Cell cycle. 2018;17(21-22):2474-83. doi: 10.1080/15384101.2018.1542898. PubMed PMID: 30394813; PubMed Central PMCID: PMCPMC6342068. 4. Zaman S, Chobrutskiy BI, Patel JS, Diviney A, Tu YN, Tong WL, Gill T, Blanck G. Antiviral T Cell Receptor Complementarity Determining Region-3 Sequences Are Associated with a Worse Cancer Outcome: A Pancancer Analysis. Viral Immunol. 2020. doi: 10.1089/vim.2019.0156. PubMed PMID: 32315578. 5. Gill TR, Samy MD, Butler SN, Mauro JA, Sexton WJ, Blanck G. Detection of Productively Rearranged TcR-alpha V-J Sequences in TCGA Exome Files: Implications for Tumor Immunoscoring and Recovery of Antitumor T-cells. Cancer Inform. 2016;15:23-8. doi: 10.4137/CIN.S35784. PubMed PMID: 26966347; PubMed Central PMCID: PMC4768948. 6. Tong WL, Tu YN, Samy MD, Sexton WJ, Blanck G. Identification of immunoglobulin V(D)J recombinations in solid tumor specimen exome files: Evidence for high level B-cell infiltrates in breast cancer. Human vaccines & immunotherapeutics. 2017;13(3):501-6. doi: 10.1080/21645515.2016.1246095. PubMed PMID: 28085544; PubMed Central PMCID: PMCPMC5360147. 7. Chobrutskiy BI, Zaman S, Tong WL, Diviney A, Blanck G. Recovery of T-cell receptor V(D)J recombination reads from lower grade glioma exome files correlates with reduced survival and advanced cancer grade. J Neurooncol. 2018;140(3):697-704. doi: 10.1007/s11060-018-03001-1. PubMed PMID: 30382482. Bodor, Csaba SEMMELWEIS UNIVERSITY Validation of the Novel Genetic Risk Scoring Method in Pediatric Acute Lymphoblastic Leukemia Jan26, 2023 approved Acute lymphoblastic leukemia is the most common malignant disease of childhood with heterogenous genetic background. However, specific genetic events, called copy number aberrations are very frequent in this patient group, affecting approximately 95% of the patients. The treatment of this disease is guided by distinct risk factors, affected by several biological parameters, most importantly the underlying genetics of the disease. Therefore risk assessment for the correct treatment of the disease is crucial. Studies in the past have developed genetic risk scoring methods, which assess only low number of genes important in this disease. Our research group developed a genetic risk scoring system, which focuses on a significantly larger set of genes, and assesses a combined genetic risk score. With the requested TARGET database, we would have the opportunity to validate our results on an independent dataset. In case of its successful testing, it could further improve the individual risk assessment and personalized treatment of pediatric patients diagnosed with ALL. Copy number aberrations (CNAs) are frequent and significant biomarkers of pediatric acute lymphoblastic leukemia (ALL). Numerical and subchromosomal CNAs affect approximately 90% of the patients, moreover, some WHO subgroups of the disease are defined by numerical CNAs. Several studies investigated the clinical significance of CNAs in pediatric ALL and identified a range of prognostic biomarkers. Integrative efforts have led to the establishment of complex classifiers enabling the assignment of patients to distinct prognostic subgroups based on cytogenetic and molecular genetic markers. Current shortcomings of these genetic classifiers are the relatively low number and limited combinations of aberrations used as criteria for decision making. Assignment of individual patients is typically restricted to a couple of specific genomic patterns with all other uncategorizable patients being classified in the same non-specific, collective subgroup. In our study, we performed a comprehensive screening for disease relevant CNAs in a cohort of 260 Hungarian patients diagnosed with pediatric B-ALL using digital Multiplex Ligation-dependent Probe Amplification. The generated CNA profiles were combined with cytogenetic data for risk assessment and introduced a conceptually novel patient classification approach which dynamically considers all possible combinations of screened and potentially co-segregating genetic alterations. This newly developed prognostic classifier provides a flexible, more refined, and more personalized risk assessment for patients with pediatric B-ALL. Although the size of our patient cohort represents a patient population newly diagnosed with pediatric ALL over a four-year period in our country, the validation of our recently developed prognostic classifier would require further datasets. With the proposed project we aim to use whole exome sequencing data to obtain CNAs using CNVkit software then filter for the genes relevant in our prognostic classifier. The disease-relevant CNAs will be combined and a risk score will be determined and assigned to each patient. Finally, the risk score will be linked to clinical datasets and event-free survival rates will be calculated. Application of your highly valuable datasets would not only enable us to validate the aforementioned prognostic classifier, but in case of its successful testing, it could further improve the individual risk assessment and personalized treatment of pediatric patients diagnosed with ALL. In this project we do not plan to combine the requested data from TARGET database with datasets outside of dbGaP. Boegel, Sebastian TRON -TRANSLATIONAL ONCOLOGY GGMBH Individualized cancer combination therapy Oct28, 2015 closed Immunotherapies have the potential improve the outcome of cancer patients. Our institute is running clinical trials testing therapeutic RNA vaccines, in which a patient’s immune system is triggered to attack cancer cells. To trigger an immune response against cancer cells, we seek to identify those genes, including mutated genes, which are found only in the cancer cells. By profiling a collection of cancer samples, we can identify the frequency and co-occurrence of mutations and tumor-specific genes (antigens). We will select the most promising tumor-specific antigens and then proceed to make therapeutic RNA vaccines that encode for these cancer antigens, followed by clinical trials that hopefully demonstrate a benefit for the patient. Here, we specifically want to use the primary NGS data to a) identify genes / exons / junctions with tumor-restricted expression patterns and b) identify tumor somatic mutations. These findings will be evaluated based on expression, protein-impact (for mutations), localization, occurrence and co-occurrence and potential immunogenic epitopes, and then further validated in other tissue microarrays (IHC) and functional studies (siRNA). The prioritized antigens will be synthesized in RNA as potential therapeutic agents. In addition, we are developing methods for detecting various classes somatic tumor mutations and target discovery. We are currently evaluating the performance of our mutation detection pipeline designed for clinical individualized vaccines against cancer. We intend to publish all of our findings, including those we derive from your database, in peer reviewed journals. As described above, we are also planning to apply our vaccine platform in the coming future to pediatric cancers. However, compared to tumors in adults, paucity of genetic alterations in pediatric cancers poses a great challenge for bioinformatic analysis and deconvolution of the cancer genomics of these patients. We therefore wish to test and optimize our cancer genomics platform on selected pediatric cancer cohorts provided by TARGET so that we can offer the best possible treatment for anticipated clinical trials in pediatric cancer indications. We are currently preparing a manuscript describing our cancer genomics platform and our software will be made available free of charge for the benefit of the academic community. Boeva, Valentina CURIE INSTITUTE Analysis of Genes Frequently Affected in Relapses in Neuroblastoma May13, 2015 expired Neuroblastoma is a pediatric cancer which often has a lethal outcome, especially for patients diagnosed with a high stage of the disease. Patients typically succumb from recurrence of the disease (relapse). Biopsies from relapsed tumors have only been performed in very rare occasions. Around the globe, independent research groups have tried to assess which tumor specific changes in the genetic material have occurred from primary tumors to relapse. Due to the rare nature, drawing conclusions from the separate studies remains a challenge. Three different institutions, including ours, have teamed up to create the largest set of whole genome sequencing data of diagnosis/relapse neuroblastoma samples to date. With the increase in sample size, we are confident that conclusions will be substantiated. In this proposal, we will attempt, using these data, to root subclonal populations that play a major role in the mechanisms of neuroblastoma relapse. Relapse material of tumors from patients suffering from Neuroblastoma has only been collected in exceptional cases over the last decade. With the advent of whole genome sequencing, several groups have started to explore changes occurring in relapsed neuroblastoma samples. However, none of the groups have data sets of sufficient size due to the rare nature of biopsy. In an ongoing collaboration, the groups of John Maris (TARGET representative, CHOP, USA), Olivier Delattre (Institut Curie, France) and Rogier Versteeg (AMC, Netherlands) have combined their precious samples to create the largest neuroblastoma relapse cohort to our knowledge. In the framework of this collaboration, the team of Valentina Boeva (Institut Curie, France) will study sub-clone evolution in neuroblastoma between diagnosis and relapse. Our team has developed a computational method for sub-clone detection using multiple whole genome sequencing samples from the same patient, which we will apply to the TARGET neuroblastoma data. The use of specialized software to detect sub-clonal populations in whole genome sequencing data combined with gene network analyses should lead to a better resolution of cancer biology and clonal heterogeneity in neuroblastoma. This research will be enhanced by analyses of recurrently mutated pathways, in order to uncover systematic transformations leading to neuroblastoma relapse. Bofill De Ros, Francesc Xavier UNIVERSITY OF AARHUS MicroRNA Metabolism in Tumors: Biology and Therapeutic Opportunities Mar23, 2023 approved MicroRNAs regulate gene expression during differentiation and development. The production of microRNAs requires the precise cleavage by DROSHA and DICER1 and loading onto AGO proteins. This process is frequently aided by an array of cofactors. Recent results indicate that impairments in microRNA biogenesis and decay can play a central role in child tumors. Examples of the pediatric tumors with strong involvement of miRNA biogenesis enzymes include: Wilms tumors, pleuropulmonary blastoma and Sertoli-Leydig cell tumors. This project will aim to get an understanding of the role/s of miRNA dysregulation more broadly in pediatric tumors and how that can be exploited as a therapeutic target. Here, we aim to reveal how dysfunctions in the microRNA pathway contribute to tumorigenesis. Using bioinformatic analysis combined with experimental approaches, we will dissect gene malfunctions and downstream effectors. We predict that many mutations in microRNA-related genes have an impact on their metabolism, and that defects in microRNA metabolism in tumoral cells may arise from the dysregulation of RNA-binding proteins. The knowledge gained with this project may pave the way to novel therapeutic targets. MicroRNAs regulate gene expression during differentiation and development. The production of microRNAs requires the precise cleavage by DROSHA and DICER1 and loading onto AGO proteins. This process is frequently aided by an array of cofactors. Different studies have indicated that impairments in the miRNA biogenesis pathway can play a central role in child tumors. Missense mutations on the RNaseIII domains of DROSHA are known drivers of Wilms tumors. Similarly, missense mutations on the RNaseIII domains of DICER1 are characteristic of a pediatric disorder, known as DICER1 syndrome, predisposes individuals to development of both benign and malignant neoplasms. The hallmark tumors of the DICER1 syndrome are pleuropulmonary blastoma and Sertoli-Leydig cell tumor. Some of the other DICER1-related tumors include cystic nephroma, anaplastic renal sarcoma, Wilms tumor, thyroid carcinoma, gynandroblastoma, ciliary body medulloepithelioma, embryonal rhabdomyosarcoma and primary brain tumors such as pineoblastoma and pituitary blastoma. Moreover, somatic hotspot mutations leading to miRNA biogenesis defects have been reported in adult tumors. This project will aim to get an understanding of the role/s of miRNA dysregulation in pediatric tumors and how that can be exploited as a therapeutic target. ### STUDY DESIGN Compare the effects on miRNA/RNA profiles from mutations (from domains of interest) found in patients samples to control samples (normal samples, and patients samples with synonymous mutations). Identify novel variants with clinical relevance involved in miRNA metabolism dysregulation (Hatton et al. 2023) ### ANALYSIS PLAN 1. Identify patients samples bearing missense mutations from non-controlled data (https://gdc.cancer.gov/). 2. Identify patients samples bearing missense mutations from controlled data using novel tools (eg. DeepVariant: Poplin et al. Nat. Biotech. 2018). 3. Identify tumoral samples with signatures suggesting miRNA metabolism dysregulation. 4. Analyze microRNA processing from controlled miRNA-seq file using miRNA tools (eg: QuagmiR, Bofill-De Ros et al. Bioinformatics. 2018). 5. Downstream analysis of microRNA metabolism using RStudio. 6. Meta-analysis of microRNA dysregulations and impact on transcriptome using non-controlled RNA-seq datasets. ### REFERENCES Hatton JN, Frone MN, …, Bofill-De Ros X, et al. “Specifications of the ACMG/AMP variant curation guidelines for analysis of germline DICER1 variants”. Human Mutation (in press) Bolotin, Eugene CELL DESIGN LABS, INC. Identification of CAR-T antigens for pediatric Acute Myeloid Leukemia Mar26, 2018 closed Recent successes in CAR-T therapy are extremely promising for their sustained improvement in cancer prognosis and outright cures. However, the technology is limited to a target antigen that might be shared in normal cell types. Through comparing the cell AML dataset with normal cell type datasets from GEO and BLUEPRINT, we plan to identify novel antigens to use in immunotherapy. We have been involved in developing novel T-cell chimeric antigen receptor (CAR-T) therapies for Acute Myeloid Leukemia in pediatric population. Unlike most current methods, our approach involves a novel CAR-T system “switch”, which only functions upon administration of a drug. We believe that this approach will enable T-cell therapy to be more effective with reduced side effects. Our system potentially allows us to extend the treatment to cancers which might not have unique antigens such as AML. To this end, we plan to explore the following RNA-seq datasets: phs000218 (subset phs000465) TARGET: Acute Myeloid Leukemia (AML) (AML pediatric patients) phs000413 Analysis of Somatic Mutations in Pediatric AML FAB-M7 Subtype by Whole Transcriptome Sequencing phs000549 Clonal Evolution of Pre-Leukemic Hematopoietic Stem Cells Precedes Human Acute Myeloid Leukemia (6) We plan to combine and process RNA expression from this dataset and other publicly available normal blood cell lineage datasets such as BLUEPRINT (http://dcc.blueprint-epigenome.eu), and studies available on GEO a consistent fashion, by aligning the reads to the human genome and then counting their occurrence per genetic feature. We plan to utilize cloud computing for our processing pipelines. We plan to further verify potential antigens using independent proteomics datasets such as ProteomicsDB (processed separately), and validate using laboratory methods. We do not believe that this kind of analysis poses any risk to the participants. We hope that these discoveries will lead to rapid application of our technology in the clinic. Bortoluzzi, Stefania UNIVERSITY OF PADOVA Study of circular RNA in pediatric T-ALL Apr26, 2024 approved Circular RNAs (circRNAs) are attracting great interest in cancer research due to the emerging evidence about their direct involvement in oncogenic mechanisms, to their potential as biomarkers, and in relation to the development of RNA-based cancer therapies. In T-cell acute lymphoblastic leukemia (T-ALL), an aggressive type of hematological cancer, data about massive circRNA dysregulation emerged recently, but the knowledge about circRNA roles are still scanty. Our study aim to advance the knowledge about circRNA contribution to disease relapse and chemoresistance and seek to identify circRNA biomarkers of clinical significance, contributing to improve patient stratification and opening the way to innovative therapies in the future. CircRNAs are versatile regulators of cell biological activities with different mechanisms and they can contribute to leukemogenesis. In T-cell Acute Lymphoblastic Leukemia (T-ALL) we recently unearthed the marked circRNA dysregulation in T-ALL at diagnosis and started to define circRNA signatures of molecular genetic subgroups. Now, our objectives are: Deepen our understanding of the biological bases of circRNA dysregulation in T-ALL, and of the links between alterations of splicing factors expression, aberrant RNA splicing and circRNA biogenesis; Clarify the impact of circRNA expression variation on leukemia cell phenotype, to discover new disease factors with prognostic value; Disclose circRNA expression variation along T-ALL evolution, to examine if changes in circRNA expression can be related to relapse and/or resistance to therapy. To these aims, we request access to the NIH datasets, dbGaP ID phs000464 ‘TARGET: Acute Lymphoblastic Leukemia (ALL) Expansion Phase 2’ and dbGaP IDs phs001513 ‘Mechanisms of Chemotherapy Resistance in T-ALL’. All data will be stored on an isolated secure server. T-ALL dataset will be possibly combined in an integrated study. RNA-seq data will be used to quantify circular and linear transcript expression and to correlate them with socio-demography and clinical information of patients and genetic aberrancies detectable from transcriptomics data and analyzed in comparison with developing thymocytes as normal counterpart. Findings will be validated in an independent pediatric T-ALL cohort and could incite experimental studies on circRNA function in T-ALL. The proposed research will allow to identify new biomarkers that could be used to improve the stratification of pediatric T-ALL patients, to discover new mechanisms contributing to disease development, relapse and resistance and to open the way to new therapies for this childhood cancer. No cloud computing is envisaged. The IT Director will ensure that all the NIH security best practices for Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy and all the institution's IT security requirements and polices are followed precisely. Boutros, Paul UNIVERSITY OF CALIFORNIA LOS ANGELES The Influence of Hypoxia on Cancer Mar11, 2020 approved The levels of oxygen vary from cancer to cancer, and those cancers without a lot of oxygen tend to be much more aggressive. These studies will attempt to understand how and why this happens by comparing tumours with differing amounts of oxygen within and between cancer types. A subset of cancers develop and/or progress in the presence of low levels of molecular oxygen. This is called hypoxia. These hypoxic tumours typically show more aggressive clinical behaviour. Increasingly drugs exist to target hypoxic tumours. We will apply existing signatures to predict the hypoxic status of individual tumours, and evaluate how their clinical, genomic, epigenomic, transcriptomic and proteomic features correlate with the inferred level of molecular oxygen. These studies will help us understand how hypoxia varies across cancer types, whether it is present in pediatric cancers, and if it differs between adult and pediatric cancers. This will allow us to determine why these tumours evolve to be more aggressive, what their consequences are for other molecular characteristics of the tumour and how hypoxic tumours might be best treated. We will apply standard statistical methodologies for these analyses to both individual features and to engineered ones (e.g. the total mutation load). This work will inform our ability to diagnose, prognose and treat hypoxic tumours across the lifespan. Boutros, Paul UNIVERSITY OF CALIFORNIA LOS ANGELES Mitochondrial Variation in Pediatric Cancer Feb26, 2020 approved Human cells contain two genomes: the nuclear genome is the one we are most familiar with because it is much larger and more prominent. But the second genome, called the mitochondrial genome, contains a few genes that are essential to cancer evolution. We will evaluate whether mitochondrial mutations in different cancer types, and if these give insight into how to treat or predict clinical and molecular features of the cancer. The mitochondria is the second genome of mammalian cells. It is small (~16 kbp), but it can still be mutated in cancer, both with changes in the number of copies and with individual mutations. Specific mitochondrial mutations may be targetable, and in some cancer types it is known that specific mitochondrial mutations preferentially occur in the context of specific nuclear ones. We will perform mitochondrial mutation and copy number detection in pediatric cancers using established pipelines (PMID: 29378198 and 28939825), and then determine how these features are correlated to clinical, epidemiologic, genomic, epigenomic and transcriptomic features to evaluate if they can serve as novel targets or biomarkers, or give insight into the evolution and management of pediatric cancers. Boutros, Paul UNIVERSITY OF CALIFORNIA LOS ANGELES Evolutionary Features of Pediatric Cancers Feb26, 2020 approved It is not well-understand why some tumours evolve more rapidly than others. We will use computational techniques to infer the type and rate of evolution for each cancer, and then look at correlations to make predictions about why different tumours evolve in different ways, leading to different responses to therapy. Tumours typically start from a single founder cell, which sequentially acquires mutations that provide it a competitive advantage that ultimately leads to the oncologic phenotypes at diagnosis. By analyzing DNA sequencing data, we can infer some of the properties of this evolution, such as the relative ordering of individual mutations or mutational processes, how specific driver mutations influence evolutionary rate & complexity, and whether specific clinical or epidemiologic phenotypes influence evolutionary processes. We will perform tumour subclonal reconstruction and identification of local hypermutational events using established techniques (e.g. PMIDs: 29681457 and 31919445), and compare the results to clinical, epidemiologic, genomic, epigenomic and transcriptomic features of the tumour. Boutros, Paul UNIVERSITY OF CALIFORNIA LOS ANGELES Sex Differences in Pediatric Tumours Feb26, 2020 approved We want to understand if pediatric tumours arising in male and female children differ. This might give us insight into ways to improve patient management by personalizing care on the basis of a patient's biological sex (i.e. their chromosomes at birth). The clinical and molecular features of cancer vary based on the sex of the individual in which they arise. These effects have been extensively quantified in adult cancers, but it is not clear whether sex differences have similar influences on pediatric tumours, and if so how these influence driver genes and potential biomarkers and drug targets. We will evaluate the differences in genomics, epigenomics and transcriptomics between pediatric cancers arising in men and those arising in women, controlling for clinico-epidemiologic factors. We will fit multivariable models to individual features as well as to engineered ones (e.g. the total number of mutations) in each sample, using methods previously developed and published for adult cancers (e.g. PMID: 30275052). Boyes, Joan UNIVERSITY OF LEEDS Investigating the Role of the Aberrant V(D)J Recombination Reaction, Cut-and-Run in Pediatric Cancer Jan27, 2021 approved Our immune system generates millions of antibodies every day to fight a vast range of infections. To encode such huge numbers of antibodies, gene segments are mixed and matched by breaking and re-joining DNA. During this process, a piece of DNA is “kicked out” of the genome as a by-product. We showed that this by-product associates with enzymes that can cut DNA and generates breaks in white blood cell DNA. We named this reaction “cut-and-run” and showed there is a very strong link between the breaks caused by cut-and-run and the mutations present in leukaemia patients. More recently, we found that the by-product is present in some patients many years after it was first formed. This is truly unexpected and suggests that cut-and-run is much more dangerous than initially thought. Since the presence of this by-product in ALL patients could allow cut-and-run to continually cause mutations, more difficult to treat clones and/or relapse may result. We request access to the TARGET data to establish if this by-product can be used as a prognostic marker to inform childhood ALL treatments. We recently described a new mechanism, named “Cut-and-Run”, by which V(D)J recombination errors cause genome instability. Specifically, we found that the V(D)J recombination by-product, the excised signal circle (ESC), in complex with RAG1/2 causes off-target breaks in the genome of developing lymphocytes, and that these breaks correlate strongly with those found in ETV6/RUNX1+ acute lymphoblastic leukaemia (ALL) (Mol Cell. 74:584-597). Importantly, we also found that the ESC persists in ALLs for much longer than expected. The objective of the proposed research is to determine if the presence of, and/or level of ESCs can be used to predict disease outcome in childhood ALL. We request access to WGS, WXS and RNA-seq data from the ALL TARGET study to analyse (a) what proportion of ALLs contain ESCs, (b) if particular genomic mutations correlate with the presence of the ESC and (c) if the presence of the ESC can predict disease outcome and relapse. We will analyse the sequencing data from all ALLs and then separate this with respect to specific ALL subtypes, as characterised in the TARGET study. Specifically, we will utilise the WGS data to determine if the ESC is present, using established in-house Python scripts that we have previously used on this type of data: ESCs result from excision and joining of the DNA between the gene segments undergoing V(D)J recombination; the presence of ESCs in paired end NGS data can be readily filtered out based on the orientation and distance between the two reads. Furthermore, we will use WXS and RNA-seq data to determine the structural variations present in each case and will compare this with the presence of ESCs from the WGS data. Overall, we will determine if the presence of the ESC is associated with a worse disease outcome, using the patient outcomes provided by the TARGET clinical data. The proposed research is consistent with the data use limitations for the requested dataset as cut-and-run has only been linked to pediatric cancers. Furthermore, it aims to determine if the ESC can be used prognostic marker for childhood ALL disease severity and thus whether it could inform more targeted, improved patient treatments. Bradley, Robert FRED HUTCHINSON CANCER RESEARCH CENTER Dysregulated RNA processing in pediatric cancers Aug07, 2019 approved Most pediatric cancers are relatively poorly studied relative to their counterparts that occur in adults. We plan to search for specific molecular alterations that characteristically occur in pediatric cancer, with a goal of identifying new means of informing patient prognosis and identifying new treatments. We seek to identify dysregulated RNA processing in pediatric cancers of potential therapeutic relevance. Work by my lab and others has revealed that dysregulated RNA processing, originating from somatically acquired genetic alterations as well as unknown changes in the post-transcriptional environment, is a direct contributor and even driver of many adult cancers. It is therefore reasonable to hypothesize that dysregulated RNA processing similarly contributes to the pathogenesis of pediatric cancers, albeit likely in a distinct manner. This hypothesis is supported by observations including the presence of focal alterations affecting key RNA processing genes such as MBNL1 that specifically occur in pediatric, but not adult, AML (reported by Bolouri et al, 2017 in the landmark paper describing the TARGET pediatric AML cohort). We seek to identify specific pediatric cancer types that are characterized by dysregulated RNA processing, search for transcriptional and post-transcriptional signatures of additional alterations that might be common in pediatric cancers (e.g., in addition to MBNL1 loss) test whether dysregulated RNA processing is a consequence of pediatric cancer-specific genetic alterations (e.g., affecting MBNL1 in pediatric AML), and determine whether dysregulated RNA processing can be used for both prognostic stratification and therapeutic purposes. We will search for genetic alterations affecting genes encoding RNA splicing factors and components of the RNA degradation machineries, search for characteristic transcriptional and post-transcriptional changes indicative of dysfunctional RNA splicing and/or degradation, and identify mechanistic links between genetic alterations and post-transcription dysregulation. We do not plan to combine the requested datasets from dbGaP with other datasets outside of dbGaP. Brady, Samuel ST. JUDE CHILDREN'S RESEARCH HOSPITAL Gene expression profiles in pediatric cancers Oct12, 2023 approved We have found that cell lines derived from malignant rhabdoid tumors (MRT) can be divided into those with and without a unique gene expression profile. MRT cell lines with this unique gene expression profile are dependent on certain genes for survival, and may be responsive to specific drugs as a result. However, cell lines and patient tumors may differ. We plan to use phs000218.v22.p8 to determine the frequency of the unique gene expression profile (which we identified in cell lines) in actual patient MRT samples. This may help identify patients who will be more responsive to specific drugs. Further, we would like to use data from phs000218.v22.p8 to study DNA mutations and gene expression patterns associated with important disease features in childhood cancer, in order to identify possible future treatment approaches for pediatric cancer. We have performed gene expression analysis of publicly available Cancer Cell Line Encyclopedia (CCLE) data for malignant rhabdoid tumor (MRT) cell lines, and identified a distinct gene expression profile in a subset of MRT cell lines which is associated with a specific dependency profile in Dependency Map (DepMap) data from these cell lines. However, it is unclear how frequent this gene expression profile is represented in patient MRT tumors. Therefore, we are requesting phs000218.v22.p8 (which includes phs000470.v19.p8), since this dataset contains RNA-Seq data from patient MRT samples. Using this dataset, we will perform analysis to determine the frequency, in patient MRT samples, of the gene expression profile which we identified in cell lines. This may help to identify a subset of MRT patients who are susceptible to drugs which can target the effects of this unique gene expression profile. We are also mechanistically studying the effects of MRT cell lines bearing the gene expression profile. Additionally, we will use data from phs000218.v22.p8 to study pediatric cancer gene expression and DNA mutation profiles which are associated with various disease states (such as poor survival, disease stage, or cancer subtype) and relapse (using the subset of samples in this dataset which are from relapsed cancers). This may help to identify new treatment approaches for pediatric cancer. BRAUN, BENJAMIN UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Impact of TP53 mutations in pediatric leukemia Apr13, 2023 closed This project centers on the potential role of a gene called TP53 in pediatric leukemia's. This gene is associated with resistance to therapy in many types of cancer, but not all. We will test the hypothesis that mutations that inactivate TP53 predict relapse in pediatric leukemia. We will also be looking at the patterns of gene expression in leukemia cells to determine whether this is a more effective way to identify cases with impairment of the TP53 gene. TP53 mutations indicate an adverse prognosis in many types of cancer, including hematopoietic malignancies. However, this phenomenon has primarily been investigated in the context of adult cancers. Furthermore, the impact of TP53 mutations can vary significantly from one type of cancer to the next. We will specifically interrogate the role of TP53 in pediatric leukemias in this project. We will examine genotype information to identify cases with mutations disrupting the TP53 gene, and we will interrogate RNA sequencing data to derive gene expression signatures that correlate with TP53 status. We will ask whether a genotype or gene expression correlate with clinical outcome. We hypothesize that TP53 mutation, or TP53 dysfunction as indicated by gene expression signature, will correlate with treatment failure. Brorsson, Caroline QLUCORE AB Testing solution for classifying childhood ALL and AML into a number of subgroups based on gene expression Mar24, 2022 approved RNA sequencing data can be used to successfully classify Acute lymphoblastic leukemia (ALL) and Acute myeloid leukemia (AML) into distinct subtypes based on gene expression, with implications for diagnosis, prognosis, and therapeutic strategies. However, no tool is currently available for clinical use. Qlucore has developed software solutions that classify ALL into a number of distinct subtypes based on gene expression signatures from RNA sequencing data. Work is now ongoing to develop a model for AML. Data from TARGET will be used to verify these subtypes and will hopefully lead to accelerated implementation of this classification in the clinical setting. Acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) are characterized by several genetic alterations of great clinical diagnostic importance, as recognized by the current WHO classification. The great majority of ALLs and AMLs are characterized by gene fusions or display a distinct gene expression signature that can be used for classification purposes. Several gene fusions can be targeted with specific drugs with improved outcomes for patients or used for risk stratification, making it critical to detect such alterations at the diagnostic workup. In this context, RNA-sequencing is becoming an increasingly important diagnostic tool as it can detect gene fusions in an unbiased manner and aid the subtype classification based on distinct gene expression signatures. Qlucore has developed software solutions including machine learning models for classifying childhood ALL into a number of subtypes based on gene expression. The current version of the models classify data into six subtypes from RNA sequencing data: High hyperdiploidy, ETV6::RUNX1 or ETV6::RUNX1-like, KMT2A(MLL)-rearranged, TCF3::PBX1, BCR::ABL1 or BCR::ABL1-like and DUX4-rearranged. Work is now ongoing to develop a model for classifying for AML as well. The gene expression signature classification is paralleled with visualizations of gene fusions detected from the RNA-sequencing data enabling, for example, distinguishing BCR::ABL1 cases from BCR::ABL1-like cases based on identified gene fusions. We propose to analyze the RNA sequencing data from the TARGET ALL and AML datasets to verify the classification in an independent cohort. Specific tasks will be to generate BAM-files aligned to the reference genome using STAR aligner and gene fusion files from gene fusion callers (STAR-Fusion, Arriba and FusionCatcher). The results will hopefully lead to accelerated implementation of Qlucore’s solution in the clinic with improved diagnostic, prognostic and therapeutic stratification of childhood ALL and AML. This analysis will not pose any risks to participants and will be consistent with Use Restrictions for the requested datasets. We have no plan to combine the requested datasets with any other dataset outside of dbGaP, however, classification results from the samples will be combined with results from other samples for classifier performance validation. Bruck, Jehoshua CALIFORNIA INSTITUTE OF TECHNOLOGY Estimating Risk of pediatric cancer using mutation profiles in repeated regions. Sep05, 2018 closed Repeat regions comprise more than 50\5 of the DNA. In this study, we will be using these regions to infer evolution properties of the DNA that can be used for predicting cancer risk in children. Designing a computational test to predict cancer risk is inexpensive and also saves a child from going through heavy screenings and imaging based test which are not only costly but can also be intimidating at a very young age. We want to anlayse the evolution characteristics of the genome of children using the TARGET database. For this purpose, we will be extracting out the tandem repeat regions from the DNA of these children to infer evolution properties. We will be aggregating the mutation rate and duplication number of each tandem repeat regions. In this process, we want to identify if at all there is a signal in their healthy DNA that makes them different from children that are healthy and if there is a signal, identify the regions that carry information about this signal. We will be using the blood derived normal and the solid tissue normal DNA for different pediatric cancers in our analyses. We will first estimate the mutation rate and duplication number in each tandem repeat region and use these values as features to create classifiers for different pediatric cancers using machine learning methods. We also want to analyze how these values differ from the same features in the adult cancer patients to get insights about the evolution trajectory that the genome could possibly take. Buchser, William WASHINGTON UNIVERSITY Mapping the Immune Landscape of Pediatric Cancers Nov07, 2017 closed The diversity and interactions among immune cells in a tumor is directly linked to disease progression, patient survival, and relapse probability. We plan to rigorously analyze available TARGET datasets to map the landscape of the heterogeneous immune microenvironment across eight pediatric cancer types and its association with mutation burden, cancer progression, and clinical outcome. We will use robust statistical methods to perform this analysis to detect molecular features of the immune microenvironment of pediatric tumors. Insights from our project will contribute towards our understanding of the role of immune infiltration in the development of pediatric cancer thereby also aiding future efforts to develop more effective diagnostic and therapeutic tools. Childhood cancer remains the leading cause of death for children in developed countries. With the advent of immunotherapy as a potential treatment for childhood cancers, it is critical to first understand immune cell interactions in the tumor microenvironment. The TARGET database contains multi-dimensional datasets that will enable systematic quantitative genomic approaches to decipher the interactions in the tumor microenvironment, specifically the immune microenvironment, in a large number of patients, furthering our understanding of patient immune responses to their tumors. Because TARGET provides a large number of tumor samples from a great breadth of studies, our conclusions will be reliable and generalizable. No study has performed a pan-cancer analysis to profile immune infiltration of pediatric cancers, but doing so will highlight the diverse associations with outcome related to various cell subsets. The main goals of our study is to use TARGET RNA sequencing and whole-genome sequencing datasets and existing statistical analysis methods to 1. Develop a unique immune signature for each pediatric cancer 2. Construct a co-infiltration network to reveal interactions between different types of cells in the tumor microenvironment 3. Study how mutation burden may associate with immunogenic differences between tumors 4. Identify new molecular subtypes of cancer based on their immune-specific gene expression profiles and 5. Investigate how inter-tumor heterogeneity of immune infiltration is linked to clinical outcomes, cancer genetic variations, and upregulated cancer pathways. Identifying these genomic features of tumors will shed insight on promising predictive and prognostic biomarkers for pediatric disease. Interactions between cells of the tumor microenvironment is being recognized as an important new factor in cancer development and progression. This project aims to improve our understanding of pediatric tumor immunology and heterogeneity, and thereby lead to a wealth of information that will guide the development of new therapeutics for pediatric cancers that prolong life and reduce long term side effects. Burdach, Stefan TECHNICAL UNIVERSITY OF MUNICH BRCAness can indicate tumors susceptible to effective treatment with PARP inhibitors in osteosarcoma patients Oct22, 2020 expired Osteosarcoma is a deadly tumor developing in bones of children and adolescents. Certain tumor cells cannot maintain their genome integrity since the genes responsible for this process are damaged due to epigenetic and genomic alterations of tumor cells. Nevertheless, tumor cell can switch to auxiliary mechanism to repair its genome. However, certain drugs can block this secondary mechanism, either. When all pathways of genome repair are blocked, tumor cells enter a path of programmed cell death, i.e. the sequence of events leading to benign elimination of tumor cells without damage of surrounding tissues and organs. We will integrate TARGET-OS genomic, epigenetic, and expression datasets and will analyze the difference between tumors with and without genome instability characteristics. If the genome instability is high enough, the therapy with drug blocking repair of tumor genome can be suggested. In the course of the study, we will advance our knowledge in mechanism maintaining tumor cell genome integrity and how to disrupt it in efficient manner. Potentially, this study will substantially improve treatment efficacy of osteosarcoma since drug therapy will be tailored to the specific state of the tumor in a patient. Osteosarcoma (OS) is the most common primary malignant bone tumour. BRCAness is a phenotypical trait in tumors with defect in homologous recombination repair (HRR) resembling tumors with inactivation of BRCA1/2 rendering these tumors sensitive to poly (ADP)-ribose polymerase inhibitors (PARPi). Recently, it was shown that OS exhibits molecular features of BRCAness. The ultimate goal of this study is to identify BRCAness biomarkers in OS to predict whether treatment with PARP inhibitors will be effective in OS patient. Also, our goal is to advance our understanding of BRCAness mechanisms in OS. This will aid clinical decision making on administering PARPi in OS patients. In the course of this study, dbGaP TARGET-OS datasets will be used for identification of mutations, copy number alterations, epigenetic and gene expression changes in the genes participating in HRR pathway in order to characterize BRCAness in OS. Initially, BRCAness in OS samples will be identified with already established genomic approaches, e.g. based on genomic "scars" and mutational signatures. This will allow us to assign OS samples to BRCAness-positive and BRCAness-negative groups, and rank samples along the whole spectrum of the BRCAness. With machine learning algorithms we discriminate between BRCAness-positive and BRCAness-negative OS classes. Genes in HRR pathway will be evaluated on gene expression and (epi-) genomic levels in order to find new unambiguously quantifiable markers of the BRCAness in OS. We will evaluate the clinical relevance of the generated BRCAness markers in relation to PARP inhibition. Potentially, this study will substantially improve treatment efficacy of OS with PARPi since this therapy will be tailored to the specific characteristics of the OS in a patient. During the analysis, we will combine the dbGaP TARGET-OS dataset with pediatric OS dataset obtained in the course of INFORM "INdividualized Therapy FOr Relapsed Malignancies in Childhood" project (Heidelberg, Germany). The combination of two pediatric datasets will not bring any additional risk for participants since both datasets are controlled datasets with the same level of protection. We will fully comply with the data usage terms and limitations set by NIH for dbGaP TARGET project. Bylesjo, Max FIOS GENOMICS, LTD Identification of specific genetic alterations in Neuroblastoma patients for a novel treatment Aug12, 2021 closed There is currently a deficit of therapies to treat Neuroblastoma cancer sufferers, where the survival rate of high-risk patients is less than 50%. Impact Therapeutics Inc. have developed a compound treatment that previous studies indicate would benefit individuals with a subtype of Neuroblastoma. Therefore, verifying the genetic profile of sufferers of this subtype would enable studies to validate this targeted treatment. Data from the TARGET study will be used to characterize genetic profiles of individuals suffering from this subtype of Neuroblastoma cancer by Fios Genomics. The output of this study may then be used to identify patients for recruitment in future clinical studies. Despite the improved therapeutics strategies, the survival rates of high-risk Neuroblastoma patients are still less than 50%. The need to develop a more effective and specific therapy is urgent for the treatment of stage 4 Neuroblastoma patients. Impact Therapeutics Inc., have taken an integrated approach to identify signalling pathways that are correlated with a compound response outcome of Neuroblastoma tumor cells. Their data indicate that a group of children with type 4 Neuroblastoma, identified via gene expression profiling, may respond well to the compound. The goal of this study is to utilize TARGET WES data of 110 Neuroblastoma tumor samples with matched blood samples to characterize significantly altered genomic profiles in those patients. The data of 110 WES samples will be analysed securely by Fios Genomics. Fios Genomics will apply their combined expertise in statistics, computational biology, bioinformatics and genomics to identify the differences in genomic variants between two group of Neuroblastoma patients. Data QC will first performed to remove any poor samples. Then identified copy number variants will be accessed for their association with patient groups. The results would then be used to identify children with type 4 Neuroblastoma who may benefit from this novel treatment. This analysis will not create any risks to patients and will be consistent with Use Restrictions for the requested datasets. We have no plan to combine the requested dataset with any other dataset outside of dbGaP. Cai, Jun BEIJING INSTITUTE OF GENOMICS Discover and validation of Novel Genetic Risk Scoring in Pediatric T cell Acute Lymphoblastic Leukemia Jun15, 2023 expired Acute lymphoblastic leukemia (ALL) is characterized by several genetic alterations of great clinical diagnostic importance, as recognized by the current WHO classification. The T-ALLs is characterized by gene fusions or display a distinct gene expression signature that can be used for classification purposes. In this context, RNA-sequencing is becoming an increasingly important diagnostic tool as it can detect gene fusions in an unbiased manner and aid the subtype classification based on distinct gene expression signatures. We propose to analyze the RNA sequencing data from the TARGET ALL datasets to establish the classification model. Specific tasks will be to generate BAM-files aligned to the reference genome using STAR aligner and gene fusion files from gene fusion callers (STAR-Fusion, Arriba and FusionCatcher). We plan to identify a T-ALL expression pattern associated to treatment outcome by means of unsupervised clustering methods and aim to interrogate T-ALL patients transcriptome for the discovery of gene signatures associated to therapy resistance and relapse. T cell acute lymphoblastic leukemia is a common malignant disease of childhood with heterogenous genetic background. However, few genetic markers with prognostic implication have been found. Specific genetic events, aberrant gene expression are very frequent in this patient group, affecting most of the patients. The treatment of T-ALL is guided by distinct risk factors, affected by several biological parameters, most importantly the underlying genetics of the disease. Therefore risk assessment is crucial for treatment strategy. Previous studies have developed genetic risk scoring methods, which assess only low number of genes important in this disease. Our research group developed RNA sequencing in a large cohort of pediatric T-ALL combined with clinical and prognostic data. With the requested TARGET database, we would have the opportunity to establish the prognostic model on TARGET dataset and validate our results on an independent dataset. In case of its successful testing, it could further improve the individual risk assessment and personalized treatment of pediatric patients diagnosed with T-ALL. CALIFANO, ANDREA COLUMBIA UNIVERSITY HEALTH SCIENCES TARGET Pediatric Neuroblastoma Analysis May28, 2010 approved Recent advances in the integration of computational and experimental methods for the understanding of cancer biology are creating unique opportunities to improve our ability to identify biomarkers for early diagnosis and prevention, therapeutic targets that are highly specific to a cancer type, and compounds that inhibit these targets. Investigators at the Herbert Irving Comprehensive Cancer Center have pioneered these type of approaches an implemented a completely integrated computational-experimental approach to the study of cancer called Cancer Systems Biology. In this project, we will assemble molecular interaction networks for pediatric neuroblastoma and use cancer systems biology tools to identify and prioritize genetic/epigenetic alterations, candidate therapeutic targets, and biomarkers for this disease. Our project will apply the investigators’ expertise in reverse engineering of cell regulatory networks and in their interrogation to (a) construct a fully integrated interactome for pediatric neuroblastoma, using both the TARGET data and additional data available through Dr. John Maris' laboratory at the Children's Hospital of Philadelphia, (b) use the pediatric interactome to integrate genetic, epigenetic, and functional assays towards the elucidation of molecular mechanisms associated both with disease progression and poor-prognosis (c) interrogate the interactome to identify candidate therapeutic targets, and (d) disseminate the algorithms, models, software tools/workflows, and integrated datasets to the research community using our established, caBIG® compliant geWorkbench integrative analysis platform. This work will be done in collaboration with Dr. John Maris at the Children's Hospital of Philadelphia who is an expert in pediatric neuroblastoma. The substantial experience of Dr. Maris' laboratory in pediatric neuroblastoma genetics and GWAS studies and our computational and systems biology expertise are perfectly complementary. This work is funded under a Grand Opportunity grant (CTD2) and one of the caBIG In Silico Research Center of Excellence (ISRCE). CALIFANO, ANDREA COLUMBIA UNIVERSITY HEALTH SCIENCES Targeting Master Regulator Dependencies in Pediatric Osteosarcoma Mouse Models Mar10, 2021 approved Outcomes in pediatric osteosarcoma (OS) remain poor and new treatment approaches are necessary. A novel Cancer Systems Biology approach identifies Master Regulator (MR) proteins that are necessary for tumor growth and predicts and prioritizes treatments that reverse their effects. This study will apply such an approach to evaluate new treatment strategies in pediatric OS. We will use patient and patient-derived mouse model gene expression data, including this dataset, to describe the landscape of MR proteins in pediatric OS, predict targeted therapies, and evaluate their efficacy and mechanisms of drug resistance and action in the appropriate disease models. These findings will support a longitudinal study using sequencing data from the iCat2 precision oncology study to evaluate the added benefit of a MR-based treatment prioritization approach in identifying new treatment opportunities for pediatric patients with OS. Additionally, the efficacy of treatments predicted by the MR approach will be evaluated in patient-derived cell lines and animal models. This research focuses on advancing drug development in pediatric OS, thus access to this and other pediatric cancer genomic data is essential to meet our objectives. Outcomes in pediatric osteosarcoma (OS) remain poor. Conventional molecularly targeted therapy approaches are limited due to lack of actionable somatic mutations and significant tumor heterogeneity. Our lab has developed an alternative precision oncology approach, using systems biology methodologies to identify and target critical tumor dependencies – master regulator (MR) proteins. MR proteins are rarely mutated but integrate the effects of many genetic alterations making them ideal therapeutic targets in cancers like OS. This project will apply MR-based therapy prioritization to evaluate new therapeutic strategies in pediatric OS. Utilizing pediatric OS gene expression datasets along with computational algorithms pioneered by our lab – ARACNe and VIPER – we will build a genome-wide regulatory network map of OS from which MR protein activity can be inferred from tumor gene expression profiles. Using a CLIA-certified test, OncoTreat, we will prioritize drugs for further in vitro and in vivo testing by their ability to revert MR protein activity and shut down the tumor’s transcriptional program. We will assess preclinical models for similarity in MR profiles and drug predictions to patient samples with the goal of selecting optimal models for evaluation of drug efficacy as well as mechanisms of drug resistance and action in conjunction with study collaborators. We will use this dataset along with pediatric OS patient and PDX gene expression data from TARGET (phs000218.v22.p8), the Osteosarcoma Genomics Study (phs000699.v1.p1), and collaborators (CUMC, UCSF, MSKCC, DFCI) to characterize the pediatric OS regulatory network, describe the landscape of MRs in pediatric OS, and identify novel targeted therapies. This work can support novel biologically-relevant patient classification and risk stratification based on unique MR protein signatures in OS, as well as rapid prioritization of drugs and drug combinations for in vivo testing that can benefit large groups of patients despite their genetic heterogeneity. While we will be collaborating with other researchers who will perform in vivo validation of our findings in PDX models as well as prospectively in patients with OS enrolled on the GAIN consortium iCat2 study, we will only be sharing the results of our analysis of the dbGaP data rather than sharing the data itself. Only our lab at Columbia, in collaboration with Drs. Darrell Yamashiro and Jovana Pavisic from the Division of Pediatric Hematology, Oncology and Stem Cell Transplantation at Columbia will have access to and will be analyzing the dataset requested. As our research focuses on advancing drug development in pediatric OS, access to this and other pediatric cancer genomic data is essential to meet these objectives. CALIFANO, ANDREA COLUMBIA UNIVERSITY HEALTH SCIENCES Targeting Master Regulator Dependencies in Pediatric Osteosarcoma Nov04, 2020 approved Outcomes in pediatric osteosarcoma (OS) remain poor and new treatment approaches are necessary. A novel Cancer Systems Biology approach identifies Master Regulator (MR) proteins that are necessary for tumor growth and predicts and prioritizes treatments that reverse their effects. This study will apply such an approach to evaluate new treatment strategies in pediatric OS. We will use the Osteosarcoma Genomic project sequencing data to begin to describe the landscape of MR proteins in pediatric OS. These findings will support a longitudinal study using sequencing data from the iCat2 precision oncology study to evaluate the added benefit of a MR-based treatment prioritization approach in identifying new treatment opportunities for pediatric patients with OS. Additionally, the efficacy of treatments predicted by the MR approach will be evaluated in patient-derived cell lines and animal models. This research focuses on advancing drug development in pediatric OS, thus access to this and other pediatric cancer genomic data is essential to meet our objectives. Outcomes in pediatric osteosarcoma (OS) remain poor. Conventional molecularly targeted therapy approaches are limited due to lack of actionable somatic mutations and significant tumor heterogeneity. Our lab has developed an alternative precision oncology approach, using systems biology methodologies to identify and target critical tumor dependencies – master regulator (MR) proteins. MR proteins are rarely mutated but integrate the effects of many genetic alterations making them ideal therapeutic targets in cancers like OS. This project will apply MR-based therapy prioritization to evaluate new therapeutic strategies in pediatric OS. Utilizing pediatric OS gene expression datasets along with computational algorithms pioneered by our lab – ARACNe and VIPER – we will build a genome-wide regulatory network map of OS from which MR protein activity can be inferred from tumor gene expression profiles. Using a CLIA-certified test, OncoTreat, we will prioritize drugs for further in vitro and in vivo testing by their ability to revert MR protein activity and shut down the tumor’s transcriptional program. We will use the Osteosarcoma Genomics RNAseq dataset along with OS gene expression data from collaborating institutions (CUMC, UCSF, MSKCC) as well as TARGET to characterize the pediatric OS regulatory network, describe the landscape of MRs in pediatric OS, and identify novel targeted therapies. This work can support novel biologically-relevant patient classification and risk stratification based on unique MR protein signatures in OS, as well as rapid prioritization of drugs and drug combinations for in vivo testing that can benefit large groups of patients despite their genetic heterogeneity. While we will be collaborating with other researchers who will perform in vivo validation of our findings using OS patient-derived xenograft models as well as prospectively in patients with OS enrolled on the GAIN consortium iCat2 study, we will only be sharing the results of our analysis of the dbGaP data rather than sharing the data itself. Only our lab at Columbia, in collaboration with Dr. Darrell Yamashiro at Columbia, a pediatric oncologist with expertise in OS who will provide clinical guidance to our analysis, will have access to and will be analyzing the Osteosarcoma Genomics dataset requested. As our research focuses on advancing drug development in pediatric OS, access to this and other pediatric cancer genomic data is essential to meet these objectives. Cam, Margaret Co NIH NCI Immuno-Oncology Data Commons Aug08, 2023 approved The immuno-oncology data commons (IODC) at intramural NCI provides harmonized multi-omics datasets to share, access, and perform computations on research data. The controlled-access data from TCGA or other datasets will be preprocessed using our standard pipeline and the final processed data will be shared across the data commons members for various collaborative projects studying factors that regulate the immune system in the context of the tumor microenvironment and in its relation to other diseases. The goal of the immuno-oncology data commons (IODC) at intramural NCI is to create an ecosystem that allows researchers the ability to share, access, and perform computations on research data, including publicly available large-scale biomedical data using readily available methods. The research interests of members of the IODC relate to the study of immune cell regulation in different disease settings and more specifically in the context of cancer. A broad range of questions to be tested include: the study of basic functions and phenotypes of immune cells, the search for tumor-intrinsic and extrinsic factors associated with or regulating immune cell function within the tumor microenvironment and novel predictors for patient response or resistance to immunotherapy. Many patient datasets, including that from the Cancer Genome Atlas (TCGA), will allow us to perform integrative analysis of the data in combination with other publicly available cancer datasets. To be able to optimally perform comparative and meta-analysis across multiple datasets (including locally generated) in the data commons, raw (Level 1) data will need to be reprocessed using a standard set of methods and properly filtered according to specific quality control metrics. At the CCR (Center for Cancer Research) Collaborative Bioinformatics Resource at intramural NCI, we have developed data analysis pipelines for several data types: exome-seq, RNA-Seq, and ChIP-Seq (https://ccbr.github.io/Pipeliner/), which we intend to apply across all genomics data to be archived for sharing in the data commons. Thus, we will have fully harmonized multi-omics datasets which are shared among our collaborators at the IODC. To ensure patient data confidentiality, our collaborators will only have access to processed (Level 3) data. Further provisions for access to Level 1 data to any of our IODC collaborators will require an additional request for approval from dbGaP. There will be proper attribution to the original authors on publications materializing from the use of this dataset. For this particular dataset (Count Me In (CMI): The Angiosarcoma (ASC) Project (CMI-ASCproject), phs001931), we will be looking at allelic expression of paternally and maternally imprinted genes to see if their expression correlates with disease progression. For this, we will need to have access to raw sequencing data. Campbell, Catarina BROAD INSTITUTE, INC. Cancer Dependency Map: comparing cell models and tumors Jun27, 2024 approved For cancer patients to receive the most promising therapies, physicians need a complete roadmap for which tumors are susceptible to which drugs. Some of this information comes from clinical trials. In our project, we are taking a more systematic approach. We are working with cell models-bits of patient tumors that grow in the laboratory. We are testing each cell line to create a map of all of the “dependencies” of human cancer and we are sharing our results broadly, prior to scientific publication. To interpret the molecular information of the cell models, we need to be able to determine which aspects of their genetic makeup come from tumor features (somatic changes) and which are present in all cells in the patient's body (germline). We also need to understand how cancer cell compare to tumors in patients, and how new types of tumor models, such as those generated by the HCMI project are similar or different to existing cell lines. Our proposal will use these data to help improve our ability to discover cancer targets that have minimal toxicity The ability to predict vulnerabilities given the molecular features of a patient’s tumor is central to operationalizing cancer precision medicine. While sequencing of patient tumors is increasingly common, researchers, clinicians, and drug developers currently lack the ability to identify which somatically altered genes and variants are required for tumor survival and/or confer a requirement for other genes (synthetic lethality). Our proposed “Cancer Dependency Map” research project directly addresses this challenge. This effort, currently encompasses over 1,000 genomically annotated cancer cell lines and organoid models, over 800 genome-wide CRISPR/Cas9 viability screens, and large scale drug repurposing screens totaling over 1,000,000 data points. In addition, we have created a wide range of computational algorithms to discover dependencies and to infer them from molecular features. For the aspect of this study relevant to this dbGaP application, we plan to use GTEx genomic data for several purposes relevant to our study of the genomics of cancer models. For instance, we plan to use whole genome data from GTEx to construct a panel of normals (PoN) to enhance our ability to filter out false positive somatic mutation calls from cancer cell lines that do not have matched normals. We do not plan on combining data with other datasets and will not release data. So we do not expect any additional risks to GTEx participants. We will use data from TCGA to benchmark algorithms for identifying mutations in cancer cell lines. We will use data from TCGA and TARGET to compare the genomic landscape between tumors and cell lines. In this analysis, we aim to compare molecular subtype distributions between the datasets to determine where DepMap is sufficiently powered and where additional cell line models are needed in both adult and pediatric cancers. Our analysis of molecular subtypes will focus on those with relevance to cancer outcome primarily somatic mutations, mutational signatures, and gene expression classifications. We will also use data from TCGA and TARGET to examine the frequency in tumors of biomarkers that our models have identified as predictive of dependency in cancer cell lines, thus allowing us to identify potential patient populations for identified targets. Analyses including data from TARGET will be in the context of informing pediatric cancer cell line dependency, where data from adult cancers is insufficient and does not represent the disease types observed in pediatric cancer, and representation to pediatric tumors. We will not release raw data from these datasets, and our analysis will focus on somatic mutations, thus should not have any additional risks to participants. In terms of our proposed use of HCMI genomics data, we will be analyzing such data to compare how emerging patient-derived organoids and others models retain (or do not retain) genomics features of primary tumors. We will be comparing such data to existing cell line data within the Cancer Dependency Map dataset and also to primary tumor data as part of the TCGA dataset. We do plan on combining this data with other datasets (such as data emerging from patient models developed at the Broad). But we will not release raw data from combined datasets, and our Broad genomics data also will be deposited into DbGaP as part of a distinct project. So we do not expect any additional risks to HCMI participants. Campbell, Peter SANGER INSTITUTE Temporal sequences of mutations in childhood tumours Aug14, 2018 closed Cancers are often caused by multiple mutations. The ordering of these in childhood cancer is unknown. In this project we will use cutting edge analysis techniques to define the genetic “time code” of childhood cancer. In this project, we propose to use cancer sequencing data to identify driver mutations and determine the temporal order of their acquisition. Advances in analysis techniques of cancer genomes has enabled the ordering of mutations. These have recently been applied to on a large scale adult cancer, but the precise ordering of mutations in childhood cancer remains unknown. While part of this work could be carried out on public mutation calls available, evaluation of the original sequencing data is frequently needed to distinguish genuine recurrent mutations from residual artifacts or population polymorphisms. We would also like to evaluate the impact of running other somatic variant callers, and in particular those developed by our group, in the study of selection in cancer. The intended use of the data is purely academic and all findings will be published in peer-reviewed journals. Capasso, Mario UNIVERSITY OF NAPLES FEDERICO II Identification of low frequency and rare coding and non-coding variants associated to Neuroblastoma development Jan03, 2018 approved NB is an aggressive pediatric tumor. Its genetics is very complex and a deep knowledge of molecular mechanisms that drive disease is determinant to better elucidate pathogenesis and identify new therapies. We hypothesized that hereditable DNA variations, not yet identified, can influences NB tumorigenesis. WGS, WES and RNA-Seq data of TARGET-NB will allow us to better understand how CV and NCV could contribute to NB cancer-risk. We will test this hypothesis by identifying low, rare and very rare variants in CPG. The outcomes deriving from this project will provide useful knowledge for genetic NB and will allow us to define innovative preventive, diagnostic and therapeutic approaches of NB children based on personal genetic background. We will strictly use these datasets for childhood cancer study and will not analyze it with other datasets outside of dbGaP. Germline ALK/PHOX2B mutations account for the majority of familial Neuroblastoma (NB). Additional NB predisposition loci have been suggested by GWAS in sporadic NB. However, these variants still account a small portion of NB susceptibility. Aim of this project is to identify undetected low (MAF 0.01-0.05), rare (MAF<0.01), and very rare (MAF=0.00001) germline DNA variations in cancer predisposition genes (CPGs) associated with NB risk. We request access to TARGET-NB data to analyze “whole genome and exome sequencing” (WGS, WES) of NB cases and 157 NB RNAseq. We will identify pathogenetic risk of coding and non-coding variants (CV and NCV) in CPGs. CVs will be given a deleteriousness score calculated as the proportion of prediction tools calling a particular variant as damaging or probably damaging. NCVs will be selected by their regulatory potential inferred by DNase I hypersensitivity mapping and chromatin immunoprecipitation sequencing (ChIP-seq) of H3K4me3, H3K4me1 and of H3K27ac in NB cells to identify variants in active promoters or enhancers regions; the “top NCVs” located in regulatory elements will be analysed in NB RNAseq and matched variation data obtained by WGS of germline DNA. These data will be used to perform allele specific expression analysis of NCV (in LD with coding variant). Given the promising response of DNA repair mutated genes to PARP inhibitors, we will explore if CPGs carrier variations render cells susceptible to homologous recombination deficiency in NB and if they become eligible for exploring the efficacy of PARP inhibitors. The results of this project will allow us to identify potential genetic pathways with a great implication in NB pathogenesis and in opening up the way to develop attractive targets for drug design. We plan to use the TARGET NB data for neuroblastoma research only; these data will be used with all necessary precaution and we will strictly adhere to all instructions/guidelines issued by TARGET database. All results will be made freely available to the scientific community. Carter, Hannah UNIVERSITY OF CALIFORNIA SAN DIEGO Characterizing the role of genetic interactions for predisposition to childhood cancers Nov20, 2017 approved We are requesting access to childhood cancer datasets with exome sequencing data to investigate interactions between the inherited genome and childhood tumors. This study builds on our previous findings of such interactions in adult tumors, however due to significant differences between childhood and adulthood cancers, we expect that pediatric tumor-specific interactions will exist. We plan to use our findings to inform risk stratification to identify children at high risk of developing tumors. This research has the potential to improve early diagnosis and effective treatment by identifying cancer predispositions in children. Cancer is associated with the accumulation of a series of oncogenic genomic variations over time. In childhood cancers, this process is expedited by the presence of inherited genetic risk variants. We have previously developed computational approaches to study interactions between the inherited genome and the tumor genome using DNA sequencing data. These approaches were used to identified novel risk loci in adult cancers. The objective of this proposal is to characterize interactions between the germline and tumor genomes that contribute to development and progression of pediatric tumors, which are expected to differ from those found in adulthood disease. These interactions will provide new insights into pediatric cancer risk and progression and may support development of child-specific biomarkers to identify children at risk of developing tumors. The accession numbers of the requested cohorts are: phs000463.v15.p7, phs000464.v15.p7, phs000465.v15.p7, phs000466.v15.p7, phs000467.v15.p7, phs000468.v15.p7, phs000470.v15.p7, phs000471.v15.p7, phs001165.v1.p1, phs000352.v1.p1, phs000543.v1.p1, phs000504.v2.p2, phs000804.v1.p1, phs000768.v2.p1, phs000235.v13.p2, phs000868.v1.p1, phs000508.v2.p1, phs001054.v1.p1, phs001072.v1.p1, phs000720.v2.p1, phs001026.v1.p1, phs001327.v1.p1, phs001228.v1.p1, phs001437.v1.p1, phs001800.v1.p1, phs001683.v1.p1, phs000900.v1.p1, phs001052.v1.p1, phs001436.v1.p1, phs001738.v1.p1, phs001714.v1.p1, phs000218.v18.p7 and phs000159.v10.p5. Use of any samples designated for disease-specific research will be limited to analysis of the relevant disease type. All research findings from this study will be shared with the public and broader research community through peer reviewed publication. CASTELO, ROBERT POMPEU FABRA UNIVERSITY Prediction of therapeutic vulnerabilities and resistance from tumor transcriptomes Jan23, 2018 closed Tumors have been characterized through their mutational profiles, which are informative for prognosis and therapy. However, do not present the same mutation signatures and actionable alterations as in adult tumors, hence new strategies for determination of therapeutic vulnerability and resistance are necessary. We propose to decompose the molecular profiles in terms of similar patterns from multiple cell lines that have been farmacologically characterized. This work will allow to determine potential therapeutic vulnerabilities and drug resistances across the entire spectrum of child leukemia's, and will contribute towards the improvement of prognosis and clinical management of patients using non-invasive RNA-based biomarkers in blood cancers. The reversion and targeting of cancer-specific splicing patterns provide effective therapeutic strategies. Additionally, splicing modulating drugs are effective in tumors with splicing factor mutations as well as in MYC-driven tumors in the absence of these mutations and acquired drug resistance has been linked to splicing alterations. All this indicates that there may be general splicing signatures that are predictive of therapeutic response and drug resistance. These are specifically relevant for pediatric tumors, which do not present the same mutation signatures and actionable alterations as in adult tumors. We plan to use multiple pharmacologically characterized cell lines to predict therapeutic vulnerabilities of pdiatric tumor samples through sample decomposition using splicing isoforms. Sample decomposition methods allow to deconvolute the signal in terms of individual signals. In this case, we will attempt to decompose the expression and splicing profiles from patient samples into the signals from multiple cell lines. As these cell lines are pharacologically characterized (we will use data from the GDSC project http://www.cancerrxgene.org/), we will determine whether the tumor contains a sensitive or resistant molecular profile to each of the characterized compounds. This will allow use to propose therapeutic vulnerabilities and resistance markers to individual tumors. Most of leukemia patients develop drug resistance during treatment. Immune therapy holds new promise for cure, but is only addressed to a subset of B-cell ALL patients and effectiveness is limited. Splicing factor mutations in hematological diseases has spawned intense efforts to find drugs targeting these factors. Although these mutations do not appear in all blood cancers, similar splicing signatures may occur due to alterations in chromatin factors, which are frequent in leukemia, or due to MYC amplification. Our approach will allow to determine potential therapeutic vulnerabilities and drug resistances across the entire spectrum of child leukemia's. This work could improve the prognosis and clinical management of patients using non-invasive RNA-based biomarkers in blood cancer Cervera, Jose RESEARCH FOUNDATION OF/HOSPITAL LA FE SCREENING AND VALIDATION OF MUTATIONS IN NON-CODING REGIONS AND IN GENES OF THE SPLICEOSOME IN ACUTE MYELOBLASTIC LEUKEMIA (AML) May08, 2019 approved Myeloid Neoplasms (MNs), including Acute Myeloid Leukemia (AML), are the most frequent blood tumor in adults and one of the most common in childhood. To date, the study of mutations is focused on exons, which are mainly the protein-coding regions of the genes. We aim to investigate also mutations in introns, the “non-coding” regions. Although these have been associated with several diseases, including cancer, they are still unexplored in MNs. Mutations in introns mainly lead to errors in splicing (i.e., among the cellular processes responsible for the correct translation of genetic information into proteins). Likewise, we will study mutations within spliceosome genes, which encode the machinery performing the splicing. The sequences of exons plus introns of 57 genes will be sequenced (by NGS technology) in 200 patients. The identified mutations will be analyzed by bioinformatic predictions and experimentally by RNA-Sequencing and cell-line models. Data obtained will be correlated with clinical and biological phenotypes. The effect of each mutation will be studied regarding its prognostic, diagnostic, and therapeutic potential both in pediatric and adults with AML or Myelodysplastic Syndromes because marked differences between pediatric and adult patients have been reported. For the study of the molecular basis of myeloid neoplasms (e.g. Acute Myeloid Leukemia, AML, and Myelodysplastic Syndromes, MDS), mutations within non-coding regions are widely unexplored. Such variations could lead to transcriptional and splicing alterations. Additionally, despite their potential as therapeutic targets, the role of mutations affecting components of the spliceosome has not been deeply explored in pediatric and adult cohorts. This methodological work will advance the genetic knowledge on the role of germline and somatic coding and non-coding, as well as spliceosome mutations in pediatric and adult patients. We will study these two cohorts due to the marked differences reported. The study is designed to analyze the entire genomic sequences of 57 genes relevant for myeloid neoplasms, including those of the spliceosome in a targeted NGS approach in 200 MDS and AML patients. The effect of the putative mutations will be evaluated in-silico to prioritize variants, and validated by RNA sequencing and in cell-line models to evaluate their effect on leukemic progression. For the analysis, we request access to the TARGET-AML (Blood) and TCGA-LAML (Bone marrow) datasets, among others. Since the understanding of mutations in non-coding regions, as well as their frequency, is still limited, access to larger databases is appropriate to derive consistent and robust clinical associations for rare mutations and confirm results obtained in the study. The genomic data obtained will be correlated with clinical and biological phenotypes. Each identified mutation will be evaluated regarding its diagnostic, prognostic, and therapeutic potential both in pediatric and adult AML and MDS cohorts, due to the expected differences. Our dataset will be analyzed separately. Requested datasets will only be used as controls during methodological development, for validation purposes and data comparison. Data use limitations will be strictly observed. Access to requested databases will be restricted to the soliciting group. Results obtained from the analysis of this data will be shared with the scientific community (e.g. publications, congress). The project may be increased to 700 samples, obtained from the PETHEMA collaboration (Spanish AML protocols coordinated by our group) as part of a network of collaborations, both at the national and international level (e.g. GIMEMA, AMLSG, HOVON, European LeukemiaNet, CETLAM, GETH, SEHOP, and GESMD). Chanock, Stephen J NIH Pursuit of Genetic susceptibility to cancer Mar06, 2014 approved This project is intended to examine the genetic variation in the human genome to determine if specific variants are associated with risk for cancer. We will look at cases and controls and compare the distribution of variants to determine if there are at risk alleles. We continue to look at SNP arrays and sequence data to determine if detectable clonal mosiacism involves smaller fragments of chromosomes in normal tissue, as a marker of genomic instability and possible risk for complex diseases. New studies have shown genetic mosaicism in germline DNA in large GWAS studies. We are currently analyzing germline blood, normal adjacent tissue and tumor DNA for both SNP arrays and next generation sequencing to accomplish the following: 1. determine frequency of mosaic events of large size (> 2 Mb), develop methods for detection of large events with NGS, develop methods for detecting smaller events in blood and normal tissue. we have begun to develop methods for identifying mosaicism in exome data and need to continue this critical methodological work. We are examining the germline susceptibility to cancer separately for adult and pediatric cancers (restricting TARGET to only pediatric cancer). as described above, we are looking at the TARGET data in relation to published GWAS data sets. Chavez, Lukas UNIVERSITY OF CALIFORNIA, SAN DIEGO Molecular characterization of brain tumors. May03, 2022 closed We are analyzing the genomic mutations that lead to brain tumors in children. We have previously performed a comprehensive analysis of genetic alterations in a pan-cancer cohort including 961 tumours from children, adolescents, and young adults, comprising 24 distinct molecular types of cancer (Groebner et al., Nature 2018 https://www.ncbi.nlm.nih.gov/pubmed/29489754). Our results have shown that childhood cancers differ substantially from adult tumors in terms of molecular alterations and mutational landscapes. We have now developed new and improved tools for the analysis of whole genome sequencing data. We aim to perform a similar pediatric pan-cancer analysis focused on the discovery of novel structural variants and extrachromosomal circular DNA (ecDNA). Thus, we need to analyze pediatric tumors, because adult diseases will not be able to inform us about the frequency and diversity of ecDNA in the different types of childhood cancers. Identifying a pediatric tumor as being driven by ecDNA will improve diagnostics and has the potential to nominate these patients for treatment with drugs that interfere with pathways required for ecDNA formation and propagation. Chen, Junren INSTITUTE OF HEMATOLOGY AND BLOOD DISEASE HOSPITAL, CHINESE ACADEMY OF MEDICAL SCIENCES Validation of risk features predisposing to relapse in pediatric acute lymphoblastic leukemia by using TARGET ALL dataset Apr06, 2023 approved Re-occurrence of tumor after remission can happen in pediatric acute lymphoblastic leukemia (ALL), even in patients that have been classified as low-risk at diagnosis. Identification of risk genetic features predisposing to relapse is important in instituting a proactive treatment strategy for patients with high relapse risk. Our primary objective is to compare mutational landscapes across populations and validate candidate relapse-associated risk signatures for pediatric low-risk ALL in independent cohorts. Our findings based on the analyses undertaken in this project will be published in a scientific journal to benefit the broader ALL research community. Acute lymphoblastic leukemia (ALL) is the most common malignancy in children. Patients with clinically defined low-risk can also relapse after remission which often resulting in poor outcomes. We have investigated the genomics landscape of Chinese pediatric low-risk ALL over one hundred patients through exosome sequencing, transcriptome sequencing, and SNP array of their matched primary, remission, and relapsed samples and have identified candidate signatures composed of somatic genetic variants and phenotypic characteristics that predispose to relapse. We would like to compare the mutational landscapes of U.S. and Chinese cohorts and also conduct external validation of the relapse-associated risk signatures and key genetic mutations that were originally discovered in the Chinese cohort; the data of pediatric ALL patients enrolled in the TARGET program would serve as an invaluable independent cohort. We kindly request access to the genetic mutation data (based on WGS/WXS), RNA-expression data, and clinical characteristics including relapse and survival outcomes of the ALL cases of the TARGET-ALL-P2 project. We plan to i) use the obtained somatic mutation data to compare the mutational landscape of Chinese pediatric ALL patients who classified in low-risk group with the TARGET ALL cohort, ii) combine the somatic mutations data as well as clinical manifestations such as treatment regimen, minimal residual disease, and patients outcomes for integrated analysis to verify the relapse-associated risk signatures and investigate associations with chemotherapy response, and iii) verify the key somatic genetic variants predisposing to relapse and investigate the corresponding alterations of RNA expression level. We anticipate our study to generate an algorithm for risk assessments of pediatric ALL patients with respect to relapse and treatment resistance. The Principal Investigator (PI) is permitted to conduct independent research, and the requested TARGET data will be solely utilized by the PI’s research group. We may combine the dbGaP dataset with the dataset generated at the PI’s research institute (Institute of Hematology and Blood Disease Hospital, Chinese Academy of Medical Sciences, IHCAMS) for integrated analyses, but not with 3rd-party data. Our use of TARGET data will not create any additional risks to study participants. The obtained dataset will be stored securely and used appropriately and solely for the research project described in the Research Use Statement, subject to any use limitations that required by the administrative office of dbGaP. Later, when we write up our research results and submit for publication, we will appropriately acknowledge our use of the requested dbGaP data in our manuscript; in addition, we will deposit the multi-omics data generated at IHCAMS in a publicly accessible database at the National Genomics Data Center of China. Chen, Kenneth UT SOUTHWESTERN MEDICAL CENTER Genomic determinants of cancer May19, 2022 approved We are interested in a special class of cancers that are caused by changes in how genes are regulated. Specifically, the regulatory molecules we are interested in are known as microRNAs. These molecules normally "turn off" certain genes, but some cancers lose the ability to make these molecules. While most cells would not be able to survive this, somehow certain cancer cells thrive in these conditions. In this project, we will investigate how the loss of microRNA production causes childhood cancer. We are interested in understanding how cancer driving mutations in microRNA processing genes (such as DICER1, DROSHA, and DGCR8) cause cancer. These mutations largely arise in pediatric developmental cancers, such as Wilms tumor, rhabdomyosarcoma, pleuropulmonary blastoma, pineoblastoma. These tumors appear to arise from embryonic progenitors that have lost the ability to differentiate, but continue to proliferate outside the embryonic environment. It is thought that microRNAs are used for differentiation, and mutations in microRNA processing genes may impair differentiation. However, it is unknown how loss of these differentiation microRNAs affects proliferation or other cornerstones of cancer. We will identify tumors with mutations or copy number changes in these genes, and we will examine how these changes affect gene expression. We expect that such mutations will cause de-repression of microRNA target genes, and we will examine which microRNA target genes are most affected. Thus, we will primarily use the pediatric TARGET to understand these gene expression differences, and we will use adult TCGA data as a control dataset. In addition, we will analyze tumor sequencing data collected by the CCDI MCI (molecular characterization initiative) to identify the frequency of targetable mutations in tumors with DROSHA/DICER1 mutations. Specifically, we will examine tumors associated with DICER1 and DROSHA mutations, collected by MCI as "rare tumors" (e.g., Sertoli-Leydig cell tumors, PPB, thyroid cancer, etc) or CNS tumors (pineoblastoma). CHEN, LI FUDAN UNIVERSITY Cis-eQTL Target Genes Associated with P-AML Patient Survival Dec16, 2021 approved Analysis of expression quantitative trait loci (eQTL) provides a means for detecting transcriptional regulatory relationships at a genome-wide scale. The project will evaluate and compare the eQTL landscape in bone marrow of pediatric AML patients from China and TARGET program. The project is aimed at identifying the mechanism (SNP-mediated modulation of RNAs expression) underlying the pathology of P-AML. Expression quantitative trait loci (eQTL) studies illuminate the genetics of gene expression and researchers performed mapping of expression quantitative trait loci (eQTLs) with WGS and RNA-seq had found that lead eQTL variants called with WGS were more likely to be causal (Nat Genet 49, 1747–1751 (2017)). We are using whole-genome sequencing (WGS) and transcriptomes from 45 pediatric individuals with Acute Myeloid Leukemia (P-AML) to describe the eQTL landscape in bone marrow of pediatric AML patients from China. Here, we would like to request access to BAM files from whole transcriptome (RNA-Seq) and whole genome (WGS) sequencing data of Acute Myeloid Leukemia (AML) samples from TARGET cancer program. These datasets contain overlapping samples that we will define and use for the following aims: Aim1: To perform cis-eQTL analysis in pediatric AML samples from TARGET and further compare the identified results with the finding in our lab to identify the population specificity of significant eQTLs for P-AML. Aim2: The detected and validated eQTLs from Aim (1) will be used for colocalization analysis of GWAS of P-AML and eQTL Signals, in order to detect target genes for P-AML. These loci will be further explored for P-AML association analysis by using blood samples collected from 200 P-AML individuals and matched controls. Chen, Xiang ST. JUDE CHILDREN'S RESEARCH HOSPITAL Genomic Analysis of Bilateral Wilms Tumor Aug18, 2017 approved We are performing a comprehensive genomic characterization of patients who develop Wilms tumor, the most common kidney cancer of childhood, in both the right and left kidneys at the same time (bilateral Wilms tumor). We are requesting access to and established set of data from patients with Wilms tumor to support or explore our findings further. Wilms tumor is the most common kidney cancer of childhood and 5% of patients present with synchronous bilateral disease. Bilateral disease is strongly suggestive of an underlying genetic or epigenetic predisposition to tumor formation. We are conducting a genomic analysis of a cohort of 24 patients with bilateral Wilms tumor which will include somatic DNA and RNA from both kidney tumors and all independent tumor loci. It will also include germline DNA from each patient. We plan to perform whole exome sequencing, SNP array, methylation studies, and RNA seq on the somatic DNA and RNA. We plan to perform whole exome sequencing, SNP array, and methylation studies on the germline DNA. Our hypothesis is that genetic or epigenetic variants will underlie bilateral WT predisposition or development and will also elucidate the association between structural birth defects and bilateral WT. We expect these variants to be related to early kidney development, common among distinct tumor loci, and potentially present in the germline. As a corollary to this hypothesis, we also expect that bilateral WT will have unique secondary oncogenic driver mutations or epigenetic variants that differ for independent tumor loci. We are requesting access to all data contained in dbGaP from patients with a diagnosis of Wilms tumor who were part of the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) study. We are requesting access to both the discovery and validation cohorts in TARGET. We plan to utilize the TARGET data as a validation cohort to determine the frequency of any genetic or epigenetic variants discovered in our bilateral WT cohort and possible associations with changes in gene expression. At this time, we do not plan to combine the TARGET data with additional established data, but do plan to use it as a validation set. Chen, Xin ZHEJIANG UNIVERSITY Integrating TARGET neuroblastoma data to compare genomic alterations of neuroblastoma across ethnicity population Jul28, 2016 closed Neuroblastoma (NBL) is the 4th common cause of death in pediatric cancer and is a significant health burden. Improving treatment outcome depends on a better understanding of the cancer-causing changes in genetic materials. We hypothesize that population-specific genetic differences could be associated with cancer development in addition to other non-biological components such as environment, social-economical status, etc. This project aims to investigate whether background genetic differences exist across ethnic groups and the role of these differences plays in the development of NBL. The result of this project will help to understand the genetic basis of cancer development across specific populations, which may help us discover new biomarkers or therapeutic targets for specific populations. Previous research has shown that racial disparities exist in breast cancer and prostate cancer treatment and outcomes. We hypothesize that population-specific genetic differences may also be associated with neuroblastoma (NBL) disparities. This project will test this hypothesis. To perform this research, we request access to simple nucleotide variation data as well as whole genome and whole exome sequencing data (raw data and BAM files). As the majority of our requested datasets is from Caucasian, we will compare other NBL datasets of Chinese patients from our affiliated hospitals. We will identify genomic alterations for each dataset independently from the whole genome/exome sequencing data and cross-verify with simple nucleotide variation data. We expect to identify cancer-causing genes and also want to apply our computational method, GSLA, to locate cancer driver pathways for different populations. Principal component analysis will be performed for variances detected in different populations. By doing so, we hope to provide the NBL research community an in-depth understanding of the significance of genomic variations across human ethnic groups. In addition, we would like to evaluate whether transcriptional abundance is associated with stages of cancer development or its prognostics. A variety of computational tools will be used to quantify transcriptional abundances and compare their efficiencies in detecting the functionally significant genes. Therefore we also request access to the raw transcriptome sequencing data. In summary, through establishing a genetic panel of NBL across population, we may improve the diagnosis and prognosis of NBL based on patient's genetic background, may identify potential driver genes or pathways for drug discovery, and may provide clues for biomarkers and therapeutic targets discovery. All analyses will strictly adhere to all guidelines and restrictions outlined for use of these datasets. NBL is one of the most common cancers in children. We will strictly use these datasets for childhood cancer study and will not analyze it with adult data. Cheng, Lijun INDIANA UNIV-PURDUE UNIV AT INDIANAPOLIS Genomic Structure Variation Detection for Different Subtypes of Sarcomas Nov21, 2016 expired The research will compare the genomic mutation variation among different sarcoma cancer types in adults and children. The project will enhance our understanding for the genomic mutation variation in different subtype sarcoma patients. At the same time, it provides us an important potential drug recommendation for clinical sarcoma treatment by the genomic information. Research objectives: In order to compare the genomic variation among our pediatric sarcoma cancer cells, the public genome sequences from different childhood cancer research and adults’ sarcoma will be collected. We will analyze the whole-genome, whole-exome, and transcriptome sequencing to characterize the landscape of somatic alterations in tumor/normal pairs in different pediatric sarcomas subtypes and adults’ sarcoma. On the other hand, we will compare those public sequence data with our clinic tumors’ samples and combine with our clinic drug treatment data. The study design: we will focus on the genomics’ variation research for the pediatric sarcoma, especially the soft tissue sarcoma (Rhabdomyosarcoma), the bone sarcoma (Osteosarcoma), Clear Cell Sarcoma of the Kidney (CCSK) and the malignant soft tissue and bone sarcoma (Ewing's Sarcoma) subtypes. The four sarcoma subtypes from dbGAP will be collected where the dbGaP Study Accessions are phs000720.v1.p1 (Rhabdomyosarcoma); phs000768.v1.p1 (Ewing's Sarcoma), phs000466.v13.p6 (Clear Cell Sarcoma of the Kidney (CCSK)), and phs000468.v13.p6 (osteosarcoma). (To phs000466.v13.p6 and phs000468.v13.p6, they are from the project ‘National Cancer Institute (NCI) TARGET. The authorized data access provided by dbGaP Study Accession: phs000218.v16.p6); On the other hand, adults’ sarcoma are concerned. The Cancer Genome Atlas (TCGA) provides the adult soft tissue sarcoma sequence, the dbGaP Study Accession: phs000178.v9.p8. We will do the genomic variant calling systematically and detect genomic variation in children and adults, and different sarcoma subtypes on these sequence datasets. An analysis plan: A big data storage and analysis will be executed in our local powerful clusters’ computers. A general pipe line methods for base calling, sequence alignment/map and genomic variant calling will be studied for different datasets and test platform. The sequence mutation variance and gene expression profile in each of sarcoma subtypes will be obtained. A systematic biology methods for the drug targets’ selection will be detected. Cherniack, Andrew BROAD INSTITUTE, INC. Molecular analysis of CTSP Data in coordination with the Genome Data Analysis Network Jul11, 2023 approved As part of a collaborative effort of the NCI's Genome Data Analysis Network, we propose to use tumor genomic data from clinical trials to search for cancer-causing changes. We will combine this molecular data with other data sets to get more in-depth insights into cancer causes and drug efficacy. This data set will additionally help us to improve algorithms for cancer genome analysis. For our work with the HCMI AWG: The purpose of the HCMI AWG is to classify HCMI’s patient-derived next-generation cancer models into cancer subtypes, validate that the models preserve the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. HCMI AWG intends to use the TARGET dataset as a reference dataset to characterize the HCMI models and associated tumors from pediatric cases. The findings from the AWG would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology.” As part of NCI’s Genome Data Analysis Network (GDAN), we will analyze RNAseq, whole genome and whole exon sequencing data from the Cancer Genome Atlas (TCGA), Clinical Trial Sequencing Project (CTSP), ALCHEMIST and other NCI data sets in collaboration with all GDAN and CPTAC analysis working groups (AWGs). Our analyses will include, but is not limited to, algorithms developed by our group to identify somatic copy number alterations (SCNAs) and structural rearrangements that occur significantly more frequently than the background rate of alterations. We will also conduct analysis of chromosomal level alterations, aneuploidy, and tumor purity and ploidy. Molecular alterations which we identify will be correlated with those identified by other GDAN members and to clinical data. We will also use this data to develop and test new algorithms for analysis of somatic alterations. Data from projects will be compared to that that from CPTAC. Our analyses will be included in manuscripts written jointly by the GDAN AWGs. We are also working as part of CCG. “The Center for Cancer Genomics (CCG) is leading an analysis working group (AWG) for the Human Cancer Models Initiative (HCMI). The purpose of the HCMI AWG is to classify HCMI’s patient-derived next-generation cancer models into cancer subtypes, validate that the models retain the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. In order to map the HCMI tumors and models in cancer genetic taxonomy, the group will be using methods such as (1) Celligner algorithm to map tumors and models against reference datasets, and (2) OncoMatch approach to analyze regulatory networks and model fidelity, and (3) place each tumor and model in the subtypes classified by the Tumor Molecular Pathology (TMP) work group. The group will also be analyzing copy numbers, mutations, mutation signatures and structural variants from the Whole Genome Sequencing Data of HCMI tumors and models and compare them against those of the reference datasets: TARGET (for pediatric cases), and TCGA (for adult cases). We will also conduct analyses of the Cancer microbiome. The findings from the AWG would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology.” CHIANG, MARK UNIVERSITY OF MICHIGAN AT ANN ARBOR Genetic lesions in non-coding childhood ETP-ALL genome Sep21, 2023 approved Cancer cell DNA contains control centers called “enhancers” that turn on expression of cancer gene. This project looks for genetic changes in the DNA of a high-risk childhood leukemia subtype called pediatric ETP-ALL that make these enhancers stronger, thus driving cancer cell proliferation. The objective of the project is to identify duplications and somatic mutations in enhancer in childhood ETP-ALL patient samples. The hypothesis is that the whole genome sequencing datasets in phs000340.v3.p1 will reveal duplications and mutations in oncogenic enhancers. Bam files will be processed through existing pipelines at the University fo Michigan to identify amplification and somatic mutations within ETP-ALL enhancers. The enhancer sequences of childhood ETP-ALL tumor cells and normal cells will be compared to look for acquired genetic changes. This project will not evaluate phenotypic characteristics that differ from the primary focus on the original study. We do not plan to combine requested datasets with other datasets outside of dbGaP. CHINNAIYAN, ARUL UNIVERSITY OF MICHIGAN AT ANN ARBOR Comprehensive immunogenomic profiling of Hispanic pediatric cancer patients Jul03, 2019 approved Despite 80% overall cure rates for childhood cancer in the United States, minorities and patients with lower socioeconomic status are significantly more likely to die of cancer. Though the immune environment of a cancer is known to have significant effects on cancer survival, the relationship between genetic factors driving cancer, the immune characteristics of the cancer and being Hispanic have been poorly studied. This study will investigate the differences in the genetic and immune profiles of Hispanic and non-Hispanic children with cancer and their contributions to death by cancer. Despite 80% overall cure rates for pediatric cancer in the United States, minorities and patients with lower socioeconomic status have reduced survival. A 2018 SEER study demonstrated significantly decreased survival in Black and Hispanic children in multiple cancer types including ALL and neuroblastoma that was partially for by low socioeconomic status. Despite the wealth of genomic data in pediatric cancers, very few of these have both outcomes data and information and minority status. Importantly, earlier studies have not carefully investigated the association between genetic drivers and immunology signatures despite the known contribution of immune system in driving treatment response and overall survival. The purpose of this research is to compare the immunogenomic profiles of Hispanic and non-Hispanic children in the TARGET cohort using the MiOncoseq and MI-ImmunoSeq pipelines developed at the Michigan Center for Translational Pathology. Our first aim will perform comprehensive genomic profiling (i.e. somatic mutations, insertion/deletions, fusions, etc.) through the MiOncoSeq clinical precision medicine pipeline. Our second aim will evaluate associations between novel genomic drivers/molecular signatures and survival between Hispanic and non-Hispanic populations. Our third aim will determine associations between transcriptomic immune cells signature (i.e. activation, suppression, proportions, etc.) and survival between Hispanic and non-Hispanic populations. Cho, Dong-Ho KOREA ADVANCED INST OF SCIENCE & TECH Genome-Scale Pediatric AML Diagnostics Marker Based on Both Inherent and Acquired SNPs, Insertion, deletion and repetitive elements Characteristics Dec18, 2017 closed The aim of this study is to develop liquid biopsy-based biomarkers for pediatric AML based on deep learning. Diagnostic systems based on existing genetic information tends to focus only on a portion of genetic information, such as genes that occupies about 1.2% of human genetic information, and also tend to require invasive techniques. However, we utilize non-invasive liquid biopsy-based biomarkers to understand the mutations of genetic information that is specific to diseases. In order to deal with large amounts of genetic information, we extract target range related to disease from using variational dropout and reduce the amount of required data significantly. By analyzing the extracted mutations, we investigate the optimal combination of multiple mutations that are effective in determining the presence or absence of disease. Also, learning and testing methods are adopted to ensure the stability of biomarker. We propose the extraction of genetic diagnosis marker based on combination of SNPs, Insertion, deletion and repetitive elements (REs) in human genome and the presence of certain proteins, cell-free DNAs (cfDNAs) and miRNAs which can be used as a useful biomarker for pediatric cancer diagnosis. Liquid biopsy, In the proposed project, blood-based liquid biopsy biomarkers are extracted for pediatric AML samples. Then, the set of mutations related to the disease are extracted. Then, the possibility of diagnosis of pediatric AML is confirmed using a combination of mutations. First, we divide the acquired pediatric AML samples into set of marker extraction samples and set of test samples. Then, liquid biopsy biomarkers are extracted using deep neural networks and variational dropout. In this study, we use variational dropout and Deep-learning algorithm to find combination of mutations related to pediatric AML in TARGET database as well as STAD, COAD and BRCA in TCGA database. And we are expecting to identify the significant diagnostic markers that determine each cancer’s development. We improved our existing researches to propose a new algorithm. Up to now, TCGA data and TARGET data have not been used simultaneously anywhere. That is, TCGA data and TARGET data were used independently in different researches. However, in our study, TCGA and TARGET data were used simultaneously to improve and verify the reliability and universality of our proposed algorithm. . Christoffels, Alan UNIVERSITY OF THE WESTERN CAPE Computational analysis of multi-omic data for the elucidation of molecular mechanisms of neuroblastoma Apr01, 2020 closed Survival rates of high risk (INSS stage 4) neuroblastoma patients as well as patients with relapsed and/or recurrent tumors can be improved by better understanding pathways and genes involved in its pathogenesis. It is apparent that the development of new treatment options is urgently needed at least for high risk cases. We are analysing multi-omic neuroblastoma datasets so that we can identify novel therapeutic targets. We will identify novel variants and genes that are drivers of neuroblastoma. The objective of our study is to identify genomic instabilities and mutations which are drivers of neuroblastoma. In previous research, we performed a differential gene expression analysis between two groups of TARGET neuroblastoma patients; short and long survival. We used RNA-Seq read count data downloaded from the Xena browser. We found 51 genes are differentially expressed. Applying a machine learning classifier algorithm on the top ten upregulated and top ten downregulated genes resulted in a creation of a machine learning model of 94% classification accuracy of short and long survival patients. The model was evaluated using 10-fold cross-validation. Therefore, the 51 identified genes can be considered as a genetic signature for short and long survival neuroblastoma patients. We intend to check for genomic mutations in the coding and non coding regions of these differentially expressed genes and genes nearby that may have enhanced or reduced their expression. We are, therefore, requesting access to the whole genome sequencing data of the TARGET neuroblastoma dataset. If given access, our institution, South African National Bioinformatics Institute at the University of the Western Cape has the capacity to store and analyze the data, as we have a well developed cluster with hundreds of terabytes storage capacity. We will perform a somatic with germline variant calling on the paired tumor/normal samples considering the overall survival time and mycn amplification as phenotypes. We expect to obtain indels and nonsense mutations, causing the gene regulation changes, and then assess their functional relevance. This research can only be conducted using pediatric data and we will not be combining this data with any other dataset. Our results will assist in developing prognostic markers which will assist clinicians in providing effective treatments to neuroblastoma patients. Chung, Yeun-Jun CATHOLIC UNIVERSITY OF KOREA Alternative Splicing Study in Wilms Tumor Aug25, 2023 expired Through this study, we anticipate reinterpreting the TARGET Wilms tumor data from the perspective of alternative splicing, enhancing our understanding of tumor development and treatment possibilities. The following outcomes are expected: - Visualization of alternative splicing events and identification of specific changes in Wilms tumors. - Interpretation of the relationship between alternative splicing and tumor biology. - Exploration of potential for developing novel therapeutic strategies. 1) Study design This proposal aims to utilize the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) Wilms tumor data to investigate the role of alternative splicing (AS) in the development of Wilms tumors. Wilms tumor is one of the most common kidney tumors in children, and understanding the impact of alternative splicing changes on tumor biology could contribute to the development of novel therapeutic strategies. 2) Analysis plan - Data Collection and Preprocessing: We will gather RNA-Seq data from TARGET Wilms tumor samples and extract information related to alternative splicing events. Rigorous preprocessing and normalization will be performed to ensure data consistency. - Alternative Splicing Event Analysis: Using RNA-Seq data, we will identify alternative splicing events within genes and analyze differences between tumor and non-tumor samples. This analysis will help uncover tumor-specific splicing events and assess their biological significance. - Association Analysis of Alternative Splicing: We will investigate whether specific alternative splicing events are associated with Wilms tumor development and progression. Correlations between gene expression levels and splicing patterns will be explored, and findings will be interpreted in the context of existing knowledge in tumor biology. - Clinical Research Comparison: We will compare splicing data in adult kidney cancers from TCGA or CPTAC datasets with Wilms tumors to understand differences and similarities. - Evaluation of Therapeutic Strategy Development: We will analyze how alterations in alternative splicing affect intracellular signaling pathways and drug responsiveness within Wilms tumors. Based on these insights, the potential for developing new therapeutic strategies will be evaluated, guiding future research directions. 3) Explanation of how the proposed research is consistent with the data use limitations for the requested dataset(s) Data from phs000218 will only be used in research consistent with the study's data use limitation. 4) Brief description of any planned collaboration with researchers at other institutions, including the name of the collaborator(s) and their institution(s). No external collaborations planned at this time. CLEARY, MICHAEL STANFORD UNIVERSITY Comparative genomic analysis of human and mouse ALL Mar06, 2014 closed We propose to analyze mutations in the specific genetic subtype of human ALL harboring t(1;19) chromosomal translocations as part of a comparative genomics study with mouse ALLs induced by the E2a-Pbx1 oncogene. Our group has developed genetically engineered mice that develop blood cancers very similar to a subtype of childhood blood cancer. These mice can be used for the discovery of novel biological features of pediatric blood cancers and for testing new compounds before they can be used with patients. To determine the genetic similarities between blood cancers derived from our genetically manipulated mice and childhood blood cancer, we propose to analyze and describe their common mutations. This information will be useful to develop novel more specific targeted therapies for childhood blood cancers. We propose to analyze mutations and gene expression in the specific genetic subtype of human ALL harboring t(1;19) chromosomal translocations as part of a comparative genomics study with mouse ALLs induced by the E2a-Pbx1 oncogene. Our goal is to determine if acquired secondary mutations that arise in a conditional E2a-Pbx1 mouse model of B cell precursor ALL are also present in human E2a-Pbx1 patients. These studies will define the genetic similarity of the mouse model and human leukemias, and establish whether the model is useful for discovering novel therapeutic targets for pediatric ALL patients, and how representative the model may be as a preclinical tool to study novel targeted small molecules in vivo. Chromosomal translocation t(1;19), which is present in 5-7% of pediatric ALL, fuses the transcription factor E2a (TCF3) with the homeobox gene PBX1. To investigate the cellular roles of E2a-Pbx1 in leukemogenesis, we have developed mouse strains that conditionally activate and express the E2a-Pbx1 fusion gene. Somatic activation of the oncogene is accomplished by Cre-recombinase expressed under the control of either specific B-lineage promoters or hematopoietic stem cell promoters. Consistent development of ALL with defined latencies suggests the need for secondary mutations. Whole exome sequencing of the mouse ALLs indicates a mean of 24 secondary Tier 1 mutations per ALL. We will analyze whole genome and exome sequencing data and RNA-Seq data of patient E2a-Pbx1 ALLs for SNPs/Indels using PICARD, SAMTOOLS and GATK tools, for functional annotations using SNPeff, and data visualization by IGV software. Tier 1 SNPs/Indels will be assessed for their presence in the mouse ALL whole exome sequencing data set, or for affects on common shared pathways. A potential limitation for the requested data set is that SNPs/Indels found to affect key target genes in pediatric ALL will not be able to be validated using the gold standard of Sanger sequencing. However, the literature describes a validation rate of next generation sequencing data by Sanger sequencing between 80-90 %, which is consistent with our experience in mouse E2a-Pbx1 ALLs. Cleton-Jansen, Anne-Marie LEIDEN UNIVERSITY MEDICAL CENTER Targeted therapies for osteosarcoma May13, 2020 expired Osteosarcoma is an aggressive bone tumor, mainly affecting young children and adolescents. In the last decades, the survival rates of osteosarcoma patients have not improved. Therefore new therapeutic options are urgently needed. We plan to use the available data of TARGET on osteosarcoma to investigate the presence of genetic alterations in known cancer-initiating pathways that can be targeted with drugs. This information could lead to the development of novel and more specific treatment options for osteosarcoma patients, possibly even with fewer side effects. Osteosarcoma is the most common high grade malignant tumor of the bone, mainly affecting young children and adolescents. The introduction of intensive chemotherapy treatment has increased the overall survival tremendously, however since than survival rates have not improved. Therefore new therapeutic options are urgently needed, especially for chemotherapy refractory metastatic disease. We and others have shown that the most commonly altered genes in osteosarcoma are part of either the P53 or RB1 pathway. We plan to use the TARGET dataset on pediatric osteosarcomas to investigate the presence of alterations in genes of the P53 or RB1 pathway, and correlate this with clinical outcome parameters. This could potentially lead to new, more specific therapeutic options for osteosarcoma patients. We will use the whole exome sequencing and whole genome sequencing data from the TARGET database and analyze copy number alterations, structural variants, and single nucleotide variants within genes of the aforementioned pathways. Our use of the TARGET data will give no additional risks to study participants and are in line with the Data Use limitations. COARFA, CRISTIAN BAYLOR COLLEGE OF MEDICINE Clinical evaluation of multi-omics signatures in pediatric cancers Jul13, 2018 expired Integrating multiple omics data types is crucial to achieve a comprehensive and wholistic evaluation of the clinical relevance of molecular signatures in pediatric malignancies. Our lab has experience in analyzing such multiple omics data, including gene expression, miRNAs, proteomics, chip-seq epigenomics, DNA methylation via arrays and sequencing, and the more recent developments in metabolomics and lipidomics. We have previously applied such techniques successfully in studies of neuroblastoma, acute myeloid leukemia, and osteosarcoma. We are currently generating complex signatures using several omics types, in pediatric cancers. We will evaluate the clinical significance of the multiple omics signatures that we will generate against the multiple omics profiles collected by TARGET data set. Assessing the potential clinical relevance of molecular signatures in pediatric malignancies is requiring the integration of multiple omics data types. Our lab is interested in evaluating multi-omics signatures of drugs, epigenetic regulators, and complex animal models, in the context of challenging pediatric cancers. We have previously applied such techniques successfully in studies of neuroblastoma, acute myeloid leukemia, and osteosarcoma. We are currently generating signatures using RNA-Seq, SmallRNA-Seq, ChIP-Seq and ATAC-Seq epigenomics, Illumina DNA methylation Epic Arrays, and both Reduced Representation Bisulfite Sequencing (RRBS) and whole genome shotgun bisulfite sequencing (WGBS), unbiased lipidomics and unbiased metabolomics, in several pediatric cancers. We will evaluate the clinical relevance of the multi-omics signatures that we will generate against the multi-omics profiles collected by the TARGET data set, including survival, clinical data association, and overlap with the cancer onset and cancer progression signatures COHN, SUSAN UNIVERSITY OF CHICAGO The use of next generation sequencing to evaluate chemotherapy resistance in high-risk neuroblastoma and other pediatric tumors Mar06, 2014 closed Neuroblastoma is the fourth leading cause of cancer in pediatrics. Patients with widely disseminated disease (INSS stage 4) who are classified as high-risk, have an overall survival rate of ~50% despite aggressive treatment regimens. Currently, it is not possible to distinguish high-risk patients who will be cured from those who will be resistant to therapy at the time of diagnosis. As such, it is important to delineate which patients are more or less likely to respond to conventional chemotherapy. Our hypothesis is that by combing genetic data in dbGaP with external databases we will be able to identify important genes or pathways in the response of neuroblastoma to conventional and novel therapies. We will also evaluate the hypothesis that these pathways will also identify important new avenues to treat a broad array of pediatric cancers. We will use a secure database at the University of Chicago to store this large dataset and do not anticipate any increase risk to participant Neuroblastoma is the fourth leading cause of cancer in pediatrics. This cancer is characterized by a wide range of clinical presentations and response to treatment. Patients with widely disseminated disease (INSS stage 4) who are classified as high-risk, have an overall survival rate of ~50% despite aggressive treatment regimens. Currently, it is not possible to distinguish high-risk patients who will be cured from those who will be resistant to therapy at the time of diagnosis. As such, it is important to delineate which patients are more or less likely to respond to conventional chemotherapy. This would allow physicians to determine which patients would benefit from traditional approaches to therapy and which should receive novel antineoplastic agents at an earlier phase in treatment. Furthermore, a relative paucity of somatic mutations have been identified in tumor samples, limiting the number of novel druggable targets. Our hypothesis is that by combing available exome, genome, methylation and RNA sequencing data with data from additional data sets we will be able to identify important genes or pathways in the response of neuroblastoma and other pediatric solid tumors to conventional and novel therapies. Merging the multiple datasets within dbGaP to external data will allow for an in depth analysis of the genomics of thousands of patients with pediatric cancer in conjunction with factors that play a role in drug susceptibility of cancer cell lines. We will be using a secure database at the University of Chicago to store this large dataset and do not anticipate any increase risk to participants. This database will allow us to rapidly analyze this data and focus on particular pathways or genes of interest in ways that would not be possible using each database individually. Specific aims: 1) Identify gene mutations and RNA expression patterns in primary neuroblastoma samples that correspond to genes important in cell line sensitivity to antineoplastic agents. 2) Identify the reproducibility of these patterns in a variety of pediatric tumors. COHN, SUSAN UNIVERSITY OF CHICAGO Using genomic data to better understand differences between high risk neuroblastoma patients. Sep25, 2015 closed Neuroblastoma is the fourth leading cause of cancer in pediatrics. This cancer is characterized by a wide range of clinical presentations and response to treatment. Patients with high risk disease have overall survival about 40-60%. They receive a variety of treatment modalities. We do not have a good way to predict who will respond well to therapy and who will not. We would like to look at genomic information for a variety of high risk patients to better understand the variable clinical activity. Our hypothesis is that by combing available genetic data in the TARGET database with data from the International Neuroblastoma Risk Group (INRG) database we will be able to identify differences that lead to varying clinical outcomes. We will be using a secure database at the University of Chicago to store this large dataset and do not anticipate any increase risk to participants. Neuroblastoma is the fourth leading cause of cancer in pediatrics. This cancer is characterized by a wide range of clinical presentations and response to treatment. Patients with high risk disease have overall survival about 40-60%. They receive a variety of treatment modalities. We do not have a good way to predict who will respond well to therapy and who will not. We would like to look at genomic information for a variety of high risk patients to better understand the variable clinical activity. Our hypothesis is that by combing available genome sequencing data with copy number data in the TARGET database together with data from the International Neuroblastoma Risk Group (INRG) database we will be able to identify differences that lead to varying clinical outcomes. We will identify a genomic biomarker that can be used to identify patients who do not respond well to therapy. Ultimately the goal is to predict which patients are expected to respond to conventional therapies and which may require novel treatment courses. We will be using a secure database at the University of Chicago to store this large dataset and do not anticipate any increase risk to participants. This database will allow us to rapidly analyze this data and focus on particular pathways or genes of interest in ways that would not be possible using each database individually. Cortes Ciriano, Isidro EUROPEAN MOLECULAR BIOLOGY LABORATORY Somatic genomic landscape of paediatric tumours Apr08, 2020 approved Over the last decade our knowledge about the DNA mutations underlying cancer development has increased considerably due to the advancement of technologies that permit us to identify which alterations are present in the DNA of individual cancer patients. Despite these advances, there are still many mutations whose clinical significance remains unclear. In this project we will develop computational methods to investigate DNA mutations across hundreds of patients spanning diverse types of paediatric tumours. This research will serve to characterize which alterations are specific to one type of cancer, or shared across multiple types of tumours. This type of research where information from hundreds of patients is integrated permits to find signals in the data that would pass unnoticed if only few patients were analysed together. It is the hope that this project will shed lights on the mechanisms underlying paediatric tumour onset and development, which might ultimately translate into more accurate patient stratification and better cancer outcomes. Recent large-scale genome analysis efforts have characterized the genomic landscape of human cancers on a pan-cancer scale, revealing that complex genome alterations often underpin tumour evolution. In this project we aim to perform a pan-cancer analysis of paediatric tumours by integrating existing molecular profiling data sets from already published studies and new data we plan to generate for selected tumours types. We aim to identify common patterns of variation among diverse tumor types, as well as characteristics that distinguish one tumor type from another, with an emphasis on patterns of structural variation. We will apply a uniform set of alignment and variant calling algorithms following the best practices established by the Pan-Cancer Analysis of Whole Genomes Project (PCAWG). These data will be supplemented by exome sequencing and RNA sequencing for a subset of cases we will generate. All the data sets will be stored and analysed using the computational infrastructure at the European Bioinformatics Institute (EMBL-EBI). The raw and derived data will then be interrogated by participants in this project in order to address one or more of the research themes listed below. The research activities will cover the following themes: 1. Novel somatic mutation calling methods; 2. Analysis of mutations in regulatory regions; 3. Integration of the transcriptome and genome; 4. Integration of the epigenome and genome; 5. Consequences of somatic mutations on pathway and network activity; 6. Patterns of structural variations, signatures, genomic correlations, retrotransposons and mobile elements; 7. Mutation signatures and processes; 8. Germline cancer genome; 9. Inferring driver mutations and identifying cancer genes and pathways 10. Translating cancer genomes to the clinic; 11. Evolution and heterogeneity; 12. Portals, visualization and software infrastructure; 13. Molecular subtypes and classification. CROCE, CARLO OHIO STATE UNIVERSITY ncRNAome profiling investigation across pediatric cancers Jun05, 2019 approved High-throughput transcriptomic studies have revealed that up to 90% of eukaryotic genomic DNA is transcribed, the vast majority of which as non-coding RNAs (ncRNAs). This has led to the realization that ncRNAs regulate a remarkably broad spectrum of cellular processes. Recently, ncRNAs (e.g. miRNAs and lncRNAs) and their modification phenomena, such as RNA editing, have been proposed as diagnostic and prognostic cancer biomarkers. Here, we intend to characterize and compare profiles of dysregulated ncRNAs across the pediatric TARGET cohort, as well as analyze the occurrence of RNA editing in both coding and ncRNAs, aiming to extend our previous study on TCGA cohort (Project ID: 11332 and PMID: 29976955). We believe that our study will allow to further elucidate the role of ncRNAs, as well as their associated RNA editing, in cancer. This will reveal important details about cancer initiation and progression, ultimately providing potential therapeutic targets to aid in the treatment of this potentially fatal disease. Novel classes of non-coding (ncRNAs) involved in gene regulation have recently emerged, such as miRNAs. Their involvement in many diseases, including cancer, has been the object of intense study from the beginning. In 2002 my group was indeed able to discover the very first direct association between miRNAs and cancer (PMID:12434020). Subsequently, we provided the first profiling of a miRNA across hundreds of human solid tumors (16461460) and associated long ncRNAs (lncRNAs) to cancer (24594601). Recently, it has been proposed that RNA modification phenomena, such as RNA editing, in ncRNAs as markers for the classification of cancer subtypes, as well as diagnostic and prognostic biomarkers. We intend to characterize and compare profiles of dysregulated ncRNA across the pediatric TARGET cohort, as well as analyze the occurrence of RNA editing in both coding and non-coding RNAs, thus also expanding on our previous work on TCGA cohort (Project ID: 11332 and PMID: 29976955). We hypothesize that ncRNA expression profiling paired with editing data can be used for sample stratification and prediction of outcome among cancer patients. To achieve this, we have developed in-house and expanded published pipelines (25577382,30153801,23742983) that can efficiently (by employing the resources provided by the Ohio Supercomputer Center) leverage on large amounts of RNA-seq and DNA-Seq data in order to study changes in transcript expression and extract its editing levels. In addition, we would compare results with those from our human RNA- and DNA-Seq experiments to check transcript dysregulation as well as RNA editing consistency among patients and cancer types. All patient-identifiable sequence data will be kept under strict internal control to prevent any possibility of it being shared or transferred, consistent with the data use agreement. Crompton, Brian DANA-FARBER CANCER INST Analyzing copy-number profiles for the development of ctDNA assays for pediatric cancers Nov28, 2018 closed We will analyze the sequencing data to identify recurrent patterns of copy-number changes across pediatric cancers for the development of circulating tumor DNA assays that detect these alterations. Objectives: We have developed a copy-number based algorithm to detect circulating tumor DNA in patients with pediatric solid tumors. One possible utility for this algorithm is to identify the presence of cancer in patients with cancer predisposition syndromes who are at high risk for developing cancer. The goal of this project is to see whether the copy-number profiles identified in the plasma can be used to identify the type of cancer present by comparing those profiles to copy-number profiles obtained from tumor biopsy samples. Study design: WES, WGS, and DNA SNP-array data will be downloaded from published data sets downloaded via dbGAP and segmental copy-number data will be generated for each tumor. Then, machine learning algorithms will be applied to samples to identify copy-number features that are unique to each diagnosis from these data sets. We will then test copy-number profiles from plasma samples obtained from patients with known pediatric solid tumors to determine whether the resulting disease-defining copy-number features are able to identify the correct diagnosis for each patient sample. Analysis plan: As outlined above, WES, WGS, and SNP-array data will be used to generate segmental copy-number calls for each tumor. Copy-number profiles from each tumor type will be analyzed by machine learning algorithms to identify copy-number hallmarks of each disease type. Associated phenotypic data will be used to assign a diagnosis and (when available) a diagnostic subtype for each sample. Planned collaborations: This analysis will be done entirely at my institution (Dana-Farber Cancer Institute) under my supervision. Curtis, Christina STANFORD UNIVERSITY Integrative analysis of oncogenomic datasets Aug04, 2022 rejected Our research is focused on understanding the molecular underpinnings of malignancy and on the development of analytical tools to integrate across and mine high-dimensional cancer genomic datasets from adult and pediatric tumors. In particular, lessons learned from the wealth of adult tumors can aid new insights into often rarer, pediatric tumors where sample sizes are often limiting. We request access to individual-level data available through The Cancer Genome Atlas and other large-scale public datasets. In some cases, these data will be integrated with data generated from our own studies. These analyses will result in a more comprehensive molecular map of cancer that may ultimate improve patient outcomes. Recent large-scale sequencing efforts have begun to reveal the complex molecular basis of cancer and principles of tumor evolution. Our research is focused on the development of analytical approaches to both mine and integrate large-scale cancer genomics datasets with the goal of elucidating disease etiology, including driver genes, as well as mechanisms of tumor progression and resistance. We propose to incorporate the wealth of data available through The Cancer Genome Atlas for multiple cancer types with other large-scale public datasets in dbGAP, EGA, and Genomics England, as well as data from our own work, to both refine the molecular map of cancers and to validate findings across datasets. Aspects of this work are specifically focused on pediatric cancers for which we will leverage approaches to define driver alterations and subgroups of disease that may benefit differentially from specific therapeutic interventions. Comparisons with far more abundant data from adult solid tumors will aid the interpretation of findings in pediatric tumors, where sample sizes are often limiting due to the rarity of disease. Access to individual-level data is required, as novel algorithms will be applied and the data normalization and summarization procedures must be standardized across platforms and studies. The development of robust tools for the systems-level analysis of genotype-phenotype associations in cancer will enable the elucidation of candidate genetic and epigenetic driver lesions and novel susceptibility loci. By exploiting existing globally coherent oncogenomic datasets and state-of-the art computational tools, this research will refine the molecular map of cancers. When possible, we will evaluate molecular signatures identified in additional cohorts. We do not believe that this poses an additional risks to the patients, particularly as these analyses rely on reduced features sets. We believe that the proposed analysis is aligned with the intended use of the datasets being requested. We are committed to sharing the results of these analyses with the scientific community through peer-reviewed publications in accordance with NIH data use policies. Curtis, Christina STANFORD UNIVERSITY Genomic architecture of pediatric tumors Aug04, 2022 approved We propose to analyze genomic datasets derived from pediatric tumors to define molecular drivers of disease and mechanisms of progression that may be targeted therapeutically. We request access to individual-level data available through The Cancer Genome Atlas and other large-scale public datasets and propose to integrate these with data generated from our own studies. These analyses will result in a more comprehensive molecular map of pediatric tumors, with the goal of ultimately improving patient outcomes. Recent large scale sequencing efforts have begun to reveal the complex genomic landscapes that underlie cancer progression, while also providing insights into tumor evolution. We aim to harness the growing number of pediatric cancer datasets to elucidate disease etiology, including driver alterations as well as mechanisms of progression and resistance. To this end, we propose to incorporate the wealth of data available through TARGET, as well as other large-scale public datasets in dbGAP, EGA or Genomics England in order to define molecular subgroups of disease, candidate targets and to infer their evolutionary dynamics. In some cases, these data will be integrated with data generated in house to corroborate and validate findings. We may also compare with adult solid tumors to understand differences in the molecular drivers and etiology of disease. Access to individual-level data is required, as novel algorithms will be applied and the data nmust be standardized across platforms and studies. By exploiting state-of-the art computational tools, this research will refine the molecular map of cancers. When possible, we will evaluate molecular signatures identified in additional cohorts. We do not believe that this poses an additional risks to the patients, particularly as these analyses rely on reduced features sets. We believe that the proposed analysis is aligned with the intended use of the datasets being requested. We are committed to sharing the results of these analyses with the scientific community through peer-reviewed publications in accordance with NIH data use policies. D'ANDREA, RICHARD UNIVERSITY OF SOUTH AUSTRALIA Implication of rare mutations in DNA repair genes for childhood AML biology and treatment Dec12, 2018 approved Children with rare disorders that impair DNA damage repair mechanisms have a profoundly increased risk of AML. Together with the frequent observation of chromosomal abnormalities in childhood AML, this suggests that defective DNA repair may lie at the root of some childhood leukaemias. We have sequenced all genes in blood cells from a small Australian paediatric AML cohort, and discovered that a large proportion of cases have damaging mutations in genes involved in DNA repair and maintenance of chromosome integrity. We suggest that children who have inherited one of these mutations may have reduction in the DNA repair capacity in blood cells, which potentially contribute to their risk of developing leukaemia due to accumulation of mutations and could also make them hyper-sensitive to chemotherapy. The leukaemia risk may be elevated if the children are exposed to chemicals that damage DNA, or to repeated infections. If our hypothesis is proven true, the results will impact on the selection of a donor for a bone marrow transplant if required and is important knowledge for counselling of families found to carry mutations in the genes involved in maintaining DNA integrity, or cancer predisposition genes. Paediatric AML have high percentage of karyotypic abnormalities. Consistent with this, recessive DNA repair disorders in children are associated with increased risk of AML. The early onset and this increased number of abnormal/complex karyotype cases in paediatric AML suggest that underlying increased genomic instability may lie at the root of the disease in this age category, more so than in adult AML patients. We aim to test the hypothesis that mutations in genes involved in DNA repair pathways predispose to childhood AML. This project continues an existing collaboration with Dr Andrew Moore from the Diamantina Institute and Translational Research Institute, University of Queensland, to undertake genome-wide characterisation of childhood AML samples. We have performed whole exome sequence (WES) on a small Australian cohort of paediatric AML and discovered that a large proportion of patients within the cohort have rare mutations, both somatic and germline, in a network of DNA repair genes. We are now using the TARGET AML dataset to validate and extend these initial findings. The data from the childhood TARGET AML dataset will be analysed using the same in-house pipeline to identify rare variants in DNA repair genes, with results correlated to clinical and molecular characteristics. We will combine the Australian and TARGET data to investigate prevalence of germline variants in the AML cohort compared to publicly available control cohorts. As all data is de-identified this will not incur an increased risk to participants. We will also determine somatic variants in the TARGET AML samples and corelate with our germline data. Given that a comprehensive analysis of rare germline variants across cancer predisposition genes has not been performed for childhood AML, we will extend our analysis to include a panel of genes including those associated with predisposition to haematological malignancies and other cancers. We will determine variants predicted to affect splicing and use the TARGET RNA-Seq data to validate that these affect derived transcripts. We will also use bioinformatic approaches to determine the mutation signatures in the TARGET childhood AML dataset. Finally, we will use the publicly available bioinformatic tool Peddy to determine ethnicity of samples listed as "Ethnicity -unknown" in the TARGET cohort to account for ancestry when analysing the prevalence of variants in the TARGET dataset compared to publicly available control cohorts. During publication, patient privacy and confidentiality will be ensured by listing ethnicity as a term only, e.g. European, American, etc, similar to the TARGET metadata available. The characterisation of a role for DNA repair and cancer predisposition germline mutations in childhood AML will be of significant interest to the haematology and cancer research community, and has implications for patient treatment and the counselling of families known to carry mutations in these genes. The data will only be used for research purposes. All analysis of the requested dataset will be performed in Adelaide at the Centre for Cancer Biology, University of South Australia. DAVIDSON, NADIA WALTER AND ELIZA HALL INST MEDICAL RES Identification of novel leukemia drivers from RNA sequencing Jan20, 2022 closed The DNA of cancer cells are mutated and can alter the function of genes to drive the disease. Genetic sequencing has facilitated more personalized medicine by revealing the mutations that cause many cancer cases. However, for some patients no genetic mutation is identified. We recently developed a computational method which analyzes sequencing data from cancer to detect a broad range of changes. We have applied this to sequencing data from 91 childhood leukemia patients from the Royal Children’s Hospital, Melbourne, Australia, and identified several novel changes which were likely to have driven the cancer. We would now like to search TARGET, which is a much larger cohort, to profile the frequency and types of genetic changes that occur. The outcome of this project aims to improve our understanding of the genetic basis of leukemia and improve personalized medicine through better diagnosis of genetic alterations in the disease. The mutations that drive many leukemia cases remain unknown, even after genetic profiling. The aim of this project is to identify novel drivers of leukemia (ALL and AML). We have recently developed a computational method to detect a broad range of novel drivers of genetic disease using RNA sequencing (Cmero et al., Genome Biology, 2021), including fusions, internal and partial tandem duplications, other insertions, deletions and altered splicing. Applied to our local cohort of 91 pediatric B-ALL patient samples from the Royal Children’s Hospital, Melbourne, Australia, we identified several novel variants in patients where no other driver event had previously been diagnosed. We will use TARGET’s RNA sequencing read data to profile novel leukemia causing variants across a larger pediatric cohort. We will associate these with disease subtypes, gene expression data and disease outcomes. A focus will be given to variants and genes recurrently altered across the patient cohort. Genomic sequencing data will be used to confirm the presence or absence of templated changes in the genome. The outcome of this project aims to improve our understanding of the genetic basis of leukemia and improve personalized medicine through better diagnosis of genetic alterations that drive the disease. Davis, Kara STANFORD UNIVERSITY Predicting Relapse in Pediatric Acute Lymphoblastic Leukemia Jan27, 2022 approved B cell acute lymphoblastic leukemia (B-ALL) is the most frequent pediatric cancer. IKAROS mutations ad/ or deletions are important factor to identify high risk patients. In our laboratory, we identified cell populations associated with treatment resistance and a new role for a known ALL transcription factor in regulating RNA splicing. In this research, we aim to elucidate the role of this transcription factor related splicing events in pediatric B-ALL cohort. TARGET dataset (mRNA-seq, .fq format file) will be used as an independent validation cohort. Completion of this study could provide evidence about therapy resistance in ALL. The results from this study could be regarded as an effective instruction in treatment decisions in the clinic. Our published and preliminary work is focused on using single-cell tools to identify cells associated with relapse in pediatric acute lymphoblastic leukemia. To that end, we have identified pro and pre-B cell populations associated with treatment failure in the setting of both traditional chemotherapy as well as novel chimeric antigen receptor therapies. To understand how these cell populations and their transcriptional and signaling features compare with additional ALL datasets, we propose to interrogate these features in the TARGET cohort Phase 1 to 3 datasets. To that end, we request access to download raw fastq files from TARGET-ALL Phases 1 to 3 RNAseq datasets. These files will be processed through the same pipeline we used to generate our preliminary data. This will allow TARGET data to be compared with datasets generated in our laboratory. We will evaluate association between prognostic cell populations and their transcription factor levels, isoforms and alternative splicing events and treatment failure. We will also evaluate expression of metabolic pathway genes and genes associated with developmental and signaling pathways. Patient samples will be analyzed by subtypes of leukemias based on their genetic alterations driven the disease (Ph+, Ph-like, CLRF, hyperploid, hypodiploid, etc.). In parallel, we will also subcategorize based on their predominant isoform of transcription factors of interest. We have previously defined a list of splicing events of interest, so for each leukemia subtype we will evaluate if these events are associated with therapy failure. This work will be conducted exclusively in Kara Davis’s laboratory at Stanford University. No external collaboration will be required. We are also using this data to evaluate expression of metabolic pathway genes of interest related to relapse in ALL and determine genes that may be differentially expressed in patients who will relapse. Davis-Dusenbery, Brandi SEVEN BRIDGES GENOMICS, INC. Seven Bridges Genomics Cancer Genomics Cloud - TARGET Jan17, 2017 closed The growth of large-scale DNA sequence data for cancer research and its routine use in translational science is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic data and associated clinical annotations collected from various studies is critical to accelerating research and making new discoveries. This project aims to support the development of a new model for data analysis that will allow groups ranging in size from single laboratories to large research consortia to derive value from the investments made in TARGET data without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. The Seven Bridges Cancer Genomics Cloud (SB-CGC) is one of three NCI Cancer Genomics Cloud Pilots, a program to support a new model for the computational analysis of biological data in which a data repository is co-located with computational capacity, with an interface that provides data access while ensuring data security. Primary data for this project will include open and controlled access data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) dataset. Open access will include clinical, Level-3 molecular, and somatic mutation data. Controlled access will include Level-1 sequence and SNP-chip data. All data will be obtained from the Genomic Data Commons. The research objective of this project is to pilot hosting large cancer genomics datasets such as TARGET in a cloud environment to allow users who do not otherwise have access to the necessary infrastructure to compute against this large dataset to perform disease-related studies. Example analyses include mutation calling, integration of data types, and analysis of pathways and regulatory networks. Open and Controlled access data from other large genomics data repositories is available in the SB-CGC, and researchers are able to upload and compute on private data. Researchers must abide by the individual data access agreements governing their use of each dataset. Our organization was granted NIH Trusted Partner status during the period of the NCI Cancer Genomics Cloud Pilot contract for TCGA data as of May 13, 2015. We have additionally submitted an extension to this application on August 11, 2016 that would allow us to make TARGET data available following dbGaP approval. We have implemented the necessary authentication, authorization and access protocols to ensure that only authorized users will be able to gain access to Controlled TARGET data. We follow all NIH Trusted Partner requirements for storage and distribution of these data. The system security is governed by an Authority to Operate (ATO) which was granted under the FISMA-moderate level on June 6, 2016. We will perform necessary security impact assessments and seek review and approval from the NCI. Dawson, Mark PETER MACCALLUM CANCER CENTRE Transcriptional Readthrough in Acute Leukaemia Jul11, 2024 approved Gene transcription is a fundamental step in the process by which proteins are made in cells. Genes that control transcription are often altered in people with blood cancers. We have previously found that some gene changes can cause transcription to continue beyond its normal stop site, this is known as transcriptional readthrough. Preliminary analyses suggest that transcriptional readthrough are more commonly found in children with blood cancers, particular those with a particularly aggressive form of the disease. We want to see how often transcriptional readthrough happens in children with blood cancers and see whether these are associated with changes in particular genes. Ultimately, we want to find what makes blood cancer cells different from healthy cells, so we can develop better treatments to specifically target cancer cells. Background: Transcriptional readthrough (RT) occurs when transcription termination is disrupted, leading to an extension of the read-in frame beyond the normal termination sequence. Although this phenomenon occurs naturally in all healthy tissues, its increased presence in various cancers has been associated with heightened disease severity. Despite its prevalence, the regulatory mechanisms and consequences of transcriptional readthrough remain underexplored. Preliminary Findings: Our initial studies have focused on exploring whether patients with various forms of leukemia experience RT. Initial analyses aimed at determining whether specific mutations within spliceosome factors drive increased RT have shown negligible results. However, analyses revealed that paediatric patients with B-ALL have significantly higher levels of RT compared to both AML patients aged 60+ and healthy controls. Notably, the Philadelphia-negative (Ph-) CRLF2+ subtype, a particularly severe ALL subtype with an early onset (< 6 years) and poor outcome, has demonstrated significant changes in transcriptional readthrough and transcript variability. Genes with increased RT are involved in inflammatory and cancer-specific signalling pathways, including cell proliferation pathways. This data indicates that paediatric B-ALL patients have a much higher susceptibility to RT, a previously underexplored transcriptional aspect of the pathology, which may play a role in sustaining and exacerbating the signalling pathways involved in various aspects of disease progression. Research Objectives: The datasets we currently have access to are primarily RUNX1-ETV6 positive, with minimal patients belonging to the subcategory of Ph-, and no patients belonging to other B-ALL subtypes, such as MLL rearranged and Ph+. Given these promising initial results, we aim to conduct an in-depth analysis and accurate comparison of RT in paediatric B-ALL samples. We need to analyse the transcriptional readthrough of other B-ALL subtypes, particularly comparing Ph+ patient samples to our Ph- samples. This will enable us to more accurately determine the consequences of RT and understand its origin. This study aims to: 1. Determine the prevalence and patterns of transcriptional readthrough in patient samples across different ALL subtypes. 2. Compare these patterns against those observed in healthy tissue samples to identify unique transcriptional signatures. Data Requirements: To conduct this research, we require access to restricted patient datasets that include BAM files necessary for detailed transcriptional readthrough analysis. The specific cohort we are interested in is: • TARGET-ALL This cohort is essential as it provides a comprehensive overview of ALL, offering data for both tumour and matched-normal sequenced samples. This research has the potential to uncover novel regulatory mechanisms of transcriptional readthrough, which could then be targeted in future treatment strategies for B-ALL patients. Thank you for considering our application. Access to this dataset is crucial for advancing our understanding of transcriptional readthrough in paediatric B-ALL and could significantly impact future therapeutic approaches. This research will be conducted solely within our laboratory and no data will be shared with external collaborators. All of the requested data will be securely stored on password protected local servers, accessible only to approved internal staff. We will observe all terms in the data usage agreement. Only data from dbGap will be used in this analysis. De Keersmaecker, Kim KATHOLIEKE UNIVERSITEIT LEUVEN T-ALL mutation analysis Jul02, 2018 approved T-cell acute lymphoblastic leukemia (T-ALL) is an aggressive malignancy. Children with T-ALL are treated with intensive chemotherapy regimens. These therapies can cure over 90% of the children, but they are associated with severe toxicity and side effects that seriously affect the quality of life at childhood age and also later on in life. My lab has a long-standing expertise in studying the genetics and molecular biology of pediatric T-ALL, with the aim to identify suitable targets for novel, less toxic therapies. A dataset with sequencing information of the DNA and RNA from 264 children with T-ALL was recently generated and analyzed and data on clinical outcome of the patients are available. We will perform a series of analyses that are complementary to the analyses that have already been performed and we will screen for acquired and inherited genetic defects that can be used as novel drug targets. T-cell acute lymphoblastic leukemia (T-ALL) is an aggressive malignancy. Children with T-ALL are treated with intensive chemotherapy regimens. These therapies result in over 90% remission rates, but are associated with severe toxicity and side effects. My lab has a long-standing expertise in studying the genetics and molecular biology of pediatric T-ALL, with the aim to identify suitable targets for novel therapies. We previously performed exome and transcriptome sequencing on respectively 30 and 11 pediatric T-ALL patients (De Keersmaecker at al., Nature Genetics, 2017; Atak et al., Plos Genetics, 2013). This provided useful information, but the limited size of the dataset may cause false negatives. Liu et all recently performed whole exome and matched transcriptome sequencing on 264 pediatric T-ALL patients (Liu et al., Nature Genetics, 2017). We would like to analyze this new dataset to validate previous observations made on our smaller dataset and to complement the analyses published by Liu et al. In particular, we will search for non-somatic mutations that predispose to T-ALL (which is possible because exome data from tumor and remission are available): we will screen for non-somatic variants that are overrepresented in T-ALL patients versus normal individuals (1000 genomes project) and will check if mutations are associated to worse outcome. Moreover, we will reanalyze the dataset for somatic mutations that may have been missed by the mutation-significance tools used by Liu et al. Transcriptome data will be used to analyze expression of mutant alleles (somatic and non-somatic) detected in the exomes and to analyze transcriptional variants. These analyses may provide actionable therapeutic targets in addition to the ones described by Liu et al. We previously showed that certain mutations are specific to pediatric cases and are not present in adults with T-ALL. Analysis of pediatric data is thus needed to find suitable drug targets for children with T-ALL. De Villiers, Ethel-Michele GERMAN CANCER RESEARCH CENTER Screening for sequences of yet unidentified viruses Mar20, 2014 closed The relation between some viral infections and cancer is well established. The objective of this project is using a computational approach to discover new pediatric cancer-related viruses and characterize their features as oncogenic pathogens. Viruses are linked to approximately 15% of the global cancer incidence. In addition to the established relationship between specific viruses and human cancer, epidemiological data support the notion that several cancers not linked to infectious events may still result from interactions with infectious agents (reviewed in zur Hausen, H. Virology 2009; 392:1-10; zur Hausen, H. World Cancer Report 2014). We are presently analyzing childhood cancers including leukemia (ALL and AML), brain tumors and osteosarcoma for potential links. Therefore, we apply for access to the full TARGET data in order to screen for the presence of circular DNA viruses and, if possible, their correlation with changes in cellular gene expression, microRNA, DNA methylation status and protein expression. We plan to process raw data from whole genome sequencing (WGS)experiments and map those against the genomes of a subset of viruses. We also plan to use a "de novo" assembly approach, in order to discover still unknown viruses. All the data acquired and the result of its processing will be stored under the protection of the DKFZ firewall and in computers protected by a password. DEKOTER, RODNEY UNIVERSITY OF WESTERN ONTARIO Characterization of driver mutations in precursor B cell acute lymphoblastic leukemia Dec19, 2019 closed Studying leukemia and other cancer types mutations is crucial for developing new oncologic approaches. Our goal is to determine mechanisms that cause Leukemia that may become new targets for molecular targeted therapy. For that, we aim to analyse Leukemia datasets and also other cancer type to investigate the relation between mutational profiles found in many cohorts and cancer development focusing in B-Cell Acute Lymphoblastic Leukemia (B-ALL). The goal of our research is to find and study mutations in specific genes within the B lymphocyte lineage that are responsible for driving disease in precursor B cell acute lymphoblastic leukemia (B-ALL). Such mutations often occur in genes that encode transcription factors, that control B lineage development, being related to differentiation and tumorigenesis when silenced or deleted. We are requesting access to human datasets to complement our research using mouse models. We understand that these datasets have use limitations and will respect then fully. In general, we plan to make analysis in the ALL cohorts and also a Pan-cancer analysis, to study variations among leukemic patients, potentially related to the B-ALL acquisition and development and also to find out whether our results correlate to other cancer types. Specifically, we plan to quantify and compare mutations between patients, hoping to set those more frequently and perform further analysis to find mutational signatures related to our target genes. Findings will allow us to gain insight to transcriptional regulatory pathways related to pediatric leukemias and possibly other cancer types. Derenzini, Enrico ISTITUTO EUROPEO DI ONCOLOGIA Bioinformatics analysis for the characterization of the mitochondrial genome in difficult to treat hematologic malignancies Jun29, 2023 approved Mitochondria are maternally inherited organelles which play a central role in cellular energy provision. Variations in mtDNA are shown to be involved in cancer pathogenesis as a result of disturbances in energy metabolism and apoptosis. Several studies have demonstrated the association between mtDNA variants and different types of cancer. Our aim is to analyze the mitochondrial genome in the context of hematologic malignancies, in order to study the basal state of mitochondria within cancer cells and to evaluate the role of mtDNA variants in the pathogenesis of blood cancers. Our purpose is to perform an in-silico analysis using NGS data of the available datasets deriving from patients affected by AML, ALL, CLL and DLBCL. The first months of the project will be dedicated to the development of the computational pipelines that will be used to analyze the mitochondrial genome of these samples. This study could provide an alternative path to explore new therapeutic strategies, including targeted mitochondrial genome editing and synthetic lethal pharmacologic approaches determining disturbance of mitochondrial metabolism. Mitochondria are maternally inherited organelles which play a central role in cellular energy provision. Variations in mtDNA are shown to be involved in cancer pathogenesis as a result of disturbances in energy metabolism and apoptosis. Several studies have demonstrated the association between mtDNA variants and different types of cancer, but they were mostly focused on few solid cancers. Our aim is to analyze the mitochondrial genome in the context of hematologic malignancies that are still difficult to treat, in order to study the basal state of mitochondria within cancer cells and to evaluate the role of mtDNA variants in the pathogenesis of blood cancers. The molecular characterization of the mitochondrial genome can be performed using the next-generation sequencing (NGS) technologies and the small size of the genome allows a depth of coverage suitable for the characterization of even rare and transient events. Our purpose is to perform an in silico analysis using whole genome/exome sequencing raw-data of the available datasets deriving from patients affected by Acute Myeloid Leukemia (AML), Acute Lymphoblastic Leukemia (ALL), Chronic Lymphocytic Leukemia (CLL) and Diffuse Large B-cell Lymphoma (DLBCL). It will be also useful to estimate the frequency of mtDNA variants in cohorts of healthy donors, in order to evaluate the possible roles of “germline” mtDNA variants found both in normal and tumor samples deriving from the same patients. Another interesting aspect will be the comparison between adult and pediatric patients. This comparison could provide new insights regarding the development of pediatric hematologic malignancies and the role of mtDNA variants. We will publish and otherwise share our findings from these studies with the broad scientific community. The first months of the project will be dedicated to the development of the computational pipeline that will be used to analyze the mitochondrial genome of the available samples. The pipeline will cover the following steps: alignment of the fastq to the mitochondrial reference genome, assignment of the correct haplogroup to each sample, estimation of the heteroplasmy levels, identification and analysis of homoplasmic and heteroplasmic variants. No other datasets are expected to be used in the first few months. However, it is likely that in-house datasets will be produced in the following months (such as WGS, WXS, RNAseq) and they will be analyzed independently of the ones requested through dbGaP. The two different sources of data will be used together only in the last steps, in order to compare the results obtained from the analyses. For example, the mutational landscape of the different datasets will be compared to find common mutated genes. This study could provide an alternative path to explore new therapeutic strategies, including targeted mitochondrial genome editing and synthetic lethal pharmacologic approaches determining disturbance of mitochondrial metabolism. Deshpande, Aniruddha SANFORD BURNHAM PREBYS MEDICAL DISCOVERY INSTITUTE Genomic Analysis of Pediatric AML Feb02, 2023 approved Our group focuses on pediatric cancers that fail to respond to therapy or have a much higher likelihood of relapse and death. We are currently focusing on subsets of childhood leukemia that are associated with high levels of therapy resistance. We will use these valuable datasets (genomic data and associated clinical data) to identify reasons why these patients fail to respond to therapy. Our studies aim to identify potential causes of relapse, as well as identification of alternative means of targeting these difficult tumors. Our research is focused on highly refractory subsets of pediatric AML, especially AML with gene fusions that lead to chimeric oncoproteins. The focus of our study is to identify gene-signature associated with chemo resistant AML subsets, including AML with NUP98 fusions, AF10 fusions and other fusions of the KMT2A gene. In our studies, we will use the transcriptomic data and associated survival or treatment data for the following purposes: 1) We will identify gene fusions using fusion calling pipelines from the transcriptomic data. 2) We will identify gene signatures that define different subgroups associated with different prognostic groups, different mutational signatures or other clinical characteristics. 3) We will use these data to probe the expression of specific genes (such as the HOX cluster genes) or pathways (such as inflammatory signaling) to test whether these genes/pathways are associated with prognosis and/or specific mutations. 4) We will use this data to draw correlations between our pathways of interest and the most highly correlated genes with these pathways. 5) Lastly, we will use the data to help understand the transcriptomic differences between normal and leukemic samples. All of these investigations will be followed up with “wet-lab” experimental studies in animal models in our lab. Deslattes Mays, Anne NIH Mechanisms, Genomic Risk Stratification and Precision Intervention for Acute Myeloid Leukemia in Children with Down syndrome (ML-DS) May03, 2022 closed We will create a new workflow to assess genetic variants and RNA-sequencing gene expression data of DS individuals, identifying molecular differences between DS individuals known to have had preleukemic transient events that progress to acute myeloid leukemia with those who do not, including completely normal DS individuals as a control. Children with constitutional trisomy 21 Down syndrome (DS) have a unique predisposition to develop myeloid leukemia of Down syndrome (ML-DS). This disorder is preceded by a transient neonatal preleukemic syndrome, referred to as transient abnormal myelopoiesis (TAM), which has been thought to be unique among clonal neoplastic disorders by its universal linkage with trisomy 21. Recent work has shown that these transient events appear also in trisomy 21 mosaic individuals and have highlighted the role of GATA1 mutations as deterministic in potential outcomes. There is an unmet need to define molecular characteristics based upon both the GATA1 variant information and the gene expression profiles of blasts of TAM, ML-DS and relapsed ML-DS, as well as normal T21 hematopoietic progenitor populations. We have assembled a multidisciplinary team to collaborate in a cloud-based environment, using and extending workflows and analyses in an open, transparent and collaborative manner using the stated dbGaP datasets available on the INCLUDE and Kids First portals. Each team will focus on different areas: the Lau and Hitzler lab will explore the role of GATA1 mutations in hematopoiesis, including understanding normal T21 hematopoiesis and the transition from TAM to ML-DS and relapse in a subset (phs001657, phs000159, phs000413, phs001027, phs000178, phs001287, phs000424, phs001746, phs000218). The Meshinchi lab will explore the gene expression profiles of the determined subpopulations based both upon the phenotype information of the groups (diagnosed ML-DS with and without evidence of TAM events), as well as relapsed ML-DS and normal T21 subpopulations (phs001657, phs000413, phs001027, phs000178, phs001287, phs000424, phs001746, phs000218). Both groups will compare the identified genomic/expression features with diploid 21 AML cases to identify alterations specific to T21 TAM and ML-DS. Data will be combined with normal T21 controls sourced from the Linda Crnic Institute for Down Syndrome’s Human Trisome Project. This data will be stored on a separate AWS bucket and merged for the purposes of comparison between disease-affected T21 patients and normal T21 subjects. The proposed project will study myeloid leukemia in pediatric T21 patients, with both computational methods and results shared with the broad scientific community, and therefore falls within the data use limitations of the requested datasets phs001657, phs000159, phs000413, phs001027, and phs000218. Deubzer, Hedwig CHARITE, UNIVERSITAETSMEDIZIN BERLIN Investigation of the differential expression of genes involved in resistance to chemotherapy in primary and recurrent solid tumors Oct31, 2018 expired In this project, we want to study specific genes in patient samples that play an important role in cell death. Comparison of primary tumors and recurrent tumors could reveal potential genes that are treatable with available therapeutic compounds. The identification of new drug targets could lead to better treatment of high risk neuroblastoma patients. Chemotherapy resistance represents a bottleneck in high-risk neuroblastoma therapy. The induction of emergency response genes, impairing the apoptotic pathway is a common and tumor-specific event in aggressive tumors. For this reason, the identification of new anti-apoptotic target genes is a promising strategy to overcome this issue. RNA sequencing data of neuroblastoma cell line treated with the pan-histone deacetylase inhibitor Panobinostat disclosed differentially expressed apoptosis-related genes in vitro. Our objective is to explore these genes in neuroblastoma in relation to clinical characteristics and exploit its control mechanisms as potential therapeutic targets. We propose to that some of these genes are involved in chemotherapy resistance and thus differentially expressed in primary and recurrent solid tumors. Our request for the dbGaP datasets would allow deeper analysis of potential new anti-apoptotic target genes using (i) mRNA expression levels, (ii) informations of point mutations as well as (iii) copy number alterations. Dietlein, Felix BOSTON CHILDREN'S HOSPITAL Clinical dissection of coding and nocoding somatic drivers of pediatric cancer Jun27, 2024 approved Cancer cells can have many genetic changes, but only a few drive the disease. While most efforts have focused on changes that directly affect proteins (coding DNA), many important changes occur in noncoding regions. This study aims to identify both coding and noncoding changes in pediatric cancers using large-scale sequencing data. Additionally, the findings will be integrated with clinical data, such as Minimal Residual Disease (MRD), to understand their impact on disease progression and treatment outcomes for children with cancer. Cancer cells can harbour thousands of mutations, but only few of these are thought to be responsible for driving the disease. Previous effort has focused on cataloguing coding mutations, mutations which directly alter the protein structure of the gene product, across and within different cancer types, to aid in identifying the drivers of each patient’s disease. As most mutations are noncoding, many patients lack coding drivers, and growing evidence suggests that also regulatory noncoding mutations do act as cancer drivers, systematically exploring the noncoding genome for cancer drivers has become an appealing opportunity. Identification of cancer driving alterations is often based on their recurrence beyond expectation under neutral selection. For coding drivers, synonymous mutations, mutations that alter the genome but not the protein, can be used as a natural background model. Meanwhile, for the noncoding genome, more intricate analysis is necessary, as the DNA conformation, gene expression, and replication timing act as covariates of the mutation frequency and are cell type specific. This is a particular challenge in children with cancer, which often lack any genomic targets in the coding regions of the genome and harbor extremely low mutation counts. To address this open challenge in pediatric tumors, we will leverage the TARGET dataset (NCI, phs000218.v19.p7) to exploring the role of the noncoding mutations in pediatric tumor typess both within and across cancer types. In this specific study we are aiming to utilize all of the cancer types in the TARGET consortium: Acute lymphoblastic leukemia (phs000463.v16.p7, phs000464.v16.p7), Acute Myeloid Leukemia (phs000465.v16.p7), Clear Cell Sarcoma of the Kidney (phs000466.v16.p7), Neuroblastoma (phs000467.v16.p7), Osteosarcoma (phs000468.v16.p7), Rhabdoid Tumor (phs000470.v16.p7), Wilms Tumor (phs000471.v16.p7). Due to their lack of coding mutations, pediatric tumors highlight the effect of noncoding mutations, and many insights on noncoding drivers in fact stem from these data. Our previous work both on coding and noncoding mutations suggests that using local genomic features benefits driver discovery and exploration of mutational landscapes in various cancers. Here, we propose to extend this work by 1) apply these models on large pediatric whole-genome and whole-exome sequencing across and within cancer types 2) Integrate findings with paired clinical data, such as MRD, to evaluate the impact on clinically relevant phenotypes. Given the low number of mutations in pediatric tumor patients, we anticipate that algorithms will need to be specifically tailored to pediatric tumor patients. The overall goal of this cross-disciplinary project is to systematically investigate the role of both noncoding and coding mutations in children with cancer as well as clinical impact. DiGiovanna, Jack SEVEN BRIDGES GENOMICS, INC. Enabling scalable data analysis on the NCI Cloud Resource: SB-CGC Dec12, 2023 approved The growth of large-scale DNA and RNA sequence data, as well as other molecular, imaging and clinical data types are rapidly out-pacing the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic, transcriptomic and associated clinical data collected from various studies is critical to accelerating research and making new discoveries. This project aims to support the development of accessibility to and interoperability between the multitude of datasets developed by the NCI that are present in the Cancer Research Data Commons. This will enable groups ranging in size from single laboratories to large research consortia to derive additional significant value from these diverse data without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. The Seven Bridges Cancer Genomics Cloud (SB-CGC) is one of three NCI Cloud Resources, a program started by the NCI in 2014 to support a new model for the computational analysis of biological data in which a data repository is co-located with computational capacity, with an interface that provides data access while ensuring robust data security and compliance. As part of the larger Cancer Research Data Commons (CRDC), the SB-CGC interoperates with data from numerous data nodes including the Genomic Data Node, the Proteomics Data Node, the Imaging Data Node, the Integrated Canine Data Commons and the Cancer Data Service. Additional data nodes are planned to be added over time. The SB-CGC leverages CRDC-developed infrastructure and standards such as Gen3 Fence, Data Registry Service and the NIH Researcher Authentication Service. These standards enable users of the CGC to access and analyze data from data nodes in a compliance and secure way while promoting democratization of data access and long term sustainability. The research objective of this project is to develop and advance approaches to support data democratization, interoperability and advanced analytical approaches to accelerate the understanding of cancer and ultimately improve prevention and treatment paradigms by end researchers. Examples of analyses performed by end researchers include gene expression analysis, mutation calling, integration of data types, and analysis of pathways and regulatory networks. In order to continue to develop a highly usable, scalable and secure system in support of the NCI CRDC vision we request access to the most frequently used NCI datasets currently hosted by data nodes at the CRDC. This data access will allow us to perform quality control and assurance steps on data access, develop training and workshop materials, and support end users in their use of the system to achieve their research endeavors. An additional goal of this project has been to leverage our interactions with researchers throughout their use of both open- and controlled-access data on the SB-CGC to develop new functionalities that promote the accessibility and use of these public datasets. We note that we have previously applied and been granted data access approval for many of the requested separate data access requests (including 8098, 12308, 12364, 22386, 24326, 24687, and 25800). Here we propose consolidating these requests to reduce administrative burden and expand the breadth of supported data. In support of this effort, our institution applied for and was granted NIH Trusted Partner status in May 2015. The SB-CGC system security is governed by an Authority to Operate (ATO) which was originally granted at the FISMA-Moderate level on June 6, 2016 and has since been upgraded to FedRAMP Moderate. Our system provides extensive logging and audit capabilities and also allows end users to maintain fine-grained control over data access and analysis. We will perform necessary security impact assessments and seek review and approval from the NCI as required by FedRAMP standards. All activities will be coordinated with the NCI and/or other stakeholders as appropriate. Dong, Rui FUDAN UNIVERSITY Comprehensive molecular characterization of pediatric cancer Mar28, 2024 approved Over recent years, our understanding of the origins and growth of pediatric cancers has advanced notably. Yet, there remains a substantial number of unknown elements in the precise mechanisms and variations among these diseases. Our endeavor is to methodically decode the complexities of pediatric tumors. We will utilize an extensive repository of genetic and clinical data from the TARGET initiative, which comprises a wealth of information from various childhood cancers. This exploration is intended to categorize the cancers into distinct groups based on their genetic characteristics. Ultimately, our objective is to pioneer the identification of precise therapeutic targets. This could lead to the development of more tailored and efficacious treatments, potentially enhancing the quality of life and treatment outcomes for young patients. In recent decades, significant strides have been made in understanding the complex genetic alterations and transcriptomic transformations that underpin the onset and progression of pediatric cancers. Despite these advancements, the pathogenesis of pediatric tumors and their molecular subtypes remain areas ripe for exploration and clarification. This project is dedicated to the development of an intricate molecular classification of pediatric tumors through a systematic integration of genomic and transcriptomic data. Our approach involves leveraging whole exome/genome sequencing alongside mRNA datasets procured from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative to verify copy number variations and to trace clonal evolution from single-cell transcriptomic analyses. Our investigative scope extends to harnessing patient-derived clinical information from TARGET, such as gender, age at initial diagnosis, cancer stage, duration of overall survival, and histological classifications. By correlating these clinical parameters with our molecular subtype data, we aim to shed light on the prognostic implications intrinsic to each subtype. The culmination of this research will be directed toward the identification of bespoke therapeutic targets, tailored to the nuanced treatment demands of distinct pediatric tumor subtypes. In doing so, we hope to contribute to the precision medicine landscape, enabling more individualized and effective treatment modalities for children afflicted with cancer. Dou, Yali UNIVERSITY OF SOUTHERN CALIFORNIA Analyzing molecular features of AML with KMT2A abberation Oct05, 2021 closed Global epigenetic abnormality is a common feature in acute leukemia. Significant subsets of mutations are found in chromatin modulators. They have been increasingly identified as bio-markers and therapeutic targets with great implications in diagnosis and prognosis. Understanding epigenetic deregulation in acute leukemia will have profound impact in basic and translational medicine. KMT2A/MLL1 rearrangement is recurrent in pediatric AML and ALL. It accounts for 70% infant ALL and 5-10% pediatric AML and ALL cases. In addition to MLL1 rearrangement, MLL1 tandem duplication is also commonly found in pediatric AML and in 90% secondary leukemia in children previously treated with Topoisomerase II inhibitors. The pediatric AML is distinct from adult AML at the molecular level, with distinct spectrum of mutations and response to treatment. Furthermore, MLL translocation has much lower frequency in adult AML. Our ongoing studies have demonstrated the therapeutic efficacy of targeting wild type MLL (which remains intact in one of the two alleles in MLL leukemia). We plan to identify potential cooperating events to MLL-dependent pathways in pediatric AML and ALL with the aim to design novel combination therapy. Furthermore, since MLL target transcription factor HOXA9, which acts as a pathological pioneer factor at distal regulatory enhancers, we will examine global epigenomic changes induced by HOXA9 overexpression in MLL rearranged leukemia. We will identify recurrent HOXA9-dependent epigenetic alterations and evaluate their potential as the therapeutic targets. Our study aims identify new vulnerabilities in leukemia with either MLL rearrangement or with high HOXA9 expression, which will help to guide our development of combination therapies to target both MLL1 and other cooperating pathways. In compliance to the Publication requirement, we confirm that we intend to publish our findings in peer reviewed journals and present the results in AACR and ASH conferences. Droit, Arnaud LAVAL UNIVERSITY Development of novel immunotherapeutic approaches targeting gene fusion-derived neoantigens to cure pediatric acute leukemia May19, 2023 approved Acute leukemia is the most common blood cancer in children. Even with aggressive chemotherapy and radiation treatment, 20% of the pediatric patients are still dying from the cancer. These treatments also cause long term secondary effects. As a result, children who survive will develop other major health problems in their adult lives, including infertility, heart failure and secondary cancers. The challenge is to develop new drugs to treat the subgroup of patients who are currently incurable that will target only the cancer cells to reduce the side effects of standard therapy. Over the past 5 years, we have learned more about the peculiarities of the genome of childhood acute leukemia, including a subtype of mutations named gene fusions. These are responsible for initiating leukemia and are absent in healthy cells. Cancer cells can be detected by small molecular flags (i.e. gene-fusion derived peptides) on the surface of the cells that indicate whether or not a cell contains the fusion genes. We propose to discover and develop therapeutic biologics that can recognize these molecular flags and bind only to cancer cells. The molecular flags (i.e. gene-fusion derived peptides) will be tested for validation on an external population of pediatric cancers, the TARGET dataset. Since 2016, an “omics” strategy has been initiated in Quebec, Canada, for pediatric cancers. Initially selecting relapsed and hard-to-treat tumors, the program now includes every pediatric tumor for tissue banking, whole exome and transcriptome analysis (WGTA). More than 300 patients have been included and 200 additional inclusions are expected every year, including solid tumor, brain tumor and leukemia. Using high throughput RNA sequencing data, we are exploring the immunopeptidome landscape of pediatric leukemia. The primary research objectives are: 1) To determine the incidence and frequency of gene fusions associated with pediatric leukemia, 2) to determine if different HLA subtypes are associated with different gene fusions and 3) to predict and validate gene-fusion derived peptides. The access to TARGET pediatric dataset will serve as an external independent dataset for validation. The data analysis will follow the St-Jude’s cicero pipeline that has already been locally used. The TARGET pediatric tumor population is a well fitted dataset for comparison, with a large panel of 500+ leukemias and 700+ solid tumors. Accessing WGTA data will allow for external validation of gene fusion and HLA subtypes frequencies described in our cohort. Droit, Arnaud LAVAL UNIVERSITY Characterization of tumor immune landscape in pediatric cancers for the elaboration of a tumor immune infiltration score Dec22, 2020 approved The treatment of pediatric tumors has improved over time but is now reaching a plateau prompting for the development of new therapeutic options. Immune checkpoint blockades (ICB), i.e. PD1/PDL1 and CTLA4 inhibitors, are highly effective in many adult cancers and in Hodgkin lymphoma, but only a small percentage of pediatric tumors are sensitive to ICB. The presence of immune cells infiltrating the tumor, defined as “hot” tumor, is needed for ICB to be active. Thus, we could anticipate obtaining a better overall response rate to ICB for pediatric cancers harboring a rich immune infiltrate. To confirm this hypothesis, we are exploring the immune landscape of a large population of 300+ pediatric leukemias and solid tumors, based on whole genome and transcriptome data analysis. We have identified distinct immune signatures and aim to build a pediatric immune infiltrate score (pIIS) to be implemented for inclusion criteria in immunotherapy clinical trial. The immune signatures and the pIIS will be tested for validation on an external population of pediatric cancers, the TARGET dataset. Since 2016, an “omics” strategy has been initiated in Quebec, Canada, for pediatric cancers. Initially selecting relapsed and hard-to-treat tumors, the program now includes every pediatric tumor for tissue banking, whole exome and transcriptome analysis (WGTA). More than 300 patients have been included and 200 additional inclusions are expected every year, including solid tumor, brain tumor and leukemia. Using high throughput RNA sequencing data, we are exploring the immune landscape of pediatric tumors focusing on solid tumors and leukemias. As the immune infiltrate is highly influenced by the tumor type, solid tumors and leukemias/lymphomas are studied separately. The primary research objectives are: - To determine immune infiltrate signatures, carried out by deconvolution - To identify genes and pathways that drive immune recruitment or immune desertion - To study the relationship between B and T-cell clonality and tumor mutation burden (TMB) to the tumor immune infiltrate density - To build a pediatric immune infiltrate score (pII Score) to be implemented for inclusion criteria in immunotherapy clinical trial The access to TARGET pediatric dataset will serve as an external independent dataset for validation. The dataset will not be combined with other sources but will be combined together into 2 groups: leukemias and solid tumors. The data analysis will follow the pipeline that has already been locally used: - Pseudo-alignement using Kallisto and TPM normalization - Immune cell infiltrate relative quantification with deconvolution algorithms - B and T-cell clonality carried out with MixCR algorithm - TMB evaluation - Immune signature determination by dendrogram analysis - Differential expression and principal component analysis for gene set and pathway analysis - Utilization of the pII Score to assess the proportion of patients reaching the eligibility criteria The TARGET pediatric tumor population is a well fitted dataset for comparison, with a large panel of 500+ leukemias and 700+ solid tumors. Accessing WGTA data will allow for external validation of immune landscape signatures described in our cohort and for testing the discrimination ability of our pII Score. Dros, Jarno PRINSES MAXIMA VOOR KINDERONCOLOGIE, BV Metabolic gene expression profiling of pediatric kidney tumors Jul01, 2020 expired Tumors have elevated nutritional needs (=metabolism) due to their fast growth. Blocking these “metabolic addictions” is the basis of several therapies in adult tumors, but has been studied limitedly in pediatric tumors. This project will study the metabolism of pediatric kidney tumors using innovative analytical techniques and “mini-organ” (=organoid) models. Metabolic rewiring is a hallmark of adult kidney cancer with established clinical relevance. Nonetheless, the metabolic alterations of pediatric kidney tumors (PKTs) remain mostly uncharacterized. Therefore, our overall aim is to characterize (targetable) metabolic dependencies of PKTs in patient-derived organoid cultures. Here, we request access to the TARGET rhabdoid tumor (phs000470.v19.p8) and wilms tumor (phs000471.v19.p8) mRNA-seq FASTQ/BAM files. We would like to integrate the requested datasets with in-house generated RNAseq data of PKT tissue and organoids. It is necessary to submit all datasets to the same analysis pipeline from the beginning, to remove batch effects and other artefacts, and thus, we require FASTQ/BAM files which are not publically available. We intend to perform differential gene expression analysis using the DESeq2 package. Differentially expressed (metabolic) genes in each PKT subtype will be analyzed using methods such as GO term overrepresentation test and gene set enrichment analysis, to determine the metabolic gene expression profile of each tumor sub-type. This will allow us to pinpoint possible metabolic alterations that will be characterized in-depth in organoid models. This project will provide further insight into the metabolic adaptations of PKTs. Ultimately, the knowledge obtained by this study will allow us to develop improved targeted treatment with the aim to improve survival of patients. DROUIN, SIMON NRC-INSTITUTE FOR BIOLOGICAL SCIENCES Identification of childhood acute lymphoblastic leukemia genetic biomarker signatures predictive of patient outcome. Dec02, 2021 closed Despite significant progress in treatment, with long-term cure rates now exceeding 85%, childhood acute lymphoblastic leukemia (cALL) is still a leading cause of disease-related death in children. In addition, little is known about the underlying causes of cALL, treatment effectiveness and long-term side effects. As mentioned above, 15% of patients are not appropriately classified when diagnosed which often leads to wrong treatment protocols and ultimately, tumor relapse. In addition, up to 66% of cALL survivors will experience long-term treatment-related side effects including cardiac toxicity, neurocognitive effects, bone morbidities, metabolic syndrome, anxiety and depression. Here again, obtaining markers (i.e., genes) that could predict patients’ risk of developing these effects would have significant long term benefits. In this project, we plan to use DNA mutation profiles of each patient and cluster them together in order to identify common features associated with their prognosis. Identification of such markers would allow us to predict tumor relapse before it happens and adjust treatments accordingly. Despite significant progress in treatment, with long-term cure rates now exceeding 85%, childhood acute lymphoblastic leukemia (cALL) is still a leading cause of disease-related death in children. In addition, little is known about the underlying determinants of cALL susceptibility, therapeutic responses and long-term outcome. For example, new patients are currently stratified using clinical and cytogenetic criteria between standard and high relapse risk groups which determine their treatment protocols. Unfortunately, rates of recurrence are almost identical between categories, at 12.5% and 13.4% for standard and high risk, respectively, indicating a clear need for better stratification methods. Using mutational profiles of patients, we propose to apply network biology combined with artificial intelligence concepts in order to identify new recurrence biomarkers which would improve current risk stratification methods and patient outcome. We plan to identify functional mutations for every patient and apply network propagation on a manually curated protein-protein interaction (PPI) network. Such process would allow us to quantify the collective impact of all functional mutations on a desired phenotype (in our case, relapse). We hypothesize that using clustering methods (e.g., kNN, sNN, etc.) on mutational propagation profiles will allow us to identify new biomarkers that could be used to predict relapse. In other words, we should be able to cluster patients with high-risk and low-risk of relapse together based on mutational propagation profiles. However, clustering methods require a large amount of samples in order to provide robustness. As such, we plan to combine 2 major cohorts for which we already have access (St-Jude PCGP and CHU Sainte-Justine QcALL) to the TARGET ALL data we are now requesting. We believe that by having such amount of samples, the model should capture enough heterogeneity (i.e., subtype diversity, treatment variation, mutational events and frequencies, etc.) which will greatly improve accuracy and power. Having a more complete dataset (from these 3 large cohorts) will also not only improve robustness of the model but additionally, clinical usability and usefulness. Therefore, we plan to combine all 3 cohorts into a massive dataset which will split randomly into a training and a test set (70% and 30%, respectively) according to machine-learning standards. This use will not increase risk to patients as no identifying data will be used. Druley, Todd WASHINGTON UNIVERSITY Characterizing rare germline variation in acute leukemia. Aug19, 2010 closed Recent studies have nicely demonstrated that most adult cancers are thought to arise due to the acquisition of one or more damaging genetic mutations. However, similar research in children’s cancers has not been able to identify enough acquired mutation to account for the numbers of affected children. This is most clearly demonstrated in acute infant leukemia, where fewer than 50% of infants survive their diagnosis but which rarely possesses damaging acquired mutations. We predict that acute infant leukemia requires profiles of inherited damaging genetic variation, which would allow leukemia to develop without requiring years of acquired DNA mutations, often seen in adult leukemias. To investigate this question, we have completed high-throughput DNA sequencing in infant and pediatric leukemias and hope to compare our findings to similarly available information of inherited damaging genetic variation in adult acute leukemia patients. Abundant genomic and epidemiologic research has demonstrated that there is a quantitative difference in the incidence of somatic mutation between adult-onset cancer and pediatric cancer (Vogelstein, Science 2013). Given a relative lack of somatic mutation and convincing epidemiologic evidence of exposure, we hypothesize that the incidence of pediatric acute leukemia is heavily influenced by each child's unique profile of inherited or de novo genetic variation. Our work, done in collaboration with the Children's Oncology Group, involves next-generation sequencing of matched germline and leukemia DNA samples from children with high-risk ALL and from germline DNA of infants with leukemia and their mothers. From our exome sequencing results, we have identified a statistically significant excess of functional germline variation in acute infant leukemia patients in genes associated with acute leukemia (as defined by COSMIC). This enrichment is compared to unaffected mothers and unaffected, unrelated children, but has not yet been compared to adult acute leukemia patients. Under our hypothesis, we would predict that adults who develop sporadic acute leukemia have fewer germline variants in leukemia-associated genes compared to infants or children. We predict that adults must acquire more deleterious somatic mutation, which is why they don’t present with leukemia until adulthood, whereas these infants are born with a synergistic profile of deleterious variation resulting in leukemia at a younger age with shorter latency. We are asking for access to the TARGET and TCGA data in order to compare available germline sequencing data in pediatric and adult acute leukemia as a comparison. We simply want to quantify the relative amount of functional germline variation in our leukemia-associated candidate genes from available adult data as a comparison dataset. We feel that these results would be informative for improving our understanding of the biology of acute pediatric leukemias. We are not combining datasets, but are performing statistical comparisons between groups regarding the prevalence of germline and somatic mutation at specific genes of interest. Díaz, Marina CLINIC FOUNDATION/BIOMEDICAL RESEARCH Clinico-biological characterization of acute myeloid leukemia not molecularly defined: gene mutation and gene expression profiles Apr06, 2023 approved Acute myeloid leukemia (AML) is clinically and biologically heterogeneous. About 30% of AML patients must be classified according to clinical or phenotypical data because the molecular mechanisms that underlie its pathogenesis are not well understood. Our project aims to discover new subgroups based on its diverse underlying genomic profile that could help predict the outcome of this patients. This project is aimed to better characterize meaningful biological and clinical subgroups of patients within the acute myeloid leukemia (AML) subset of patients lacking the presence of a class-defining genetic or molecular marker, according to the last WHO-2016 classification system and hereinafter referred as non-molecularly defined AML (NMD-AML). In our project we have performed RNAseq of 117 bone marrow or peripheral blood specimens from patients with NMD-AML at diagnosis and 5 bone marrow from healthy donors. We have also performed WES from 79 of these patients. We want to explore the frequence of recurrent mutations, identify novel somatic mutations and study expression patterns related to those. A correlation with phenotypic and survival data will also be made. Requested datasets will only be used as controls during methodological development, for validation purposes and data comparison. The project development is assessed by Jordi Morata and Anna Esteve-Codina, researchers of Centro Nacional de Análisis Genómico (CNAG). Edgren, Henrik MEDISAPIENS, LTD NTRK gene fusions in pediatric cancers Dec28, 2015 closed Gene fusions are a type of DNA change which, when happening in the right type of cell, can cause cancer. For this reason, several projects are developing new cancer drugs which work by stopping the function of the protein produced by the gene fusion. In this project, we will use a data analysis method developed by MediSapiens to search for gene fusions in RNA sequencing data from childhood cancer samples analyzed by the TARGET project. In particular, we will look for gene fusions involving NTRK genes, in order to compare how frequently these happen in childhood cancers compared to adult cancers. The results are potentially important for finding out which cancer patients in the future should be given drugs that work by stopping the function of NTRK gene fusions. Gene fusions are a well known type of somatic driver mutation in many different cancer types, and as such are currently targeted by both existing as well as developmental cancer drugs. For instance NTRK1-3 inhibitors, such as entrectinib, are currently being tested in phase I-II clinical trials, in patients with a tumor that carries an NTRK gene fusion. These drugs are often tested in basket clinical trial settings, in which the patient may have one of several different types of cancers, as long as the tumor is positive for the target gene fusion. The necessity for a basket trial is the fact that a specific gene fusion may occur in several different cancer types, but mostly at a low frequency in any given type. For efficient future use of such drugs, it is therefore important to know in which cancer types the gene fusions occur, so that testing for the presence of a specific fusion gene can be done in the right patient populations. The aim of this project is to determine whether gene fusions involving the genes NTRK1, NTRK2 or NTRK3 occur in pediatric cancers, and if yes, in which types and at what frequencies. As drugs targeting NTRK fusion positive cancers are in active development, this knowledge would be of immediate value if one of the drugs is approved. As part of this project, we have already analyzed ~8000 adult cancer samples from the TCGA project, providing a comprehensive catalog of NTRK fusion occurrence in adults. With the use of TARGET data, we hope to extend our results to include several important types of pediatric cancers. Our findings so far include NTRK fusions in e.g. adult sarcomas, and one of the questions in this study is whether pediatric sarcomas also have these fusions. In the project, we will use the FusionSCOUT fusion gene detection pipeline developed at MediSapiens to search for gene fusions in paired-end RNA-sequencing data from TARGET cancer samples. Restricted access TARGET data will be analyzed independently of data from other sources, and only for the purpose of identifying gene fusions. Results will be submitted for publication in a peer reviewed journal. Egawa, Takeshi WASHINGTON UNIVERSITY Elucidation of genetic circuitry that suppresses MYC-driven leukemogenesis in lymphocytes Dec19, 2019 approved Using experimental animal models, we have found a connection of a few genes that protect developing lymphocytes from becoming cancers. Since our finding is purely based on the animal models, we would like to utilize the information collected from human patients with lymphoid cancers to determine how our findings in the model organism are related to abnormal gene expression in humans with the similar disease, which may allow us to identify new targets for diagnosis or therapies. B-ALL is one of the most common pediatric malignancies and the frequencies of B-ALL is substantially higher in pediatric patients compared to adults within acute leukemias. A large number of genome-wide sequencing data of patient samples have revealed major mutations or genomic alterations frequently found in subsets of the patients and improved diagnosis. However, we are still facing the demand for understand of the pathogenesis resulting from these genomic alterations and such knowledge will be helpful for further accurate diagnosis and discovery of innovative therapies. In this application, we propose to analyze the transformation resulting from aberrant expression of the proto-oncogene c-MYC based on our recent studies using mouse models of B cell leukemia, which resemble pediatric B-ALL. In our studies, we have identified a novel tumor suppressor pathway engaged by c-MYC in developing B cell precursors. In the models, while c-MYC is a powerful oncogene and induces many changes in gene expression that enhances leukemogenesis, we have found that one transcriptional regulator induced by c-MYC functions as a dominant tumor suppressor, protecting developing B cells from transformation. Indeed, haploinsufficiency or deficiency of the MYC downstream transcription factor dramatically accelerates the development of B-ALL in our models. We have characterized gene expression changes controlled by the c-MYC-downstream program with or without the tumor suppressor and identified a few gene pathways that can prevent the development of the disease and potentially be targeted to restrict tumor development. To seek for the human relevancy of these findings from our mouse studies, we request the continued access to the pediatric B-ALL patient data in the Target database based on these studies (phs000218). On our preliminary literature search shows that the Target database has multiple patient samples that harbor mutations of the c-MYC downstream tumor-suppressor that we are studying, and that a few candidate genes that we identified as a part of the tumor-suppressive network were also shown to be relevant to pediatric B-ALL in the publication based on the dataset (Ma et al., Nature, 2018). The analysis will include focused classification of samples with aberrant c-MYC expression into the co-existing mutations or downregulation of the c-MYC-induced tumor suppressor, the relevant downstream genes that we have found as a potential targets, and seek for their association. This will be supplemented by direct analysis of samples with altered copy numbers or expression of the targetable pathways to assess the potential for targeted intervention. Since we have not found the relevant mutations in adult B-ALL or non-Hodgkin Lymphoma samples through the search of multiple databases, this study must use the pediatric sample data in the TARGET. Access to the data will allow us to seek for the relevancy of our findings in the animal models to the human disease and dissecting important pathways to understand the pathogenesis of pediatric B-ALL for the assessment of the risk as well as potential therapeutic pathways. Einvik, Christer UNIVERSITY OF TROMSO Identification and understanding molecular mechanisms of malignant transformation, tumor progression and differentiation in neuroblastoma Sep08, 2022 expired Neuroblastoma is a malignant embryonic childhood tumor arising from primitive cells of the neural crest. It accounts for more than 7% of childhood malignancies and around 15% of cancer-related deaths in childhood. One hallmark of neuroblastoma is its extremely heterogeneous behavior ranging from spontaneous regression or differentiation into benign histological variants to aggressive metastatic behavior with poor prognosis. Despite continued improvements in cancer treatment, the overall survival of patients with high-risk neuroblastoma is still only 40–50%. Our research is focus on the understanding of molecular mechanisms that are important for neuroblastoma biology with a goal to identify novel molecular targets that can be exploited for more efficient and less toxic treatments of high-risk neuroblastoma patients. Neuroblastoma is a malignant embryonic childhood tumor arising from primitive cells of the neural crest. It accounts for more than 7% of childhood malignancies and around 15% of cancer-related deaths in childhood. One hallmark of neuroblastoma is its extremely heterogeneous behavior ranging from spontaneous regression or differentiation into benign histological variants to aggressive metastatic behavior with poor prognosis. Despite continued improvements in cancer treatment, the overall survival of patients with high-risk neuroblastoma is still only 40–50%. Our research focuses on understanding molecular mechanisms of non-coding RNAs in the complex processes of malignant transformation, progression and differentiation in neuroblastoma. Due to the lack of biological material from neuroblastoma patients, our neuroblastoma research has been limited to relevant cell line and animal model systems. Therefore, we now apply for access to the TARGET NBL genomic data to support and validate our understanding of neuroblastoma in model systems. The use of gene expression and other genomic data from neuroblastoma patients to understand relevant regulatory interactions between both coding and non-coding genes is of crucial importance to our research. Identifying clinically relevant molecular mechanisms of malignant transformation, tumor progression and differentiation that can be specifically targeted can lead to more effective and less toxic treatments of high-risk neuroblastoma patients. The datasets from TARGET NBL will be analyzed separately and according to the current dbGAP security Best practice. We plan to use the TARGET NBL data for neuroblastoma research only. We will not use the data for methods, software, or other tool development. Einvik, Christer UNIVERSITY HOSPITAL OF NORTH NORWAY Identification and understanding molecular mechanisms of malignant transformation, tumor progression and differentiation in neuroblastoma Apr07, 2016 expired Neuroblastoma is a malignant embryonic childhood tumor arising from primitive cells of the neural crest. It accounts for more than 7% of childhood malignancies and around 15% of cancer-related deaths in childhood. One hallmark of neuroblastoma is its extremely heterogeneous behavior ranging from spontaneous regression or differentiation into benign histological variants to aggressive metastatic behavior with poor prognosis. Despite continued improvements in cancer treatment, the overall survival of patients with high-risk neuroblastoma is still only 40–50%. Our research is focus on the understanding of molecular mechanisms that are important for neuroblastoma biology with a goal to identify novel molecular targets that can be exploited for more efficient and less toxic treatments of high-risk neuroblastoma patients. Neuroblastoma is a malignant embryonic childhood tumor arising from primitive cells of the neural crest. It accounts for more than 7% of childhood malignancies and around 15% of cancer-related deaths in childhood. One hallmark of neuroblastoma is its extremely heterogeneous behavior ranging from spontaneous regression or differentiation into benign histological variants to aggressive metastatic behavior with poor prognosis. Despite continued improvements in cancer treatment, the overall survival of patients with high-risk neuroblastoma is still only 40–50%. Our research focuses on understanding molecular mechanisms of non-coding RNAs in the complex processes of malignant transformation, progression and differentiation in neuroblastoma. Due to the lack of biological material from neuroblastoma patients, our neuroblastoma research has been limited to relevant cell line and animal model systems. Therefore, we now apply for access to the TARGET NBL genomic data to support and validate our understanding of neuroblastoma in model systems. The use of gene expression and other genomic data from neuroblastoma patients to understand relevant regulatory interactions between both coding and non-coding genes is of crucial importance to our research. Identifying clinically relevant molecular mechanisms of malignant transformation, tumor progression and differentiation that can be specifically targeted can lead to more effective and less toxic treatments of high-risk neuroblastoma patients. The datasets from TARGET NBL will be analyzed separately and according to the current dbGAP security best practice. We plan to use the TARGET NBL data for neuroblastoma research only. We will not use the data for methods, software, or other tool development. Emerenciano, Mariana INSTITUTO NACIONAL DE CANCER The striking ability of somatic alterations to trigger high-risk acute leukemia Oct25, 2018 approved Acute leukemia (AL) is the most frequent cancer in children. It is characterized by a heterogeneous molecular profile and despite outstanding advances in the genomic field, the ‘origins’ of some molecular abnormalities are unclear. To date, only institutions with state-of-the-art technology can incorporate molecular markers into risk-stratification algorithms to benefit the patient. A big challenge relies on identifying novel approaches to trace oncogenic alterations. Most studies that have attempted to understand the mechanisms behind overexpression of oncogenes have characterized coding abnormalities when we now know that non-coding alterations play a critical role in oncogene regulation. Here, we will investigate two high-risk markers of AL: FLT3 and CRLF2 overexpression. Besides that, we will identify and characterize enhancers associated with gene expression profiles observed in AL subtypes. Our data will outline the genomic causes of oncogene activation in AL and offer a rationale for targeting those abnormalities in high-risk cases. With the advent of omics approaches combined with advanced strategies developed by bioinformatics, it has been possible to identify and characterize the role of regulatory elements (e.g. enhancers) in the dysregulation of several proto-oncogenes expressions. A well-known property of these regions is the function of establishing spatial/temporal gene expressions, becoming an extremely important mechanism in biological processes such as tissue and cell differentiation. Therefore, this study aims to identify regulatory regions associated with acute leukemia profiles, searching for alterations in enhancer landscapes that can contribute to its pathogenesis. Besides that, given the lack of known molecular alterations to account for some of the high-risk profiles observed in acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) in children and adults, this project also proposes the understanding of the genetic bases of FLT3 and CRLF2 overexpression. Therefore, we aim to screen human pediatric and adult acute leukemias for the presence of somatic alterations that explain either FLT3 or CRLF2 overexpression. We cannot conduct our project using only data from adults because the gene expression profiles and outcomes significantly differ between those age groups. This marked difference would preclude us to translate our findings into improvements for diagnostic tests, risk stratification, and prognosis in childhood acute leukemia. To reach our goals, we intend to use data from pediatric patients diagnosed with AML or ALL retrieved from Therapeutically Applicable Research to Generate Effective Treatments (TARGET) (dbGaP Study Accession: phs000218.v20.p7) to identify alterations affecting cancer genes or cis-regulatory elements. For this purpose, we will use data from RNA-seq, Whole Exome Sequencing (WXS), Whole Genome Sequencing (WGS), and Methylation Array to analyze expression profile, enhancer RNAs (eRNA), sequence mutations, copy number, and chromosomal structural abnormalities, and DNA methylation for both coding and non-coding regions. Thereby, we will be able to molecularly characterize these patients with aberrant expression and elucidate the involved mechanisms. We also intend to confirm the poor outcome associated with each profile. Our final goal is to implement a feasible routine diagnostic tool to detect those alterations driving oncogenic overexpression with prognostic value. With the present proposal, we can offer a feasible routine diagnostic tool for two extremely prognostic relevant subgroups of childhood ALL and an understanding of the mechanisms underlying the cell type-specific selection and function of enhancers in the establishment of acute leukemia subtypes. Once there is a small likelihood of incorporating gene expression approaches into diagnostic routine platforms for adequate prospective risk stratification, a simple DNA-based screening will be able to fulfill this gap. We propose to develop a PCR-based screening method to detect the mutations identified in our analyses. We believe this approach will likely be incorporated into the risk stratification of childhood acute leukemia in the next few years. Altogether, our research project will generate relevant data for a more precise diagnosis and adequate treatment of childhood leukemias. This investigation will be performed in collaboration with Dr. Mariana Boroni, a researcher from INCA. The Chief of the ICT Governance responsible for data security at the Information Technology (IT) Division – INCA is Carlos Henrique Fernandes Martins. In case our preliminary results are compatible with the hypothesis, we also plan to collect samples from independent cohorts for validation purposes. Erickson, Stephen RESEARCH TRIANGLE INSTITUTE Predictive Model of Chemotherapy Response in Pediatric Cancers Aug28, 2017 closed The Pediatric Preclinical Testing Program (PPTP), currently the Pediatric Preclinical Testing Consortium (PPTC), tests new cancer drugs on mouse xenograft models of pediatric cancers; a patient-derived xenograft is made from human tumor tissue and is then implanted into an immunodeficient mouse. This preclinical testing is then used to prioritize agents for human clinical trials. The DNA from roughly 100 xenografts have undergone genetic sequencing, which reveals specific mutations that have occurred within these tumors. We aim to identify specific mutations, or combinations of mutations, which can be used to predict how well a given tumor will respond to a specific cancer agent, or combination of agents. Our project will use statistical learning and data mining techniques to discover genomic signatures within pediatric tumors which predict drug response in preclinical testing of chemotherapy agents. A predictive model of drug response could then be applied to cancers and tumors which have not undergone testing, to prioritize agents for preclinical testing and eventually lead to personalized treatment in the clinic. We will use genomic data from the PPTP to identify specific genomic aberrations which can predict response, based on in vivo testing results on mouse xenograft models which have already been published. Esteller, Manel FUN/INST/REC/LEUCEMIA JOSEP CARRERAS RNA splicing and epigenetics in cancer Feb29, 2024 approved We have more than 37 trillion cells in our body, and some of them may escape from our control, dividing uncontrollably and becoming what is known as cancer cells. An important mechanism that has been seen to contribute to cancer progression is “RNA splicing”, which modulates the patterns of gene expression by which a cancer cell can keep proliferating. On the other hand, another important mechanism that can contribute to cancer progression is “DNA methylation”, an epigenetic mechanism that chemically modifies DNA depending on the cellular context, also changing the gene expression patterns. It is hypothesized that there is a connection between DNA methylation and RNA splicing, but it is not currently well known in a cancer context. Thus, we want to study this connection in order to potentially discover new cancer survival mechanisms and to find possible new therapeutic opportunities. That is why we need to access the clinical, expression and methylation data from dbGaP, in order to perform computational analysis to pinpoint new patterns correlating RNA splicing and DNA methylation with cancer progression. We want to focus on neuroblastoma as a starting point, due to our expertise in this cancer, retrieving the data from the “TARGET: Neuroblastoma (NBL)” project (phs000467.v23.p8) RNA splicing is a crucial biological process in animal cells that allow to generate different types of messenger RNA from one single gene, by differential exon/intron inclusion. This differential inclusion of genetic material depends on the type of cell and its (micro)environmental context, allowing the cells to rapidly adapt to a myriad of inputs. Nevertheless, the cellular mechanisms that allow cells to change these splicing patterns, giving alternative products (i.e. alternative splicing), remain obscure. It is thought that epigenetic mechanisms, specifically DNA methylation, play an important role in controlling alternative splicing. It is well-known that DNA methylation, which is the addition of a methyl group of the 5th carbon of cytosines that form a CpG pair on the DNA sequence, can alter the expression levels when this modification is found on gene promoters. Nevertheless, the role of DNA methylation in other genomic positions (e.g. exonic/intronic regions of the gene body) is not well-understood, although some studies suggest an important implication of gene body methylation in alternative splicing (e.g. DNA methylaton in exonic/intronic regions stall RNA polymerase II, increasing the chance of including an exon that otherwise would be excluded), but these mechanisms are not fully confirmed. On the other hand, RNA splicing has a crucial role in cancer development. Almost all types of tumors disclose dysregulated RNA splicing patterns, which help to contribute to the progression of the disease. For example, many splicing factors that are upregulated in breast cancer have oncogenic functions, promoting tumor initiation and progression. Since DNA methylation regulates the performance of the splicing machinery, the connection between both mechanisms in cancer needs to be tackled, especially nowadays with the advent on epigenetic therapies (e.g. hypomethylating agents). Since the connection between RNA splicing, DNA methylation and cancer progression is currently not understood, we need to access to the clinical, expression (RNA-seq paired fastq files) and DNA methylation public data from the dbGaP repository, in order to perform an exhaustive in silico analysis to perform this triple correlation, allowing us to pinpoint splicing patterns that are correlated with DNA methylation and have a potential impact on the cancer patients. Specifically, we want to focus on neuroblastoma as a starting point, due to our expertise in this disease, retrieving the raw data from the “TARGET: Neuroblastoma (NBL)” project (phs000467.v23.p8). Evrony, Gilad NEW YORK UNIVERSITY SCHOOL OF MEDICINE Analysis of somatic variant allele frequency distributions in cancer May13, 2020 closed Mutations occur in tumors at different timepoints: some occur early just as the tumor is starting, while others occur later. Early mutations would be expected to be present in most or all tumor cells, while later-occurring mutations would be expected to be present in fewer tumor cells. In this project, we will analyze these patterns of the number of tumor cells containing mutations to understand how a tumor develops. This may help us understand the initial processes that create tumors and the way they subsequently grow. The origins of cancer, i.e. the original normal cell type/lineage that gives rise to a tumor, remains enigmatic. The goal of this research project is to analyze the distributions of somatic variants across the genome in different tumor types in order to reveal the tumor lineage of origin and to estimate the timing of somatic tumor mutations relative to tumor development. For example, variants at high allele frequency would be expected to have occurred prior to tumor development or early in a tumor (i.e. clonal tumor variants), and vice versa. Moreover, genes that are specific to normal cell lineages and expressed at high levels are mutated at higher rates during normal body development and aging. Therefore the distributions and variant allele frequencies of tumor variants provide a critical view into oncogenesis and may be able to reveal the normal lineage origins of cancer. Pediatric cancer, in particular, has an intimate link to normal development. Many pediatric tumors likely arise from progenitor cells present during normal development rather than fully diffierentiated cells. As a result, identifying these progenitor lineages would enable: a) the creation of improved animal models for pediatric cancer; and b) enable investigations of possible lineage-specific vulnerabilities for development of novel therapeutics. Therefore, pediatric cancers in particular are amenable to our approach seeking to identify tumor origins using somatic variants. The study will obtain raw somatic variant calls from all key tumor types, including pediatric tumors, and will analyze the allele frequency distributions. In particular, we will prioritize tumors for which the tumor cellularity is known, since this will provide an upper bound for the expected variant allele frequency. Variant allele frequency distributions will also be correlated with other parameters, including: tumor grade, tumor type, copy number variants, and RNA expression profiles. Pediatric tumors are of particular interest because of the more direct link between their origins and development of the body and developmental progenitor cells. Eyal, Eran SHEBA RESEARCH FUND Genomic discoveries in ultrahypermutant mismatch repair deficient pediatric tumors using next generation sequencing analyses Mar28, 2016 expired Biallelic mismatch repair deficiency (bMMRD) is the most aggressive pediatric cancer predisposition syndrome caused by homozygous germline mutations in one of the 4 mismatch repair genes. We performed sequencing and transcriptome analysis on these tumors to find novel genes, mutations or genomic modifications that might be related to the disease and can help find new and better threrapuetic approaches. In order to preform proper comparative and statistical analysis we would require access to the The Cancer Genome Atlas (TCGA) and TARGET datasets for comparison of sequencing data to our patients. This will allow us to perform a more accurate and efficient analysis. We also perform transcriptome analysis on these tumors, for which we will need access to the GTex data in order to compare expression patterns of normal tissues with the tumor tissues, thus to decipher novel gene expression pattern in cancer. We study in depth the causes for mismatch repair deficient (MMRD) cancers in children, in order to prevent it from developing and to treat it effectively. These types of tumors harbor the highest mutation load in human cancer and are termed ultrahypermutant cancer. For this purpose we use next generation sequencing analyses including whole-genome, whole-exome and transcriptome data to determine the effect of the ultra hypermutational signature on the genome and transcriptome of such tumors and based on that, aim to develop novel therapies. Our group has extensive experience in molecular cancer genomics research, genomic (CGH, RNA, miRNA) microarray profiling, brain cancer biomarker elucidation, and personalized cancer medicine. Thus, access to TARGET data that holds pediatric tumor data would be extremely valuable to our study. The availability of a large sequencing cohort in the analysis of variants enables a much more efficient and accurate identification of novel pathogenic variants by enabling a faster filtering process through reported allele frequency. For transcriptome analysis it is necessary to compare the results to a normal samples, thus access to GTex raw data is crucial in order to perform proper comparison of the transcriptome data from tumors with normal tissues. The mismatch repair deficient (MMRD) cancers which we are focusing on are Glioblastoma (GBM) samples. We would like to download from The Cancer Genome Atlas (TCGA) datasets a comparable set of GBM transcriptome samples which don't exhibit ultrahypermutant phenotype for comparison. this will allow us to elucidate the unique properties underlying the phenotype in search of genetic causes. We also plan to perform analysis of the TARGET data to elucidate concurrent gene expression changes and identify novel somatic mutations associated with large subgroups of different pediatric cancers. Farrar, Jason UNIV OF ARKANSAS FOR MED SCIS Genomic Abnormalities in Pediatric AML and High Risk Malignancies Jun12, 2013 approved The TARGET project uses innovative technologies to fully define the variety of genetic changes that occur within cells that lead to childhood cancer. This study seeks to identify the critical changes that occur in the development of acute myeloid leukemia (AML). We expect that these studies will improve the treatment outcomes of pediatric AML both by allowing identification of patients who do poorly with current treatments as well as by identifying new weaknesses in AML cells that can be targeted with treatment. Acute Myeloid Leukemia (AML) is a molecularly heterogeneous disease with alterations occurring by mutation, copy number alteration, epigenetic modification, as well as by formation of novel gene function through intra- and extra-chromosomal fusions. Through the TARGET Initiative and with the Children’s Oncology Group, we are evaluating the detailed copy number alterations, genome-wide methylation changes and whole-genome and transcriptome sequence in a carefully defined cohort of patients with childhood AML patients through analysis of tumor, germline (remission bone marrow), and in a subset with relapsed tumor specimens. The identification of distinct molecular changes in subtypes of AML should result in improved strategies for treatment stratification and the identification of novel targets for therapy. In addition to definition of subgroups within pediatric AML, data will be compared across the TARGET Initiative sub-projects, representing 5 of the most common high-risk pediatric malignancies to develop a comprehensive view of the biology of childhood cancer. Finally pediatric AML data from TARGET will be compared to adult AML data from TCGA and to forthcoming additional genomic data of pediatric AML to define the unifying and distinctive features that characterize childhood and adult AML. Fauteux, Francois NRC-INSTITUTE FOR BIOLOGICAL SCIENCES The identification of therapeutic targets in pediatric ALL Jun29, 2017 closed ALL is the most frequent cancer diagnosed in children. Despite the great progress achieved over the last 40 years in treating ALL, most children that relapse will succumb to their disease. This poor outcome reflects the lack of unique personalized treatment that specifically target high-risk and relapsed ALL. Hence, this project aims to address this issue by identifying relevant therapeutic targets for high-risk and relapsed ALL using the sequencing data generated by TARGET. Therapeutic antibodies will then be generated against potentially relevant target in the hope of improving the prognosis of children with high-risk and relapsed ALL. Acute Lymphoblastic Leukemia (ALL) is the most frequent cancer diagnosed in children. Despite the great progress achieved over the last 40 years in treating ALL, most children that relapse will succumb to their disease. This poor outcome reflects the lack of unique personalized treatment that specifically target high-risk and relapsed ALL. Hence, this project aims to address this issue by identifying relevant therapeutic targets for high-risk and relapsed ALL using the two following approaches: 1) We will use the TARGET acute lymphoblastic leukemia (ALL) RNA-seq data to compare gene expression in cancer versus normal tissues to identify differentialy expressed genes and isoforms for each ALL subtype. We also have assembled large datasets for multiple tumor type. This will allow us to identify genes that are highly specific for pediatric ALL. 2) Using functional genomics approaches, we will screen the genes previously identified in 1) using siRNA/shRNA/CRISPR to determine which one are required for survival and growth of pediatric ALL samples. The goal is to identify genes that we can potentially target with therapeutic antibodies. Ferrando, Adolfo COLUMBIA UNIVERSITY HEALTH SCIENCES Identification of novel mechanisms of drug resistance in pediatric ALL May09, 2014 closed Patients that relapse after treatment have very poor prognosis. We previously found that genetic alterations in a gene named NT5C2 are present in relapsed leukemias. Here we will further characterize the genetics and mechanisms of leukemia relapse and resistance to chemotherapy. Acute lymphoblastic leukemia (ALL) is an aggressive hematological tumor resulting from the malignant transformation of lymphoid progenitors. Despite intensive chemotherapy, 20% of pediatric patients and over 50% of adult patients with ALL do not achieve a complete remission or relapse after intensified chemotherapy, making disease relapse and resistance to therapy the most substantial challenge in the treatment of this disease. Our group recently identified activating mutations in the cytosolic 5'-nucleotidase II gene (NT5C2), which encodes a 5'-nucleotidase enzyme that is responsible for the inactivation of nucleoside-analog chemotherapy drugs in relapse precursor T-ALLs resulting in chemotherapy resistance in pediatric ALL. We have tested whether NT5C2 mutations in relapse leukemia also occur in the TARGET pediatric leukemia Whole Exome Sequencing (WES) samples which are B-ALL. We have also extended our current cohort of WES triplets with data from diagnosis-remission-relapse TARGET samples to identify novel mutations apart from NT5C2 contributing to drug resistance in pediatric ALL. We have identified somatic variants acquired by the recurrent tumors comparing called variants from remission and relapse samples for each patient and measuring the frequency of NT5C2 mutations, as well as novel recurrent mutations that are enriched in specific pathways that may be consistent with mechanisms of resistance. Our proposed research project specifically addresses the question of drug resistance in pediatric ALL samples for the purpose of developing more effective targeted treatments for the subset of childhood leukemias that do not achieve a complete remission after intensified chemotherapy. The success of the project strictly depends on the valuable availability of matched diagnosis, remission and relapse samples form TARGET pediatric patients. Collaborators providing samples and contributing to the scientific analyses will include: Maddalena Paganin and Giuseppe Basso from (University of Padua, Italy); Maria Luisa Sulis (Columbia University); Koh Katsuyoshi (Saitama Children’s Medical Center, Japan). Integration with single cell RNASeq analyses is now also being implemented. Fiers, Mark FLANDERS INTERUNIV INST BIOTECHNOLOGY Alternative splicing in acute lymphoblastic leukemia Oct02, 2014 closed Acute lymphoblastic leukemia (ALL) is an aggressive leukemia caused by malignant transformation of developing lymphocytes. Much research has been aimed at identifying mutated cancer genes, however another possible source of variation is alternative splicing - a process where the same gene on the DNA is transcribed differently between a healthy cell and tumor cell. In this research we want to identify alternative splicing events in our own data (T-cell ALL) and compare these with other forms of ALL to assess whether these alternative splicing events can distinguish (pediatric) T-ALL. This research will focus on detection of variant spliced alleles expressed in pediatric T-ALL. We hypothesize that some variant alleles are uniquely expressed only within some forms of ALL, and not in other hematological malignancies. The discovery of such variants will be valuable as a diagnostic marker, or even potential therapeutic targets. Therefore, access to this large cohort of ALL, in combination with our own data on T-ALL, will help us determine the presence of such alleles, and whether they are unique to a specific form of ALL. Fisch, Kathleen UNIVERSITY OF CALIFORNIA, SAN DIEGO Investigating the Role of Epitranscriptomic RNA Mutation in T-cell Acute Lymphoblastic Leukemia Mar11, 2020 approved Acute lymphoblastic leukemia (ALL) is the most prevalent hematological cancer in children younger than 14 years of age and despite progress in intensive chemotherapy, 20-25% of pediatric and over 50% of adult patients show resistance to therapy and relapse. We want to investigate the role of ADAR1 and APOBEC3 deaminases in T-cell acute lymphoblastic leukemia (T-ALL) cancer cell generation and maintenance using this whole transcriptome RNA sequencing dataset of T-ALL patients. Acute lymphoblastic leukemia (ALL) is the most prevalent hematological cancer in children younger than 14 years of age. Despite progress in intensive chemotherapy, 20-25% of pediatric and over 50% of adult patients show resistance to therapy and relapse. Widespread aberrant epitranscriptomic ADAR1-mediated adenosine-to-inosine (A-to-I) RNA editing and APOBEC3-mediated cytosine-to-uracil (C-to-U) RNA editing has been associated with clinical characteristics of several cancer types and generation of leukemia initiating cells (LICs) with enhanced pro-survival and self-renewal capacity. We would like to use this dataset to analyze the A-to-I and C-to-U RNA mutation signatures in pediatric T-ALL, which will aid the downstream functional and mechanistic overexpression and knockdown studies. By providing a more mechanistic understanding of the role of ADAR1 and APOBEC3 in pediatric cancer, the proposed study will inform future RNA mutation detection and inhibition strategies that may help to obviate cancer resistance and relapse. This is no plan to combine this dataset with other datasets outside dbGaP. FLYNN, RACHEL BOSTON UNIVERSITY MEDICAL CAMPUS Defining the Genetic Basis of the Alternative Lengthening of Telomeres Pathway in Osteosarcoma Dec21, 2023 approved Telomere elongation is a requisite for cellular immortality and a hallmark of cancer cells. Most cancer cells rely on reactivation of the enzyme telomerase or activation of the alternative lengthening of telomeres pathway (ALT) to promote telomere elongation. The prevalence of ALT in pediatric osteosarcoma is estimated to be over 70%. These cancers have poor overall survival and treatment options that have remained static for decades suggesting the need for additional research. To date, genetic mutations in several genes have been associated with ALT positive cancers. However, whether these mutations are early drivers of ALT activity or are acquired later in the progression of ALT is unclear. The goal of our study is to further define the genetic evolution of ALT in pediatric osteosarcoma. Fully defining the genetics of ALT could help us identify genetic vulnerabilities that may serve as tractable therapeutic targets in the treatment of ALT positive cancers. Most human cancers rely on the enzyme telomerase or activation of the Alternative Lengthening of Telomeres (ALT) pathway to stimulate telomere elongation, and ultimately, promote cellular senescence. ALT activity has been detected in approximately 75% of osteosarcoma, yet exactly how the ALT pathway is activated is unclear. Genetically, mutations in the chromatin remodeling proteins ATRX and DAXX, correlate with ALT status. Thus, it has been hypothesized that inactivation of ATRX and DAXX alters the chromatin landscape at telomeric DNA contributing to the activation of the ALT pathway. However, loss of ATRX or DAXX alone does not fully induce ALT activity in vitro, suggesting the gain, or loss, of additional factors likely contribute to the process. Using RNA sequencing (ribosomal depleted RNA preparations) on 20 osteosarcoma cell lines and tumors, we identified that approximately 70% of osteosarcoma samples have lost expression of the telomerase RNA, hTR. Thus, functional inactivation of components of the telomerase holoenzyme could be an early event in the activation of ALT. hTR is an HA/CA box snoRNA gene and consists of a single exon spanning 451 nucleotides. hTR is transcribed by RNA Pol II, however, it is not polyadenylated and cannot be analyzed by RNA sequencing libraries prepared from poly(A)+ purified RNA. Given that most transcriptomic datasets in cancer are analyzed using poly(A)+ purified RNA, hTR analysis has been consistently overlooked. Likewise, genome-wide analyses are often conducted using whole-exome sequencing approaches. WES captures 93% of exons annotated as protein coding, but only 24% of all exons within the genome. As a result, non-protein-coding genes including hTR are often excluded from analysis. Our initial analysis by RNA sequencing did not identify a clear genetic basis for loss of hTR expression. Therefore, we are interested in re-mining existing whole-genome sequencing data to see if we can determine if deletion, mutation, or loss of heterozygosity at the hTR loci (including promoters and the 3’ region of the hTR loci) occurs broadly in osteosarcoma tumors and serves as a previously unrecognized contributor to ALT activation. Study Design: We will use a hypothesis driven bioinformatics approach to look for genetic mutations, deletions, or chromosomal rearrangements of hTR, ATRX, and DAXX using the TARGET OS whole genome sequencing and RNA sequencing datasets that are currently available. In addition, we will determine whether any defects in hTR are early or late events in the evolution of osteosarcoma tumors harboring ATRX/DAXX inactivation using the PhyogenicNDT analysis pipeline developed by our collaborator Dr. Ignaty Leshchiner. Analysis Plan: WGS and RNA sequencing data will be aligned to the hg38 human reference genome using STAR to identify reads. The individual loci indicated above will be examined for evidence of mutations. We will also validate any genetic events detected in the TARGET OS samples to a cohort of our own osteosarcoma cell lines and tumors (approximately 50 samples includes fresh frozen material). Data Use Limitations: This study meets all of the data use limitations as defined by the Disease-Specific content group that applies to this accession. Planned Collaborations: This project does not involve planned collaborations with individuals from other institutions. Foulkes, William SIR MORTIMER B. DAVIS JEWISH GEN HOSP Dissecting driver mechanisms in tumors carrying genomic alterations in microRNA processing genes. Nov14, 2018 approved Over the past several years, germ-line and/or somatic mutations have been identified in different microRNA processing genes in a number of rare pediatric and adult tumors. These tumors include pleuropulmonary blastoma, cystic nephroma, Sertoli-Leydig cell tumors, Wilms tumor and rare brain tumors such as pineoblastoma and pituitary blastoma, among others. Our goal in this project is to dissect driver mechanisms that have been altered as a consequence of microRNA processing genes alterations and compare it with TARGET samples carrying similar aberrations or having a similar phenotype. The outcome of our study will be very valuable in terms of developing a new therapeutic approach that can target these devastating cancers. The requested data will be used to compare with our tumors carrying mutations in microRNA processing genes to identify the pathways altered in these rare tumors. The data will be used only for scientific research purposes and will NOT be compared with any public database of adult tumors. In fact, the mutation(s) that we are working on have so far been reported mainly in pediatric cancers. We are planning to publish the results of our study in peer-review journals. Foulkes, William SIR MORTIMER B. DAVIS JEWISH GEN HOSP Systematic approaches to dissect driver mechanisms in tumors carrying genomic alterations in microRNA processing genes. Nov16, 2021 approved Over the past several years, germ-line and/or somatic mutations have been identified in different microRNA processing genes in a number of rare pediatric and adult tumors. These tumors include pleuropulmonary blastoma, cystic nephroma, Sertoli-Leydig cell tumors, Wilms tumor and rare brain tumors such as pineoblastoma and pituitary blastoma, among others. Our goal in this project is to dissect driver mechanisms that have been altered as a consequence of microRNA processing genes alterations and compare it with TARGET samples carrying similar aberrations or having a similar phenotype. The outcome of our study will be very valuable in terms of developing a new therapeutic approach that can target these devastating cancers. The requested data will be used to compare with our tumors carrying mutations in microRNA processing genes to identify the pathways altered in these rare tumors. The data will be used only for scientific research purposes and will NOT be compared with any public database of adult tumors. In fact, the mutation(s) that we are working on have so far been reported mainly in pediatric cancers. We are planning to publish the results of our study in peer-review journals. Foulkes, William SIR MORTIMER B. DAVIS JEWISH GEN HOSP Systematic approaches to dissect driver pathways in rhabdoid tumor of the ovary (MRTO). Dec05, 2016 closed The requested data will be used to compare with our MRTO data to identify the differential pathways altered in Rhabdoid tumors. The data will be used only for scientific research purposes. We are planning to publish the results of our study in peer-review journal. Rhabdoid tumors are pediatric soft tissue tumors that can manifest as either atypical teratoid/rhabdoid tumors (ATRTs) in the brain, or extra-cranial malignant rhabdoid tumors (MRTs) that most often develop in the kidney, but can arise in other tissues. The inactivating mutations in two main components of the SWI/SNF complex, SMARCB1 and SMARCA4 have been reported in 98% and 2% of cases, respectively. Recently, we, and others showed that deleterious mutations in SMARCA4 are the main cause of an early onset lethal ovarian cancer (SCCOHT) in females (median age 24 years, range 18 months to 56 years). Also, all cases under 15 years old at diagnosis appear to be caused by germline mutations. Further studies, by us, revealed that SCCOHTs are clinically, genetically and epigenetically more similar to Rhabdoid tumors than ovarian high grade serous carcinoma and we suggested renaming SCCOHT as MRT of the ovary (MRTO). Our goal in this project is to dissect driver pathways that have been altered as a consequence of SMARCA4 loss and compared it with SMARCB1 mutated Rhabdoid tumors. Our initial results suggest that most probably the same pathways have been alters as a consequence of mutations in these genes. The outcome of our study will be very valuable in terms of developing new therapeutic approach that can target both types of these devastating paediatric cancers. Francis, Stephen UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Investigating transposable and retro elements in the etiology of childhood cancers Aug19, 2020 rejected We are investigating an under studied area of the human genome in the cause and treatment of childhood cancer. These genetic elements were previously called “junk DNA”, yet a growing body of research has determined that these genetic elements may be involved in some of the more complex functions of our genome. We plan to study these at the DNA and RNA level to identify novel associations to help to understand the cause and target new therapies in childhood cancer. Objective: Investigate transposable and retro elements (TE/RE) polymorphisms and expression across and between childhood cancers to identify features that may be etiologically relevant in childhood cancer. Nominated elements will then go into functional studies in murine and cell based systems. Background: Barbara McClintock’s discovery of RE/TE’s in 1956 was a paradigm shift in genetics. Despite that discovery large scale investigations into retroelements in human diseases have been few and far between. Almost half of the genome is comprised of TE/RE’s, they are repressed by methylation that is often removed in cancer and actively contribute to immune response. Broadly many of these elements are known to be functional, though the extent and function of most of these elements remains unknown. Approach: We propose to use the TARGET genomic dataset for this analysis. This investigation will catalog all germline TE/RE polymorphisms, quantify TE/RE expression and analyze gene expression differences associated with polymorphic and differentially expressed RE/TE’s, we will also classify infections using BLAST present in RNA and DNA samples as these are known to effect both gene expression and TE/TE expression. We have developed and adapted a number of bioinformatic pipelines optimized for identification of polymorphic retroelement insertions, expression quantification and gene expression that used in tandem will provide a map of variation in childhood cancer. We will use appropriate statistical tools for analyses of highly dimensional data with a strict multiple testing framework. We plan to investigate features discovered in this analysis in follow up work in murine and cell line studies. We plan to analyze this data set in the context of concurrent mouse studies and previous analyses to understand the frequency of these polymorphisms and expression in large genomic datasets such as GTEX and 1000 genomes Data use limitations: The goal of this study is to investigate TE/RE features in the etiology of childhood cancer to further the understanding of the cause and identify novel treatment targets. This investigation is in line with the data limitations outlined for TARGET. Fridley, Brooke MAYO CLINIC Genetic markers associated with thiopurine treatment response in childhood acute lymphoblastic leukemia (ALL) Jun23, 2011 closed Acute lymphoblastic leukemia (ALL) is the most commonly diagnosed form of pediatric cancer. Although cure rates are high, current therapy carries risk of severe adverse effects such as myelosuppression or, conversely, relapse can occur. Determination of genetic factors responsible for variation in treatment response of patients would help make it possible to optimize and individualising treatment of childhood ALL. We have performed a clinical association study using UK-ALL patients. We genotyped 631 ALL patients for our study and identified 85 candidate SNPs that were associated with outcome or adverse response phenotypes as well as 40 associated genes, 12 of which we functionally validated by thiopurine cytotoxicity assays. In order to replicate our findings, we would like to compare them with SNPs that were identified during the St Jude GWAS for childhood ALL treatment response. Therefore we are requesting access to genotype and phenotype data from the study “Genome-wide interrogation of germline genetic variation associated with treatment response in childhood acute lymphoblastic leukemia” by Yang et al., 2009, JAMA. 301(4):393-403. The goal of this research is to identify and functionally validate genetic variants associated with variation in thiopurine response during the therapy of childhood ALL Friend, Stephen SAGE BIONETWORKS Network modeling of Pediatric Cancer Dec17, 2010 closed Cancer represents one of the most complex diseases because the groups of genetic mutations that can result in tumor progression can vary widely across individuals. The aim of this research is to use mathematical models based on genomic and clinical data donated from patients to provide a model of cancer pathology that can be used to understand the complex biology that underlies tumorigenesis. The goal of this effort is to predict clinical outcomes and, ultimately, to guide the production of more effective therapies for future generations of pediatric cancer patients. We kindly request an extension for access to the TARGET data set. We hope to commence the project investigation as outlined in our previous proposal, outlined here: We propose to use integrated approaches to combine the pediatric cancer genomic and phenotypic datasets available through TARGET into network models that can be used as predictive tools to understand cancer biology, to identify novel targets for the development of pediatric cancer therapies and to maximize therapeutic benefit. Our group has successfully developed an integrated approach for modeling complex genomic datasets and has effectively applied these techniques in the fields of diabetes and obesity. Our current initiative is to apply these techniques towards the improvement of drug design and targeted patient selection in the field of cancer. The TARGET datasets are an invaluable resource that will enable us to apply these techniques towards the therapeutic needs of pediatric oncology. We intend to combine the TARGET datasets with data on adult AML and cell lines we have through a collaboration with the FHCRC and OHSU. Through this initiative we plan to: (1) Develop network models that are predictive of the relationships among genomic data (genetic, epigenomic, and transcriptomic) and clinical phenotypes/outcome. (2) Identify co-regulation patterns among genetic, epigenomic, transcriptomic data and clinical phenotypes/outcome and determine their relationships with regards to known biological processes. (3) Identify sub-networks that are causally associated with dissimilarities in models derived from multiple types or stages of cancers, or derived from patients stratified by clinical outcome. (4) Identify key regulators of tumor progression through the construction of predictive causal networks. Frietze, Seth UNIVERSITY OF VERMONT & ST AGRIC COLLEGE Mechanisms of IKZF1 tumor suppression in ALL Oct21, 2020 approved We are exploring the potential for genome-wide approaches to reveal drivers of adult and pediatric leukemias. Based on our previous acute lymphoblastic leukemia (ALL) studies, we expect this innovative strategy will provide an enhanced understanding of the molecular underpinnings of high-risk ALL and will also lead to the identification of new therapeutic targets, as well as new biomarkers for high-risk ALL (both adult and pediatric ALL). Our team recently conceived of a plan to explore the hypotheses that select non-coding RNAs are deregulated according to IKZF1 mutational status. In this project, we are employing a computational algorithms and integrative analysis of gene expression and variant analysis. We are exploring the potential for new computational and functional genomics approaches to uncover novel targets of acute lymphoblastic leukemia (ALL). The specific objective of our study is to study the molecular mechanisms of Ikaros tumor suppressor function in ALL to reveal new therapeutic targets and better biomarkers for risk stratification, for both childhood and adult ALL. We have developed a human cell model and IKZF1 mouse model to identify critical target genes for Ikaros (Schjerven JEM, 2017). We have specifically uncovered candidate non-coding RNAs that are aberrantly regulated in cells that harbor IKZF1 mutations. These factors have particular relevance for the diagnosis or treatment of childhood ALL as IKZF1 mutations are frequent in pediatric ALL. Here our goal is to explore existing TARGET RNA-seq, small RNA-seq and whole exome sequencing data sets, and analyze each separately according to its research use specifications. We have a plan to explore the hypotheses that select non-coding RNAs are deregulated according to IKZF1 mutational status. Results outside each set respective scope will not be sought or pursued. We will also comply with all datasets PUB requirements and disseminate our results to the research community through publications and presentations or otherwise broadly share any findings from their studies with the scientific community, as each PUB requirement states. Gaheen, Sharon Marie NIH TARGET Data Migration, Update, and Submission QC/QA Nov12, 2014 approved As the Leidos GDC Technical Project Manager, my team and I will download TARGET data from the GDC to verify its availability and integrity. We may also sporadically download data to verify the integrity of the data after harmonization and in support of regression testing throughout the GDC release cycle. Our primary purpose for accessing TARGET data is for continued Quality Control (QC)/Quality Assurance (QA) of the TARGET data sets (clinical, biospecimen, and molecular) available in the GDC. In support of TARGET, the GDC maintains harmonized data (sequence alignments, variant calls, etc.) for several TARGET projects including AML, NBL, WT, OS, ALL (P1, P2, P3), RT, and CCSK. As the GDC continues to develop new pipelines (such as WGS variant calls) and update existing pipelines, access to TARGET data is needed to perform QC/QA on newly harmonized data. Additionally, the GDC is developing GDC 2.0 which supports the analysis of TARGET data using new analysis tools. Access to TARGET data is needed to perform QC/QA on the new GDC 2.0 analysis tools. QC/QA will involve verifying TARGET data analysis and visualization in GDC 2.0 throughout the release cycle. Garde, Christian EVAXION BIOTECH APS Analysis of haematological cancer datasets for alternative vaccine targets to enable personalised treatments in these previously untargetable childhood and adult cancers May04, 2023 approved Based on DNA and mRNA sequencing of tumour biopsies and matched healthy tissue, we wish to identify cancer specific targets in different haematological cancer types, that have the potential to be recognized by the immune system. Here we would specifically focus on mutations and changes in expression levels that arise from normally dormant viral elements found in the patient genome but are activated by the changes in cancerous cells. In addition, we will map further sources of mutations/genomic changes that may contribute further to this pool of “targetable” differences between cancerous and healthy tissue, to allow for the design of personalized cancer vaccines in both childhood and adult haematological indications. Personalized immunotherapies for solid tumours relying on neoepitope concepts have been developed and evaluated in a clinical setting throughout the last decade (PMID: 32117272 and 31921218) in mainly adult cancers. Here, positive responses have been recorded when combined with checkpoint inhibitors targeting PD1/PD-L1, especially in indications with a high tumor mutational burden (TMB), such as NSCLC and malignant melanoma. However, these approaches have not been evaluated in detail for haematological cancers (except NCT03559413 for children and young adults with ALL) due to low TMBs for these indications (PMID: 23945592). Accordingly, this potentially safe and effective treatment is unavailable for a patient group comprising a significant number of children. In the proposed project, we aim to analyse NGS datasets (WGS, WXS, mRNAseq, genotyping, methylation), from solid cancer types and several haematological cancers, including Hodgkins lymphoma, non-Hodgkins lymphoma, acute leukemias, chronic leukemias, and myelodysplastic syndromes, for the presence of novel cancer antigen targets that would allow development of personalized cancer vaccines. For such an approach we will be identifying SNVs, INDELs, gene fusions, endogenous retroviruses (ERVs), activated transposon elements, mRNA levels, mRNA splicing, and exon skipping/inclusions changes (PMID: 36604431) across all datasets, to reveal potential tumor type specific combinations of antigens that may differ between paediatric and adult settings. If novel discoveries are made during this work, we aim to publish the findings in a peer-reviewed international journal as a follow-up study to our recent ERV-focused article (bioRxiv: 2023.03.23.533908v1). In addition, we will also be investigating longitudinal data with various treatment options (chemotherapy, epigenetic altering compounds, BTK inhibitors etc.) to understand how this may affect the mutational landscape over time. We have in an initial POC study shown that acute myeloid leukemia (AML) has an unprecedented high expression of ERVs, enabling targeting of this aggressive cancer type, with few treatment options, especially in a paediatric setting. For further evaluation of this concept, we request access to a large collection of realistic and reliable datasets across multiple cancer types to verify the observations from AML and investigate similar effects in other haematological indications including acute lymphoid leukaemia that is mostly found in children. We thus request access to the TCGA, TARGET, MMRF, BEATAML1.0, OHSU, CGCI databases amongst others. Garde, Christian EVAXION BIOTECH APS Investigation of epigetic control of ERVs in adult and pediatric hematological cancers Jan11, 2024 approved ERVs are ancient viruses located in the human genome, which are specifically expressed in cancers and may thus be a good therapeutic target. We wish to investigate the cellular mechanisms that regulate the expression of ERVs through the analysis of sequencing data of hematological tumor biopsies. This investigation will also include elucidation of the differences in epigenetic control and ERV expression in childhood and adult hematological cancers. Endogenous retroviruses (ERVs) are ancient viruses that have invaded the human genome and been passed down through generations. In healthy tissue, ERVs are generally expressed at very low levels, yet it has been demonstrated that ERVs are overexpressed in cancers and may be a source of cancer antigens (Garde et al. 2023, https://doi.org/10.1101/2023.03.23.533908). It suggested that ERVs are subject to epigenetic control potentially mediated through perturbing mutations of the major DNA methylation regulators (DNMTs and TET enzymes). Furthermore, we have observed higher overexpression of ERVs in adult AML patients compared to pediatric AML patients (Thygesen et al. 2023, https://doi.org/10.1182/blood-2023-178203). In the current project, we wish to further investigate the link between epigenetic control and ERVs in hematological cancers, and especially the differences between adult and pediatric patients. This will include an investigation of the correlation between somatic mutations of master regulators of DNA methylation (incl. DNMTs and TETs) and the ERV expression in AML biopsies. This investigation will be complemented by correlation analysis of DNA methylation (Whole genome BS-seq) and ERV expression (RNA-seq). GARNER, HAROLD VIRGINIA COLLEGE OF OSTEOPATHIC MEDICINE Tumorigenic mutations in pediatric cancers May17, 2017 closed Many childhood cancers are known to result from a combination of a small number inherited and acquired mutations. However, despite much work, the specific mutations that cause most of these cancers are not known. There may be many different combinations of mutations that give rise to childhood cancers, which may explain the discrepancy, and suggest new ways to view, diagnose and treat childhood cancers. This project searches for those combinations and then explores their mechanistic role in cancer. Many childhood cancers are known to result from a combination of a small number germline and somatic mutations. Inherited mutations predispose individuals to cancer, while somatic mutations result in tumorigenesis. Although significant progress has been made in understanding the genetic basis for some pediatric cancers, the cause of most such cancers remain unknown. One possible reason is that there are many different combinations of tumorigenic mutations (hits), and these hits could occur in a different order in different patients. The goal of the proposed work is to identify these combinations and determine if the order of mutations (inherited versus somatic) matters. We will first determine if pediatric tumorigenesis is likely to require more than one tumorigenic somatic mutation. The incidence of pediatric cancers in the TARGET database exhibits a monotonically decreasing incidence with age, which suggests a single somatic mutation may be sufficient for tumorigenesis. We will identify the specific set of genes with somatic mutations that are most likely responsible for tumorigenesis. We will also examine germline mutations to determine if the order of mutation matters. Whole exome sequencing data from pediatric tumor samples will be required for this study. Data form adult tumor samples can not be used for this purpose. The tumorigenic mutations identified in this study, will help explain the diversity of pediatric cancer phenotypes and pathologies, provide insights into the tumorigenic mechanisms that may be unique to childhood cancers, and suggest new biomarkers and therapeautic targets. We will use dbGaP data exclusively for this project and will not combine it with any other data. All results of the study will be published and study details will be made available to the general scientific community. GARNER, HAROLD VIRGINIA POLYTECHNIC INST AND ST UNIV Identifying Neuroblastoma specific biomarkers from Germline DNA Microsatellite Sequences Jun12, 2014 closed Using DNA sequences from blood and tumor samples from children who have neuroblastoma we intend to identify unique markers specific to this disease. The regions within DNA that we will study are repeat sequences known as microsatellites; differences in DNA patterns within these regions have previously been linked to cancers. Our goal is to locate unique repeats that could be (1) special to neuroblastoma (2) help understand the biology of neuroblastoma so that better targets for therapies can be discovered. As genomic technology continues to expand, biological insight from ‘junk DNA’ (non-coding DNA) is proving to be important towards understanding diseases, and especially cancers. Our laboratory has identified unique cancer-specific DNA microsatellite loci with variant repeat sequences from adult and pediatric germline (non-tumor) samples using a novel microsatellite identification software described by McIver LJ et. al. (Breast Cancer Research and Treatment; May 2014). These variant repeat sequences are composed of alleles that make up a genotype; those genotypes distinct to cancer are identified as candidate markers for disease. Medulloblastoma and neuroblastoma, both pediatric cancers originating from neural tissues have previously been shown to share gene-specific mutations. Notably, we have identified cancer-specific alleles in microsatellite repeat sequences near ARID1B in medulloblastoma. Previous studies associated with neuroblastoma implicate ARID1A and ARID1B in tumorigenesis. Therefore, we hypothesize that cancer-specific microsatellite loci with variable DNA sequences are important to understanding neuroblastoma disease etiology and are useful diagnostic markers to this pediatric cancer. Accurately identifying these loci require high quality genomic sequencing data from large sample cohorts which are available through dbGAP and within TARGET. Access to these data would allow us to identify novel disease markers for neuroblastoma and potentially identify linkages important to pediatric neural-cancers. Gartrell, Robyn COLUMBIA UNIVERSITY HEALTH SCIENCES Immunogenomic Analysis of Neuroblastoma Jun11, 2021 closed Neuroblastoma accounts for 12% of childhood cancer deaths. Previous studies have demonstrated an interplay between Neuroblastoma cells and immune cells, both immuno-stimulatory and immunosuppressive. However, Neuroblastoma is a highly heterogeneous disease and the tumor immune microenvironment (TIME) is likely to differ between subsets. Thus, understanding the TIME within distinct subsets of Neuroblastoma is critical for developing immunotherapies and determining which patients will benefit the most from these immunotherapies. To assess the differences in immune infiltration between different subsets of Neuroblastoma (characterized by factors like age, stage and mutation status), we will analyze expression data from TARGET and look at the differential expression of immune genes across subsets. Neuroblastoma is the most common extra-cranial solid tumor in infants and children and accounts for 12% of pediatric cancer deaths. Early studies described tumor-infiltrating lymphocytes (TILs) in patients with Neuroblastoma and demonstrated that these lymphocytes preferentially killed Neuroblastoma cells. Further studies have found that Neuroblastoma cells can induce immunosuppression by specifically inhibiting these T cells. Thus, there is both an urgent need and a strong rationale to develop immunotherapies for Neuroblastoma. However, Neuroblastoma is a highly heterogeneous disease and immunotherapeutic approaches are likely to work for some categories of patients but not for others. Thus, understanding the tumor immune microenvironment (TIME) within distinct subsets of Neuroblastoma is critical to maximize results from clinical trials allowing patients to benefit from immunotherapy. To do this, we will analyze Neuroblastoma expression data from TARGET, comparing age, stage and MYCN status to test the hypothesis that subsets of immune genes are differentially expressed in high and low risk neuroblastoma patients and that MYCN status impacts immune gene expression. Thus, we are requesting TARGET dataset access and the use of the data will follow the terms of the model Data Use Certification. Public posting of genomic summary results will not occur. First, we will download the raw RNA-seq files, align them uniformly and generate feature counts. We will then perform both differential expression analysis of immune gene signatures and GSEA. The significant gene signatures identified in our analysis can then be used to infer the phenotypic population of immune cells in the tumor microenvironment. This methodological work will advance the understanding of the pediatric immune system and how it impacts clinical outcomes in subsets of Neuroblastoma, paving the way for potential therapeutic targets. Gerstung, Moritz EUROPEAN MOLECULAR BIOLOGY LABORATORY Temporal sequences of mutations in childhood tumours Aug13, 2018 closed Cancers are often caused by multiple mutations. The ordering of these in childhood cancer is unknown. In this project we will use cutting edge analysis techniques to define the genetic “time code” of childhood cancer. In this project, we propose to use cancer sequencing data to identify driver mutations and determine the temporal order of their acquisition. Advances in analysis techniques of cancer genomes has enabled the ordering of mutations. These have recently been applied to on a large scale adult cancer, but the precise ordering of mutations in childhood cancer remains unknown. While part of this work could be carried out on public mutation calls available, evaluation of the original sequencing data is frequently needed to distinguish genuine recurrent mutations from residual artifacts or population polymorphisms. We would also like to evaluate the impact of running other somatic variant callers, and in particular those developed by our group, in the study of selection in cancer. The intended use of the data is purely academic and all findings will be published in peer-reviewed journals. GETZ, GAD BROAD INSTITUTE, INC. Analysis of Somatic and Germline Alterations from Human Cancer Model Initiative (HCMI) Jun22, 2023 approved We propose to use the HCMI data to search for cancer-causing changes. We will combine this dataset with other datasets to obtain more in-depth insights into cancer. This data set will additionally help us to improve our algorithms for integrated genomic analysis. We propose to use the HCMI data to characterize the germline and somatic structural alterations (amplification, deletion, and LOH), Somatic Structural Variation, and mutations. We will compare our data with that from other centers in the CCG network to assess concordance and to investigate discrepancies. The multi-platform dataset of the HCMI will also help us to optimize algorithms for integrated genomic analysis. Furthermore, the integration of this data will allow a deep characterization of cancer on a per-sample basis. HCMI data will enable us to extend our analyses to reveal genomic markers associated with treatment efficacy. TCGA and TARGET will be used as tumor reference for DNA and RNA analyses, with TARGET data used only for studying pediatric cancer models. GETZ, GAD BROAD INSTITUTE, INC. Pan-cancer analysis of cancer will detect both common and rare alterations in tumors. Dec28, 2023 approved While enormous strides have been made in cancer research in the last decade, cancer is still a leading cause of death worldwide. The onset of new sequencing technologies, along with the directed assembly of patient samples by groups worldwide, is allowing researchers to identify the common genomic alterations in numerous tumor types. To date however, these data sets have been too small, and hence underpowered, to detect the many infrequent or rare events that drive cancer. Our goal in this project is to assemble and analyze a data set of adequate size to be powered to discover the complete array of genomic alterations, both common and rare, found in cancer . As next generation sequencing technologies have matured, numerous efforts have been initiated to catalog the full spectrum of alterations in cancer via analysis of whole exome and whole genome sequencing of tumor specimens. These studies have revealed a picture of tumor complexity, driven primarily by point mutations and copy number alterations, both germline and somatic, targeting particular genes, and including many events in known cancer genes but many more infrequent and rare events in a variety of other genes. Recent work by our group has determined that most studies have been underpowered to discriminate the significant rare cancer genes from passengers or from noise. Thus, the emerging cancer catalog is still incomplete. Furthermore, our power calculations have revealed that, depending on the background mutation rate of a tumor type, the number of samples needed in order to detect the significant cancer genes for that tumor can be calculated, and that to-date, no analyses have been adequately powered to detect somatic mutations found at a 2% frequency. This underscores the need for the analysis of larger data sets. Therefore, we aim to perform a large-scale analysis, both tumor-type specific and pan-cancer, integrating all types of events, somatic and germline events, clonal and sub-clonal, found in all publicly available whole exome, whole genome, RNA sequencing and methylation data in order to identify all significantly altered genes, both common and rare, known and unknown, across both specific tumors types and the set of all tumors. We plan to perform the same analysis done previously to determine variants in the cancer samples, and then aggregate the data and evaluate in a statistical framework. We have reviewed all consent forms for all data requested. In the case where data use is restricted to tumor type or pediatric cancer analysis only, we will use these only for appropriate analyses and withhold from the general pan-cancer analysis. We may combine these with other data sets (e.g. ICGC, collaborators, or independent research) in accordance with all associated Data Use Limitations (DULs). DULs requiring public dissemination of results will be complied with. In all cases, data use agreements will be reviewed and work will proceed in accordance with all Data Use Agreements. GETZ, GAD BROAD INSTITUTE, INC. Broad Institute Cancer Genomics Cloud Mar20, 2018 closed The growth of large-scale DNA sequence data for cancer research and its routine use in translational science is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic data and associated clinical annotations collected from various studies is critical to accelerating research and making new discoveries. This project aims to support the development of a new model for data analysis that will allow groups ranging in size from single laboratories to large research consortia to derive value from the investments made in The Cancer Genome Atlas data without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. This project, FireCloud is one of three NCI Cancer Genomics Cloud Pilots, a program to support a new model for the computational analysis of biological data in which a data repository is co-located with computational capacity, with an interface that provides data access while ensuring data security. Primary data for this project will include open and controlled access TCGA and TARGET data. Open access will include clinical, Level-3 molecular, and somatic mutation data. Controlled access will include Level-1 sequence and SNP-chip data. Users will also be able to upload and compute on private data. TARGET data analysis will be restricted to pediatric cancer projects and not included with other cancer projects. The research objective of this project is to pilot hosting TCGA and TARGET data in a cloud to allow users who do not otherwise have the necessary infrastructure to compute against this large dataset quickly and efficiently. Analyses examples include mutation calling, integration of data types, and analysis of pathways and regulatory networks. In support of this effort, our institution has applied for and been granted NIH Trusted Partner status. FireCloud was launched in January 2016, controlled access data is made available to those users of the FireCloud platform who have been authenticated through NIH and verified to have received dbGaP access for use of controlled access data. We will have completed a Security Impact Assessment (SIA) and received NCI’s approval. We will have implemented the necessary authentication and authorization protocols to ensure that only dbGaP-authorized users will be able to gain access to the controlled data. GETZ, GAD BROAD INSTITUTE, INC. Broad Institute's FireCloud: TARGET Jan17, 2017 closed The growth of large-scale DNA sequence data for cancer research and its routine use in translational science is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic data and associated clinical annotations collected from various studies is critical to accelerating research and making new discoveries. This project aims to support the development of a new model for data analysis that will allow groups ranging in size from single laboratories to large research consortia to derive value from the investments made in TARGET data without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. The Broad Institute’s FireCloud is one of three NCI Cancer Genomics Cloud Pilots, a program to support a new model for the computational analysis of biological data in which a data repository is co-located with computational capacity, with an interface that provides data access while ensuring data security. Primary data for this project will include open and controlled access data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) dataset. Open access will include clinical, Level-3 molecular, and somatic mutation data. Controlled access will include Level-1 sequence and SNP-chip data. All data will be obtained from the Genomic Data Commons. The research objective of this project is to carry out analysis of pediatric cancer genomics datasets. Example analyses include mutation calling, integration of data types, and analysis of pathways and regulatory networks. Firecloud incorporates data access restrictions via "authorization domains" which will ensure that researchers abide by TARGET data use policies. Our organization applied for and were granted Trusted Partner status for hosting the TARGET on September 28, 2016. We have implemented the necessary authentication, authorization and access protocols to ensure that only authorized users will be able to gain access to Controlled TARGET data. We follow all NIH Trusted Partner requirements for storage and distribution of these data. The system security is governed by an Authority to Operate (ATO) which was granted under the FISMA-moderate level on May 19, 2016. We will perform necessary security impact assessments and seek review and approval from the NCI. Gimba, Etel INSTITUTO NACIONAL DE CANCER Characterization of total osteopontin and its splice variants in T-cell acute lympbhobastic leukemia Oct27, 2022 expired Leukemia is the most cause of childhood death all over the world. Many leukemia childhood patients develop a more aggressive disease that could affect the central nervous system (CNS). This scenario prompts scientists to find alternatives to improve the patient’s quality of life. One protein named osteopontin (OPN) has been reported as an important molecule involved with more aggressive forms of many cancer types. By studying this protein and the way it is expressed in leukemia, mainly child T-ALL, we may understand how this disease develops and open new avenues to find treatment options for childhood leukemia patients. Acute lymphoblastic leucemia (ALL) is the most common childhood malignance, corresponding to approximately 80% of cases. When ALL patients are treated with the most advanced therapeutic protocols, they present overall survival rates about 90%. Despite these advances, T-ALL patients display high relapse rates in extramedullary sites, mainly in the central nervous system (CNS). Hence, ALL patients undergo intensive chemotherapy regimens that may cause serious sequels or even death. In this context, many studies are trying to find gene products with altered expression in patients with CNS infiltration that could be potential biomarkers or therapeutic targets for this disease. In this scenario, osteopontin (OPN) and its splice variants (OPN-SV) have been proposed as candidates. OPN is a glycophosphoprotein know to be involved in several steps of tumor progression, such as invasion, migration, adhesion, metastasis, angiogenesis and chemoresistance. It’s primary transcript undoergoes alternative splicing, generating at least five OPN splicing variants (OPN-SV), named OPNa, OPNb, OPNc, OPN4 and OPN5. OPNa is the full-length isoform, OPNb lacks exon 5, OPNc lacks exon 4, OPN4 lacks both exons 4 and 5 and OPN5 present an extra exon. These OPN-SI are known to be important biomarkers in solid tumors, but their roles in leukemia has not been explored. In this context, our group recently demonstrated that among the 5 previously described OPN-SV, OPNc expression patterns has been associated in a more aggressive B-cell ALL phenotypes, including CNS infiltration. This current study aims to extend the studies in ALL by further evaluating the expression levels of total and OPN-SV in T-ALL patient samples and correlate it with disease-associated prognostic features, mainly CNS infiltration status and overall survival. To achieve this, we will need access to the transcriptome data of T-ALL patients from TARGET database and corresponding follow-up data, besides and CNS commitment information. These results will be the basis to further develop in vitro functional assays based on these preliminary information. Giorgi, Federico UNIVERSITY OF BOLOGNA Transcript-level Analysis of Neuroblastoma Gene Networks Aug13, 2018 closed Roughly 250,000 children globally are diagnosed with neuroblastoma, a pediatric tumor with poorly characterized genetical and molecular causes. In my previous research experience, I have developed methods to analyze tumor samples from adult patients to identify the genetical causes of their onset, and gene expression markers to predict the survival chance and the best pharmacological approach. Recently, we have identified in neuroblastoma cells (grown on petri dishes) a set of new messenger RNAs previously not characterized. We would therefore like to test on the available neuroblastoma datasets if and how much these RNAs are present also in pediatric patient samples and if they are associated to survival. We will need access to both gene expression and genomic data, in order to understand if these transcripts originate from genomic mutations or from alternative splicing mechanisms. Our research aims at understanding the behavior of transcriptional networks in neuroblastoma, specifically the expression patterns of genes targeted by E2F transcription factors. We have analyzed RNASeq data from cell lines and identified novel isoforms associated to genetical features of these cells (most notably, MYCN amplification status and ALK mutations) and found previously uncharacterized expressed RNAs, both coding (novel genes, novel isoforms) and noncoding (lncRNAs), some of which are co-expressed with E2F gene expression profiles. We would therefore like to access high quality RNASeq datasets to assess if these novel RNA species are expressed also in pediatric patient samples and can therefore have clinical repercussions. Technically, we will download the neuroblastoma RNASeq samples, align them on the human genome with standard software (Tophat, Hisat) and quantify transcript abundance with Salmon/Kallisto tools based on our novel list of putative cell line-derived transcripts. Finally, we will try to associate the expression levels of novel transcriptional species with genetical and clinical features (survival, gender) associated to the samples. In order to do so, we will also need access to the TARGET neuroblastoma genome-wide dataset, to determine whether the novel transcripts arise from post-transcriptional mechanisms (alternative splicing, RNA editing) or from genomic alterations. Gisselsson Nord, David LUND UNIVERSITY Immunogenomic Characterisation of Paediatric Solid Tumours Jun26, 2019 closed Cancer is a disease of uncontrolled cell division. When the cancerous cells divide, they accumulate genetic changes. These changes may potentially be recognized by the patients’ immune system as foreign (much like how the immune system recognizes bacteria and viruses as foreign). It has been shown in multiple types of tumours that affect adults, that the number of genetic changes that the immune system can recognize is important for prognosis. This project will study whether there exists such a connection in two common, extra-cranial, solid paediatric tumours, Wilms Tumor and Neuroblastoma. Cancer cells that are recognized by the immune system will typically be killed by immune cells, this creates a pressure on the cancer cells to evolve in such a way that they are not recognized by the patients’ immune cells. One such way is removing (or inactivating) the genes responsible for presenting proteins to the immune system. Within the scope of this project, we will also look at how often and through which mechanism this happens in Wilms Tumor and Neuroblastoma. Increased knowledge of the interplay between tumour cells and the immune system will hopefully lead to better therapies for these diseases. This project aims to provide a first estimate on the level of neoantigens as well as tumour infiltrating immune cells in Wilms Tumor (WT) and Neuroblastoma (NB) cases. In brief, this project will try to answer the question whether WTs and NBs can be considered immunologically “hot” tumours, i.e. tumours with an ongoing immune response at time of sampling, or if they are immunologically “cold”, i.e. tumours that are devoid of infiltrating immune cells with low levels of immunogenic neo-antigens. More specifically, somatic point mutations and small insertions and deletions, called from whole exome sequencing data, will be analysed together with the patients’ constitutional HLA genotype (inferred from whole exome sequencing data from paired normal tissue) using state of the art bioinformatics in order to elucidate whether specific mutations seen in tumour cases are predicted to elicit a response from the immune system. Estimates of tumour purity and allele-specific copy number changes will also be derived using modern bioinformatics techniques, this will enable us to map the interplay between predicted neo-antigens and tumour heterogeneity. This data will be correlated with RNA-Seq based estimates of tumour infiltrating immune cells, estimated using standard bioinformatics methods. In order to further unravel the potential interplay between tumour cells and the host immune system, we will also analyse whether there are somatic mutations or copy number changes targeting the HLA locus, or other genes involved in antigen presentation, as has been described in various forms of adult cancers. If present, such genetic aberrations indicate a selective pressure from the immune system on the tumour cells. These genomic data will then be combined with clinical data to understand whether the interplay between tumour cells and the host immune system informs patient outcome. The data requested through this application will be analysed within the same study as multiregional genomics data from a smaller cohort of WT and NB cases (Karlsson et al, Nat Genet, PMID 29867221). However, the requested datasets and the dataset above will be analysed separately and will only be combined in the last step (i.e. publication). We expect that this project will generate important knowledge regarding the interplay between tumour cells and the host immune system in two of the major types of solid paediatric cancers outside the central nervous system. Further understanding of this interplay is expected to be important when designing clinical trials of new immunotherapeutic approaches in both Wilms Tumor and Neuroblastoma. Godbout, Roseline UNIVERSITY OF ALBERTA Effect of Elevated Levels of DDX1 on the Transcriptome of Neuroblastomas Jul02, 2018 closed In our lab, we have found that high levels of a protein called DEAD Box 1 (DDX1) is amplified in a subset of childhood tumours including retinoblastoma and neuroblasoma. Our results indicate that levels of DDX1 may determine how cancer cells respond to therapy. The role of DDX1 in the cell is to modify the products of genes, called RNAs. We are proposing to use the TARGET sequencing database to identify differences in the genes expressed in neuroblastoma with low levels of DDX1 compared to high levels of DDX1. Our goal is to gain insight into the role of DDX1 in altering the expression of genes associated with resistance to conventional cancer treatment. DEAD box 1 (DDX1) is an RNA helicase that unwinds both RNA/RNA and RNA/DNA duplexes. DDX1 is co-amplified with MYCN in a subset of pediatric cancers such as neuroblastoma and retinoblastoma. While predominantly localized in the nucleus, DDX1 is also recruited to stress granules in the cytoplasm as well as previously uncharacterized large cytoplasmic aggregates in early stage mouse embryos. Based on data generated using RNA immunoprecipitation followed by sequencing, we found that DDX1 is preferentially bound to a subset of mRNAs. Interestingly, comparison of RNAs in neuroblastomas that express either high or normal (low) levels of DDX1 demonstrated increases in ribosomal protein mRNAs. As DDX1 depletion results in changes in alternative splicing and intron retention, we would like to gain further insight into potential stress-related transcript alterations in DDX1-high versus DDX1-low neuroblastoma. First, we will use preprocessed open access data to separate patients with high levels of DDX1 from patients with normal levels of DDX1. With these data, we will identify .bam files for our preliminary study. We will then download these selected .bam files and use the MISO as well as cuffdiff programs to examine individual transcript levels. We will validate putative targets using experimental approaches including qPCR and western blot analysis on neuroblastoma cell lines with high levels of DDX1 and normal levels of DDX1. Goodarzi, Hani UNIVERSITY OF CALIFORNIA, SAN FRANCISCO The role of RBM15-MKL1 fusion in controlling the alternative polyadenylation in pediatric acute megakaryoblastic leukemia Aug19, 2021 closed We anticipate that this data will advance our understanding of the molecular dysregulation in the context of RBM15-MKL1 fusion. Interestingly, several studies have reported that small molecule drugs could be used to modulate transcription termination and by extension to alter the equilibrium of alternative polyadenylation site usage. It is thus plausible that clarifying the molecular role of the RBM15-MKL1 fusion in alternative polyadenylation would allow to open therapeutic avenues for counteracting the t(1;22) translocation, as no specific treatment is currently available for the pediatric AMKL. RNA-related mechanisms are at the core of regulation of gene expression, and the diversity of molecular processes implicated in the RNA life cycle contributes to the complexity of these regulatory phenomena. For example, the current data supports a model where mRNA transcription is at interplay with a variety of chromatin states, the splicing machinery, and other co-transcriptional RNA processing events, including 5’ mRNA capping and 3’ polyadenylation. Further on, an intricate network of post-transcriptional regulation takes on, controlling the localization, stability and translation of the mRNA. Post-transcriptional RNA modifications have recently come into play and participate at every stage of the RNA lifecycle. N6-methyladenosine (m6A) is the most abundant and by far the most studied RNA modification to date, implicated in mRNA splicing, translation and decay. RBM15 has been recently identified as a member of the m6A “writer” complex, aiding the METTL3 RNA methyltransferase to deposit the m6A modification on RNA. Several studies have revealed that RBM15, along its paralog RBM15B, are necessary for the X chromosome inactivation in human cells. Specifically, RBM15 is required for the m6A modification of XIST RNA, which plays an essential role in X inactivation. However, the data on other molecular functions of RBM15, especially involving mRNAs, remain limited. RBM15 has been first identified over twenty years ago as a part chromosome translocation event t(1;22)(p13;q13), specifically associated with a subgroup of pediatric acute myeloid leukemia (AML): non-Down syndrome AML subgroup M7, or acute megakaryoblastic leukemia (AMKL). In this translocation, RBM15 is fused with MKL1, a transcription factor, and the majority of the efforts to establish the functional implication of the RBM15-MKL1 (also known as OTT-MAL) fusion in the onset of AMKL focused on MKL1. The role of RBM15 in the RBM15-MKL1 fusion context has not been so far addressed, mostly owing to the lack of data on the molecular function of RBM15. Our current data points to the role of RBM15 in controlling the alternative polyadenylation. To address this molecular phenotype, we can use the standard RNA-seq data to model the usage of annotated polyadenylation sites based on the read count density upstream and downstream of a given site. We have recently developed a statistical algorithm, termed APAlog, aimed at comparing the usage of alternative polyadenylation sites in different conditions (Navickas et al., in preparation). Importantly, the alternative polyadenylation information is lost while performing general comparison of gene expression. This is why we need to start our analyses with the raw RNA-seq data, to incorporate the alternative polyadenylation modeling in the analytical process. We propose to reanalyze the data from the following clinical samples: TARGET-20-PAVWNG TARGET-20-PAXBLJ TARGET-20-PAVCJW TARGET-20-PAXHPH TARGET-20-PAXJPZ TARGET-20-PAUUTY TARGET-20-PAVHWZ TARGET-20-PAWDWA TARGET-20-PAWRUX TARGET-21-PATKBK TARGET-20-PARFIW TARGET-20-PASAUT TARGET-20-PATMDJ TARGET-20-PASZJC We will perform pairwise comparisons of the alternative polyadenylation between the tumor cells and normal PBMCs for each sample using APAlog. We will use an equivalent number of non-RBM15-MKL1 fusion samples from the same pediatric AML cohort as controls, to determine if the observed signal is specific for the RBM15-MKL1 fusion. Gopalakrishnapillai, Anilkumar ALFRED I. DU PONT HOSP FOR CHILDREN NOVEL TARGETED THERAPIES FOR DOWN SYNDROME MYELOID LEUKEMIA May02, 2024 approved Down syndrome (DS) is recognized as one of the most important leukemia-predisposing syndromes. Specifically, 1-2% of DS children develop myeloid leukemia (DS-ML) before age 5, preceded by a pre-leukemic phase. Aside from low dose chemotherapy, there are no treatment options for TAM and no preventative measures to stall DS-ML onset. Nearly 10-15% of children with DS-ML are either refractory to treatment or suffer early relapse. These children with refractory disease do not benefit from dose intensification or bone marrow transplant and therefore face a dismal outcome with 3-year event-free survival less than 21%. Moreover, treatment-related toxicity and morbidity is a major cause of death in DS-ML patients. Therefore, novel therapeutic options are needed for this rare disease. Using (CRISPR)/Cas9 mediated gene targeting for stepwise introduction of disease-specific mutations in induced pluripotent stem cells from individuals with Down syndrome, we modelled DS-ML. Using these models, we have identified novel therapeutic targets for DS-ML. We will use TARGET data to validate these protein targets to prioritize their preclinical evaluation. Down syndrome (DS), with triplication of chromosome 21, is recognized as one of the most important leukemia-predisposing syndromes. Specifically, 1-2% of DS children develop myeloid leukemia (DS-ML) before age 5, preceded by a pre-leukemic phase termed transient abnormal myelopoiesis (TAM). TAM is more prevalent than DS-ML, with approximately 30% of TAM cases progressing to DS-ML. Aside from low dose chemotherapy, there are no treatment options for TAM and no preventative measures to stall DS-ML onset. Nearly 10-15% of children with DS-ML are either refractory to treatment or suffer early relapse. These children with refractory disease do not benefit from dose intensification or bone marrow transplant and therefore face a dismal outcome with 3-year event-free survival less than 21%. Moreover, treatment-related toxicity and morbidity is a major cause of death in DS-ML patients. Therefore, novel therapeutic options are needed for this rare disease. Both TAM and DS-ML are characterized by the pathognomonic mutation in the gene encoding essential hematopoietic transcription factor GATA1. This mutation results in the production of N-terminally truncated mutant GATA1 protein (GATA1s). Trisomy 21 and GATA1s are sufficient to induce TAM, while additional co-operating mutations in genes such as STAG2 are required for DS-ML leukemogenesis. Using clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 mediated gene targeting for stepwise introduction of GATA1 or GATA1 and STAG2 mutations in induced pluripotent stem cells (iPSCs) with trisomy 21, we modelled TAM and DS-ML respectively. Using these models, we have identified novel therapeutic targets for DS-ML. We will use TARGET data to validate these protein targets to prioritize their preclinical evaluation. GRAEBER, THOMAS UNIVERSITY OF CALIFORNIA LOS ANGELES Molecular determinants of pediatric cancer aggressiveness Sep11, 2017 expired Cancers with small cell carcinoma or neuroendocrine features, such as neuroblastomas, are highly malignant and arise across multiple tissue types, and in both pediatric and adult cancers. Although small cell cancers / neuroblastomas are initially sensitive to radiochemotherapy, relapse rate is high and prognosis is poor. We aim to better understand the tumorigenesis underlying small cell cancers to inform treatment approaches. We will leverage both pediatric and adult cancer types, and in particular will make comparisons across different tumor types (including non-small cell cancers) to identify the core regulatory programs driving small cell cancers. Working with pediatric oncologists, we will test candidate pediatric cancer therapeutic approaches in model systems. It is our intent to broadly share our results with the scientific community by publishing our findings. Objectives: The proposed study will classify pediatric neuroendocrine cancers using molecular features such as copy number variation (CNV), gene expression, and DNA methylation signatures; with the ultimate goal of guiding pediatric cancer treatment. Our main focus is on cancers that are closely related to small cell or neuroendocrine cancers (this includes the pediatric cancers of neuroblastomas, osteosarcomas, and Wilms’ Tumors). A notable percentage of pediatric cancers are in the category of being related to small cell or neuroendocrine cancers. Study design: Since neuroendocrine cancers occur in both pediatric and adult settings, we will leverage both types of cancer to learn more about the pediatric case. Pan-cancer datasets of small cell and non-small cell carcinomas are collected across tissue types and across various platforms. Datasets include measurements of gene expression, copy number alterations, and mutational signatures. To guide our pediatric cancer interpretation and translation, we collaborate with Brigitte Gomperts, a pediatric oncologist in the UCLA Department of Pediatrics (her lab would not directly analyze any of the data). Analysis Plan: Principal component analysis of copy number variation and differential gene expression, and analysis of mutational signatures of cancer types across tissues and across platforms allow identification of gene networks that may be drivers of genomic instability, metastatic potential and other cancer phenotypes. Methods such as nonnegative matrix factorization and support vector machines are used to build phenotypes predictors based on cancer molecular features. Phenotypic characteristics provided along with the tumor data will allow an evaluation of the accuracy of molecular features used for classification. Features predictive of cancer phenotypes are extracted for biological interpretation. Findings will be interpreted in a biological context, and candidate therapeutic approaches will be pre-clinically tested in neuroendocrine cancer model systems. We will adhere to all indicated data use limitations. No collaborators, internal or external, will work directly with the data. It is our intent to broadly share our results with the scientific community by publishing our findings. GRAEBER, THOMAS UNIVERSITY OF CALIFORNIA LOS ANGELES Molecular determinants of cancer aggressiveness Dec19, 2016 expired “Small round blue cell tumors of childhood” (SRBCTs) and sarcomas are a subtype of pediatric cancer that span multiple tissues of origin (neuroblastoma, Ewing’s sarcoma, retinoblastoma, hepatoblastoma, Wilms’ Tumors, lymphoma, osteosarcoma, Clear Cell Sarcoma of the Kidney (CCSK)), but share highly malignant cancer features. Small cell carcinomas (or neuroendocrine cancers) also occur in adult cancers. Although SRBCTs and sarcomas are initially sensitive to radiochemotherapy, relapse rate is high and prognosis is poor. We aim to better understand the tumorigenesis underlying small cell cancers to inform treatment approaches. We will leverage both pediatric and adult cancer types, and in particular will make comparisons across different tumor types (including non-small cell cancers) to identify the core regulatory programs driving small cell cancers. Working with pediatric oncologists, we will test candidate pediatric cancer therapeutic approaches in model systems. Objectives: The proposed study will classify “small round blue cell tumors of childhood” (SRBCTs) and pediatric sarcomas using molecular features such as copy number variation (CNV), gene expression, and DNA methylation signatures; with the ultimate goal of guiding pediatric cancer treatment. SRBCTs and sarcomas are subtypes of undifferentiated pediatric cancer that span multiple tissues of origin (neuroblastoma, Ewing’s sarcoma, retinoblastoma, hepatoblastoma, Wilms’ Tumors, lymphoma, osteosarcoma, Clear Cell Sarcoma of the Kidney (CCSK)), but share highly malignant cancer features, initially respond to treatment, but have a high rate of relapse. Small cell carcinomas (or neuroendocrine cancers) and sarcomas also occur in adult cancers. Study design: Since neuroendocrine and sarcoma cancers occur in both pediatric and adult settings, we will leverage both types of cancer to learn more about the pediatric case. Pan-cancer datasets of small cell and non-small cell carcinomas are collected across tissue types and across various platforms. Datasets include measurements of gene expression, copy number alterations, and mutational signatures. To guide our pediatric cancer interpretation and translation, we collaborate with Brigitte Gomperts, a pediatric oncologist in the UCLA Department of Pediatrics (her lab would not directly analyze any of the data). We will incorporate data from published prostate and lung cancers (Beltran et al., 2011; Beltran et al., 2016, George et al., 2016; Takeuchi et al., 2006; CLCGP et al., 2013), but this will not create any additional risks to participants. Analysis Plan: Principal component analysis of copy number variation and differential gene expression, and analysis of mutational signatures of cancer types across tissues allow identification of gene networks that may be drivers of genomic instability, metastatic potential and other cancer phenotypes. Phenotypic characteristics provided along with the tumor data will allow an evaluation of the accuracy of molecular features used for classification. Findings will be interpreted in a biological context, and candidate therapeutic approaches will be pre-clinically tested in SRBCT and sarcoma cancer model systems. We will adhere to all indicated data use limitations. Grau, Michael UNIVERSITY OF MUENSTER Lipid metabolism, transcriptomic subtype signatures, and clinical outcomes in AML Apr18, 2024 approved Acute myeloid leukemia (AML) remains a frequently fatal disease. The objective of our research is to investigate the metabolism with a focus on the role of the lipid metabolism and potentially associated genomic or transcriptomic signatures in the development, treatment, and outcome of AML. We aim to contribute to the discovery of novel treatment targets, improve the selection of leukemia therapies, and ultimately enhance treatment outcomes in AML. Our research investigates the metabolism with a focus on the lipid metabolism and its influence on treatment outcomes in acute myeloid leukemia (AML). Our aim is to reveal genomic and transcriptomic signatures, in particular of genetic modulation and differential expression of various transporters and receptors involved in the lipid metabolism, and their association with treatment outcome and prognosis in AML. We aim to utilize DNA and RNA sequencing data from studies such as BEAT AML and TARGET to identify mutational signatures, transcriptomic signatures, and lipid protein expression profiles in different AML subtypes. Two independent large datasets will be processed for validation purposes, in particular to investigate applicability to pediatric patients (potential age differences of the metabolic or mutational signatures). Besides known AML subtypes, unsupervised analysis driven by the data will be utilized to potentially discover links to other lymphoma/leukemia entities and/or novel subtypes that could profit from adapted therapy protocols. Processing will occur on university-owned hardware, not in the cloud. Our results will potentially aid in treatment selection, prognostication, and enhancement of treatment outcomes in AML. Findings will be made public, while the raw data will be kept secure according to data access policies, and will not leave the encryption chain (local storage with BitLocker). Only aggregated data will be published to guard against identification of individual subjects. GROSSMAN, ROBERT UNIVERSITY OF CHICAGO National Cancer Institute’s Cancer Research Data Commons Mar11, 2021 approved The NCI, Cancer Research Data Commons is a cloud-based data science infrastructure that connects cancer data sets, including the Cancer Data Services, with analysis tools. It provides a foundation for the cancer research community to make new scientific discoveries and lower the burden of cancer. With Data Commons Framework services, CRDC is designed to serve as a foundation for future expanded cancer data access, with related computational capabilities and bioinformatics cloud research. The National Cancer Institute’s (NCI) Cancer Research Data Commons (CRDC) is a virtual expandable computing infrastructure that provides secure access to many different cancer related data types across scientific domain. The goal of the CRDC , is to co-locate data storage, and computing infrastructure in the cloud with tools for analyzing and sharing data to create an interoperable resource for the cancer research community to lower the burden of cancer. NCI’s CRDC leverages a central component, the NCI Data Commons Framework Services (DCF), to authorize access to data repositories. The Cancer Data Service (CDS) within the NCI’s CRDC is a resource for sharing NCI-funded data that is currently not hosted by other data repositories. The CDS, as a data service, with DCF is providing authorized researchers access to cancer datasets including the Cancer Genome Atlas (TCGA), Center for Cancer Genomics (CCG), Division of Cancer Treatment and Diagnosis (DCTD), Pediatric Preclinical Testing Consortium (PPTC), Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), and the Development of a Tumor Molecular Analysis Program (LCCC 1108). Note that all external users of the CRDC must independently apply for dbGaP access and receive approval before getting access to the data in CRDC’s CDS, as governed by the DCF. To apply or access by API any controlled access data, a user of the CRDC must first use the eRA commons credentials to log into the component of the CRDC system that contains controlled access data. Once a day, the DCF retrieves a list of users that are authorized to access dbGaP data. The DCF uses this list to check that the DAR has been approved by dbGaP; and that the user has agreed to the terms of the Data Use Certification (DUC) Agreement and in the dbGaP Approved User Code of Conduct. Wherever appropriate, users of protected datasets have agreed to the Data Use Limitation. This may include, but is not limited to, use of protected datasets to be for research projects that can only be conducted using specified data and that have likely relevance to developing more effective treatments, diagnostic tests, or prognostic markers for cancers. Additionally, when appropriate, users have acknowledged that research projects proposing methods, software, or other tools development are not considered to be acceptable uses for the data. GROSSMAN, ROBERT UNIVERSITY OF CHICAGO Genomic Data Commons Jul16, 2014 approved The Genomic Data Commons (GDC) is a data service providing authorized researchers access to cancer genomics data in a uniform way and from a single data repository. It is also designed to serve as a foundation for future expanded data access, computational capabilities and bioinformatics cloud research. The Genomic Data Commons (GDC) is a data service providing authorized researchers access to The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research To Generate Effective Treatments (TARGET), Cancer Genome Characterization Initiative (CGCI), and other datasets authorized by the Center for Cancer Genomics at the National Cancer Institute (NCI). The goal of the GDC is to provide cancer genomics data in a uniform and co-localized database and to serve as a foundation for future expanded data access, computational capabilities and bioinformatics cloud research. Note that all external users of the GDC must independently apply for dbGaP access and receive approval before getting access to the data through the GDC. To download or access by API any controlled access data, a user of the GDC must first use their eRA Commons credentials to log into the component of the GDC system that contains controlled access data. Once a day the GDC retrieves a list of users that are authorized to access dbGaP data. The GDC uses this list to check that users have the appropriate authorizations to access this data. Specifically: that the user has submitted a Data Access Request (DAR) to dbGaP or is an approved user under a DAR; that the user has reviewed the project specific Data Use Limitations (DUL), if any; that the DAR has been approved by dbGaP; and that the user has agreed to the terms in the Data Use Certification (DUC) Agreement and in the dbGaP Approved User Code of Conduct. In particular, users of protected TARGET datasets have agreed to the Data Use Limitation that TARGET datasets should be for research projects that can only be conducted using pediatric data and that have likely relevance to developing more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. Further users have acknowledged that research projects proposing methods, software, or other tool development are not considered to be acceptable uses of the data. The GDC Project has been approved as a NIH Trusted Partner and follows core NIH standards for establishing data quality and data management service protocols as required by this program. Gruber, Tanja ST. JUDE CHILDREN'S RESEARCH HOSPITAL Genomic Comparison of Pediatric AML and Pediatric Mixed Phenotype Acute Leukemia Jan23, 2019 closed Acute Leukemia is a cancer of the blood. There are multiple subtypes of leukemia in pediatrics and we will compare two distinct subtypes that share a common feature of having proteins on the surface of the cells that are referred to as myeloid markers. We want to study the similarities and differences between these subtypes to gain a greater understanding of the biology of these diseases. We have a cohort of pediatric AML that has undergone whole genome and RNA sequencing. We would like to compare the mutational spectrum and gene expression signatures of pediatric mixed phenotype acute leukemia with pediatric acute myeloid leukemia. We are requesting access to whole genome, whole exome, and RNA sequencing files from SRP011999 (publication 10.1038/s41586-018-0436-0) which is listed under gap accession:phs000464 and the gap parent accession phs:phs000218. Gruber, Tanja ST. JUDE CHILDREN'S RESEARCH HOSPITAL Genomic Analysis of Pediatric Acute Myeloid Leukemia Jul13, 2018 closed Cancer cells arise from normal cells that have acquired changes in DNA, also called "mutations" that cause the cells to become malignant. We will determine the mutations that are present in children with a form of blood cancer called "AML" by analyzing the DNA. We have completed an integrated genomic analysis of ~200 pediatric AML patients using whole genome, whole exome, and RNA sequencing to identify recurrent driver and cooperating mutations in coding and non-coding regions of the genome. This data is as yet unpublished. We request to obtain all available next generation sequencing data for pediatric AML cases sequenced as part of the TARGET project. These data will be used to supplement our cohort with additional cases and serve to validate our findings. Gu, Shuo NIH Analysis of miRNA dysregulations in Cancer Dec18, 2017 approved MicroRNA (miRNA), as a small non-coding RNA, plays an essential role in gene regulation networks and cancer. We are interested in understanding how miRNA dysregulations contribute to childhood cancer development by analyzing the TCGA and TARGET data. We recently found that changes on cleavage fidelity during miRNA biogenesis pathway result in production of various miRNA isoforms and changes in their biological functions. Reanalyzing miRNA-Seq data in TCGA and TARGET will allow us to detect the impact that mutations on different elements of the miRNA biogenesis have on the miRNA production and establish its correlation to childhood cancer development. Our study will provide insights into the mechanism of miRNA dysregulation in cancer, identifying novel cancer biomarkers and potential targets for therapeutic treatments. The purpose of our research is to identify cancer associated miRNA dysregulations in cancer, in particular within miRNA biogenesis and post-maturation modifications. A major part of miRNA function regulation happens at the first step of its biogenesis. A number of protein factors modulating the activity and specificity of Drosha-mediated pri-miRNA cleavage have already been described as tumor suppressors or oncogenes in tumors occurring in adults. However, recent studies on the TARGET patient’s cohort (Gadd et al., Nat. Genetics 2017), confirmed the high prevalence of mutations on directly on miRNA biogenesis genes such as DROSHA, DGCR8, XPO5, DICER1. Mutations involving miRNA-processing genes in Wilms tumor have been shown to result in significantly decreased miRNAs (Wegert et al., Cancer cell 2015). We hypothesize that these mutations on critical elements of miRNA biogenesis and other complementary factors can contribute to miRNA malfunction in the development of childhood cancers. We propose to validate such hypothesis by analyzing the sequencing results from TCGA and TARGET. Specifically, we will measure the quantitative changes in miRNA biogenesis by reanalyzing the Small RNA-Seq data. Comparing patients with or without mutations in specific genes, we will be able to establish its potential association to cancer progression. In particular, we will analyze changes in the miRNA expression as well as the generation of 5’ and 3’ isomiRs. Recent evidences suggest that increased isomiR generation is one of the key features when comparing cancer to primary cells (MacRai et al. Genome Res. 2017). To this aim, we have recently developed QuagmiR, a new tool deployed on the Cancer Genome Cloud (CGC) to efficiently analyze miRNA isoforms. Once successful, it will provide novel biomarker for pediatric cancer diagnostics and potential targets for innovative treatments. Gu, Zhaohui BECKMAN RESEARCH INSTITUTE/CITY OF HOPE Genomic Features of Acute Leukemia Subtypes Oct15, 2021 approved Acute leukemia is the most common cancer in children and a leading cause of childhood cancer death. Over the past few years, multiple novel subtypes have been identified using high-throughput genomics sequencing and largely advanced our understanding of this disease. In this project, we seek to better understand the features of each acute leukemia subtype to facilitate future diagnosis, risk-stratification, and targeted treatment, and with the ultimate goal for a better outcome. Acute leukemia is the most common type of cancer in children and is still a leading cause of childhood cancer death. Over the years, multiple novel acute leukemia subtypes have been identified and largely advanced our understanding and the treatment of this highly malignant disease. In this research, we focus on dissecting the featured genomic lesions and deregulated gene pathways in each subtype of acute leukemia. We are using a comprehensive multi-omic approach to define the spectrum of genomics lesions of each acute leukemia subtype from whole genome/exome sequencing (for SNVs, indels, SVs, and CNVs), RNA-seq (for gene expression profiles and fusion genes) in both diagnosis and relapse samples, with matched germlines samples to distinguish the somatic and germline variants. This study will be carried out within our institution and we are requesting all the available next-generation sequencing data for pediatric acute leukemia (acute myeloid leukemia, acute lymphoblastic leukemia, and mixed phenotype acute leukemia) cases sequenced as part of the TARGET project. These data will be combined with our datasets to increase the power of defining the genetic features of each leukemia subtype and validate the findings from our own cohort. To comply with the policies of the Human Genomics Database, the downloaded controlled-access data will only be stored in City of Hope HPC server under my research folder with strict access control, and only the personnel listed in the application as research collaborators will be granted password-protected access. Only the analyzed results and discoveries made through this dataset will be shared through publications and/or web portals. The raw data will only be used for the proposed academic research and no commercial application is involved. Response to the comments: Please make it clear whether you plan to combine requested datasets with other datasets outside of dbGaP, and, if so, whether you plan to analyze these datasets independently or together. If you do plan to combine datasets in any way, please describe your plan, and also please discuss whether it creates any additional risks to participants. If you are focusing on outcomes or hypotheses that were not the focus of the primary study (or studies), please describe the outcomes you propose to examine. [response]: The requested datasets will be combined with datasets obtained from other public domains or generated from our own group. The datasets will be analyzed independently but the results will be integrated to study the genetic features of pediatric acute leukemias. The analyses do NOT include inferring the genetic or personal identity based on the genotype information, therefore no additional risks to the participants are expected. The outcomes of the analysis are the delineated genetic lesions and molecular features of each acute leukemia subtype. GUDA, CHITTIBABU UNIVERSITY OF NEBRASKA MEDICAL CENTER Molecular subtyping of osteosarcoma Oct31, 2019 closed Integrating multiple data types is crucial to get biologically relevant molecular subtypes, and achieve a comprehensive evaluation of the clinical relevance of molecular signatures in osteosarcoma cancers in order to identify new and effective therapies. A second goal is to discover a set of genetic markers that predict risk for osteosarcoma so that early screening could be feasible. Further, we will use other publically available data to validate the findings. Our lab has experience in analyzing such complex multi-omics data analysis. We have previously applied such techniques successfully in studies of other cancers as pancreatic ductal adenocarcinoma and, cholangiocarcinoma. We will evaluate the clinical significance of the multiple molecular signatures that we will generate against the profiles collected by the dbGAP Pediatric Osteosarcoma, and Ewing sarcoma data set. Outcomes for cancer patients vary greatly even within the same tumor type, and characterization of molecular subtypes of cancer holds important promise for improving prognosis and treatment. Efforts to distinguish subtypes are complicated by the many kinds of genomic and epigenomic changes that contribute to cancer. Assessing the potential clinical relevance of molecular signatures in pediatric malignancies is requiring the integration of multiple omics data types. The goal of this study is to identify molecular subtypes, biomarkers, and genes associated with osteosarcoma with advanced bioinformatics and computational methods. This study aims to use integrative analysis to identify genetic mutations, copy number alterations (CNA), gene expression signatures, pathways signature, and epigenetic changes to identify molecular subtypes of the osteosarcoma; with the ultimate goal of guiding pediatric cancer treatment. Further, we will use machine-learning methods to predict risk scores for the osteosarcoma. We will evaluate the clinical relevance of the multi-omics signatures that we will generate against the multi-omics profiles collected by the TARGET data set, including survival, clinical data association. Some important research questions we plan to address: 1) molecular subtypes of osteosarcoma; 2) the landscape of genomic alterations in Osteosarcoma, including mutation, amplification, and deletion of genes, their associated pathways; 3) the co-occurrence of CNA with the other genomic and epigenomic features, such as mutations, insertions, deletions, fusions genes, and promoter DNA methylation; 4) the associations between molecular alterations and patient overall/disease-free survival. In addition, the research will also expect to discover driver genes, networks, and pathways related to the development and progression of osteosarcoma. As osteosarcoma is very similar to Ewing sarcoma, we will look at the molecular similarity between these two pediatric bone cancers. Guertin, Michael UNIVERSITY OF VIRGINIA Molecular signatures and transcription factors dictating childhood leukemia Jul13, 2015 closed ALL is the most common pediatric malignancy and comprises 25% of childhood cancers. Although approximately 98% of children enter complete remission within 6 weeks of standard chemotherapy regimens, one in four children will relapse. Despite a high survival rate for ALL patients, ALL remains the second-leading cause of pediatric cancer deaths in the United States. This proposal promises to identify genes and genetic elements that control regulatory networks by systematically analyzing molecular anomalies in ALL biospecimens. Data from the Therapeutically Applicable Research To Generate Effective Treatments childhood cancer consortium, will permit the unbiased identification of genes, transcription factors, and DNA elements that influence drug-sensitivity and relapse. This work will identify molecular biomarkers that predict efficacy of treatment and will reveal candidate biological pathways that can be targeted with combinatorial drug therapies. Our long-term objective is to improve the success of initial therapy regimens for children with poor predicted ALL prognoses. We will use these RNA-seq data to examine the plausibility that the candidate transcription factors dictate ALL cancer risk, progression, relapse, and treatment success. Using our methodology, we have found between 50 and 100 TFs that are controlling the chromatin landscape of ALL cells, these TFs do not necessary confer ALL risk or influence patient survival. We will thus determine whether these genes are differentially expressed in ALL vs. normal vs. relapsed cells by using Therapeutically Applicable Research To Generate Effective Treatments (TARGET) childhood cancer consortium mRNA expression data. We will perform Kaplan-Meier survival analyses using TARGET data and previous established methods for Kaplan-Meier analysis of RNA-seq expression data. Throughout these analyses we will be careful to perform comparisons in established ALL subtypes; we will also, when possible, compare ALL tissue to tissue from normal controls or successfully treated ALL patients. These more sophisticated classifications will allow us to discriminate between genes that may influence incidence and treatment success in molecular subtypes of ALL. A long-term goal of these analyses is to further classify ALL subtypes based on gene expression profiles. These data can then be integrated with epidemiological data to identify molecular signatures that associate with treatment success, with the intention of repurposing drugs that are more successful in treatment of specific subtypes of ALL Guharaj, Tamilselvi DANA-FARBER CANCER INST Identification of key immune repertoires for Acute Myeloid Leukemia (AML) Oct31, 2019 closed In our proposed research, we are aiming to identify the key immune repertoires for Acute Myeloid Leukemia (AML) using the TARGET database. Acute Myeloid Leukemia (AML) is a cancer that originates in the bone marrow from immature white blood cells known as myeloblasts. About 25% of all children with leukemia have AML. Despite the remarkable progress that have been made in some leukemias such as CML, cytotoxic treatment for AML remains basically unchanged over the last 4 decades. Given the slow progress of the traditional therapy development for this disease, many novel immunotherapies have been explored. Better understanding of immune system of AML patients could lead to promising biomarkers and be extremely helpful to new therapy development. In the past our department has developed a computational method to identify the T/B cell receptor repertoires and estimate immune infiltration abundance from molecular data. Using the AML data from TARGET, we will apply our methods to study the T/B cell receptor repertoires. We will compare the repertoire difference between peripheral blood and bone marrow to study the process of immune repertoires derivation and identify key biomarkers. We will also investigate the relationship between key immune repertoires and patients' clinical outcome. It would be very interesting and helpful to study AML from the perspective of immune repertoires. We would like to request the raw and processed RNA-Seq, WGS, WXS, miRNA-Seq, Targeted Capture, Bisulfite-Seq and ChIP-Seq data. Our team members will comply with all applicable data use rules and policies put forth by dbGaP. There will be no external collaborators partaking in our proposed research. GUIDOS, CYNTHIA HOSPITAL FOR SICK CHLDRN (TORONTO) ALL specific gene fusion events Jan03, 2018 approved Survival rates for pediatric acute lymphoblastic leukemia (ALL) patients have greatly improved due to central nervous system (CNS) prophylaxis and multi-agent chemotherapy. However, 30-40% of ALL relapses still involve the CNS. Moreover, improved survival rates of pediatric patients have come at the expense of treatment-associated endocrine disorders, secondary brain tumors and irreversible neurocognitive late effects. Our research will use the TARGET data in an exploratory analysis to identify gene fusions and gene sets with suggestive association to ALL CNS relapse. Our multi-pronged approach is exclusively focused on pediatric ALL data and will address key knowledge gaps that continue to impede improved outcomes in pediatric ALL: 1) the molecular mechanisms by which ALL cells invade the CNS. 2) Identification of biomarkers predictive of CNS leukemia. 3) Biological evidence of targeted therapies that prevent and treat CNS leukemia. Overview: Survival rates for pediatric acute lymphoblastic leukemia (ALL) patients have greatly improved due to CNS prophylaxis and risk-adapted multi-agent chemotherapy. Prior to the introduction of CNS-prophylaxis ~80% of children relapsed in the CNS; however even with contemporary ALL treatment protocols, 30-40% of relapses involve the CNS. Improved survival rates of pediatric patients have come at the expense of treatment-associated endocrine disorders, secondary brain tumors and irreversible neurocognitive late effects. Our research addresses key knowledge gaps that continue to impede improved outcomes in pediatric ALL: 1) the molecular mechanisms by which ALL cells invade the CNS. 2) Identification of biomarkers predictive of CNS leukemia. 3) Biological evidence of targeted therapies that prevent and treat CNS leukemia. We request access to TARGET data for identification of genetic markers associated with poor outcome childhood ALL. Our objective is to use transcriptome data to define biomarkers predictive of CNS involvement and relapse risk in pediatric ALL. Foundational data from the Applicants: We identified mechanisms for B-ALL cells in the bone and CNS that underpin the proposed TARGET data analysis: 1) Bone destruction: Receptor activator of nuclear factor ?-B (RANK) binding to its ligand RANKL governs bone resorption by osteoclasts. We reported that RANK-RANKL interaction critically regulates B-ALL-mediated bone destruction (Rajakumar et al, Science Trans Med 2020). Specifically, diagnostic B-ALL cells in patient-derived xenograft (PDX) models up-regulate RANKL expression causing bone destruction. Treatment with a RANKL antagonist (OPG-Fc) robustly protected the bone despite heavy B-ALL burden. Thus RANK-RANKL inhibitors disrupt B-ALL exploitation of the bone and may reduce fractures and bone growth impairment in pediatric patients 2) CNS invasion: While RANK-RANKL regulates B-ALL mediated bone destruction, a role in CNS invasion was unknown. C-X-C chemokine receptor 4 (CXCR4) is involved in ALL migration to the CNS. Thus, we examined both mechanisms in PDX models of patient samples associated with high risk of CNS disease. ALL cells migrated to skull and vertebral bone marrow where they stimulated osteoclast-mediated excavation of bone passages into the CNS. OPG-Fc treatment blocked bone-mediated CNS entry. An infant ALL sample with MLL/KMT2A rearrangement breached the CNS via the blood-cerebrospinal fluid (CSF) barrier. Co-administration of CXCR4 + RANKL antagonists blocked both CNS entry routes suggesting potential benefit to pediatric patients (Rajakumar et al, Cell Reports Med, 2021). Our request for access to TARGET data are to identify biomarkers that can improve prediction of CNS relapse with analyses of mRNA sequencing. We are also accessing primary patient samples from local biobanks and through samples requested from COG that will be analyzed by high parameter, single cell profiling of cryopreserved diagnostic samples from well characterized cohorts of ALL patients. Specific Approaches: To evaluate TARGET Phase II ALL mRNA sequencing data from diagnostic samples with associated longitudinal clinical data from pediatric patients that relapsed within 4 years of diagnosis. We will examine raw (FASTQ) sequencing files and associated anonymized metadata for the proposed studies on a firewall protected compute cluster at the Hospital for Sick Children Research Institute. mRNAseq analysis from diagnostic samples will be focused on gene fusions and gene expression patterns associated with relapse involving the CNS relapse (CNS-r) compared to cases with BM relapse without CNS relapse (BM-r). We will construct a 1:2 CNS-r:BM-r design by sub-sampling BM-r cases matched to CNS-r cases by sex and age at diagnosis. Gene fusion detection will be performed using defuse, Fusioncatcher, and STAR-Fusion algorithms. STAR-aligner and HTSeq-counts algorithms will be used to generate gene expression data, the ‘edgeR’ R package for differential expression analysis, and the hclust function in R for hierarchical clustering by gene expression. Gene set enrichment analysis will be performed with GSEA software with pairwise comparison between CNS-r cases and BM-r controls. Gene sets will be extracted from the molecular signatures database (MSigDB) and Cytoscape will be used for gene network visualization. Guo, Yiran CHILDREN'S HOSP OF PHILADELPHIA Cross validation of reported pathogenic variants in the Kids First dataset Sep27, 2021 approved Gene panels and whole exome sequencing (WES) have been widely used in the genetic diagnosis in rare disorders and cancer. While most of the published changes are within exons, the non-coding variants esp. intronic ones are significantly less likely to be charted by clinical labs or reviewed by curators. Previous pilot efforts with small sample sizes showed that whole genome sequencing (WGS) can provide more insight about non-coding regions of the human genome, facilitate curation of reported variants, and discover novel genomic changes that could be missed by panel sequencing or WES. Implementing WGS, the NIH Common Fund Gabriella Miller Kids First Pediatric Research Program (Kids First) represents a national collaboration focused on large-scale genomic and clinical data sharing for childhood cancers and structural birth defects. With the current application, we will use Kids First data to perform large scale validation of the phenotypes associated with reported pathogenic variants in the Kids First cohorts, evaluating phenotypic concordance in specific datasets. Data Use Limitations and Data Use Certification will be followed. Objectives: Gene panels and whole exome sequencing (WES) have been widely used in the genetic diagnosis in rare disorders and cancer. While the majority of the published [likely] pathogenic genetic changes are exonic plus splicing, the non-coding variants, esp. intronic ones published in the literature are significantly less likely to be charted by clinical labs such as ClinVar submitters or reviewed by curators like the ClinGen Expert Groups. Previous pilot efforts of small sample sizes showed that whole genome sequencing (WGS) can provide more insight about non-coding regions of the human genome, facilitate curation of reported variants, and discover novel genomic changes responsible for the clinical manifestations that could be missed by panel sequencing or WES. Implementing WGS, the NIH Common Fund Gabriella Miller Kids First Pediatric Research Program (KF) represents a national collaboration focused on large-scale genomic and clinical data sharing for childhood cancers and structural birth defects. With the current application, we propose to use KF datasets to assess the published pathogenic variants, esp. non-coding and splicing ones to validate phenotypes in KF cohorts vs literature (as cataloged in Human Gene Mutation Database [HGMD]). Study design: We have access to the latest 2022.3 version of HGMD which contains 246,769 pathogenic (CLASS=DM) and 86,252 likely pathogenic (CLASS=DM?) variants. With the access to both WGS and phenotypic information of the 27 KF and other cohorts that we are requesting (total n~33,300), we will check each cohort against the (likely) pathogenic variant list in HGMD. Analysis plan: First we will make two lists from the downloaded HGMD vcf file, i) only pathogenic variants (DM) and ii) only likely pathogenic variants (DM?). Then implement a filtering step for each of the KF WGS datasets (KF-vcfs) with the following criteria: retaining only bi-allelic sites, “PASS” as filter field, sequencing depth (DP) greater than 30, and if a variant is called heterozygous, allele depth (AD) is balanced (minor allele fraction between 25% and 75%, inclusive). In the third step, we will use bioinformatics pipelines and self-developed scripts to check if variants in the HGMD-vcf are also seen in KF-vcfs, and this step will utilize the efficient cloud computing platform Cavatica (https://cavatica.sbgenomics.com/) as well as Apache Spark-Zeppelin Notebooks in the Variant Database/WorkBench feature of the Kids First Data Resource Portal (https://portal.kidsfirstdrc.org/variant). As part of KF, we at the Gabriella Miller Kids First Data Resource Center (https://kidsfirstdrc.org/) also use Cavatica to harmonize KF data and empower collaborative discovery across KF datasets. The last step will be comparing phenotypes based on published variants that are also found the KF cohorts. The above analyses are consistent with dataset specific Data Use Limitations (DUL). Our use of the datasets will be limited to health/medical/biomedical purposes, will exclude population origins/ancestry study, will include methods development (e.g., large scale variant lookup algorithms), and will comply with the model Data Use Certification. Use of multiple datasets: We will analyze the requested datasets individually, data from which will only be used in research consistent with the respective study’s DUL and will not be combined with other datasets of other phenotype. Finally we intend to publish our findings to broadly share the results with the scientific community. GUPTA, RAMNEEK TECHNICAL UNIVERSITY OF DENMARK Integrative systems biology analysis of contribution of inherited genomic variation to susceptibility to childhood acute lymphoblastic leukemia and to drug response. Nov21, 2013 expired The integration of single genomic marker data in more broad concepts like biological pathways and protein-protein interactions, as well as integration of biological evidence from multiple data types has the potential to unveil subtle disease related associations that would not be easily discovered with genome-wide association studies or other single data type based analyses. We plan to apply systems biology methods together with machine learning techniques on dbGaP data in conjunction with other publicly available data in the hopes of identifying new candidate risk genes for childhood acute lymphoblastic leukemia, as well as investigate the inter-individual determinants of treatment response. Acute lymphoblastic leukemia (ALL) is the most common malignancy affecting children, representing 25% of all pediatric cancers. The causes of leukemia remain largely unknown, however genetic lesions leading to lymphoid stem cell transformations are believed to be triggered by a combination of environmental exposures, infections and inherited susceptibility. Several studies attempted to identify the underlying genetic factors of ALL susceptibility, however they either concentrated on limited candidate gene screening approaches or large-scale genetic studies investigating associations only on a single marker resolution, with limited emphasis on the underlying biological mechanisms. We are interested in the integration of genomic variation data from different types of studies to identify candidate risk genes in childhood ALL, using systems biology methods and machine learning techniques developed at Center for Biological Sequence Analysis at Technical University of Denmark. The integration of genomic variation data on a higher lever of biological complexity, like protein-protein complexes or biological pathways, might lead to findings more robust than single SNP associations and more reproducible in patients from different cohorts and different ethnicities. Aside from susceptibility to leukemia, we would also like to investigate the response to treatment e.g. whether the patients are at a risk of relapse or how their inherited variation influences clearance of chemotherapeutic drugs. We plan to apply systems biology methods together with machine learning techniques on dbGaP data in conjunction with our own and other publicly available data in the hopes of identifying new candidate risk genes for childhood ALL, as well as investigate the inter-individual determinants of treatment response. We would also like to use the datasets collecting genomic data from non-leukemia related studies as controls (datasets: phs000145.v4.p2, phs000209.v11.p3, phs000424.v3.p1). We understand that the dbGaP datasets have certain limitations to their use that we intend to respect fully. Haas, Brian BROAD INSTITUTE, INC. Oncogenic fusion and virus content, sequence, and expression feature characteristics for Pediatric Cancers Jul14, 2022 closed Pediatric tumor transcriptome samples will be explored for fusion transcripts and viruses to determine if there are sequence or expression characteristics that are unique to pediatric tumor samples as compared to those in adults. We aim to explore the sequence and expression characteristics of fusion transcripts and potential virus content and viral genome integration sites within pediatric tumors to determine if they have features that differ from those observed in adult tumor samples. Our cancer transcriptome analysis toolkit (CTAT) including STAR-Fusion, FusionInspector, and VirusIntegrationFinder will be applied to RNA-seq data to identify evidence for aberrations involving chimeric read alignments. Expression and sequence properties of genes involved in identified fusions will be explored and compared to fusions we have already separately identified in adult tumor samples explored within TCGA and fusion transcripts we identified in adult normal tissues as provided through GTEx. Knowledge gained regarding pediatric-specific features would both expand the knowledgebase of pediatric-specific oncogenic features and ideally contribute towards future pediatric oncology research, diagnostics, and treatment efforts. Hahn, William DANA-FARBER CANCER INST Analysis of mRNA and DNA sequencing data from TARGET Project ID 5476 Approved user name WILLIAM HAHN Institute affiliation DANA-FARBER CANCER INST (Non-Profit) Request date : 2013-10-24 Renewal date : Dec28, 2015 closed Cancer is a disease that stems from alterations in our genes. By cataloging changes in the genes in each of 10 childhood cancer types, the TARGET project will enable the cancer research community to better understand drivers of cancer in children. While there may be distinct and unique changes to the genes that allow us to easily identify druggable targets, these are rare. Based on what has been seen in adult cancers, there are likely to be many mutations seen in these childhood tumors. As a result, systematic functional approaches are needed to understand the mechanisms that drive pediatric cancer development. Our research objectives are to use the DNA and RNA sequencing data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project to understand possible mechanisms for tumorigenesis in pediatric cancers. We have learned and continue to learn a great deal from The Cancer Genome Analysis (TCGA). Using cancer functional genomics, we plan to perform similar experiments by analyzing data generated by TARGET. Data analysis from DNA will include copy number alterations, point mutations and translocations while RNA analysis will focus on, but not be limited to, novel fusion transcripts. Following identification of potential driver events, we will aim to characterize the molecular pathways involved. -The proposed work is consistent with the Use Restrictions for the TARGET data sets. -Our lab spans between the Broad Institute and Dana-Farber Cancer Institute. No other inter-institutional collaborations are planned at this time. Hahn, William DANA-FARBER CANCER INST Analysis of mRNA and DNA sequencing data from TARGET Nov21, 2013 closed Cancer is a disease that stems from alterations in our genes. By cataloging changes in the genes in each of 10 childhood cancer types, the TARGET project will enable the cancer research community to better understand drivers of cancer in children. While there may be distinct and unique changes to the genes that allow us to easily identify druggable targets, these are rare. Based on what has been seen in adult cancers, there are likely to be many mutations seen in these childhood tumors. As a result, systematic functional approaches are needed to understand the mechanisms that drive pediatric cancer development. Our research objectives are to use the DNA and RNA sequencing data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project to understand possible mechanisms for tumorigenesis in pediatric cancers. We have learned and continue to learn a great deal from The Cancer Genome Analysis (TCGA). Using cancer functional genomics, we plan to perform similar experiments by analyzing data generated by TARGET. Data analysis from DNA will include copy number alterations, point mutations and translocations while RNA analysis will focus on, but not be limited to, novel fusion transcripts. Following identification of potential driver events, we will aim to characterize the molecular pathways involved. -The proposed work is consistent with the Use Restrictions for the TARGET data sets. -Our lab spans between the Broad Institute and Dana-Farber Cancer Institute. No other inter-institutional collaborations are planned at this time. Hakonarson, Hakon CHILDREN'S HOSP OF PHILADELPHIA Genetic Factors Associated with Neuroblastoma Jul01, 2021 approved The Center for Applied Genomics (CAG) at the Children’s Hospital of Philadelphia (CHOP) is a highly automated genotyping center founded to identify genetic variants that underlie susceptibility to complex medical disorders. Our center is currently investigating some of the most prevalent diseases of childhood including neuroblastoma. Our previous studies have revealed genetic factors associated with neuroblastoma including MMP20 gene and mitochondrial haplogroup K. As such, we would like to employ SNP genotype and sequencing data from dbGAP to validate our recently detected significant signals and potentially discover new signals through integrating a larger internal control dataset. We have several unpublished hypotheses that we would like the dbGAP datasets to help inform. The CAG project is among the largest genotyping projects in the world; indeed, as we continue to collect genotype data, we are also keeping track of multiple clinical, environmental, & laboratory value parameters for each patient. Using the combination of genotype and these covariates, we hope to continue to identify robust models that characterize the genetic components of these common illnesses. The Center for Applied Genomics (CAG) at the Children’s Hospital of Philadelphia (CHOP) is a highly automated genotyping center founded to identify genetic variants that underlie susceptibility to complex medical disorders. Our center is currently investigating some of the most prevalent diseases of childhood including neuroblastoma. Our previous studies have revealed genetic factors associated with neuroblastoma including MMP20 gene and mitochondrial haplogroup K. As such, we would like to employ SNP genotype and sequencing data from dbGAP to validate our recently detected significant signals and potentially discover new signals through integrating a larger internal control dataset. We have several unpublished hypotheses that we would like the dbGAP datasets to help inform. The CAG project is among the largest genotyping projects in the world; indeed, as we continue to collect genotype data, we are also keeping track of multiple clinical, environmental, & laboratory value parameters for each patient. Using the combination of genotype and these covariates, we hope to continue to identify robust models that characterize the genetic components of these common illnesses. Our bioinformatics experts will conduct the analysis portion of the study, including applying single marker chi-square tests via the software package plink as well as applying our own software implementations for detecting and validating epistatic gene effects. All analyses will be limited to conditions authorized in the data use limitations for each particular dataset. The use of these genetic data will be restricted to academic researchers (affiliated with a research institution, non-commercial entity). The study participants will not be used as general controls in other studies unless it is allowed for a given dataset. The study participants will not be used as general controls in other studies unless it is allowed for a given dataset. Halbritter, Florian ST. ANNA KINDERKREBSFORSCHUNG Roles and dynamics of ß-catenin in Wilms tumor Sep19, 2024 approved Nephroblastoma, which is also called Wilms tumor (WT), is the most common childhood kidney cancer. Advances in treatment of WT has led to >90% overall survival for localized disease. Still, 15% of patients will unexpectedly suffer a relapse. Therefore, risk stratification needs further improvement to adjust treatments and clinical follow-up using new biomarkers. The exact causes of WT aren't fully understood, and the genetic landscape of WT is very diverse. The multifunctional protein ß-catenin is a driver of a subset of WT. Intriguingly, we have previously observed various expression patterns of ß-catenin in WT patient samples that match with specific cancer properties. Preliminary analysis suggests that one of the identified specific ß-catenin phenotypes might be indicative of relapse of intermediate-risk WT. Here, we propose to use the dose-dependent ß-catenin transcriptional programs as a novel approach to refine WT classification and identify relapse early. Advances in treatment of Wilms tumor (WT) has led to >90% overall survival for localized disease. Still, 15% of patients will suffer a relapse, including those with intermediate-risk WT (iWT ; SIOP classification; COG equivalent: “favorable”). Therefore, risk stratification needs further improvement to adjust treatments and clinical follow-up using new biomarkers. The exact causes of WT aren't fully understood, and the genetic landscape of WT is very diverse. The multifunctional protein ß-catenin is a driver of a subset of WT. ß-catenin plays a role in gene transcription regulation during normal kidney development but also WT initiation. In other cancers, ß-catenin is involved in cancer progression with roles in cell proliferation, survival, differentiation, and stem cell maintenance. The diverse functions of ß-catenin arise from its ability to modulate gene expression together with different co-factors and depend on subcellular localization. We have observed various expression patterns of ß-catenin in WT patient samples that match with specific cancer properties. Thus, we hypothesized that ß-catenin has pleiotropic functions in WT supporting the hallmarks of cancer in a context-, time-, space- and dose-dependent manner. We spatially resolved the pleiotropic functions of ß-catenin in WT by combining immunofluorescence and spatial transcriptomics on archived SIOP WT-patient materials. We identified a core and multiple dose- and spatially dependent ß-catenin transcriptional programs. Preliminary analysis suggests that one of the identified specific ß-catenin phenotypes might be indicative of relapse in SIOP intermediate-risk WT patients. We intend to validate this result using iWT biospecimens obtained with (SIOP; ongoing) or without (COG; this request) preoperative chemotherapy. Using the mRNA-Seq datasets of the TARGET-WT studies (discovery and validation, phs000471), we propose to quantify ß-catenin associated phenotypes (ssGSEA) and assess their clinical relevance performing Kaplan-Meier survival analysis. HAMMARSKJOLD, MARIE-LOUISE UNIVERSITY OF VIRGINIA Transcriptional Profile of Human Endogenous Retrovirus- K in Wilms Tumor and Post Transcriptional Regulation in Wilms Tumor 1 gene Jan23, 2019 approved We investigate cancers that develop in small children. We research the changes that occur within the cell that drives the formation of cancer. We also research elements in the tumor that may allow us to develop new chemotherapy that is specific to the cancer and the child. This is broadly called immunologic therapy. We further concentrate on elements within the human genome called Human Endogenous Retroviruses (HERV). HERVs resulted from previous viral infections in humans millions of years ago. They are now permanent in the human genome. Once the virus is integrated into the genome, the genomic element is called a “provirus.” These proviruses act similarly to a normal human gene. In come circumstances these proviruses are still able to produce viral proteins. In adult cancers, these viral proteins have been associated with cancer. They have also been proposed as important immunologic targets. However, the role of HERVs remain poorly understood. We are exploring the importance of these HERVs in Wilms tumor. Wilms tumor is a kidney tumor that develops in very young children. The cancer often results from a failure of the normal kidney cell to develop correctly. In these scenarios, we believe HERVs may be a specifically important element to explore. Human Endogenous Retroviruses (HERV) are a group of genomic elements that resulted from ancient retroviral infection of the human germ line. The most recently integrated provirus, HERV-K, remains capable of producing viral proteins. Though usually transcriptionally silent, HERV-Ks are upregulated in multiple cancers and during fetal development. Very few investigations have explored the molecular role of HERV-K in fetal tumors. We hypothesize that fetal solid organ tumors that result from a failure of differentiation may still express HERV-K. Furthermore, viral proteins produced under these conditions may pose intriguing targets as neoantigens for immunotherapy. We are requesting RNA-seq data from Wilms tumor patients to determine the transcriptional profile and differential expression of HERV-K in these tumors. Our objectives are to explore the molecular role of HERV-K in Wilms tumor and to determine if HERV-K elements represent potential immunologic targets in this disease. We developed a HERV-K genome of 92 previously described proviruses. Utilizing the bioinformatics platform Geneious, we annotated these proviruses and created a Gene Transfer Format file compatible with RNA-seq data analysis. With this method, we effectively determined the HERV-K transcriptional profile of Hepatoblastoma utilizing publicly available RNA-seq data (unpublished results). We propose a differential HERV-K expression experiment between 1. Wilms tumor and normal kidney control, and as data allows 2. Wilms tumor and local reoccurrence, and 3. Wilms tumor and Metastatic disease. We will also determine the HERV-K transcriptional profile in patient serum samples as available. We will correlate HERV-K expression in Wilms tumor with age, race, histologic subtype, stage and event free survival. We will also evaluate mutations in the Wilms Tumor 1 (WT1) gene, and evaluate how these mutations effect the major WT1 isoforms of +KTS and -KTS. The +KTS isoform interacts with RNA and effects post-transcriptional regulation. We will thus perform differential gene expression of tumor with mutations within the WT1 gene and explore effects on mRNA targets. We will perform differential gene expression analysis utilizing the HISAT2, Stringtie, Ballgown package. We will manage and analyze the dbGaP data on our labs previously established and secure Amazon Web Service Cloud Computing Platform MolBioCloud. We do not plan to integrate the downloaded data from dbGaP with other datasets. Han, Buhm SEOUL NATIONAL UNIVERSITY Identifying disease-associated noncoding RNAs using both the genotype and expression data Apr13, 2023 closed Our project is to develop an analytical framework that exploits both genotype data and expression data to identify noncoding RNAs that act as a mediator for increasing the risk of diseases. A reasonable amount of genotype data and expression data are required to construct our predictive modeling scheme. We therefore sincerely request for the use of the datasets from two different studies: phs000218.v24.p8 and phs001134.v2.p1. These studies, which collected data from case (affected) individuals, include both the genotype data and the expression data from the same participants. We will mainly use these data to analyze, test, and evaluate our methodological work. The datasets from two different studies can be combined into a large matrix just to increase the sample size, not to produce any new variable that contains new information. This project will advance the understanding of the role of noncoding RNAs and propose the robust guideline for identifying the causal relationship between noncoding RNAs and diseases. It will also help discover the noncoding RNA markers that can be used in clinical practice. All requested datasets include only the participants who have given their consent to the use of their data for research purposes. Research Use Statement For: - Therapeutically Applicable Research to Generate Effective Treatments (phs000218.v24.p8) - Genomic Profiling of Papillary Thyroid Cancer after the Chernobyl Accident (phs001134.v2.p1) We plan to use the datasets listed above (from two different studies) to develop an analytical framework that exploits both genotype data and expression data to identify noncoding RNAs that act as a mediator for increasing the risk of diseases. For example, a certain genetic variant may alter the expression pattern of a particular noncoding RNA, and the altered expression pattern of this noncoding RNA can affect the expression of other genes, which can increase the risk of diseases. We hope that the results of our research can generate a list of disease associated noncoding RNAs as biomarkers. 1. Research objectives - Enhance the understanding of the role of noncoding RNAs as regulators and mediators - Construct guidelines for identifying the causal relationship between noncoding RNAs and diseases - Propose a powerful statistical framework that can lead to more accurate and stable detection of disease associated noncoding RNAs - Demonstrate the feasibility of applying our model in clinical practice 2. Study design - Several statistical and computational methods will be applied to the requested datasets for development of the entire pipeline of our modeling scheme, including data processing, predictive modeling, and evaluation. - We need genotype data and expression data from the requested datasets. Our main goal is to integrate both data types for detecting the disease associated noncoding RNAs. - Using the genotype data, which is the fixed information, along with expression data can help identify the causal effects of markers (noncoding RNAs in our study). 3. Analysis plan - The requested data from the two studies (phs000218.v24.p8, phs001134.v2.p1) were derived from the case (affected) individuals. Therefore, the datasets from each study can be used together in one large matrix, in order to increase the sample size in our modeling step. - No new variables will be produced from the requested datasets. They will only be used as either input variables or target variables in our project. 4. Explanation of how the proposed research is consistent with Use Restrictions for the requested datasets - According to the variable ‘CONSENT (phv00076498.v23.p8)’ in the phs000218.v24.p8 dataset, all 6319 participants showed consent on use of their data for pediatric cancer research. One important goal of our study is to identify ‘disease-associated’ biomarkers. Using the requested data from phs000218.v24.p8 will allow us to contribute to pediatric cancer research through identifying pediatric cancer associated noncoding RNAs. - According to the variable ‘CONSENT (phv00491031.v1.p1)’ in the phs001134.v2.p1 dataset, all 442 participants showed consent on general research use of their data. - All the requested datasets include only the subjects who have provided informed consent regarding the use of their data for the research purpose. Hannenhalli, Sridhar Subrahmanyam NIH Comparing epigenomic and transcriptomic changes between the drug-induced differentiation of neuroblastomas and the malignant transformation of neuroblasts Dec16, 2019 closed Neuroblastoma cell lines can be transformed into normal neuronal cells when treated with small molecule called retinoic acid. We want to know whether at the cellular level, the changes that happen upon treatment with retinoic acid are the same processes that are utilized when a normal neuron undergoes malignant transformation to become neuroblastoma. Better understand of similarities in the processes in these two directions will help with understanding the biology of neuroblastoma nd potentially help pursue specific interventions to treat neuroblastoma. In this project, we wish to find out whether the biological pathways activated when a neuroblast becomes malignant are merely reversed when the malignant neuroblast is re-programmed to differentiate into a neuroblast by the action of a drug called retinoic acid (RA). This is of consequence in neuroblastoma tumors, a pediatric tumor that results from malignant transformation of neural crest derived cells. Our goal is to understand RA achieves its effect through a reversal of pathways that cause the malignant transformation of neuroblasts, or if it activates an independent set of pathways toward differentiation. This requires us to first infer the pathways involved in both RA-induced differentiation of neuroblastomas and during the malignant transformation of neuroblasts. To understand the malignant transformation of neuroblasts, we require access to raw sequencing data of tumors from neuroblastoma patients. HARBI, SHAGHAYEGH VASCULOTOX, INC. Quality Control and Genomic Analysis Apr01, 2021 approved Specific to High Risk, Pediatric Neuroblastoma, this scientific knowledge will provide new perspectives on therapeutic strategies. Extensive quality control analysis will be performed prior to the genomic analysis. Preliminary Data: Our collaborators have data to support, Pediatric Neuroblastoma (type/Stage 4) subsets, with similar gene expression profiling as neuroblastoma-derived cell lines, may respond well to a compound. Proposed Research: Quality Control and Genomic Analysis. Research AIMS. To provide additional evidence to support the preliminary data of pediatric participants who may respond well to the specific compound, we propose to analyze the Target 30 cohort, using the next generation sequencing (NGS) data for identification of mutations strongly associated with disease phenotype. We have identified 120 FASTQ files (Target 30; Neuroblastoma) for genomic analysis for identification of mutations and gene expression profiles. Extensive quality control analysis will be performed prior to the downstream genomic analysis. Comprehensive bioinformatics analysis of the sequencing data will be performed with Qiagen (Advanced Genomics Ingenuity Variant Analysis) and Illumina BaseSpace platforms. Specific to High Risk, Pediatric Neuroblastoma, this scientific knowledge will provide new perspectives on therapeutic strategies and identification of potential mechanisms of action specific to the compound of interest. Hasle, Henrik UNIVERSITY OF AARHUS Multi-omics analysis of pediatric acute myeloid leukemia, with a focus on the subtype “not otherwise specified (NOS)” as well as relapse Jan29, 2020 closed Acute myeloid leukemia (AML) is a cancer of the bone marrow and blood that is associated with a rather dismal outcome. To improve the survival of pediatric AML patients, more knowledge is needed regarding the molecular basis of tumor onset, progression and resistance to therapy. In this project, we will investigate various genomic-, epigenomic- and transcriptomic alterations associated with the pediatric AML cells at the time of initial diagnosis compared to at relapse, as well as in a specific subtype of pediatric AML referred to as “not otherwise specified”, or NOS. Our studies include so-called machine learning based analyses. This approach may help us to find alterations that are connected to therapy resistance. In addition, our studies may lead to the identification of novel biomarkers that can be used for new and less invasive diagnostics and improved risk-stratification; all for an enhanced survival and quality of life for pediatric patients diagnosed with AML. Acute myeloid leukemia (AML) is a cancer of the bone marrow, well known for its heterogeneity. Patients usually respond to initial chemotherapeutic treatment and reach complete remission. However, many of them relapse, and the relapse clones are very often resistant to the therapy. Recent studies have shown that the patterns of genomic alterations found in adult AML differ compared to what is seen for pediatric AML, and focused studies on pediatric AML are needed to further extend our knowledge about this disease. Studying the various types of genomic, epigenomic and transcriptomic alterations that potentially differ between diagnostic and relapse AML cells could uncover putative markers that eventually may lead to novel treatment alternatives. Further, there is still a large heterogeneous group of pediatric AML patients with none of the genetic aberrations currently included in the WHO classification system of AML. Additional detailed multi-omics studies are necessary to get a better understanding of the underlying causes of leukemic formation for this subtype, referred to as “not otherwise specified” (NOS). In this study, we are utilizing rule based machine learning models, applying them on in-house-generated whole genome-, whole exome- and RNA sequencing data as well as microarray-based genome-wide DNA methylation data from diagnostic and relapse material from 26 pediatric AML cases, as well as on diagnostic material from 60-70 pediatric AML NOS cases. Our rule based models are transparent, which allows us to visualize them in the form of rule networks. These rule networks are not limited to highlighting the most interesting genes, but can also show the dynamics that govern for instance the transformation from diagnosis to relapse (or from treatment sensitive to treatment resistant, etc.) and the interactions between genes that lead to a certain condition. Access to the TARGET: Acute Myeloid Leukemia (AML) dataset with accession number phs000465.v19.p8, which is a sub-study of the Pediatric Cancer Research study phs000218.v22.p8.c1, would allow us to extend our current rule based machine learning models on a pediatric validation cohort, investigating specific subtypes such as AML NOS, as well as further patient-matched diagnosis and relapse specimens. In addition, we wish to combine appropriate subsets of the TARGET AML dataset with our (non-dbGaP related) in-house pediatric relapse and NOS AML datasets, respectively, and after proper batch-correction, investigate if a larger starting cohort potentially could allow the detection of any further, more rare interactions. Combining the datasets will not create any additional risk to the study participants. In addition, we aim to utilize the TARGET AML dataset for recurrence screening of genomic, transcriptomic and epigenomic variants we recently have identified in relapsing AML (Stratmann et al., Unpublished data), as well as in pediatric AML NOS (Herlin et al., Unpublished data). We believe that our studies will advance the understanding of the underlying causes of pediatric AML onset, progression and therapy resistance, and lead to the identification of new biomarkers for improved risk-stratification, as well as the development of novel therapeutic options, and thus result in a higher survival rate of pediatric AML patients. We intend to publish or otherwise broadly share the findings from our studies with the scientific community. Hassan, Bass UNIVERSITY OF OXFORD Tp53 associated ploidy in sarcoma Aug08, 2024 approved Cancers are caused by changes in the DNA in the cancer cells. Patterns of changes in the DNA sequence occur in the cancer cell when one compares the sequence between normal cells and the cancer cells. These changes in patterns are the basis for determining the type of cancer, what might have caused it to form and what treatments might be best to use. Some of these patterns are less well studied, including reorganisation of DNA in large regions or chunks. In normal cells, very long sections of DNA sequence form large packages we call chromosomes. These large packages of DNA sequence can break up to cause even more complex patterns (fragmentation of chromosomes). Here we will evaluate regional patterns of DNA sequence in rare cancers arising from bone and soft tissues called sarcomas. In sarcomas, a mistake in the sequence of a gene called Tp53 appears to result in the large regional changes in DNA patterns described. We are seeking to understand these changes in patterns in order to see whether this new information can help us ultimately improve treatments for sarcoma. Following analysis of the data we will publish the findings in writing in peer reviewed publications and oral scientific presentations acknowledging the origin and funding of the original data. Sarcomas are a cancer of mesenchymal origin characterized in high grade cases by extensive structural rearrangements in the chromosomes (ploidy) including regional chromosomal gains and losses, loss of heterozygosity, chromoplexy and chromothripsis. It has also been observed that sarcomas also frequently contain mutations in the TP53 gene, a key transcriptional regulator, referred to as the guardian of the genome, that co-ordinates the cell cycle, cell death and metabolism. Disruptions in TP53 can occur through single nucleotide variations, deletions, or even loss of an entire chromosome arm that can subsequently lead to genetic instability and ploidy secondary to events during mitosis. Here we aim to focus on the impact of Tp53 on high grade sarcoma ploidy, including specific loss of regions in chromosome arm 17p. Not only is this region particularly important and considered the crucial second hit required to completely disrupt wild-type TP53 function, but it and a number of other chromosomal regions may act to alter the context Tp53 loss of function, with implications for the downstream functional consequences for the sarcoma cell. We propose utilizing the dbGAP data to explore the impact of different TP53 mutations in relation to the broader genomic landscape of instability. Through analysis of WGS between tumour-normal pairs we will map the Tp53 locus, chromosome 17p and the rest of the genome with a view to interrogating functional effects of regional changes using CRISPR genome engineering in informative sarcoma cell lines. We will utilise a conventional PCAWG based structural, copy number and SNV callers in the first instance. This knowledge could contribute to a more comprehensive understanding of TP53-related sarcoma genomics and ultimately inform personalized treatment approaches targeting ploidy contextual features for sarcoma patients. We aim to utilise dbGAP sarcoma WGS data sets that we will pool for the analysis, including from TCGA, osteosarcoma, MPNST, rhabdomyosarcoma, myxofibrosarcoma, undifferentiated pleomorphic sarcoma. We may require integration with other public available datasets that we can access and that report high quality sequencing data and associated peer review publications. Renewal Comment August 2024 To date, we have indentified downstream gene target dependencies to Tp53 in osteosarcoma uisng CRISPR and we have a number of targets on 17p that have copy number effects. These new targets may be therapeutic in osteosarcoma and offer novel strategies for treatment. As osteosarcoma is a sarcoma with childhood and young adult incidence, we request renewal and the addition of sarcoma data in paediatric tumours Disease-Specific (Pediatric Cancer Research) access of National Cancer Institute (NCI) TARGET: Therapeutically Applicable Research to Generate Effective Treatments. We are specfically focused on the validation of these new potential therapeutic targets and require continued access to the data as we investigate target pathways in other high grade sarcoma that occur across childhood and adult age groups. Objective Summary I. Assess the pattern of chromosome instability in relation to common Tp53 mutants in sarcoma. II. Identify the boundaries of chromosome arm loss in tumours with common p53 mutations. III. Investigate the patterns of copy number and LOH across chromosome 17. ***Dissemination of research funding and acknowledgement of controlled access datasets subject to NIH GDS policy: Following analysis of the data we will publish the findings in peer reviewed publciations in the public domain in a timely manner, present the data in oral and written form at scientific conferences and make analysis processes and resuls available to the scientific community. We agree to acknowledge the submitting investigators, the funding body that supported the investigators and the NIH data repository, in all oral and written presentations, disclosures, and publications resulting from any analyses of controlled-access data obtained. We further agree that the acknowledgment shall include the dbGaP accession number to the specific version of the dataset(s) analyzed. HAUSSLER, DAVID UNIVERSITY OF CALIFORNIA SANTA CRUZ Data Analysis of the TARGET Project Aug18, 2011 closed Develop advanced analysis and visualization tools for integrative analysis of the TARGET datasets. Apply these tools to compare pediatric cancers (including TARGET) to adult and pediatric cancers. The goal of the project is to develop advanced analysis and visualization tools for integrative analysis of multi-analytes cancer genomic data for pediatric cancer genomic research. The project is focused on TARGET datasets, supplemented with pediatric cancer datasets gathered as part of the our California Kids Cancer Comparison project. Our specific aims are stated below. Over the next 12 months, we will focus on integrative, pan-cancer analysis of TARGET data against other pediatric cancers, and against adult cancer data from TCGA. Aim 1. We will build a high-throughput analysis pipeline to process next-gen sequencing data to detect complex genomic changes including mutation, fusion protein, structural variation, transcriptomic and epigenomic changes in tumor samples and cancer- associated molecular alterations. Aim 2. We will build a multi-tiered pipeline to detect biological pathways perturbed in tumor samples and cancer- associated molecular alterations to better understand the possible vulnerabilities of the pediatric cancer tumors. Aim 3. We will further develop the UCSC Cancer Genomics Browser to fulfill the need for advanced analysis and visualization tools for integrative analysis of large, complex genomic datasets that meet the requirements of the TARGET project. Aim 4. We will integrate TARGET data with datasets from external pediatric cancer genomic studies and clinical trials, using tools developed in Aims 1 and 2, to identify cancer-associated molecular alterations, dysregulated pathways and signatures for such translational applications as clinical diagnosis, prognosis, drug response prediction, and gene targets for the development of novel therapeutics. These results will provide the basis for a refined clinical understanding of patient stratification in therapy and will generate new hypotheses for translational research and clinical care. Hawkins, Cynthia HOSPITAL FOR SICK CHLDRN (TORONTO) Expanding the oncohistone landscape in pediatric brain cancer May03, 2022 closed Histones, proteins around which DNA is coiled in the cell, are essential in determining DNA accessibility, thereby controlling protein synthesis and biological processes. At present, Additional mutations in histone in pHGG, except Lysine to Methionine at amino acid position 27 (K27M) and Glycine to Arginine/Valine at amino acid position 34 G34R/V have not been reported due to the technical limitations of prior bioinformatics techniques to identify mutations in these regions. Here, I will use a modernized approach to identify the additional histone mutations in PBT are present. In summary, we will provide a comprehensive landscape of histone mutations in PBT and provide valuable insights into their local and global oncogenic mechanisms for downstream therapeutic exploitation. The objective of this project is to identify the landscape of histone mutations in paediatric brain tumors and investigate their cellular phenotypes and potential impact of tumor formation. I have set up an automated screening pipeline to extract mutations from the publicly available whole-genome sequencing, whole-exome and RNA sequencing data from the datasets that I requested in this application to identify the landscape of histone mutations in cancer. Variants will be considered if there are more than 3 variant reads with the allelic frequency over 5%. Upon the completion of the screening of histone variants, I will clinically annotate these samples and do statistical analysis if necessary. The reason to include samples of other cancers in addition to pediatric brain tumour is to compare their genomic discrepancy in the context of histone mutations. Using these information I will answer some important questions regarding the clinical features of the non-canonical histone mutants: Are these mutations more prevalent in PBT or a specific subtype than in other cancers? How does the prognosis of patients harboring these alterations compare to the prognosis of tumors with canonical histone mutations? HAWTHORN, LESLEYANN AUGUSTA UNIVERSITY Defining poor prognosis in Wilms tumor genes using Next-Gen Sequencing approach Nov13, 2014 closed Wilms tumor of the kidney is the third most common chilhood tumor. Genetic studies have shown recurrent regions of chromosomal loss which are associated with poor prognosis. We will use genomic sequencing to identify mutations of genes in these regions that are responsible for the poor outcome. Loss of heterozygosity (LOH) in tumor cells represents a somatic mechanism of exposing recessive mutations in tumor suppressor genes involved in tumor development and progression and has been used to identify regions of the genome that harbor other genes thought to be involved in Wilms tumorigenesis. LOH analysis has implicated regions on chromosome arms 1q, 4q, 7p, 9q,11p, 11q, 16q and 22 as consistent and relatively frequent events in WT. Within the small number of loci implicated in WT development, losses of 16q, 1p, 4q, 11q and 22 have been associated with poor survival and more frequent 2-year relapse. No genes from these regions have yet been implicated in Wilms tumorigenesis due in part to the fact that these regions of the genome are still relatively large and carry large numbers of genes. In addition, gain of chromosome 1q occurs in ~50% of tumors and is also associated with poor survival. We recently conducted a high resolution, genome wide analysis of copy number abnormalities (CNAs) and LOH in a large series of WT. This study defined subsets of tumors that carry all of the common genetic changes seen in WT, as well as those that are associated with poor survival. We have also conducted a preliminary analysis of 9 WT and control samples using exome sequencing. We now propose to use dbGAP WES and SNP-array data to identify genes that are frequently mutated in WT with an emphasis on the regions of frequent LOH to define the driver genes for these events. Our hypothesis, therefore, is that structural chromosomal events such as LOH and CNAs will harbor genes that are directly involved in tumorigenesis in WT. We plan to combine our current data on 9 tumor normal WT pairs with the Target Wilm Tumor data requested from dbGAP. He, Lin UNIVERSITY OF CALIFORNIA BERKELEY Functions of Retrotransposons in Pediatric Cancer Jan21, 2016 closed Large sequencing efforts have identified important genes mutated in pediatric cancer. However, the functional importance of the non-coding genome, particularly retrotransposons, is largely unknown. Using DNA/RNA next-generation sequencing on various human pediatric cancer samples, we will develop new algorithms to profile the expression of retrotroposons between normal and cancer tissues, to study the impact of retrotransposon reactivation on the adjacent gene expression, and to identify new somatic retrotransposon insertions in tumor tissues. The sequencing data within the ALL dataset of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative would provide an ideal data source for our proposed studies. Retrotransposons and their remnants are a class of REs that constitute a significant fraction of the mammalian genome (40-50%). While retrotransposons are remnants of the invading foreign retroviral sequences, emerging evidence suggest that a small subset of the retrotransposons could be derepressed under specific biological contexts. These reactivated retrotransposons could lead to new integration in the genome, and confer gene regulatory roles. Several adult human cancer types are associated with derepression of specific retrotransposons, or with novel retrotransposition events that may contribute to the development of cancer. However, it is unclear if the same mechanisms operate in pediatric cancer types. Here, we propose to employ the TARGET genome sequencing data to investigate the activity of retrotransposons in pediatric cancers, and to investigate the functional consequence of retrotransposon derepression during oncogenesis. Heath, Allison CHILDREN'S HOSP OF PHILADELPHIA Engineering interoperable approaches for improving outcomes of pediatric diseases. May01, 2019 approved As part of the NIH's strategic data planning, the Kids First Data Resource Center (KFDRC) is developing new engineering approaches for interoperating clinical and genomic data generated under the Kids First program with other emerging initiatives. The expected solutions include new software, techniques and methodologies that will accelerate discovery for researchers working on pediatric diseases. As part of the NIH's strategic data planning, the Kids First Data Resource Center (KFDRC) is developing new engineering approaches for interoperating clinical and genomic data generated under the Kids First program with other emerging initiatives. Data interoperability and harmonization will be evaluated in a systematic way in partnership with expert advisory groups and by making the costs and technical challenges explicit and transparent. The expected solutions include new software, techniques and methodologies that will accelerate discovery for researchers working on pediatric diseases. These solutions, findings, and/or results will be published and/or disseminated to the scientific community. To model and understand these techniques, the KFDRC is requesting access to a representative collection of genome sequences and associated phenotypic data. KFDRC will process genomic data through standardized pipelines developed with the other programs and communities as well as develop methods and techniques to determine the feasibility of similar techniques on phenotypic and clinical data. As such, the KFDRC works closely with NIH to ensure that all work being done respects project specific Data Use Limitations (DULs), the terms in the Data Use Certification (DUC) Agreement, and the dbGaP Approved User Code of Conduct. Datasets that have different disease-specific DULs will not be combined for analysis, but will be analyzed separately and/or used as use cases for building improved technical frameworks to limit and/or warn when projects are using datasets with potentially incompatible DULs. Heath, Allison CHILDREN'S HOSP OF PHILADELPHIA Genomic Data Commons May19, 2017 closed The NCI's Genomic Data Commons (GDC) provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG), including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET). It is also designed to serve as a foundation for future expanded data access, computational capabilities, and genomic research. The National Cancer Institute’s (NCI’s) Genomic Data Commons (GDC) is a data sharing platform that promotes precision medicine in oncology (gdc.cancer.gov). The GDC contains NCI-generated data from some of the largest and most comprehensive cancer genomic datasets, including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Therapies (TARGET). The GDC is a NIH Trusted Partner which allows it to ensures that anyone accessing controlled access data from the GDC: has submitted a Data Access Request (DAR) to dbGaP or is an approved user under a DAR; that the user has reviewed the project specific Data Use Limitations (DUL), if any; that the DAR has been approved by dbGaP; and that the user has agreed to the terms in the Data Use Certification (DUC) Agreement and in the dbGaP Approved User Code of Conduct. In particular, users of protected TARGET datasets have agreed to the Data Use Limitation that TARGET datasets should be for research projects that can only be conducted using pediatric data and that have likely relevance to developing more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. As co-PI of the GDC, I advise and work on technical and scientific aspects of the project that require access to all GDC data. This includes the development and scaling of harmonization methodologies, for both clinical and genomic data. Topics of particular interest that I will be involved in include germline variant calling, gene fusion detection, and structural variants. All method development is performed only using data specifically approved for this purpose, but then is applied to other data stored at the GDC to provide high quality datasets for approved research projects. I also help oversee the scientific quality of the resulting data, which can include performing small scale research projects to inform the community about whether novelties in the harmonized data are realistic or artifacts that should be culled to better inform discovery for improvement treatments/markers for cancer. Hellstrand, Kristoffer GOTEBORG UNIVERSITY Role of interferon lambda in leukemia Oct26, 2016 rejected Acute myeloid leukemia (AML) is characterized by rapid accumulation of white blood cells in bone marrow and other organs. The currently available therapy fails in a large proportion of patients, and only 30-50% of patients survive AML for >5 years. In this study, we wish to determine the impact of variation at genes encoding interferon-lambda. Such genetic variation has earlier been shown to herald favorable outcome in chronic infections, but information about the impact of interferon-lambda-related genes and cancer prognosis is largely lacking. This project may further define the molecular basis for the course of AML and point to new therapeutic opportunities. Background and objectives In humans, genome-wide association studies implicate IFNLs in the clearance of hepatitis C virus (HCV) infection. Hence, the allelic distribution at single nucleotide polymorphisms (SNP) within the locus on chromosome 19 that encodes IFNLs predict the clinical outcome of HCV-infected patients and prognosticates the efficiency of therapy. While the mechanisms that link variation in IFNL SNPs to clearance of HCV remain to be defined, a prevailing hypothesis is that certain IFNL SNPs are associated with a constitutive activation of IFN-stimulated genes (ISGs) that predict unfavorable outcome of HCV infection. The role of the interplay between the IFNL genotype and induction of ISGs in non-infectious diseases remains unknown. We were recently granted permission to mine the TCGA-database to determine clinical outcome in relation to IFNL genotype and ISG induction in adult leukemia. We hereby apply for permission to mine also the TARGET database comprising pediatric patients with acute myeloid leukemia (AML) to enable similar studies on patients with childhood AML. While the use of intensive chemotherapy in conjunction with improved supportive care have improved the prognosis of pediatric AML in recent decades, as many as 30-40% of children diagnosed with AML will succumb to the disease. Understanding the molecular mechanisms of childhood AML may identify novel risk factors of relevance to the choice of therapy (including allogeneic bone marrow transplantation) and point towards hitherto unexplored therapeutic targets. Study and analysis plan We wish to analyze patients in the TARGET database with pediatric acute myeloid leukemia (AML) to determine clinical outcome in relation to germline IFN-L genotype and ISG induction. We wish to compare the clinical outcomes of pediatric AML patients with various IFN-L SNP genotypes. If available, we would like to also access blood and bone marrow samples for analysis of phenotypic markers of relevance to IFN-L and ISG gene status. In our view, these analyses are consistent with the data use limitations. In this project, we will not collaborate outside of our institution. Heltemes Harris, Lynn UNIVERSITY OF MINNESOTA Novel genes involved in ALL induction Dec19, 2016 approved We will use Target ALL data to determine if we can identify new genetic abnormalities in human leukemic samples and determine if they correlate with poor outcomes. We have generated several mouse models of acute lymphoblastic leukemia (ALL) with genes known to be mutated in pediatric ALL including Pax5, Ebf1 and Ikzf1. These models have allowed us to identify additional genes that appear to be critical for induction of progenitor ALL and may provide new, novel ways to classify and potentially treat pediatric ALL patients. One aim was to identify additional driver mutations in pediatric ALL to better classify and treat patients. To begin to address this question, we used forward genetic screens on a couple of mouse models that allowed us to identified novel genes that may potentially drive leukemia induction. Analysis of all or our leukemia’s by RNA-Seq analysis identified several additional genes including a truncated, over-expressed form of Sos1. Importantly, expression of truncated Sos1 correlated inversely with survival in our mouse models. Interestingly, pediatric patients with Noonan syndrome, a subset who have point mutations in Sos1, have an 8-fold increased risk of leukemia. The point mutations in Sos1 in Noonan syndrome and the N-terminal truncation of Sos1 in our mouse model both disrupt the N-terminal auto inhibitory domains of Sos1. Thus, truncated Sos1 could act as an oncogenic mutation much as a point mutated form of Sos1 does in Noonen syndrome. To begin to assess a role for Sos1 in human pediatric ALL, we examined the freely available RNA-Seq data from these TARGET ALL datasets. Our preliminary analysis suggests that ~50% of pediatric patients in this dataset may have a truncated form of Sos1 similar to what we observed in our mouse model. However, the publically available data only shows relative exon expression and thus is very limited in what we can do with it. We request access to the Target ALL data to determine if we can identify truncated Sos1 genetic abnormalities in human pediatric leukemic samples, determine if it arises via mechanisms similar to that seen in our mouse model, and establish whether it correlates with disease outcome. Finally, our goal would be to follow up our results with options in the clinic as there are currently Sos1 inhibitors being developed. Hendrickx, Wouter SIDRA MEDICAL AND RESEARCH CENTER Pan-cancer immune signatures and oncogenic pathways in pediatric and adult solid tumors Mar03, 2022 approved Cancer is one of the deadliest diseases is described as the uncontrolled growth of human cells. Understanding the difference between cancer patients requires a lot of effort because of then nature of the disease. Researchers started characterizing the disease in the past few decades by exploring the nature of biomacromolecules, i.e. DNA, RNA, and proteins. Alterations from the normal state (mutations, variation in the number of copies, changes in gene expression, and other alteration) in the biomacromolecules are often associated with uncontrolled cell growth, leading to tumor progression. So, it is essential to characterize and catalog all the alterations present in the cancer cells of the cancer patients for better diagnosis and therapy. Understanding the effect these alterations have on the other cells and how they interact with each other is a crucial element in our endeavor to find better treatments and eventually cure this disease. The high-throughput genomic and proteomic data of different cancer types are emerging and can be used to identify patient subgroups for tailored precision therapy. In the past, independent studies have identified molecular markers including DNA, RNA, Protein, Metabolite, and other non-coding RNAs that have been used to predict the molecular subtypes and clinical outcomes of various cancers. However, the integrative genomics analysis using multiple datasets/studies provides much higher information content than independent datasets. In the past decade, various large-scale genomic studies improved our understanding of the genetic pinning's of pediatric and adult cancers. The large-scale sequencing studies unveiled the distinct nature of the tumor genetic alterations in different cancer types and helped identify the clinically relevant subtypes. The genomic landscape studies also uncovered the immune subtypes of various tumors and their role in designing the treatment strategy. Access to the large databases available publicly in dbGaP and GEO, etc., allows researchers to perform integrative analyses to identify and characterize the genetics of various cancer types. The primary goal of the Pediatric Cancer Omics Laboratory at Sidra Medicine is to establish the SIDRA medicine as the international leader in developing immunotherapy, discovering and characterization of therapeutic targets in pediatric and adult cancers using an integrated multi-disciplinary systems biology approach. The Pediatric Cancer Omics Laboratory utilizes various high-throughput sequencing methodologies, including single-cell and spatial omics approaches to identify a new paradigm of bringing personalized medicine to routine clinical care and treatment. We generate high-throughput sequencing data from patient samples to predict the genetic alterations in different pediatric and adult cancers to achieve our goals. In order to extend our effort to identify the genetic landscape of pediatric and adult tumors, we plan to use publicly available high-throughput sequencing datasets present in dbGap and GEO, etc. We aim to perform an integrative data analysis approach to identify the driver mutations, copy number alterations, and gene expression variation in pediatric and adult cancer types by accessing the data from dbGaP. We are also interested in identifying and characterizing the immune landscape of pediatric and adult tumors using integrative omics methods. Overall, we plan to use the dbGaP datasets to validate our in-house data findings and further genomic exploration, using an in house high performance computing cluster. We intend to publish all finding in pre-publication repositories and aim to publish in high impact peer reviewed journals. We intend to share all our bioinformatic code on GitHub (https://github.com/Sidra-TBI-FCO) and supplementary data on FigShare. Herold, Nikolas KAROLINSKA INSTITUTE The Effects of Epitranscriptomic Modifications on pediatric Acute Lymphoblastic Leukemia Stem Cells Jul15, 2020 approved Acute lymphoblastic leukemia (ALL) is the most prevalent hematological cancer in children younger than 14 years of age and despite progress in intensive chemotherapy, 20-25% of pediatric and over 50% of adult patients show resistance to therapy and relapse. We want to investigate the role epitranscriptomic modifications in T-cell acute lymphoblastic leukemia (T-ALL) cancer cell generation and maintenance using the requested whole transcriptome RNA sequencing dataset of T-ALL patients. While ADAR1-mediated RNA editing, RNA splicing and RNA methylation promote LSC generation in adult myeloid malignancies, it has not yet been extensively studied in pediatric or young adult ALL. Because of high relapse related mortality rates, there is a pressing unmet medical need to predict and prevent acute leukemia relapse in children and to define relevant exposures that promote LSC generation. For this study we will be comparing the 121 treatment-refractory and/or relapsed pediatric T cell acute lymphoblastic leukemia samples from requested dataset. This study includes three sub-aims 1) RNA editing through the editase ADAR1, 2) genes involved in RNA methylation (such as METTL3, METTL14 etc) and 3) RNA splicing with emphasis on adhesion molecules. Introduction Normal hematopoietic stem cells (HSCs) give rise to blood cells throughout life, and the ability to isolate HSCs and progenitors has facilitated a better understanding of the molecular regulation of their functional attributes including cell survival and self-renewal. In children younger than 14 years of age, Acute Lymphoblastic Leukemia (ALL) is the most common form of leukemia, consisting of about 90% of all leukemia cases. ALL distribution is typically bimodal, with a first peak in childhood around 2-3-years of age, and the second peak during adulthood around 50-years of age. As increasing numbers of patients with childhood leukemia are successfully cured, further efforts focused on mapping the underlying cause in order to precede disease, improve therapeutic strategies and thus improve survivor’s life quality is needed. Normal hematopoietic stem cells (HSCs) accumulate genetic (DNA) and epitranscriptomic (RNA) changes that promote the emergence of pre-leukemic clones that have gained survival and proliferative advantages. The recently coined term “epitranscriptome” describes numerous post-transcriptional RNA modifications that introduce functionally relevant changes to the transcriptome. Epitrancriptomic modifications include several important RNA processing events, including methylation (N6-methyladenosine (m6A), RNA editing, and alternative mRNA splicing. Widespread aberrant epitranscriptomic ADAR1-mediated adenosine-to-inosine (A-to-I) RNA editing and APOBEC3-mediated cytosine-to-uracil (C-to-U) RNA editing, splicing of adhesion molecules such as CD44 and unstable RNA methylation have all been associated with clinical characteristics of several cancer types and generation of leukemia initiating cells with enhanced pro-survival and self-renewal capacity. We would like to use this dataset to analyze these epitranscriptomic modifications, which will aid the downstream functional and mechanistic overexpression and knockdown studies. By providing a more mechanistic understanding of the role of the epitranscriptome in pediatric cancer, the proposed study will inform future RNA mutation detection and inhibition strategies that may help to obviate cancer resistance and relapse. Aims The general aim of the proposed study is to evaluate the role of epitranscriptomic modifications and its role in pediatric acute lymphoblastic leukemia, specifically we will be looking at: 1) adhesion molecules such as CD44 and its isoforms in regard to LSC homing, survival and the self-renewal capacity. 2) genes involved in RNA methylation, specifically genes involved in m6A, such as genes in its “writer” and “reader” complex 3) A-to-I and C-to-U RNA mutation signatures in pediatric T-ALL, which will aid the downstream functional and mechanistic overexpression and knockdown studies Design ALL results in inhibition of differentiation and the subsequent accumulation of immature blood cells at various stages of incomplete maturation. In some patients, the LSCs can evade traditional chemotherapy and therefore contribute to disease progression, and in rare cases, relapse. Although specific inhibitors (e.g Doxorubicin, Dexamethasone and Methotrexate) have dramatically improved therapy and significantly slowed disease progression by eradication the bulk of ALL cells in the circulation, they sometimes fail to eliminate quiescent leukemic stem cells residing in the bone marrow niche. The LSCs are able to drive disease relapse and may eventually contribute to the emergence of treatment resistant ALL. We will be using existing data-set published by Masafumi Seki et., al 2017 in nature Genetics (DDBJ accession number JGAS00000000090 and TARGET data from dbGaP accession number phs000464) Herold, Nikolas KAROLINSKA INSTITUTE SNP-based and gene-expression-based prediction of treatment responses Feb23, 2023 approved Acute myeloid leukaemia is an aggressive blood cancer with poor survival. While effective treatments exist, many patients cannot be cured due to therapy resistance. Identifying patients with expected treatment resistance against the main AML drug cytarabine is important in order to stratify those patients to alternative treatments and clinical trials. One main resistance factor is SAMHD1 whose expression positively correlates with poor treatment outcomes. As strategies to inhibit SAMHD1 exist, identifying patients who would benefit most of SAMHD1 inhibitors is an unmet medical need. This project wants to explore whether a multiparametric model including gene expression of SAMHD1 and a novel family of endogenous SAMHD1 inhibitors together with two published predictive SNP scores can more reliably distinguish patients responding well from patients that respond poorly to cytarabine-based AML-therapy. Acute myeloid leukaemia (AML) is the most common and deadliest acute blood cancer with approximately 300 000 new cases every year worldwide. Five-year overall survival is still only about 30%, even though survival in paediatric patients is approaching 80%. The main reason for treatment failure is resistance to AML-directed drugs, in particular cytarabine (ara-C). Our laboratory has demonstrated that leukemic expression of the protein SAMHD1 leads to resistance against ara-C in adult and paediatric AML and that SAMHD1 expression at diagnosis is prognostic for outcomes in patients treated with ara-C (PMID: 28067901, 30341277). We have recently identified a novel family of endogenous SAMHD1 inhibitors, and when combining expression of both SAMHD1 and the endogenous SAMHD1 inhibitors in a score, treatment outcomes can be predicted even more accurately with a strong discrimination in several datasets including TCGA and TARGET-AML (unpublished; ethical permit: Dnr 2018/464-31/2). Recently, a score of 10 SNPs involving genes associated with ara-C metabolism as well as 3 additional SNPs involving SAMHD1 have been published to correlate with responses to ara-C treatment in AML (PMID: 34990262, 36689724). To allow even better prediction of ara-C responses, we now wish to compare the predictive potential of our gene expression score with predictions of 10-SNP and the 3-SAMHD1-SNP score for which we need to call the variants from whole-genome sequencing data. Aim and goals The aim is to compare the performance of two published SNP-based scores with our gene-expression based score in predicting treatment responses and outcomes in paediatric and adult patients with AML. Correlate the 10-SNP and 3-SAMHD1-SNP score with event-free and overall survival in AML patients from the TARGET-AML and TCGA-LAML cohort. Compare the performance of the 10-SNP and 3-SAMHD1-SNP score with our gene-expression-based score. Combining the 10-SNP and 3-SAMHD1-SNP score with our gene-expression-based score and evaluating whether the scores are independent and to see whether combination leads to even better outcome predictions. Design We will call the variants for the 13 SNPs described above from whole-genome DNA data in the TARGET-AML and TCGA-LAML cohort. We will generate SNP scores based on these 13 published SNPs, group patients according to these scores and perform survival analyses based on the Kaplan-Meyer method as well as Cox regression analyses. We will then compare performance of the two SNP scores with our already analysed gene-expression score and generate Cox regression models that take into account all three scores. Hettig, Andrea QIAGEN REDWOOD CITY, INC. TARGET Data Use in Genetic and Transcriptional Landscapes of Childhood Cancers Mar26, 2018 approved QIAGEN has developed advanced software and novel algorithms for analyzing next-generation sequencing (NGS) data, including the Omicsoft aligner (OSA) for mapping RNA-Seq reads and the novel gene fusion detection method – FusionMap. We have gained significant knowledge from analyzing multiplatform cancer genomics data and identified recurrent gene fusion and alternative splicing events across multiple tumor types. The research objective of analyzing the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) data sets is to identify biomarkers and potential novel targets for drug intervention in pediatric cancers. It is well known that the types of cancer found in children are very different from those in adults. We plan to investigate gene mutation and transcription differences within and across tumors and by comparing with other public datasets. Our goal is to identify potential tumor drivers and improve our understanding on the mechanism of tumorigenesis in pediatric patients. Identifying the genetic alterations and biological pathways is essential for understanding the initiation of cancer, how a tumor progresses, and why a treatment is effective to some patients but not to others. In this project, we plan to perform integrated data analysis using the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) datasets, and to identify genetic mutations, gene fusions and other potential driving factors that are unique to childhood cancers. Our goal is to identify clinical biomarkers for improved diagnosis and potential therapeutic targets for drug intervention in pediatric cancers. Hickman, Emma IMMUNOCORE, LTD Target discovery for TCR therapy of pediatric cancers Jun20, 2016 closed Immunotherapy is a highly promising new treatment method for many types of cancers, which have previously proved resistant to more conventional therapies. By re-directing the cells of a patient’s immune system to recognize genes that are selectively expressed in cancer cells we have recently demonstrated clinical efficacy in uveal melanoma, a disease which is otherwise very difficult to treat. We would like to extend our work to investigate similar therapeutic agents for pediatric cancers, including indications such as Neuroblastoma. Due to the fact that some of these tumors are relatively rare it can prove difficult to obtain study material in which to search for target genes for these diseases. Access to the TARGET dataset would dramatically increase our capacity to search and identify these genes and would enable the discovery of new immunotherapies for pediatric cancers. In addition, the data would provide valuable information that would allow us to determine whether any of our existing agents could have potential therapeutic efficacy in Neuroblastoma. The objective of this study is to utilize the large transcriptomic data files generated by the TARGET consortium to assist in the identification of novel therapeutic targets for monoclonal TCRs, with the ultimate aim of developing novel immune therapeutics for childhood malignancies. Immunocore has previously developed ImmTACs (Immune Mobilising Monoclonal TCRs Against Cancer), a novel class of soluble bispecific immunotherapeutic agents consisting of a monoclonal TCR linked to an anti-CD3 single chain variable fragment (scFv) effector. Precision engineered TCRs possess very high affinity for target tumour antigens and have the ability to re-direct circulating cytotoxic T cells to effect tumour killing in patients. The first ImmTAC (IMCgp100) has recently shown encouraging signs of long-term clinical efficacy in both uveal and cutaneous melanoma. In parallel to this, we have developed internal expertise in the discovery of novel targets for TCR therapy using a combination of trascriptomics data and mass spectrometry. In order to extend our research to the treatment of pediatric cancers we would like to access the raw mRNA sequencing data generated by the TARGET consortium, and in particular the dataset for Neuroblastoma. Primarily we wish to compare candidate target genes at the RNA expression level, using the power of the large datasets available from the TARGET study to ensure that we are selecting targets that will ultimately deliver the maximum benefit to patients. We also believe that there may be significant value at looking for tumor-specific splicing events and that this may highlight new antigens that we have previously overlooked. Hill, Jennifer NRC-INSTITUTE FOR BIOLOGICAL SCIENCES The identification of therapeutic targets in pediatric ALL Apr28, 2021 approved ALL is the most frequent cancer diagnosed in children. Despite the great progress achieved over the last 40 years in treating ALL, most children that relapse will succumb to their disease. This poor outcome reflects the lack of unique personalised treatment that specifically target high-risk and relapsed ALL. Hence, this project aims to address this issue by identifying relevant therapeutic targets for high-risk and relapsed ALL using the sequencing data generated by TARGET. Therapeutic antibodies will then be generated against potentially relevant target in the hope of improving the prognosis of children with high-risk and relapsed ALL. Acute Lymphoblastic Leukemia (ALL) is the most frequent cancer diagnosed in children. Despite the great progress achieved over the last 40 years in treating ALL, most children that relapse will succumb to their disease. This poor outcome reflects the lack of unique personalised treatment that specifically target high-risk and relapsed ALL. Hence, this project aims to address this issue by identifying relevant therapeutic targets for high-risk and relapsed ALL using the two following approaches: 1) We will use the TARGET acute lymphoblastic leukemia (ALL) RNA-seq data to compare gene expression in cancer versus normal tissues to identify differentially expressed genes and isoforms for each ALL subtype. We also have assembled large datasets for multiple tumor type. This will allow us to identify genes that are highly specific for pediatric ALL. 2) Using functional genomics approaches, we will screen the genes previously identified in 1) using siRNA/shRNA/CRISPR to determine which one are required for survival and growth of pediatric ALL samples. The goal is to identify genes that we can potentially target with therapeutic antibodies. Hirst, Martin PROVINCIAL HEALTH SERVICES AUTHORITY Synovial sarcoma and MRT epigenetic comparison study Jul12, 2019 approved The term "epigenome" refers to chemical modifications of DNA and proteins that control genome activity. The genome remains mostly the same throughout an individual's life, whereas the epigenome changes during normal development. Deviations from these normal changes in the epigenome can have profound effects on which genes are turned on, causing cells to behave abnormally, sometimes leading to cancer. Synovial sarcoma is a common form of soft-tissue tumors in pediatric and adolescent patients. Like malignant rhabdoid cancers, synovial sarcomas are aggressive tumours characterized by genetic alterations that disrupt the same epigenomic remodeling complex. We have some understanding of the normal role of this complex, but do not fully understand how its disruption impacts the epigenome and the pathology of these cancers. By comparing our existing pediatric tumours datasets with the epigenetic and transcriptional datasets of MRTs (subset of TARGET study), we hope to identify similarities and gain a better understanding of the epigenetic deregulation that drives these pediatric cancers Malignant rhabdoid tumours (MRTs) are characterized by genetic inactivation of SMARCB1, a member of the SWI/SNF chromatin-remodeling complex. MRTs are one of a family of cancers that are characterized by genetic lesions involving members of the SWI/SNF complex; often seen in the pediatric context as the sole recurrent genetic alteration. Synovial sarcoma is another member of this family and is one of the most common soft-tissue tumors in adolescents and pediatric patients, with approximately one third of cases occurring in the first two decades of life. It is a translocation-associated sarcoma where the underlying chromosomal event generates the SS18-SSX fusion protein that disrupts the SWI/SNF complex. Our group has been profiling the epigenome and transcriptome of a cohort of primary synovial sarcoma samples and subsequent comparisons to reference epigenetic datasets supports a mesenchymal/neural progenitor origin, similar to our observations for MRT (PMID: 26977886). In order to gain a better understanding of the underlying epigenetic landscape of pediatric tumours harbouring mutations to SWI/SNF complex members, we would like to compare this data against epigenetic and transcriptional datasets from MRTs. We will not be combining or merging this data with other datasets and there will be no additional risks to participants This analysis will increase our understanding of the mechanisms driving paediatric cancers harbouring genetic lesions to SWI/SNF complex members and potentially lead to novel treatment strategies. Hirst, Martin BRITISH COLUMBIA CANCER AGENCY Synovial sarcoma and MRT epigenetic comparison study Jul31, 2017 closed The term "epigenome" refers to chemical modifications of DNA and proteins that control genome activity. The genome remains mostly the same throughout an individual's life, whereas the epigenome changes during normal development. Deviations from these normal changes in the epigenome can have profound effects on which genes are turned on, causing cells to behave abnormally, sometimes leading to cancer. Synovial sarcoma is a common form of soft-tissue tumors in pediatric and adolescent patients. Like malignant rhabdoid cancers, synovial sarcomas are aggressive tumours characterized by genetic alterations that disrupt the same epigenomic remodeling complex. We have some understanding of the normal role of this complex, but do not fully understand how its disruption impacts the epigenome and the pathology of these cancers. By comparing our existing pediatric tumours datasets with the epigenetic and transcriptional datasets of MRTs (subset of TARGET study), we hope to identify similarities and gain a better understanding of the epigenetic deregulation that drives these pediatric cancers. Malignant rhabdoid tumours (MRTs) are characterized by genetic inactivation of SMARCB1, a member of the SWI/SNF chromatin-remodeling complex. MRTs are one of a family of cancers that are characterized by genetic lesions involving members of the SWI/SNF complex; often seen in the pediatric context as the sole recurrent genetic alteration. Synovial sarcoma is another member of this family and is one of the most common soft-tissue tumors in adolescents and pediatric patients, with approximately one third of cases occurring in the first two decades of life. It is a translocation-associated sarcoma where the underlying chromosomal event generates the SS18-SSX fusion protein that disrupts the SWI/SNF complex. Our group has been profiling the epigenome and transcriptome of a cohort of primary synovial sarcoma samples and subsequent comparisons to reference epigenetic datasets supports a mesenchymal/neural progenitor origin, similar to our observations for MRT (PMID: 26977886). In order to gain a better understanding of the underlying epigenetic landscape of pediatric tumours harbouring mutations to SWI/SNF complex members, we would like to compare this data against epigenetic and transcriptional datasets from MRTs. We will not be combining or merging this data with other datasets and there will be no additional risks to participants This analysis will increase our understanding of the mechanisms driving paediatric cancers harbouring genetic lesions to SWI/SNF complex members and potentially lead to novel treatment strategies. HOADLEY, KATHERINE UNIV OF NORTH CAROLINA CHAPEL HILL Integrative genomics analysis of cancer Jul27, 2023 approved The Center for Cancer Genomics is leading an analysis working group (AWG) for the Human Cancer Models Initiative (HCMI). HCMI’s repository include patient-derived next-generation cancer models, case-associated tumors and matched normal samples annotated with genomics and molecular data, from rare adult and pediatric cancers. The purpose of the AWG is to classify patient-derived next-generation cancer models into cancer subtypes, validate that the models preserve the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. HCMI AWG intends to use the TARGET dataset as a reference dataset to characterize the HCMI models/tumors from pediatric cases. The findings would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology. Cancer is a complicated disease with numerous cancer types both within and across different organs or tissues in the body. Our goal is to find groups of tumors that have features associated with clinical data such as risk factors, survival, or response to therapy. The Center for Cancer Genomics (CCG) is leading an analysis working group (AWG) for the Human Cancer Models Initiative (HCMI). The purpose of the HCMI AWG is to classify HCMI’s patient-derived next-generation cancer models into cancer subtypes, validate that the models retain the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. In order to map the HCMI tumors and models in cancer genetic taxonomy, the group will be using methods such as (1) Celligner algorithm to map tumors and models against reference datasets, and (2) OncoMatch approach to analyze regulatory networks and model fidelity, and (3) place each tumor and model in the subtypes classified by the Tumor Molecular Pathology (TMP) work group. The group will also be analyzing copy numbers, mutations, mutation signatures and structural variants from the Whole Genome Sequencing Data of HCMI tumors and models and compare them against those of the reference datasets: TARGET (for pediatric cases), and TCGA (for adult cases). The findings from the AWG would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology. I am the PI of the Genome Data Analysis Center Specialized for RNA sequencing analysis. We will be providing transcriptome analysis. Current projects are re-analyzing data from TCGA. Our interests include using the transcriptome plus other data types to further classify and better understand cancer. In addition to unsupervised classification methods and gene signature development and application, we are also integrating the transcriptome with the other data types of TCGA. This includes subtype comparisons, fusion analysis and comparison with the DNA sequencing data, integration of RNA and DNA sequencing for mutation calling, SNP comparisons across data types with sequence information, effects of miRNA and DNA methylation on gene expression, transcriptome and protein interactions, and relationships with clinical and treatment information. Gene expression signatures, subtypes, or other molecular features identified in this cohort will be applied to external cohorts for validation. Alternatively, previously derived signatures, subtypes, and features will be tested in TCGA samples. We will be performing analysis within cancer types as well as across cancer types for pancancer analyses. We request access to phs001175 CTSP DLBCL project. As part of the AWG, we will analyze RNA sequencing data and integrate with other datatypes. In particular for DLBCL, there will be a focus on fusions. In addition, we might compare data back to TCGA pancancer set or the ~30 DLBCL that had been sequenced in TCGA. We also plan to include other samples from this cohort previously sequenced in Phs001444 Genomic Variation in DLBC. We request access to the phs001486 HCMI project. We will be working as part of the analysis working group, primarily focused on RNA expression analyses providing analysis to compare the cancer models to their primary tumors and collaborate with other members of the awg for integrated analyses. We request access to phs001140 ALCHEMIST project. We will be part of the AWG and will classify the lung adenocarcinomas molecular subtypes, compare to TCGA samples, find correlates with response to therapy/treatment arms, and integrated RNA data with other GDAN centers. We request access to phs002253 MILD project. We are members of the AWG and will focus on RNA analysis and data integration. Our goal will be to determine if early lung cancers represent a unique type of lung cancers compared to later stage tumors such as those in TCGA. We will combine this data with data from TCGA and maybe even ALCH samples depending on the time frame for primary analyses. We will work with other GDACs for data integration of multiple omics data. Holmberg, Johan KAROLINSKA INSTITUTE Identification and analysis of novel gene fusions in neuroblastoma Jun22, 2017 approved TARGET Neuroblastoma dataset (accession number phs000218.v16.p2) contains more than 200 pairs fully characterized neuroblastoma patient cases (matched normal/tumor pairs); whole-exome sequencing were performed in 221 cases and whole-genome sequencing were performed in 18 cases; moreover, Pair-end RNA-seq were carried out on 10 cases of whole-genome sequenced samples. These whole-exome sequencing data together with pair-end RNA-seq data will be ideal for fusion protein detection. We would like to analyze this dataset to find potential interesting fusion protein and study their functions both in vitro and in vivo. Fusion proteins play a critical role in cancer development. We are interested in discover of novel protein fusion events in neuroblastoma. We would like to examine the function of these potential cancer-driving fusion proteins both in vitro and in vivo. TARGET Neuroblastoma dataset (accession number phs000218.v16.p2) contains more than 200 pairs fully characterized neuroblastoma patient cases (matched normal/tumor pairs); whole-exome sequencing were performed in 221 cases and whole-genome sequencing were performed in 18 cases; moreover, Pair-end RNA-seq were carried out on 10 cases of whole-genome sequenced samples. These whole-exome sequencing data together with pair-end RNA-seq data will be ideal for fusion protein detection. Pair-end RNA-seq data will be analyzed using available bioinformatics tools such as TopHat-fusion and R package Chimera. Whole-exome sequencing data could be analyzed by subread algorithm. Holmfeldt, Linda UPPSALA UNIVERSITY Multi-omics analysis of pediatric acute myeloid leukemia, with a focus on the subtype “not otherwise specified (NOS)” as well as relapse Jan17, 2020 closed Acute myeloid leukemia (AML) is a cancer of the bone marrow and blood that is associated with a rather dismal outcome. To improve the survival of pediatric AML patients, more knowledge is needed regarding the molecular basis of tumor onset, progression and resistance to therapy. In this project, we will investigate various genomic-, epigenomic- and transcriptomic alterations associated with the pediatric AML cells at the time of initial diagnosis compared to relapse, as well as in a specific subtype of pediatric AML referred to as “not otherwise specified”, or NOS. Our studies include so-called machine learning based analyses. This approach may help us to find alterations that are connected to therapy resistance. In addition, our studies may lead to the identification of novel biomarkers that can be used for new and less invasive diagnostics and improved risk-stratification; all for an enhanced survival and quality of life for pediatric patients diagnosed with AML. Acute myeloid leukemia (AML) is a cancer of the bone marrow, well known for its heterogeneity. Patients usually respond to initial chemotherapeutic treatment and reach complete remission. However, many of them relapse, and the relapse clones are very often resistant to the therapy. Recent studies have shown that the patterns of genomic alterations found in adult AML differ compared to what is seen for pediatric AML, and focused studies on pediatric AML are needed to further extend our knowledge about this disease. Studying the various types of genomic, epigenomic and transcriptomic alterations that potentially differ between diagnostic and relapse AML cells could uncover putative markers that eventually may lead to novel treatment alternatives. Further, there is still a large heterogeneous group of pediatric AML patients with none of the genetic aberrations currently included in the WHO classification system of AML. Additional detailed multi-omics studies are necessary to get a better understanding of the underlying causes of leukemia formation for this subtype, referred to as “not otherwise specified” (NOS). In this study, we are utilizing rule based machine learning models, applying them on in-house-generated whole genome-, whole exome- and RNA sequencing data as well as microarray-based genome-wide DNA methylation data from diagnostic and relapse material from 26 pediatric AML cases, as well as on diagnostic material from 60-70 pediatric AML NOS cases. Our rule based models are transparent, which allows us to visualize them in the form of rule networks. These rule networks are not limited to highlighting the most interesting genes, but can also show the dynamics that govern for instance the transformation from diagnosis to relapse (or from treatment sensitive to treatment resistant, etc.) and the interactions between genes that lead to a certain condition. Access to the TARGET: Acute Myeloid Leukemia (AML) dataset with accession number phs000465.v21.p8, which is a sub-study of the Pediatric Cancer Research study phs000218.v24.p8, would allow us to extend our current rule based machine learning models on a pediatric validation cohort, investigating specific subtypes such as AML NOS, as well as further patient-matched diagnosis and relapse specimens. In addition, we wish to combine appropriate subsets of the TARGET AML dataset with our (non-dbGaP related) in-house pediatric relapse and NOS AML datasets, respectively, and after proper batch-correction, investigate if a larger starting cohort potentially could allow the detection of any further, more rare interactions. Combining the datasets will not create any additional risk to the study participants. In addition, we aim to utilize the TARGET AML dataset for recurrence screening of genomic, transcriptomic and epigenomic variants we recently have identified in relapsed AML (Stratmann et al., Blood Adv. 2021 and 2022; and Shahidi-Dadras et al., Unpublished data), as well as in pediatric AML NOS (Herlin et al., Unpublished data). We believe that our studies will advance the understanding of the underlying causes of pediatric AML onset, progression and therapy resistance, and lead to the identification of new biomarkers for improved risk-stratification, as well as the development of novel therapeutic options, and thus result in a higher survival rate of pediatric AML patients. We intend to publish or otherwise broadly share the findings from our studies with the scientific community. Hong, Andrew EMORY UNIVERSITY Identifying molecular targets in childhood cancers Sep30, 2020 approved Sequencing efforts from the 2000s and 2010s have provided significant advances in our knowledge of cancers in adults and children. As we are identifying new mechanisms of biology in the laboratory from childhood cancer models (e.g. cells that grow on plastic, in a cluster of cells in 3D, or in mice), it is important to be able to see if our results can be identified in children affected by cancer who have participated in these sequencing efforts. Furthermore, looking at the gene expression levels, one can also ask whether some of our hypotheses in the lab translate to patients. Our goals are to use the de-identified patient data to help us better understand the relevance of our findings in the lab. We will be using genomic data arising from TCGA, TARGET or other cancer efforts (e.g., Moonshot) to compare and contrast with primarily childhood cancer models of rhabdoid tumors, Wilms tumor, epitheliod sarcomas, rhabodmyosarcomas and other rare tumors that affect children (e.g. cell lines, organoids, patient derived xenografts) used in the laboratory. We plan to analyze specific mutations together (e.g. SMARCB1 in MRTs, CTNNB1 in WTs, Rasopathies in RMS - all childhood cancers and then more broadly across adult and other childhood cancers) at the exome or transcriptome level. We are trying to determine if our findings in our pediatric cancer models can then be applied to patients who were a part of these sequencing efforts. We will combine the transcriptome datasets together or will focus on specific mutations but these will not create additional risks to participants. Because TARGET aimed to be comprehensive but ultimately lacked study of some subpopulations, we are looking to be inclusive by adding several other datasets (rhabdomyosarcoma, renal medullary carcinoma). Again, although we will analyze these datasets together, the analysis will be at the transcriptome level so there is no increased risk to individuals. The ultimate goal of my lab (thehonglab.org) is to identify new therapeutic targets that affect children. Our intention is to publish this data and broadly share our findings with the scientific community from analysis of this data along with our studies in the laboratory. We will abide by the respective publishing guidelines and include the required acknowledgement statements for the respective projects. Houlston, Richard THE INSTITUTE OF CANCER RESEARCH: ROYAL CANCER HOSPITAL Gene Expression Changes in Wilms Tumour Sep12, 2024 approved Wilms Tumour (WT) is the most common childhood kidney cancer. Such cancers can be caused by a mutation in a gene which then changes the protein that the gene creates. However, most cases of WT are not caused by these obvious mutations, but instead by mutations which slightly change when and how strongly genes are expressed. We have evidence for two groups of mutations on two areas of the genome which are associated strongly with WT. We’re now hoping to find the gene whose expression is affected by the presence of these mutations, which would then allow us to identify which genes are important to WT development. This could allow for better understanding of the disease, and for improved identification and screening for at-risk children. Wilms Tumour (WT) is the most common paediatric kidney cancer, and its genetic causes have yet to be fully understood. Coding mutations in genes such as WT1 have been discovered namely due to their penetrant phenotype, but it is more challenging to identify the commonly occurring, non-coding variants which increase WT risk. We have preformed a meta-analysis of three genome wide association studies for WT, and identified two loci which are strongly associated with WT incidence. One locus on chromosome 2 centres on the essential RNA helicase and DNA repair gene DDX1. Due to the paucity of childhood gene expression data in both tumour and normal conditions, we have used adult expression quantitative trait loci (eQTL) to identify how our variants of interest affect gene expression. We now would like to perform our own eQTL on tumour and/or normal tissue from the TARGET-WT study, to hopefully corroborate our findings in adult kidney that DDX1 expression is lowered in presence of the alternative allele. This would give us confidence that DDX1 is the gene which our identified variant affects and give us a basis to investigate why the lowering of DDX1 expression is associated with WT incidence. We have also received anonymised genotype and RNAseq data from a group based at the Sanger Institute in tumour and normal tissue from WT patients which indicates DDX1 expression is indeed lowered in presence of our variant. We aim to analyse the TARGET-WT data separately and compare the variance and expression values between these two datasets initially to note whether there are significant differences especially seeing as the TARGET-WT is selected for high-risk subtypes while the Sanger dataset is not. If there are no significant differences between these two datasets, we would like to perform eQTL analysis on the datasets in combination to increase the number of samples for a stronger significance test. We do not foresee any risks for participants as the data are anonymised and will be segregated into allele types for the one locus. Houlston, Richard UNIVERSITY OF LONDON INST OF CANCER RES Post GWAS Characterisation of Acute Lymphoblastic Leukaemia Associated SNPs Mar26, 2018 closed We intend to use TARGET data to identify novel drug therapies in childhood acute lymphoblastic leukemia. We intend achieve this by investigating mutations in tumours and the biological pathways they occur in. The main purpose of these analyses, aside from understanding how ALL occurs, will be to highlight the highest number of new potential drug targets possible. For example where a gene implicated by our analysis is not inherently druggable it may be possible to identify additional downstream drug targets through pathway analysis. This strategy will provide for an increased probably of identifying novel drug treatments. We intend to use TARGET data to identify novel targets for drug therapy in pediatric acute lymphoblastic leukemia. We intend achieve this by investigating recurrent somatic non-coding mutations that impact on gene expression. Once we have identified such mutations we will perform pathway analyses to determine whether such mutations and their associated expression changes cluster in defined biological/signalling pathways. The main purpose of performing pathway analysis aside from informing our fundamental understanding of dysregulated pathways in tumorigenesis, will be to highlight potential novel drug targets. Where genes directly implicated by non-coding somatic mutations are not inherently druggable it may be possible to identify additional downstream targets for intervention through pathway/network analysis. We will identify potential novel therapeutic targets, using in silico methods for example canSAR (https://cansar.icr.ac.uk/) and other similar bioinformatic resources. This strategy will provide for an increased probably of identifying targetable nodes in a dysregulated tumourigenic pathway. If such nodes are identified we may move to appropriate in vivo systems to evaluate the feasibility and efficacy of drug intervention. In order to achieve this we will use whole genome sequencing in matched samples, tumour/normal, in reference to corresponding RNA-sequencing data. Hsieh, Tien-Chan UNIV OF MASSACHUSETTS MED SCH WORCESTER Computational discovery of long non-coding RNAs as novel relapse risk biomarkers for pediatric acute lymphoblastic leukemia May30, 2024 approved Acute lymphoblastic leukemia (ALL) is the most common cancer in children. Doctors currently use several biomarkers to determine the best treatment plans and predict the likelihood of relapse. We are investigating a specific type of biomarker called long non-coding RNAs (lncRNAs). These molecules do not produce proteins but can play crucial roles in cancer development. Because lncRNAs often exhibit very specific patterns related to particular diseases and tissues, we hypothesize that they can be used as prognostic biomarkers. New lncRNA biomarkers may enhance current methods of predicting B-ALL relapse, helping to tailor treatment plans more effectively and avoid unnecessary toxic treatments. We plan to use the MP2PRT-ALL, TARGET-ALL, Genomic Analysis of Relapsed Pediatric Acute Lymphoblastic Leukemia, and relevant datasets from NCBI GEO to conduct methodological genetics research on long non-coding RNAs (lncRNAs) in pediatric acute lymphoblastic leukemia (ALL). These datasets will be analyzed as independent cohorts. Acute lymphoblastic leukemia (ALL) is the most prevalent pediatric cancer. LncRNAs are transcripts longer than 200 nucleotides that do not encode proteins, and have emerged as promising biomarkers due to their highly disease- and tissue-specific expression patterns. This study aims to: Identify lncRNAs expressed in pediatric ALL patients by leveraging the MP2PRT-ALL, TARGET-ALL, and NCBI GEO datasets. Identify and characterize lncRNA biomarkers for B-ALL relapse risk using advanced machine learning algorithms and in silico biological function analysis. There is no additional risk to the individuals in these datasets, and the study will focus on the clinical outcomes reported in these cohorts. Hu, Xiaowen UNIVERSITY OF PENNSYLVANIA Functional characterization of long noncoding RNAs in human cancers Oct25, 2018 closed lncRNAs are long(>200nt), noncoding RNAs which are widely expressed in human tissue. They are key regulators of human disease such as cancer. Thousands of lncRNAs have been identified as abnormally altered in the cancer genome or differentially expressed in tumor tissues. These lncRNAs are associated with imbalanced gene regulation and aberrant biological processes that contribute to malignant transformation, thus can be promising diagnosis biomarker and therapeutic target for cancer. This study can provide valuable candidates of cancer-associated lncRNAs as the potential targets for cancer diagnosis and therapy. Sequencing technology has facilitated a new era of cancer research. Next-generation sequencing revealed that the human genome, including the noncoding regions, is pervasively transcribed. Up to 75% of the human genome can be transcribed into RNAs, whereas less than 2% encodes proteins. Much of the human transcriptome is composed of noncoding RNAs (ncRNAs), including long noncoding RNAs (lncRNAs), one of the major subtypes of ncRNAs (including antisense RNAs) and are referred to as RNA transcripts >200 nucleotides without apparent protein-coding potential. lncRNAs are known to be associated with human diseases especially with cancer and play important role in tumorigenesis. The requested data will help us comprehensively investigate the implication of lncRNAs in cancer development, progression and response to treatment in different cancer types including pediatric cancer. It’s an important tool for us to identify lncRNA candidates that are “drivers” in human cancer at a pan-cancer level. The data will also be used to validate molecular functions of lncRNA candidates in each cancer type. The requested pediatric datasets will only be used for identifying lncRNAs that are specifically functional in pediatric cancers. Datasets which have disease-specific restrictions will be used only for analysis of lncRNAs in the indicated cancer type. All requested datasets will only be used for biomedical research consistent with this proposed study and will not be combined with other datasets of diseases other than cancer, and will not combined with dataset outside of dbGaP. Findings from our study will be published and shared with the scientific community. Huang, Annie HOSPITAL FOR SICK CHLDRN (TORONTO) Evaluating Clinic-pathological significance of ATRT subtypes respective to non-CNS rhabdoid tumour Apr17, 2018 closed Atypical Teratoid/Rhabdoid Tumor (ATRT) is a rare and clinically aggressive childhood brain tumour that tend to affect children aged 2 or younger with poor outcome. ATRT is part of a larger family of rhabdoid tumors which could be found in numerous part of the body such as kidney, by combining these tumours of different origins, our goal is to discover markers that could help to predict treatment outcome and find new drug-able targets. Atypical Teratoid/Rhabdoid Tumor (ATRT) is a rare and clinically aggressive childhood brain tumour that tend to affect children aged 2 or younger with poor outcome. In recent years there are a number of publications on the identification of molecular subgroups in the ATRT with unique clinical and molecular features. However, there is a lack of integrative study that combine mutli-omics data-types and correlating these molecular subgroups and their histopathological/clinical features respective to non-CNS rhabdoid tumour (RT). The aim of this project is to use established statistical and analytical methods to evaluating clinic-pathological significance of non-CNS RT tumours with ATRT subtypes, and elucidate underlying disease-causing mechanisms. Comparing RT from different origins, both CNS and non-CNS, will allow us to develop a more refined understanding RT in general and identify novel prognostic/predictive markers and subgroup specific gene signatures. To accomplish this goal, we are requesting the access to the NCI TARGET Kidney Tumors, Rhabdoid Tumor dataset (phs000470). Finally, the data would be combined with publically available pediatric dataset to increase statistical power. Data will be stored securely at the institutional high performance computer cluster behind firewall and we do not anticipate any additional risks to participants. Huang, Benjamin UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Targeted Gene Expression Classifier Identifies Pediatric T-ALL Patients at High Risk for MRD Positivity May27, 2021 closed T-cell acute lymphoblastic leukemia is an aggressive form of childhood leukemia. Understanding who will be cured with chemotherapy (and who will not) is important to improving cure rates. We have developed a novel biological and predictive readout. Our specific goal use for this dbGaP dataset is to determine whether this novel readout correlates with the underlying biology of the associated leukemias. The heterogeneity of T-ALL has hindered biomarker identification and limited biology-based risk stratification. Historically, minimal residual disease (MRD) has been the strongest predictor of poor outcomes. However, stratification by MRD does not allow for risk-adapted therapy early in treatment, which may induce deeper remissions and decrease risk of relapse. We hypothesized that gene expression profiling at diagnosis may have prognostic value in identifying high risk patients. We have developed a gene expression classifier that differentiates a subset of non-ETP T-ALLs with an ETP-like gene expression pattern and a high risk of MRD+, and have adapted the classifier to a clinically tractable targeted platform. Identification of this high-risk subset at diagnosis has the potential to facilitate risk-adapted trials to evaluate the utility of novel or more intensive therapies aimed at improving clinical outcomes. Our specific goal use for this dbGaP dataset is to perform transcriptome based TCR analysis to determine whether TCR rearrangements and developmental stage correlates with this novel biomarker, MRD status, and outcome. Huang, Frank circularRNA–miRNA– mRNA competing endogenous RNA network identifies circular RNA signature as a prognostic biomarker for AML and ALL May14, 2021 closed circular RNAs are a type of RNAs playing a role in progression of AML and ALL. We would like to identify circular RNA that belong to AML and ALL, and identify circular RNA signatures capable of predicting overall survival of AML and ALL patients. Increasing evidence has underscored the role of circular RNA acting as competing endogenous RNAs (ceRNAs) in the development and progression of tumors. Nevertheless, circular RNA biomarkers in circular RNA-related ceRNA network that can predict the prognosis of AML and ALL are still lacking. The aim of our study was to identify potential circular RNA signatures capable of predicting overall survival (OS) of AML and ALL patients Huang, Jing NIH Identify targetable genetic, transcriptional, and immunological alterations in osteosarcoma May13, 2020 approved Osteosarcoma (OS) is a second leading cause of cancer-related death in pediatric patients, and it mainly affects children and young adults. The therapy for OS has stalled in the last three decades, and new therapies (targeted therapy and immunotherapies) are desperately needed. The stagnancy of OS treatment is partially due to the lack of druggable targets. In this proposed study, we plan to use the TARGET data to search for the changes between human OS and normal tissues. The result from this study will help us identify druggable targets for treating this devastating pediatric cancer. Osteosarcoma (OS) is the second leading cause of cancer-related death in pediatric patients, and it mainly affects children and young adults. The therapy for OS has stalled in the last three decades, and new therapies (targeted therapy and immunotherapies) are desperately needed. The stagnancy of OS treatment is partially due to the lack of druggable targets. Our hypothesis is that by comparing datasets from human OS to normal tissues, we will be able to identify druggable targets for treating human OS. We plan to use the TARGET data to identify druggable genetic, transcriptional, and immunological alterations in human osteosarcoma. Our plan to use these datasets is: Exome sequencing and whole-genome sequencing data (cancer versus normal) will be used to identify genetic alterations. RNAseq data will be used to identify transcriptional alterations and interrogate immunological features, such as infiltrated immune cell estimation. We will focus on transcriptional changes in immune cells. After we identify all the alterations in human OS, we will use our mouse OS data as filters to identify conserved alterations between human and mouse OS. The identification of conserved alterations between human and mouse OS is the primary endpoint for this study. Please note that although we will use mouse OS data as a filter, human OS data from the TARGET and our mouse OS data will be separately analyzed and won't be combined. Therefore, it won't create additional risks for participants. In addition, this study would not be possible without the pediatric data from the TARGET as the TARGET is the only comprehensive genomic data for identifying genomic and transcriptional changes within the human OS. Huang, Jing NIH Examine the human relevance of p53-regulated genes in mouse mesenchymal stem cells Apr18, 2014 closed Osteosarcoma is a deadly bone cancer that mainly affects adolescents and young people. However, the 5-year survival rate of osteosarcoma remains at around 65%. The tumor suppressor p53 plays critical role in osteosarcoma suppression. This study will use human osteosarcoma datasets to interrogate the relevance of data obtained from mouse mesenchymal stem cells with or without p53. The outcome of this study may potentially provide insights into osteosarcomagenesis and help design novel p53-based strategy to target osteosarcoma. Objective of the proposed research Examine the human relevance of p53-regulated genes in mouse mesenchymal stem cells Study Design The tumor suppressor p53 plays a vital role in osteosarcoma suppression. Li Fraumeni syndrome patients have a high osteosarcoma incidence. Loss of p53 is also associated with high frequency of osteosarcoma in mice. To study the role of p53 in MSCs, we have profiled the transcriptomes of p53 wild type and p53 knockout mouse mesenchymal stem cells (mMSCs) and derived gene sets upregulated in p53 wild type mMSCs and p53 knockout mMSCs. There is emerging evidence suggesting that osteosarcoma is linked to mesenchymal stem cells (MSCs). Thus, we are motivated to investigate whether the identified mouse genes are dys-regulated in human osteosarcoma cells using the multi-dimensional genomic datasets of osteosarcoma in TARGET. This study will advance our knowledge about the roles of p53 in osteosarcoma and the cell of origin of osteosarcoma. Analysis Plan We plan to analyze this osteosarcoma dataset independently. We will first compare the RNA-Seq data in normal and osteosarcoma tissue samples to identify differentially expressed genes and test whether the identified gene set significantly overlap with the gene sets in our mMSCs analysis. Similar analysis will be performed on patients with TP53 mutated based on the exome sequencing data. Huang, Jinyan THE FIRST AFFILIATED HOSPITAL, ZHEJIANG UNIVERSITY SCHOOL OF MEDICINE RNA alternative splicing in leukemia. Jul22, 2021 approved The analysis will be performed based on the standard bioinformatic pipelines such as samtools, STAR, GATK, DEXseq and transcriptR. We hope to find specific alternative splicing associated with different cancer which in turn may inform the development of novel therapeutics. All the data acquired and the result of its processing will be stored under the protection of the firewall and in computers protected by a password and not released to other investigators. The results of this project will be made freely available to the scientific community. We seek to perform differential allelic expression and RNA alternative splicing analysis using TARGET datasets. We are interested in study the relationship between RNA alternative splicing and the diagnosis of childhood cancers. In the previous study on acute lymphoblastic leukemia (ALL), it was found DUX4 fusions were strongly related to ERG alternative splicing. In this project, we are interested in how many genes can express one or more special transcripts, and how gene alternative splicing regulates key steps of cancer initiation and progression in TARGET datasets. We will use TARGET datasets for research projects that only is conducted using pediatric data. RNA alternative splicing analysis has the potential to identify prognostic markers for childhood cancers and used in diagnostic tests. HUANG, Jinyan SHANGH RUIJIN HOSPITAL RNA alternative splicing in pediatric cancer Dec19, 2016 expired We will use the cancer genome data available in dbGAP to investigate RNA alternative splicing in pediatric cancer. We seek to study RNA alternative splicing in pediatric cancer with the TARGET dataset. We are interested in how gene alternative splicing regulates key steps of cancer initiation and progression in children. By analysis of next generation sequencing data, we hope to define specific patterns of alternative splicing, such as previous reported ERG alternative splicing. Specifically, our results will be correlated to gene expression status of the particular cancer. Analysis will be performed using software such as R and MatrixeQTL. We hope to find specific patterns of altered splicing associated with distinct subsets of cancers which in turn may inform the development of novel therapeutics. All the data acquired and the result of its processing will be stored under the protection of the firewall and in computers protected by a password and not released to other investigators. The results of this projects will be made freely available to the scientific community. We do not anticipate any increased risk to the participants. We do not plan to combine requested datasets with other datasets outside of dbGaP. We will strictly use these datasets for childhood cancer study and will not analyze it with adult data. Huff, Chad UNIVERSITY OF TX MD ANDERSON CAN CTR Exploring Shared Susceptibility between Birth Defects and Pediatric Cancer May11, 2023 approved Congenital anomalies are defined as disorders that affect a child during pregnancy and can be identified before or at birth, or sometimes later in life. In fact, one of the strongest risk factors for cancer in children and adolescents is being born with a congenital anomaly—this is true both for chromosomal abnormalities (like Down syndrome) and non-chromosomal birth defects (like congenital heart disease), as recently validated in our large multi-state epidemiologic study of over 10 million children. Specifically, by obtaining and linking information from various registries and hospitals on children in four states, we started the Genetic Overlap Between Anomalies and Cancer in Kids (GOBACK) Study. In the GOBACK Study, we identified several new congenital anomaly-cancer patterns. After conducting genetic testing in a subset of these families, we identified genes that possibly underlie previously unreported genetic syndromes that increase cancer risk. The objective of this application is to extend the GOBACK Study and identify new genetic syndromes that increase cancer risk in children using both epidemiologic and genetic approaches. To accomplish this, we propose to discover and validate genetic variants underlying the overlap of congenital anomalies and pediatric cancers. Globally, more than 250,000 children are diagnosed with cancer every year. In the United States, cancer remains the leading cause of death by disease in those <20 years of age, and approximately 80% of survivors have at least one chronic health condition by 45 years of age. One of the strongest risk factors for cancer in children and adolescents is being born with a congenital anomaly—this is true both for chromosomal abnormalities (e.g., Down syndrome) and non-chromosomal birth defects (e.g., non-syndromic congenital heart defects), as recently validated in our registry linkage study of over 10 million live births. It has been estimated that 10% of childhood cancer cases could be attributable to the risk associated with having a congenital anomaly. As an estimated 8 million children worldwide are born with a congenital anomaly per year (6% of all births), the public health implications of identifying why these children develop cancer are substantial. In this application, we will address the gaps that limit translational impact by characterizing the molecular etiologies underlying these observed associations, which has the potential to identify new and important cancer predisposition syndromes and the relevant genes among children with congenital anomalies. Previous examples of this include: 1) WT1 in children with aniridia, genitourinary anomalies, and Wilms tumor (WAGR syndrome); and 2) RECQL4 in children with limb anomalies, poikiloderma, and osteosarcoma (Rothmund-Thompson syndrome, i.e., RTS). There is also emerging evidence that archaic DNA may play a role in the overlap between several conditions – including birth defects and childhood cancer. Our overall research objective is to identify novel cancer predisposition variants by leveraging observed associations with birth defects. The specific aims of this study are: 1) identify genetic pleiotropy between birth defects and childhood cancer; and 2) characterize the role of archaic DNA on the risk of birth defects and childhood cancer. Our study designs will include the assessment of parent-offspring trios available in dbGaP, as well as case-control analyses. The analytic plan will build on our established bioinformatic pipelines to evaluate the role of de novo variants and rare variants consistent with autosomal recessive or X-linked disorders. We will leverage previously generated sequencing data from the National Institutes of Health (NIH)-supported Gabriella Miller Kids First (GMKF) Pediatric Research Program on congenital anomaly and pediatric cancer cohorts. We may use controls from previously sequenced populations in the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine for case-control analyses. However, the HGSC has many years of experience in joint analysis of next-generation sequencing (NGS) data from different sources. Dr. Sabo (collaborator) was the HGSC lead on one of the first large scale sequencing studies to successfully combine NGS data from two different capture methods, as well as different sequencing platforms. More recently the HGSC has led efforts to create NGS data processing standards that allow different groups to produce functionally equivalent results and alleviate technical bottlenecks for genome aggregation. Additionally, to address issues related to cross-platform sequencing studies, Dr. Huff (collaborator) has recently developed the Cross-Platform Association Toolkit (XPAT). XPAT includes tools to support cross-platform aware variant calling, quality control (QC) filtering, gene-based association testing, and rare variant effect size estimation. Briefly, for sample level QC, XPAT infers sex information based on the ratio of the homozygote and heterozygote counts on chromosome X for each individual and reports possible misidentification of sex. XPAT uses NC90 to describe individual missing-genotype call levels. NC90 is defined as the number of sites that are not called in a given sample among all sites called in over 90% of all samples from the same platform. NC90 provides a platform-specific sample missing-genotype rate that restricts to well-behaved markers. The results from this study will be shared with the research community through peer-reviewed publications and scientific presentations at local, national, and international scientific meetings. Huff, Vicki UNIVERSITY OF TEXAS MD ANDERSON CAN CTR Wilms Tumor Dec01, 2011 closed Approximately 25% of children with Wilms tumor do not respond satisfactorily to current treatments. Using tumor samples previously collected as part of a Wilms Tumor clinical trial, the proposed study will utilize a comprehensive set of state-of-the-art analyses to define molecular differences that delineate Wilms tumors that do not respond to treatment and/or that relapse. In addition to generating data sets that can be used by many cancer researchers, this work may enable patients with high risk Wilms tumor to be identified at diagnosis and treated more effectively. Favorable Histology Wilms tumor (FHWT) represents the most common pediatric renal tumor. Although the overall outcome for FHWT is excellent, two groups of patients have suboptimal outcomes. First, approximately 15% of FHWT relapse despite therapy (RFHWT), and only 50% of RFHWT survive. Second, 8-10% of Wilms tumors develop histologic features known as anaplasia (which confers unfavorable histology, UHWT), which is associated with only 60% survival. Little improvement in the outcome of patients with RFHWT and UHWT has been seen in recent years, and very little is known about them biologically. A comprehensive and comparative analysis of these two groups would allow for the discovery of pathways that may be specifically targeted for therapy, and for the discovery of markers that would predict poor outcome and allow for therapeutic stratification. Such studies are also an efficient mechanism for surveying changes within Wilms tumors as a group. We therefore propose to interrogate the genomic, transcriptomic, epigenetic, and mutational characteristics high risk Wilms tumors treated on NWTSG/COG protocols. Aim One: To identify genetic mutations involved in the pathogenesis of Wilms tumor, and in the development of relapse and anaplasia: Using next-generation sequencing tools we will identify mutations involved in both the underlying pathogenesis of Wilms tumor as well as in the development of high risk features resulting in relapse and anaplasia. Aim Two: To assess genomic gains and losses in RFHWT and UHWT: A high density genetic platform (Affymetrix 6.0 SNP) will be utilized to survey for recurrent copy number variations in 50 UHWT and 50 RFHWT. This data will be compared with the expression data to identify genes and regions involved in relapse or anaplasia, and will be used to evaluate mutations identified in Aim 1. Aim Three: To define transcription and methylation patterns within RFHWT and UHWT: We will perform global gene expression analysis using Affymetrix U133+2 arrays and global DNA methylation analysis using Illumina platform. These will be compared with genomic changes identified in Aim 1 and 2. These requested dataset will not be combined with datasets outside of dbGap. Huo, Bing-Xing BROAD INSTITUTE, INC. Cancer Research Data Commons - Firecloud Jan30, 2024 approved We will continue to operate the FireCloud platform, allowing users to access TARGET, TCGA & other NCI datasets, as well as analytical pipelines. FireCloud is a Broad Institute project funded by NCI to empower cancer researchers to access data, run analysis tools and collaborate securely in the cloud. It is powered by Terra, a secure, scalable cloud-native platform developed by the Broad Institute, Microsoft, and Verily, an Alphabet company. FireCloud provides access to many large NCI-funded datasets including TARGET and TCGA, complemented by a rich collection of other datasets and collaborative projects that are part of the Terra ecosystem, including the Human Cell Atlas, the All of Us Research Program, the AnVIL, and many others. Through FireCloud, researchers can leverage the powerful data analysis functionalities offered by Terra, including its scalable workflow execution system that can handle vast amounts of data processing through automated pipelines, as well as its highly flexible system of customizable cloud environments for interactive analysis and data visualization through preinstalled applications such as Jupyter Notebooks, RStudio, Galaxy, and IGV, and analysis frameworks such as Bioconductor and Hail. The platform offers access to a wide array of publicly available data analysis workflows and notebooks, many of which are available in fully loaded workspaces that include appropriate example data to demonstrate usage and ensure computational reproducibility. Researchers can also import their own tools, as well as community-sourced tools, through connected repositories operated by other collaborating organizations, such as Dockstore. Hwang, Tae Hyun CLEVELAND CLINIC LERNER COM-CWRU Developing novel biomarkers of FLT mutant patients to determine the optimal treatment strategies Aug19, 2021 expired Pediatric Acute Myeloid Leukemia (AML) with the internal tandem duplication mutation of FLT3 (FLT3mut) is a specific type of cancer that has a relatively poor outcome. Therefore, studies to develop new therapeutic strategies and identify improved biomarkers to predict outcomes are important. Our preliminary studies suggest studying changes in RNA in these patients may help to reveal both new therapeutic targets as well as biomarkers for this disease. Through performing detailed analysis of 37 pediatric AML patients we have identified candidate RNA molecules. Through the proposed work, we are seeking access to additional RNA datasets that can be used to validate and extend on our observations. It is hoped that this work will lead to new biomarkers and therapeutic targets that can improve the management of pediatric AML patients. Pediatric Acute Myeloid Leukemia (AML) with the internal tandem duplication mutation of FLT3 (FLT3mut) has a 31% four-year progression free survival. This mutation is one of the most common and recurrent mutations in pediatric AML and it is associated with poor outcomes. Despite the poor overall outcomes, there is still a wide range of outcomes in pediatric patients with FLT3 mutations making it challenging to determine the optimal treatment strategies. Therefore, the development of improved biomarkers is important. In addition to improved biomarkers, the identification of new therapeutic targets is also significant to aid in the development of new therapeutics. Due to the limited number of DNA mutations observed in AML, we hypothesize that analysis of RNA based changes during AML progression contributes to disease progression independent of DNA clonal evolution. To begin to address this hypothesis, we used single cell RNA sequencing on paired diagnosis/relapse AML patient samples, to demonstrate that RNA cluster groups can evolve during AML progression. In our pilot study using 5 genetically heterogenous adult and pediatric AML patients, we found compelling evidence for the existence of RNA cluster groups that expanded or contracted during progression from diagnosis to relapse in leukemic stem cells (LSCs). While this pilot study strongly supported the investigation of RNA based changes during disease progression, the small sample size and genetically diverse patient set was not ideal to identify key driver pathways for therapeutic targeting. To further investigate RNA based changes during disease progression and also to identify prognostic biomarkers, we obtained 37 pediatric FLT3-ITD patient samples comprised of 37 diagnostic samples from favorable outcome patients and 18 paired diagnostic/relapse samples from poor outcome patients from the Children’s Oncology Group. We performed single cell RNA sequencing on both bulk AML and LSCs from all of these samples. Initial analysis of this dataset has identified potential biomarkers and novel therapeutic targets for FLT3-ITD AML. We are requesting access to existing RNA expression data sets from TARGET AML for additional pediatric AML samples that we have not analyzed to serve as validation data for our study. We will utilize CIBERSORTx to deconvolute existing RNA sequencing data to enable this analysis on the bulk RNA sequencing datasets using information gathered from the single cell sequencing. Iavarone, Antonio UNIVERSITY OF MIAMI SCHOOL OF MEDICINE Integrative analysis of genetic alterations in pediatric tumors Jun02, 2023 approved A mesenchymal phenotype is the hallmark of tumor aggressiveness in adult high-grade glioma (HGG). Global gene expression analysis of a limited set of pediatric HGG samples has confirmed that the mesenchymal signature exists also in these poorly genetically characterized tumors. To accurately define the aberrations responsible to drive tumor initiation and progression in a larger cohort of pediatric patients with HGG, we will perform an integrative analysis of DNA copy number, gene expression and somatic mutations. By accessing the data available at the public and controlled TARGET portal we will elucidate the specificity and/or the sharing of genetic aberrations and transcriptomic profiles in different pediatric tumors (i.e. neuroblastoma). Our recent work identified and experimentally validated the function of the master regulators for the most aggressive invasive phenotype of adult high-grade glioma (HGG, the mesenchymal phenotype). The direct follow-up of this work has already allowed us to identify and test specific inhibitors of the two master regulators (Stat3 and C/EBP transcription factors) that might emerge as exceedingly useful tools for new, targeted therapeutic approaches for HGGs. Our recent findings from a limited set of pediatric HGG samples from which we obtained global gene expression profiles has confirmed that the mesenchymal signature exists also in pediatric HGG. However, the accurate reconstruction of cellular networks through state-of-the-art systems biology approaches requires a representative number of samples (> 200) to guarantee a wide dynamic range of the networks. We are working with a Pediatric Tumor cooperative project to collect and analyze the required number of pediatric HGG samples necessary to construct the master regulatory networks, following the successful strategy already used for adult HGGs. The final experimental follow up of the discovery of new genetic alterations in pediatric brain tumors will be focused on the validation of the functional significance of these alterations in proper in vitro and in vivo models. More importantly, following the scheme that we are presently implementing for the master regulators identified from adult HGG, we anticipate that the successful completion of these projects will identify novel and unique pediatric glioma-specific pathways from which we will be able to extract the best crucial targets for effective therapeutic intervention. This final aim will be extremely empowered by elucidating the specificity and/or the sharing of genetic aberrations and transcriptomic profiles in different childhood malignancies. For this purpose, we planned to perform a comparative analysis between the results obtain from the pediatric HGG with other very frequent pediatric tumors (i.e. neuroblastoma) and the TARGET data portal is an invaluable tool to accomplish our aim. This current project is a continuation from Project #2527. Iavarone, Antonio COLUMBIA UNIVERSITY HEALTH SCIENCES Integrative analysis of genetic alterations in pediatric tumors Apr22, 2011 closed A mesenchymal phenotype is the hallmark of tumor aggressiveness in adult high-grade glioma (HGG). Global gene expression analysis of a limited set of pediatric HGG samples has confirmed that the mesenchymal signature exists also in these poorly genetically characterized tumors. To accurately define the aberrations responsible to drive tumor initiation and progression in a larger cohort of pediatric patients with HGG, we will perform an integrative analysis of DNA copy number, gene expression and somatic mutations. By accessing the data available at the public and controlled TARGET portal we will elucidate the specificity and/or the sharing of genetic aberrations and transcriptomic profiles in different pediatric tumors (i.e. neuroblastoma). Our recent work identified and experimentally validated the function of the master regulators for the most aggressive invasive phenotype of adult high-grade glioma (HGG, the mesenchymal phenotype). The direct follow-up of this work has already allowed us to identify and test specific inhibitors of the two master regulators (Stat3 and C/EBP transcription factors) that might emerge as exceedingly useful tools for new, targeted therapeutic approaches for HGGs. Our recent findings from a limited set of pediatric HGG samples from which we obtained global gene expression profiles has confirmed that the mesenchymal signature exists also in pediatric HGG. However, the accurate reconstruction of cellular networks through state-of-the-art systems biology approaches requires a representative number of samples (> 200) to guarantee a wide dynamic range of the networks. We are working with a Pediatric Tumor cooperative project to collect and analyze the required number of pediatric HGG samples necessary to construct the master regulatory networks, following the successful strategy already used for adult HGGs. The final experimental follow up of the discovery of new genetic alterations in pediatric brain tumors will be focused on the validation of the functional significance of these alterations in proper in vitro and in vivo models. More importantly, following the scheme that we are presently implementing for the master regulators identified from adult HGG, we anticipate that the successful completion of these projects will identify novel and unique pediatric glioma-specific pathways from which we will be able to extract the best crucial targets for effective therapeutic intervention. This final aim will be extremely empowered by elucidating the specificity and/or the sharing of genetic aberrations and transcriptomic profiles in different childhood malignancies. For this purpose, we planned to perform a comparative analysis between the results obtain from the pediatric HGG with other very frequent pediatric tumors (i.e. neuroblastoma) and the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) data portal is an invaluable tool to accomplish our aim. Imamura, Yuka PENNSYLVANIA STATE UNIV HERSHEY MED CTR Multi-omic gene network analysis of osteosarcoma Oct31, 2019 closed Gene expression changes can infer its encoding protein's functional alterations in diseased tissues. We will survey not only the abundance of the expressed gene but also its qualitative changes including alternative splicing and gene fusions in pediatric cancer compared to its paired control (normal tissue/cells from the same individual), which also can contribute to the significant changes in its protein functions. 1-Objectives; Pediatric osteosarcoma (OS) is a complicated beast. Using TARGET multi-omics (genomics, transcriptomics, and epigenomics) dataset, we aim to stratify the OS subtypes by elucidating co-expressed gene networks by integrating multi-omic data modalities including gene mutations (amplifications, fusions, indels, and missense mutations) and epigenetic changes (DNA methylation and microRNA changes). 2-Study resign; Weighted Gene Co-Expression Network Analysis (“WGCNA”) is a robust gene expression network approach that gene expression is organized in distinct co-expression (synergistically regulated) networks. The co-expressed gene networks (modules) reflect shared function and/or regulation and further elucidate ‘hub’ gene(s) that is(are) the master regulator(s) of the distinct modules that may play a key role in disease etiology or stratification of disease subtypes. Recent study demonstrated a successful gene network reconstruction via the integration of multi-omic dataset, which will be applied to the TARGET OS data in Aim 1 of our study. Alternatively, a deep-learning based disease stratification will be attempted by constructing new features (disease classifiers) from a classic artificial neural network (“Autoencoder”). Opposed to conventional supervised machine learning classifiers such as support vector machine and logistic regression analysis, our deep-learning approach will construct and select the new features. The Autoencoder-based classification has been shown to outperform other supervised learning approaches in survival stratification of high-risk neuroblastoma. Having these cutting-edge yet scientifically rigorous bioinformatics approaches in place, we will propose below 2 Aims. 3-Analysis plan; Aim1: We hypothesize that highly correlated and multi-omic inferred gene modules reflect functionally important genes for OS. Briefly, the multi-omic data (DNA mutation, RNAseq, DNA methylation, and microRNA data) will be retrieved from the TARGET cohort including the clinical manifest data and preprocessed to create normalized matrices. Signed co-expression networks will be built using the WGCNA package in R. Each network nodes will be filtered for to exclude genes with spurious correlations. The mutation and epigenetic data will be integrated in the network to refine the modules and build the networks for stratifying disease subtypes. Aim2: Deep learning-based multi-omic data integration. The WGCNA may not be a perfect algorithm to stratify disease subtypes as it solely depends on linear correlation (e.g. Pearson-correlation) of each gene to gene relationship. Using Autoencoder, a deep learning-based multi-omics data integration, we hypothesize that we can robustly predict not only survival in high-risk neuroblastoma but also stratify other clinical phenotypes of OS, such as resistance to chemotherapy and metastasis. Autoencoder is a dimensionality reduction method based on artificial neural network, which consists of input, hidden, and output layers. The data integration analysis by Autoencoder is implemented in R package ANN2. To better capture properties that reflect the variety of patients' prognosis, a classic Autoencoder with 3 hidden layers will be applied (500, 100, and 500 nodes, respectively), of which the 100-node bottleneck layer will be used to represent new features for further analysis. We will select features associated with survival, chemoresistance, and metastasis status from these new features. Then Autoencoder will be trained using the gradient descent algorithm with 10 epochs, a batch size of 32, and a learning rate of 1e-6 to evaluate its predictive values. Imielinski, Marcin WEILL MEDICAL COLL OF CORNELL UNIV Complex structural variant patterns in pediatric cancer whole genomes Dec19, 2019 closed The genomes of pediatric cancers can be very complex, which is astounding given the relatively short amount of time that these tumors have to evolve. It's unclear how much of this complexity can be attributed to inherited defects in DNA repair, exposure to mutagens, or other factors (e.g. viral infection, retrotransposition). Our group has developed novel genome analytic tools, which we have applied across thousands of adult cancer samples to discover novel classes of complex structural variants. We will use these tools to characterize novel mutational processes that shape the genomes of pediatric cancers, which can yield novel biomarkers and lend insight into the fundamental biology of these tumors. What are the mutational processes responsible for the complexity underlying the genomes of certain pediatric cancers? We will apply state of the art analytic "genome graph" approaches, which we have already validated to large cohorts of adult cancer whole genomes (Hadi et al 2019 BioRxiv). These approaches allow us systemtically characterize complex structural variants, particularly those which simultaneously introduce both rearrangements and copy number alterations into somatic genomic sequences. We will be combining these data in combination with data sets from the International Cancer Genome Consortium, Hartwig Medical Foundadtion, and our internal collection efforts at Weill Cornell Medicine to assess the differences between adult and pediatric cancers and use these findings to improve the whole genome sequencing based interpretation of tumor samples. These analyses do not pose any additional risks to participants - in fact they substantially increase the probability that the samples provided to this study will lead to concrete improvements in the treatment and diagnosis of pediatric cancer. Specifically, we plan to analyze whole genome sequences (WGS) from pediatric cancers to identify patterns of complex rearrangements, and associate these patterns with tissue of origin, somatic variant patterns at known cancer genes, and inherited DNA repair defects. We are interested specifically in the role of somatic viral integration, retrotransposition, and recombinases in driving the landscape of structural variation in pediatric cancer. We will approach this through the local assembly and realignment of sequences associated with "loose ends" around structural variants. To characterize complex structural variants (SVs), we will apply both in-house (JaBbA, gGnome, GxG, SvAbA) and externally developed (GRIDSS, Lumpy, DELLY). We have recently published results applying these tools to large pan-cancer cohorts (Hadi et al 2019 bioRxiv) to nominate novel classes of complex rearrangements (tyfonas, pyrgo, and rigma) as well systematically characterize previously identified complex SV classes (chromothripsis, chromoplexy, breakage fusion bridge cycles, double minutes, templated-insertion chains) as well as simple SVs (deletion, duplication, inversion, inverted duplication). We are interested in the distribution of these variants in pediatric tumors, particularly rhabdoid tumors. In addition to calling these structural variant patterns, we will analyze these WGS cases for additional variants, including inherited lesions in known DNA repair genes (BRCA1/2, WRN, TP53, RB1, et al) and acquired "driver" alterations. We will also use genome graph approach (gGnome) to nominate novel complex rearrangement patterns through the analysis of graph motifs. We will also analyze these genomes for passenger mutational signatures, including the analysis of SNV trinucleotide patterns and the burden of specific indel classes. To get additional granularity into the interplay of SV and SNV signatures, we will assess the local sequence context of variants. We will also correlate SV patterns with additional phenotypic characteristics, including histopathology, stage, and survival. When available, we will correlate SV patterns with RNA-seq gene expression and methylation patterns (e.g. bisulfite microarrays and whole genome sequencing). We will use these analyses to determine whether pediatric tumor types harbor distinct structural variant patterns relative to adult cancers, which we have analyzed in our lab using harmonized pipelines to facilitate cross-study comparison. The findings from this study can potentially impact the basic understanding of the etiogenesis of certain pediatric cancers, including identifying novel factors that trigger key somatic genomic changes that drive these diseases. These findings can also potentially result in novel prognostic and diagnostic biomarkers that we have the capability to potentially translate, due to the active clinical genomic practice at our cancer center (Weill Cornell Meyer Cancer Center, Englander Institute for Precision Medicine). Imoto, Seiya UNIVERSITY OF TOKYO Pan-cancer annotation of driver alterations: understanding their distribution and their clinical significance in adults and pediatrics Jun17, 2019 approved The functional impacts of genetic alterations in different cancers are not fully understood, because genetic alterations may function differently in different cancers. The distribution and relationship of each genetic alteration in different cancers are not understood as well. Therefore, we intend to analyze the functional impact of each driver alterations in pediatric and adult cancers using several independent approaches. We will also analyze the distribution and relationship of these genetic alterations in each sample. We believe that this study will lead to the detection of the therapeutic target in both adult and pediatric cancers. Recent advances in high-throughput sequencing technologies have enabled us to systematically identify somatic mutations, copy number alterations, and fusion genes in various malignancies. However, the functional impact of these genetic alterations is not fully understood, since most of these alterations are expected to be “passenger” alterations. It is of urgent need to accurately detect “driver” genomic alterations. Also, it is of wide interest to comprehensively understand the clinical significance of each driver alterations. However, it is sometimes difficult, because driver alterations may function differently in different cancers. For example, same driver alteration may have different functional effect on pediatric cancers and adult cancers. Therefore, we intend to analyze the functional impact of each driver alterations in pediatric and adult cancers using several independent approaches (including our newly created pipeline). We also intend to analyze the distribution of driver alterations in each sample. Some of these genetic alterations may coexist in a same sample, and some may be mutually exclusive. We therefore analyze the distribution and relationship of driver abnormalities in different cancers (including leukemia and B cell lymphoma). By performing this study, we will be able to understand the distribution and clinical significance of each driver alterations in different cancers. This study will lead to the accurate detection of the therapeutic target in both adult and pediatric cancers. To perform this research, we would like to use the data deposited by TCGA, TARGET, Foundation One, GTEx and several other groups. In order to achieve sufficient statistical power, the combined analyses with other datasets outside of dbGaP are planned in the current project. We plan to analyze these datasets independently and together. The data integration with the other datasets does not create any additional risk to participants. No request will be made for the identification of participants. We publish or otherwise broadly share any findings from our study with the scientific community. Imoto, Seiya UNIVERSITY OF TOKYO Genetic analysis of pediatric acute lymphoblastic leukemia Jul17, 2017 closed Acute lymphoblastic leukemia (ALL) is the most common cancer in childhood. The survival rate of pediatric ALL has greatly increased over time, but relapsed cases are chemo-resistant and long-term survival of these cases is still poor, moreover it is difficult to predict the risk of relapse accurately. In this study, we will identify the molecular mechanisms of relapse or treatment failure in ALL. Acute lymphoblastic leukemia (ALL) is the most common cancer in childhood. The survival rate of pediatric ALL has greatly increased over time, but relapsed cases are chemo-resistant and long-term survival of these cases is still poor, and it is difficult to predict the risk of relapse accurately. In this study, we will perform whole-exome sequencing (WES), targeted sequencing, and RNA sequencing to identify the molecular mechanisms of relapse or induction failure in ALL. We would also like to analyze the genetic difference between cured and relapsed/refractory cases combining other data sets of ALL cohorts deposited in dbGaP including WES, RNA-seq, and methylation analysis. The combined analysis of these data sets does not create any additional risk to participants. No request will be made for the identification of participants. We intend to broadly share any findings from public data sets with the scientific community. Inoue, Ituro NATIONAL INSTITUTE OF GENETICS The analyses of Transposable Elements in neuroblastoma. Sep26, 2016 approved Neuroblastoma(NB) is a malignant tumor which affects early infant or toddlers. It commonly arises from around the adrenal gland. The treatment of NB is not enough now, and many children died with this cancer. The genetic cause of this cancer was studied enthusiastically but could not find common or major cause of this neoplasm. Transposable Elements(TEs), or Transposons, are curious “parasites” of human genome. It occupy about a half of human genome. TEs are thought to be a remnant of the ancient virus and have many characters of retrovirus. TEs can create their copies and are inserted, or transposed, into human genome. So, TEs increased in number along with the evolution of human. Sometimes TEs cause genetic disease by inserting into gene. Detecting TEs in individual human genome is very difficult because of its similarity and the huge number of copies, i.e. over 1 million, in a human genome. Many computational research tools to detect TEs in Next Generation Sequencing data are developed. But the importance of TEs in tumorigenesis is unknown. We want to reveal the influence of TEs in NB. The molecular causalities of neuroblastoma(NB) are not well understood. Whole exome studies on NB were conducted but could not find common or major cause of this neoplasm. Transposable Elements(TEs) are the mobile DNA sequence occupying about a half of human genome. TEs exist in almost all organism including unicellular life to higher mammal primate. The function of these elements are mostly unknown. TEs can be harmful when it transposed into specific gene. It is also known that TEs reshape the genomic landscape of some neoplasms. The influence of this landscape changes on tumorigenesis was not understood. Transposition of TEs are thought to be a result of genomic instability, a common feature of tumor. Solid tumors tend to have multiple new TE insertions, but myelogenous neoplasms have few. Some unknown mechanism to cause TE transposition would exist. There are many reports of TE transposition causing diseases such as neurofibromatosis(NF1), breast cancer(BRCA2), colon cancer(APC) etc.. But the reports of malignancy caused by insertion of TEs are not many. This could be due to the limitation of technology to detect TE insertions(TEIs). Almost TEIs reported until now are detected by candidate gene approach. So TEIs can be detected only when the causal gene was examined. For the disease of unknown causal gene, this approach is not suitable. Genome-wide TEIs detection using Next Generation Sequencing(NGS) was published but this approach was not applied to each type of malignancies. We published a paper detecting TEIs systematically in NGS data of epithelial ovarian cancers. These TEIs were not detected in conventional NGS analyses pipeline. So it is important to re-evaluate the NGS data by the algorithm specialized to detect TEIs. We intend to apply our pipeline and published method to detect TEIs in neuroblastoma and elucidate the molecular causality of the tumor. In addition we found that expressed TEs are differ among sub-types of NBs and start to find the feature of these TEs.In addition we are planning to compare the expression pattern of TEs in stage4 and stage 4S tumors. Itzykson, Raphaël NAT'L INST OF HLTH/MED RES INSERM-PARIS Clonal Architecture in AML Nov24, 2020 expired In one third of patients with core-binding factor (CBF) acute leukemias, several clones of leukemic cells with mutations in genes involved in signaling pathways coexist. The presence of multiple signaling clones within a patient is associated with higher rates of relapse after treatment by chemotherapy. In other AML subtypes, the massive expansion of a single-clone at diagnosis is associated with a poorer prognosis. We want to understand why these specific clonal architectures in AML correlate with chemoresistance and identify specific vulnerabilities to improve the treatment of those patients. To address these questions, we want to analyze the genomic, transcriptomic and functional data of the TARGET and BEAT AML cohorts and combine them with our own data to identify specific vulnerabilities in those severe acute leukemias to improve the treatment of these patients. We recently reported in Acute Myeloid Leukemia (AML) that specific clonal architectures, namely clonal interference (Itzykson Blood 2018) and clonal dominance (Cerrano et al. Leukemia 2020) have a strong independent impact on the prognosis of patients treated by intensive chemotherapy. By combining genomic, transcriptomic and functional data when available, we aim to uncover the mechanisms of chemoresistance in those specific clonal architectures and identify targetable vulnerabilities in those patients. We request access to the pediatric TARGET AML cohort because clonal interference is particularly frequent in core-binding factor (CBF) AMLs, a subtype of AML affecting children. Indeed, the TARGET AML cohort include 255 patients with CBF AMLs, with an extensive genomic annotation. We want to identify the genomic characteristics of AML patients with clonal interference, to understand why these patients have higher rates of relapse after chemotherapy treatment, and identify biological vulnerabilities to design targeted-treatment to cure these high-risk patients. We also request access to the BEAT AML cohorts because it is one of the largest adult AML cohort available with genomic, transcriptomic and functional annotations. Again, we want to address the characteristics of patients with clonal interference and clonal dominance, to identify druggable vulnerabilities to treat these high-risk patients. To address this, we will 1) analyze genomes and exomes of AML patients to identify either clonal interference or clonal dominance AML-patients 2) analyze RNA-seq data to identify differentially expressed genes and enriched gene signatures in patients with those particular clonal architectures 3) correlates the previous results with outcome (overall survival and event-free survival) and functional data to identify specific targetable vulnerabilities to improve the treatment of these patients. Data will be analyzed securely using our in-house pipelines and results will be compared to our own data. Adult cancer data will not be combined with the Pediatric data. Jamieson, Catriona UNIVERSITY OF CALIFORNIA SAN DIEGO Reversion to an Embryonic Alternative Splicing Program Enhances Leukemia Stem Cell Self-renewal Dec03, 2015 expired We will study acute lymphoblastic leukemia (ALL), and the renewal of stem cells in ALL by identifying changes in gene expression for genes involved in self renewal, as reflected in RNA sequencing datasets from control samples to ALL samples. Because 75% of ALL cases occur in children, we will use two pediatric ALL datasets, the TARGET ALL Study and the Hyperdiploid Acute Lymphoblastic Leukemia RNA-Seq datasets. We will then visualize the changes in gene expression to better understand ALL and choose genes to target with therapies. We will proceed with the original project and will not combine data sets. We have been involved in research to understand stem cell self-renewal in the context of adult and pediatric acute leukemias. This includes measurement of gene expression in self renewal pathways in order to arrive at a mechanistic understanding of leukemia stem cell self renewal, including RNA and DNA editing rates triggered by ADAR1 and APOBEC, and identify potential therapeutic targets. There is an unmet need to study the molecular evolution of acute leukemia in pediatric samples, in order to formulate targeted therapeutic strategies and pediatric stem cell diagnostics. To this end, we plan to study RNA sequencing (RNA-seq) data from pediatric samples in adult and pediatric leukemia datasets including the TARGET ALL Study and the Hyperdiploid Acute Lymphoblastic Leukemia RNA-Seq datasets. These datasets will be examined for the expression of genes involved in stem cell self-renewal and related processes. The manner of RNA seq library preparation between these libraries will have to be accounted for and normalized against, to reduce the risk of bias from combining these datasets; no exchange of data between primary data source hosts will occur. Once gene expression is quantified, differential expression analyses will be performed against publicly available and in-house control datasets, and the results will be visualized with GENE-E, Cytoscape and similar tools to facilitate mechanistic understanding and therapeutic target selection. We have identified global RNA splicing alterations as well as an isoform of CD44 that appears to confer LSC transplantation capacity and therapeutic resistance. Janga, Sarath Chandra INDIANA UNIV-PURDUE UNIV AT INDIANAPOLIS Role of lncRNAs in pediatric cancer genomes Mar05, 2017 closed The long noncoding RNAs, also called lncRNAs play an important role in normal physiological and disease conditions. These developmentally important class of RNAs were earlier considered to be artifacts or redundant RNAs with minimal or no transcriptional/translational role. With the advent of RNA-sequencing techniques, large number of lncRNA sequence data is now made available to researchers. More and more lncRNAs are being unraveled for their vital role in gene regulation. However, role of large pool of lncRNAs still remains unexplored. In our study, we would like to profile lncRNA to understand their role in tumorogeneis and progression in pediatric cancer genomes. We will profile the expression of lncRNAs to understand their behavior in different cancers and to help identify prognostic markers. Hence, this study would help us uncover the diverse functions of lncRNA in various cancers. The role of various non-coding RNAs involved in post-transcriptional modifications is becoming evident in various disease mechanisms especially in the developmental context. Although the dysregulated expression of lncRNAs has been seen in various pediatric cancers, their oncogenic function in relation to the changes in their expression still remains unclear. In our research, we want to investigate lncRNAs for their role in tumorogenesis and tumor progression in pediatric cancers and to identify potential biomarkers. The TARGET/dbGAP datasets which currently include sequencing datasets from ALL, AML, NBL, OS and WT cancer types, could be used to understand a) the differential expression patterns of lncRNAs in the TARGET cohort of cancer samples sequenced so far b) effect of lncRNA dysregulation on their target genes and their implications in tumor development and progression c) interactions of lncRNAs with other ncRNAs which can help us gain a comprehensive knowledge on gene regulatory mechanisms. In order to facilitate the study, we would like to use raw RNA-Seq data of pediatric cancers. We will analyze the splicing patterns, copy number variations of lncRNA across various cancers to uncover potential biomarkers. Also, we will profile lncRNAs for their expression and correlate their expression with their targets’ expression to understand the function of lncRNA in developmental gene regulation context. Further, the gene regulatory mechanisms mediated by lncRNAs could be compared in cancer and normal tissues to understand their oncogenic role. As mentioned, our study is not limited to a cancer of specific pediatric tissue/organ. We wish to study the role of lncRNAs globally across all these cancer types and to perform a comparative analysis. Hence, the RNA-Seq data of all cancer types from TARGET/dbGAP could be an excellent resource to shed light on the unexplored areas of lncRNA mediated gene regulation in pediatric developmental cancers. Jensen, Mark LEIDOS BIOMEDICAL RESEARCH, INC. Deep learning to identify genomic determinants of pediatric cancer Apr23, 2019 closed One of the most important things we are learning about cancer is that it is not one disease, but many. A lot of the differences between different kinds of adult cancer are due to different genetic errors that appear in cancer tumors from organ to organ, and from person to person. Cancer in children is itself very different from cancer in adults, but less is known about what those genetic differences are and how those errors cause cancer. In this work, we want to use data from several thousand children with cancer to locate those errors using new computer algorithms and powerful computers. We want to share what we learn with scientists who can then research how these errors cause cancer, and whether those errors can be repaired or avoided with drugs. Pediatric cancer genomes differ from adult cancer genomes in significant qualitative and quantitative ways. For example, while adult tumors of many cancer types are known to possess so-called "driver mutations" within specific genes, pediatric tumors tend to have very few somatic mutations of any kind. Other genomic features are present in some pediatric tumors, such as gross structural changes or copy number alterations, that have been shown or suggested to have driver-like properties. However, much less is known about the general genomic features that drive pediatric cancer. The TARGET genomic dataset is ideal in many ways for use in a broad search for general features of pediatric cancer genomes, as well as for as-yet undiscovered alterations that may be further explored in vitro for their potential as cancer drivers or therapeutic targets. Most studies of TARGET data to date have involved traditional bioinformatic algorithms and heuristic approaches to identify candidate driver features. We are proposing to combine such approaches with machine learning and deep learning neural network analysis. We believe these tools, employed on high performance computing resources (such the NIH BioWulf cluster) that are available to us, can feasibly analyze a large proportion of the data across the 5906 TARGET individuals. The large number of cases (>1000) greatly improves the ability of deep learning algorithms to discriminate between classifications (tumor vs normal genomes, for example) and, importantly, to identify the features that contribute significantly to correct classification. These features become the candidates that we hope to report and explore further in vitro. We believe the risk to patient privacy to be very small. While we intend to report features that we find to be associated with pediatric cancer of different types, those features themselves will be the result of mathematically processing the data of many individuals, such that any idiosyncratic genomic changes would be highly obscured. In addition, we are proposing to analyze only somatic differences, and we expect to filter any set of candidate somatic mutations to exclude germline mutations before any analysis. We do not intend to combine TARGET data with any other dataset for this study. Jeong, Jong Cheol UNIVERSITY OF KENTUCKY Somatic variations and Genetic Rearrangement features of high-risk Neuroblastoma: Unraveling DNA repair mechanisms association Apr18, 2024 approved Neuroblastoma (NB) is a cancer affecting young children, often under 5 years old. It spreads rapidly, making treatment challenging. High-risk NB is aggressive and harder to cure, with poor prognosis even with intensive treatment. Clinical trials are crucial for discovering new treatments, especially for severe cases. Recent research has identified specific genetic changes driving tumor progression, leading to targeted therapies that may be used alongside or instead of traditional chemotherapy. High-risk NB patients often exhibit genomic instability, contributing to cancer progression and drug resistance. Our goal is to study these genetic traits and develop personalized therapies, potentially improving outcomes and reducing side effects for children with NB. Neuroblastoma (NB) stands as one of the most lethal pediatric solid tumors, characterized by significant clinical and genetic diversity. Despite advancements in intense therapy, the survival rate for the high-risk (HR) subset remains below 50%. Clinical heterogeneity primarily impacts NB prognosis, with genomic alterations such as MYCN amplification, large deletions of chromosome 11q, and gain of chromosome 17q serving as markers for HR-NBs. Though the precise mechanisms underlying large-scale structural variations (SVs) related to HR-NB remain unclear, identifying genetic features, including somatic SVs and copy number variations (CNVs), associated with disease progression and therapy resistance may unveil potential drug targets. Aberrant DNA repair pathways can drive tumor progression and therapy resistance by promoting ectopic DNA repair, escalating mutation burden, loss of heterozygosity, and chromosome rearrangements, all contributing to tumor development and progression. In HR-NBs, deficiencies in DNA repair pathways, particularly homologous recombination and nonhomologous DNA end joining (NHEJ), are implicated in segmental chromosomal instability, linked to poor prognosis. Therapeutic approaches in NB treatment, like radiotherapy, induce DNA damage in tumor cells, yet resistant tumors often exploit the DNA damage response (DDR) pathway to repair DNA damage, reducing treatment efficacy. Combining DNA damaging therapies with selective DNA repair cascade inhibitors emerges as a promising strategy to sensitize tumors and overcome therapy resistance. Our study focuses on the role of the PRKDC gene, a key component of the NHEJ DNA repair pathway in NB. PRKDC encodes DNA-dependent protein kinase (DNA-PK), regulating responses to DNA double-strand breaks induced by radiation and chemotherapy agents. Our data demonstrate NB cell survival dependency on PRKDC activity, with inhibition inducing apoptosis and increasing DNA damage. In vitro studies using Peposertib (Pep), a selective DNA-PK inhibitor, show decreased IC50 values of Etoposide (ET) and increased DNA damage and apoptosis in combination therapy. This highlights DNA-PK inhibition as a promising strategy for HR-NB treatment. Immunohistochemistry staining of NB tissue arrays reveals elevated PRKDC expression in a subset of NB patients, suggesting its potential as a therapeutic target. Comprehensive analysis of patient data aims to identify genomic alterations associated with increased PRKDC activity, potentially introducing Pep as a new therapeutic option for HR-NBs. Clinical trial studies have shown Pep's efficacy in inhibiting DNA-PK protein activity and its oral tolerability, further supporting its potential in HR-NB treatment. Through this research, we aim to identify NB-specific genomic features linked to DNA-PK activity, aiding in patient stratification and treatment selection. Jiang, Qingfei UNIVERSITY OF CALIFORNIA, SAN DIEGO Profiling Epitranscriptomic RNA editing in Pediatric Cancer Jan18, 2024 approved Relapsed pediatric T-cell acute lymphoblastic leukemia (T-ALL) is often refractory to conventional therapy and is associated with a dismal survival rate of less than 25%. Adenosine deaminase acting on RNA 1 (ADAR1) mediates the conversion of adenosine (A) to inosine (I) in the mammalian transcriptome. Malignant ADAR1 activation and over-editing was reported in extensively reported in adult cancer type. As a result, there is an intense interest to understand the mechanisms by which ADAR1-directed A-to-I RNA editing regulates gene expression, and how these editing events influence tumorigenesis. However, the global landscape of A-to-I RNA editing in pediatric cancer has not been systematically characterized. In this proposal, we will 1) define the ADAR1-controlled A-to-I RNA editing landscape in 1,304 T-ALL patients by combining the Kids First and NCI TARGET datasets, 2) identify novel RNA editing events that predict disease outcome, and 3) compare the RNA editing landscapes in various molecular subtypes to reveal any critical link between RNA editing and genetic background. Our preliminary studies and the proposed work together will provide the first complete A-to-I RNA editing landscape in T-ALL that will be shared within the pediatric research community. Relapsed pediatric T-cell acute lymphoblastic leukemia (T-ALL) is often refractory to conventional therapy and is associated with a dismal survival rate of less than 25%1-3. Thus, the development of novel therapies for relapsed T-ALL represents an urgent unmet medical need in children. Adenosine deaminase acting on RNA 1 (ADAR1) plays a key regulatory role in the innate immune response, hematopoietic stem cell maintenance, and cancer4. ADAR1 catalyzes the transition of adenosine (A) to inosine (I) in precursor double-stranded RNA (dsRNA) that are extensively detected in the mammalian transcriptome5-7. Malignant ADAR1 activation and transcriptomic wide over-editing was reported in all major adult cancer types4,8-10. As a result, there is an intense interest to understand the mechanisms by which ADAR1-directed A-to-I RNA editing regulates gene expression, and how these editing events influence tumorigenesis. However, the global landscape of A-to-I RNA editing in pediatric cancer has not been systematically characterized. Fulfilling this knowledge gap will allow mechanistic and functional studies of these RNA editing modifications that can ultimately aid in formulating new therapeutic and preventative strategies. Our RNA editing analysis of over 260 T-ALL patients (TARGET) revealed wide-spread A-to-I RNA mutations in the relapsed T-ALL cohort. These hyper-editing RNA “mutations” result in activation of self-renewal genes and fuels therapeutic resistance. These discoveries were confirmed by functional studies demonstrating that inhibition of ADAR1 impairs LICs self-renewal and prolongs life expectancy in mouse models. These discoveries need to be validated in a large cohort of T-ALL patients to further delineate the critical RNA editing “mutations” associated with relapse. In this proposal, our overall goal is to leverage on the large sample size in Kids First Program to fully understand the heterogenous RNA editing landscape in T-ALL pathogenesis. Our central hypothesis is that ADAR1 promotes unique A-to-I RNA editing changes in T-ALL which drives disease relapse and therapeutic resistance. Aim 1: Define the ADAR1-controlled RNA editing landscape in T-ALL. In this aim, we will combine the Kids First and TARGET cohort (1,304 patients) and perform transcriptome wide profiling of A-to-I RNA modifications. We will compute overall A-to-I RNA editing level, individual editing sites, editing intensity, and hyper-editing targets of ADAR1 to identify novel RNA editing alterations in pediatric T-ALL. Aim 2: Identify clinically relevant RNA editing events in T-ALL. An important question is which RNA editing events have critical functional and clinical implications. In this aim, we will perform correlation analyses to determine associations between RNA editing sites and patient survival/relapse to identify recurrent RNA editing events that predict poor outcome in T-ALL. Aim 3: To examine if the RNA-editing landscape varies based on genetic classification of T-ALL. We will examine the difference in A-to-I editing events based on the heterogenous genetic network and hyper-editing of tumor suppressor or oncogenes in T-ALL molecular subtypes. Our preliminary studies and the proposed work together will provide the first complete A-to-I RNA editing landscape in T-ALL that will be extensively shared within the pediatric research community. In addition, we will provide new insights into the mechanisms and functions of ADAR1 in T-ALL pathogenesis and will substantially advance our understanding of the epitranscriptomic regulation in pediatric malignancies. The success of this work will reveal a comprehensive evaluation of the RNA editing network that provides advantages for leukemia expansion, and RNA hyper-editing events which may serve as an attractive therapeutic target. Jones, Steven PROVINCIAL HEALTH SERVICES AUTHORITY Accessing TARGET data for personalized oncogenomics in pediatric patients Jul19, 2019 approved The Genome Sciences Centre sequences the tumours of cancer patients and delivers a personalized analysis of the tumour to help inform treatment. The ability to compare a tumour to other tumours of the same type helps greatly with the interpretation of the data, in particular allowing us to highlight the important molecular events in the tumour that could be targeted with a drug. As pediatric tumours look very different from adult tumours, we need suitable pediatric tumour populations for the pediatric arm of our program in order to help with the analysis for our pediatric patients, many of whom have very rare, untreatable tumours. The Genome Sciences Centre performs analysis of whole genome sequencing of tumours and matched normals and whole transcriptome sequencing of tumours from pediatric patients to identify putative drug targets and inform treatment in a clinically relevant timeframe for projects such as the PRecision Oncology For Young peopLE (PROFYLE) program. The comparison of tumour expression and somatic mutations against a population of tumours provides a powerful tool for interpreting genomic data from a single patient. Routine analyses that we perform in our pipeline include: i) identifying genes which are expression outliers in the patient based on expression rank in the tumour population, ii) identifying rare or novel mutations by looking at the frequency of the mutation in the tumour population iii) assessing mutational burden and prognosis or other clinical features by comparing mutation rate and profile against tumour subtypes. In adult cases, comparison against The Cancer Genome Atlas (TCGA) expression datasets has identified many potential drug targets, as well as helped to reveal drivers that characterize the individual patient tumour. Pediatric tumours are very different from adult tumours at a molecular level, and comparison of expression profile to adult cohorts of the same cancer type is confounded by these molecular differences. In addition, the relatively quiet mutational landscape of many pediatric tumours makes the identification of drivers more challenging. We plan to use the TARGET pediatric tumour datasets as a comparator set for samples from pediatric patients. The ability to apply appropriate pediatric tumour and matched normal datasets will greatly increase the power to elucidate the molecular drivers and drug targets in these tumours. Jones, Steven PROVINCIAL HEALTH SERVICES AUTHORITY Pediatric personalized oncogenomics Oct08, 2019 closed The Pediatric Personalized Oncogenomics (ped-POG) program at the Genome Sciences Centre sequences the tumours of cancer patients and delivers a personalized analysis of the tumour to help inform treatment. The ability to compare a tumour to other tumours of the same type helps greatly with the interpretation of the data, in particular allowing us to highlight the important molecular events in the tumour that could be targeted with a drug. As pediatric tumours look very different from adult tumours, we need suitable pediatric tumour populations for the pediatric arm of our program in order to help with the analysis for our pediatric patients, many of whom have very rare, untreatable tumours. The Pediatric Personalized Oncogenomics (ped-POG) program at the Genome Sciences Centre performs analysis of whole genome sequencing of tumours and matched normals and whole transcriptome sequencing of tumours from pediatric patients to identify putative drug targets and inform treatment in a clinically relevant timeframe. The comparison of tumour expression and somatic mutations against a population of tumours provides a powerful tool for interpreting genomic data from a single patient. In the adult arm of the POG program, routine analyses that we perform in our pipeline include: i) identifying genes which are expression outliers in the patient based on expression rank in the tumour population, ii) identifying rare or novel mutations by looking at the frequency of the mutation in the tumour population iii) assessing mutational burden and prognosis or other clinical features by comparing mutation rate and profile against tumour subtypes. In these adult cases, comparison against The Cancer Genome Atlas (TCGA) expression datasets has identified many potential drug targets, as well as helped to reveal drivers that characterize the individual patient tumour. Pediatric tumours are very different from adult tumours at a molecular level, and comparison of expression profile to adult cohorts of the same cancer type is confounded by these molecular differences. In addition, the relatively quiet mutational landscape of many pediatric tumours makes the identification of drivers more challenging. We plan to use the TARGET pediatric tumour datasets as a comparator set for patients enrolled in the pediatric arm of the POG program. In addition to TARGET data, we will also be including RNA seq data from our previous Ped-POG samples, Medulloblastoma Advanced Genomics International Consortium (MAGIC) and Treehouse Childhood Cancer Initiative (University of California Santa Cruz). The ability to apply appropriate pediatric tumour and matched normal datasets will greatly increase the power of our pipeline to elucidate the molecular drivers and drug targets in these tumours, many of which are rare and have very limited treatment options. Jones, Steven PROVINCIAL HEALTH SERVICES AUTHORITY Accessing TARGET data to analyze tumours and their derived samples from pediatric cancer patients Aug07, 2019 closed The ability to compare a tumour to other tumours helps greatly with the interpretation of the data, in particular allowing us to highlight the important molecular events in the tumour that could be targeted with a drug. As pediatric tumours look very different from adult tumours, we need suitable pediatric tumour populations for the pediatric data analysis in order to help with the analysis for our pediatric samples, from projects such as the Stand Up To Cancer (SU2C) Canada Cancer Stem Cell program which studies the brain tumours and their matched cell-lines from pediatric patients to help develop new treatment. Pediatric tumours are very unique at a molecular level, typically with few aberrations at the genome level, and several tumour types such as brain tumours have drivers at the epigenetic level, which makes the RNA expression analysis critical to study such tumours. An appropriate comparator is critical to identity aberrant expressions. We plan to use the TARGET pediatric tumour datasets as a comparator set to analyze tumour, cell-line, and mouse model samples from pediatric patients of projects like The Stand Up To Cancer (SU2C) Canada Cancer Stem Cell program, which studies the brain tumours and their matched cell-lines from pediatric patients to help develop new treatment. The Genome Sciences Centre will perform analysis of whole genome sequencing of tumours, matched cell-lines, matched xenografts, and matched normals, as well as whole transcriptome sequencing of tumours, cell-lines, and xenografts to identify putative drug targets and develop new treatments. Routine analyses that we plan to perform in our pipeline include: i) identifying genes which are expression outliers in the patient based on expression rank in the tumour population, ii) identifying rare or novel mutations by looking at the frequency of the mutation in the tumour population iii) assessing mutational burden by comparing mutation rate and profile against tumour subtypes. The ability to apply appropriate pediatric tumour and matched normal datasets will greatly increase the power of our pipeline to elucidate the molecular drivers and find novel drug targets in these tumours. Jones, Steven BRITISH COLUMBIA CANCER AGENCY Accessing TARGET data to analyze tumours and their derived samples from pediatric cancer patients Aug30, 2018 closed The ability to compare a tumour to other tumours helps greatly with the interpretation of the data, in particular allowing us to highlight the important molecular events in the tumour that could be targeted with a drug. As pediatric tumours look very different from adult tumours, we need suitable pediatric tumour populations for the pediatric data analysis in order to help with the analysis for our pediatric samples, from projects such as the Stand Up To Cancer (SU2C) Canada Cancer Stem Cell program which studies the brain tumours and their matched cell-lines from pediatric patients to help develop new treatment. Pediatric tumours are very unique at a molecular level, typically with few aberrations at the genome level, and several tumour types such as brain tumours have drivers at the epigenetic level, which makes the RNA expression analysis critical to study such tumours. An appropriate comparator is critical to identity aberrant expressions. We plan to use the TARGET pediatric tumour datasets as a comparator set to analyze tumour, cell-line, and mouse model samples from pediatric patients of projects like The Stand Up To Cancer (SU2C) Canada Cancer Stem Cell program, which studies the brain tumours and their matched cell-lines from pediatric patients to help develop new treatment. The Genome Sciences Centre will perform analysis of whole genome sequencing of tumours, matched cell-lines, matched xenografts, and matched normals, as well as whole transcriptome sequencing of tumours, cell-lines, and xenografts to identify putative drug targets and develop new treatments. Routine analyses that we plan to perform in our pipeline include: i) identifying genes which are expression outliers in the patient based on expression rank in the tumour population, ii) identifying rare or novel mutations by looking at the frequency of the mutation in the tumour population iii) assessing mutational burden by comparing mutation rate and profile against tumour subtypes. The ability to apply appropriate pediatric tumour and matched normal datasets will greatly increase the power of our pipeline to elucidate the molecular drivers and find novel drug targets in these tumours. Jones, Steven BRITISH COLUMBIA CANCER AGENCY Pediatric personalized oncogenomics Oct17, 2016 closed The Pediatric Personalized Oncogenomics (ped-POG) program at the Genome Sciences Centre sequences the tumours of cancer patients and delivers a personalized analysis of the tumour to help inform treatment. The ability to compare a tumour to other tumours of the same type helps greatly with the interpretation of the data, in particular allowing us to highlight the important molecular events in the tumour that could be targeted with a drug. As pediatric tumours look very different from adult tumours, we need suitable pediatric tumour populations for the pediatric arm of our program in order to help with the analysis for our pediatric patients, many of whom have very rare, untreatable tumours. The Pediatric Personalized Oncogenomics (ped-POG) program at the Genome Sciences Centre performs analysis of whole genome sequencing of tumours and matched normals and whole transcriptome sequencing of tumours from pediatric patients to identify putative drug targets and inform treatment in a clinically relevant timeframe. The comparison of tumour expression and somatic mutations against a population of tumours provides a powerful tool for interpreting genomic data from a single patient. In the adult arm of the POG program, routine analyses that we perform in our pipeline include: i) identifying genes which are expression outliers in the patient based on expression rank in the tumour population, ii) identifying rare or novel mutations by looking at the frequency of the mutation in the tumour population iii) assessing mutational burden and prognosis or other clinical features by comparing mutation rate and profile against tumour subtypes. In these adult cases, comparison against The Cancer Genome Atlas (TCGA) expression datasets has identified many potential drug targets, as well as helped to reveal drivers that characterize the individual patient tumour. Pediatric tumours are very different from adult tumours at a molecular level, and comparison of expression profile to adult cohorts of the same cancer type is confounded by these molecular differences. In addition, the relatively quiet mutational landscape of many pediatric tumours makes the identification of drivers more challenging. We plan to use the TARGET pediatric tumour datasets as a comparator set for patients enrolled in the pediatric arm of the POG program. In addition to TARGET data, we will also be including RNA seq data from our previous Ped-POG samples, Medulloblastoma Advanced Genomics International Consortium (MAGIC) and Treehouse Childhood Cancer Initiative (University of California Santa Cruz). The ability to apply appropriate pediatric tumour and matched normal datasets will greatly increase the power of our pipeline to elucidate the molecular drivers and drug targets in these tumours, many of which are rare and have very limited treatment options. Jones, Steven BRITISH COLUMBIA CANCER AGENCY Accessing TARGET data for personalized oncogenomics in pediatric patients Aug30, 2018 closed The Genome Sciences Centre sequences the tumours of cancer patients and delivers a personalized analysis of the tumour to help inform treatment. The ability to compare a tumour to other tumours of the same type helps greatly with the interpretation of the data, in particular allowing us to highlight the important molecular events in the tumour that could be targeted with a drug. As pediatric tumours look very different from adult tumours, we need suitable pediatric tumour populations for the pediatric arm of our program in order to help with the analysis for our pediatric patients, many of whom have very rare, untreatable tumours. The Genome Sciences Centre performs analysis of whole genome sequencing of tumours and matched normals and whole transcriptome sequencing of tumours from pediatric patients to identify putative drug targets and inform treatment in a clinically relevant timeframe for projects such as the PRecision Oncology For Young peopLE (PROFYLE) program. The comparison of tumour expression and somatic mutations against a population of tumours provides a powerful tool for interpreting genomic data from a single patient. Routine analyses that we perform in our pipeline include: i) identifying genes which are expression outliers in the patient based on expression rank in the tumour population, ii) identifying rare or novel mutations by looking at the frequency of the mutation in the tumour population iii) assessing mutational burden and prognosis or other clinical features by comparing mutation rate and profile against tumour subtypes. In adult cases, comparison against The Cancer Genome Atlas (TCGA) expression datasets has identified many potential drug targets, as well as helped to reveal drivers that characterize the individual patient tumour. Pediatric tumours are very different from adult tumours at a molecular level, and comparison of expression profile to adult cohorts of the same cancer type is confounded by these molecular differences. In addition, the relatively quiet mutational landscape of many pediatric tumours makes the identification of drivers more challenging. We plan to use the TARGET pediatric tumour datasets as a comparator set for samples from pediatric patients. The ability to apply appropriate pediatric tumour and matched normal datasets will greatly increase the power to elucidate the molecular drivers and drug targets in these tumours. Jones, Steven BRITISH COLUMBIA CANCER AGENCY TARGET data analysis Nov19, 2010 closed Sequence data for the TARGET project is being generated by multiple technology platforms. This study will determine the relative merits of each in their ability to aid in pediatric cancer research. TARGET data for pediatric cancer research is being generated through the Illumina sequencing platform and more recently using the Complete Genomics Platform. I will be evaluating the fecundity of both these platforms and comparing their results, in order to determine their role in aid pediatric cancer research. I will only be comparing data from the TARGET dataset. This is a renewal of a previously approved application. Kalisky, Tomer BAR-ILAN UNIVERSITY Meta-analysis of RNA sequencing datasets of Wilms’ tumors Sep12, 2018 approved We want to understand how and why do Wilms’ tumors – a type of pediatric tumor of the kidney - vary between different patients. This will assist in designing new quantitative measures to classify Wilms’ tumors into subtypes, assess disease progression, and design treatments that are personally tailored for each patient according to his unique characteristics. In this project, we will perform a meta-analysis of the TARGET Wilms’ tumors dataset in order to better characterize the heterogeneity of Wilms’ tumors and find new quantitative measures to classify Wilms’ tumors into subtypes and asses disease progression. Background: Wilms’ tumors are pediatric tumors of the kidney that are thought to arise from faulty development of the kidney at the embryonic stage. Although Wilms’ tumors are generally responsive to treatment and have a relatively good prognosis, there remains a need for measures to better classify them into subtypes and assess the progression of the tumor in each patient in order to design a more personalized treatment. To date, this is done by experienced pathologists that manually inspect tissue sections taken from a biopsy of the tumor. In this project we wish to use gene expression and sequence information found in the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) dataset in order to find more quantitative and objective measures that will complement the manual pathological inspection that is done today. Objectives: - To better characterize the heterogeneity of Wilms’ tumors, that is, how they differ from patient to patient. - To find new quantitative genomic measures to classify Wilms’ tumors into subtypes and asses disease progression. Analysis plan: We will first use gene expression levels to characterize the heterogeneity of Wilms’ tumors in the TARGET dataset by examining the shape that they form in gene expression space (e.g. do they form a continuum or discrete groups? What is the shape that they form?). We will use dimension reduction techniques such as PCA and tSNE. Then, we will examine genomic sequence information to characterize the heterogeneity in splice isoform expression. We will use tools such as rMATS to find splice isoform switching events and rMAPS to identify global splicing factors using motif analysis of known RNA binding proteins. We will then correlate the gene expression and sequence information with clinical parameters in order to understand the relationship between the genomic characteristics of the each specific Wilms’ tumor (e.g. gene expression patterns, splice isoforms, genetic or epigenetic traits) and its clinical characteristics (e.g. tumor histology report, anaplastic vs. non-anaplastic, tumor stage, patient age, relapse vs. non-relapse, primary tumor vs. metastasis, etc.). Statement of relevance to this dataset: We believe that our research can only be conducted using pediatric data (that is, the TARGET Wilms’ tumors dataset) and that it has relevance for developing more effective treatments, diagnostic tests, and prognostic markers for childhood cancers. Here are our reasons: 1. To the best of our knowledge, no RNA sequencing dataset of Wilms’ tumors of similar quality and magnitude currently exists elsewhere. Therefore, the proposed project can only be conducted using the pediatric data of Wilms’ tumors contained in the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) Wilms’ tumors dataset. Since adult tumors of the kidney (e.g. RCC) are quite different than pediatric (Wilms’) tumors, the research objectives cannot be accomplished using data from adults. 2. The purpose of our research is to find new quantitative measures to classify Wilms’ tumors into subtypes and asses disease progression. These measures, being more objective and less prone to human error, will hopefully complement the manual pathological inspection that is done today for this purpose. This is especially important in cases when the tumor histology is unknown or not well defined, such as in tumors of “mixed” histology that contain varying proportions of different cell types. Therefore, we believe that our proposed research will pave the way to better diagnosis and a more personalized treatment of Wilms tumors. Combination of datasets: At this time, we do not expect to combine the TARGET Wilms’ tumors dataset with other datasets. However, if other datasets become available in the future, we might want to use them to complement our research. Since all data points are anonymized, this will not create any additional risks to participants. If needed, we will ask for permission to combine with other datasets when this becomes relevant. Kanduri, Chandrasekhar GOTEBORG UNIVERSITY Identification of novel prognostic markers and therapeutic targets for the treatment of childhood cancer neuroblastoma. Jan18, 2017 rejected LncRNA represent the largest subtype among the noncoding RNAs, and also it is a major constituent of transcriptome. Studies over the last two decades strongly suggest that they play a critical role in a wide range of biological functions such as chromatin organization, transcriptional regulation, pluripotency maintenance, mRNA and protein stability and their transport. LncRNAs show aberrant expression during tumor development and progression, and their expression level signifies potential prognostic significance and they are projected to serve as potential therapeutic molecules for cancer. Thus identification of cancer associated lncRNAs and their mechanistic roles in pediatric tumor development and progression could help us devise novel lncRNA based treatment strategies. Neuroblastoma is a tumor of neural crest origin, occurs in children mostly less than 5years of age. It is the most common extracranial solid tumor typically develops from sympathetic ganglia in the abdomen, with majority of them developing from adrenal medulla. Several known risk factors such as age, stage and chromosomal alterations like MYCN amplification, 11q deletion and 17q gain, are being routinely used in the clinics to stratify the neuroblastoma patients into high risk and low risk groups. MYCN amplification is a well known high risk factor and linked to aggressive form of the disease. MYCN is implicated in neuroblastoma development and progression through influencing diverse biological processes. Interestingly, the p53 gene is rarely mutated in neuroblastoma and how its activity is compromised in high-risk neuroblastomas has not been investigated in a greater detail. Our previous work has implicated the functional role of neuroblastoma hotspot locus 6p22.3 derived lncRNAs NBAT1 and CASC15 in neuroblastoma development (Pandey GK et al., 2014; Mondal et al., 2018). In particular, our recent published work explored the functional role of 6p22.3 locus derived lncRNAs in p53 and MYCN dependent tumor suppressor and oncogenic pathways, respectively (Mitra S etal 2020 Cancer Research; Juvvuna PK et al., 2020. Neuro-Oncology Advances). These studies have identified several novel tumor-associated pathways, with a strong functional connection to neuroblastoma development and progression. Our access to TARGET datasets has immensely helped in the progress of these projects. With aid of TARGET datasets we will investigate the clinical significance of the targets in highly aggressive tumor group, our future work will explore the functional significance of these pathways in neuroblastoma pathogenesis. In addition, we have recently identified two novel suppressor lncRNAs with functional roles in neuroblastoma development and progression (Unpublished data). We have extensively used TARGET neuroblastoma sequencing data to identify these two lncRNAs. We expect that these two lncRNA would be potential prognostic biomarkers in the clinical setting. Karakach, Tobias DALHOUSIE UNIVERSITY Multi-omics analysis of Wilms Tumor data Mar23, 2023 expired Living organisms interpret instruction in their genomes to direct daily function. Dysfunction in this information can sometimes lead to diseases such as cancers and extensive research is ongoing to discover how the instructions stored in this genome can help us identify and differentiate cancers. For childhood cancers, this information can provide guidance regarding: (a) whether the cancer is starting to develop, (b) at what stage the tumor is, and (c) if they are responsible for the growth of the riskiest cancers. My lab is interested in studying information in the genome to classify different stages of Wilms Tumor (WT), the most common childhood cancer of the kidney. With the extensive data available at dbGaP it is possible to extract many levels of genomic information related to WT. Unfortunately, this requires complex computational tools that are not readily available. My lab specializes in developing sophisticated methods to analyze these data by connecting various levels of genomic information to get a comprehensive picture of the characteristics of the disease. These allow us to understand for example, which WTs return years after they have been treated. In this proposal we are requesting these data to continue developing these methods to get deeper information about possible changes in the genome and their signatures and how these influence the outcomes of the disease. There is concerted effort to find comprehensive molecular classifiers for Wilms' tumors that can help: (a) determine the genetic changes that are responsible for oncogenesis, (b) categorize Wilms' tumor histopathologies, and (c) determine drivers of progression of the most high-risk ones. The search for these molecular alterations is beyond the known loss of function of the WT1 gene on chromosome 11p13 and the reported loss of heterozygosity (LOH) at chromosome 1p and 16q for a subset of these tumors. Unlike past studies where -omics data have been analyzed individually, it has become clear that such approaches lack the power to unambiguously establish the genotype-phenotype etiology. The objectives of this data request are the following. First, to re-analyze Binary Aligment Map (BAM) files associated with Whole Genome Sequencing (WGS), Whole Exome Sequencing (WES) and RNA sequencing data. WGS data provide invaluable information regarding somatic copy number alterations (SCNA) in the genome and, at a sequence coverage of 50x (for the data available in dbGaP), it is possible to infer mutations that range from single nucleotide variations (SNVs) to whole chromosome translocations. These data will be re-analyzed to confirm the LOH at chromosome 1p and 16q that has been associated with Wilms' Tumors and calculate copy number and structural variations in the genome at different levels of resolution. In addition, the WES data will be used to infer which of the mutations are related directly to coding sequences and hence gene expression. Second, we will perform an integrated analysis of WGS, WES, Epigenomics and RNAseq data as they relate to histopathological staging and overall Relapse Free Survival (RFS) of the Wilms' Tumor patients. We will subset the data to query different functional roles of the genes sequenced (e.g., transcriptional factors (TFs) or metabolic genes) and relate all observable alterations through different levels of molecular complexity. Third, we will use the data to infer cis- and trans-regulatory mutations using machine learning methods to develop classifiers that can relate the expression of TFs to SCNA and Wilms' tumor histopathological stages. Whereas single-omics data can contribute toward the identification of Wilms' tumor-specific mutations, epigenetic alterations, and gene expression, they lack the resolving power to unambiguously classify these tumors into respective histopathological stages or even relapse-free survival. We will identify molecular targets that can be uniquely associated with RFS using Kaplan-Meier Survival analysis. We will account for covariates via Cox proportional hazard models. Karakach, Tobias UNIVERSITY OF MANITOBA Multi-omics analysis of Wilms Tumor data Jan10, 2020 expired In living things, DNA is the chemical that contains the instructions needed to develop and direct their activities and functions. DNA is usually organized in specific sequences called genes. Different sequences of DNA make different genes and all the genes in an organism are called its genome. Because all instructions in a living organism are stored in the genome, there is a lot of research in cancer to discover: (a) whether a cancer is starting to develop, (b) a stage of the tumor, or (c) genes that are responsible for the growth of aggressive tumors. In our lab, we study different levels of information in the genome in order to classify different stages of Wilms’ tumor (WT). WT is a type of cancer of the kidney for children. The aim of this proposal is to request genome data from dbGaP. We want to analyze the data to get all information about possible changes DNA sequences (mutations) related to WT. Also, we want analyze the data together to see how all levels of information are related, and how they can determine WT stages and which tumors recur. Finally, we will use computers to learn patterns in the data that can tell us if there are specific mutations that lead directly to the kind of WT that recurs, regardless of the child's gender, race or where they live. There is concerted effort to find comprehensive molecular classifiers for Wilms' tumors that can help: (a) determine the genetic changes that are responsible for oncogenesis, (b) categorize Wilms' tumor histopathologies, and (c) determine drivers of progression of the most high-risk ones. The search for these molecular alterations is beyond the known loss of function of the WT1 gene on chromosome 11p13 and the reported loss of heterozygosity (LOH) at chromosome 1p and 16q for a subset of these tumors. Unlike past studies where -omics data have been analyzed individually, it has become clear that such approaches lack the power to unambiguously establish the genotype-phenotype etiology. The objectives of this data request are the following. First, to re-analyze Binary Aligment Map (BAM) files associated with Whole Genome Sequencing (WGS), Whole Exome Sequencing (WES) and RNA sequencing data. WGS data provide invaluable information regarding somatic copy number alterations (SCNA) in the genome and, at a sequence coverage of 50x (for the data available in dbGaP), it is possible to infer mutations that range from single nucleotide variations (SNVs) to whole chromosome translocations. These data will be re-analyzed to confirm the LOH at chromosome 1p and 16q that has been associated with Wilms' Tumors and calculate copy number and structural variations in the genome at different levels of resolution. In addition, the WES data will be used to infer which of the mutations are related directly to coding sequences and hence gene expression. Second, we will perform an integrated analysis of WGS, WES, Epigenomics and RNAseq data as they relate to histopathological staging and overall Relapse Free Survival (RFS) of the Wilms' Tumor patients. We will subset the data to query different functional roles of the genes sequenced (e.g., transcriptional factors (TFs) or metabolic genes) and relate all observable alterations through different levels of molecular complexity. Third, we will use the data to infer cis- and trans-regulatory mutations using machine learning methods to develop classifiers that can relate the expression of TFs to SCNA and Wilms' tumor histopathological stages. Whereas single-omics data can contribute toward the identification of Wilms' tumor specific mutations, epigenetic alterations and gene expression, they lack the resolving power to unambigously classify these tumors into respective histopathological stages or even relapse free survival. We will identify molecular targets that can be uniquely associated with RFS using Kaplan-Meier Survival analysis. We will account for covariates via Cox proportional hazard models. Karsan, Aly PROVINCIAL HEALTH SERVICES AUTHORITY Improving stratification of pediatric AML Sep15, 2019 approved Acute myeloid leukemia is a cancer of the blood. Blood is made from special cells in bone marrow. In adults, the disease can arise because the bone marrow cells have changes to their DNA when people get old. But, this does not explain why children also can get the disease. We have developed new genetic tests that may explain why some adult patients with the disease survive longer than others. We would like to use our new tests on data from patients with the childhood form of the disease. We expect that our new test will work the same way, but that it will give different answers from the two groups of patients. By comparing the answers, we will learn more about the biology of the bone marrow cells in both adults and children, and why some people get the cancer and others do not. Acute myeloid leukemia (AML) is a clonal malignancy originating in hematopoietic stem cells. We have developed a novel analytic pipeline using RNA-seq and miRNA-seq data from adult AML samples. This pipeline is meant to serve as the basis for future diagnostic and prognostic tests for AML. The pipeline can be run on any RNA-seq data, from FASTQ files as input. First, we determine the presence of sequence variants, structural variants, and gene expression. Then we use machine-learning methods to classify the samples, and determine the patient's likely outcome and treatment options. We now seek to apply our analytic pipeline to RNA-seq and miRNA-seq data from TARGET, to confirm that our approach works on juvenile AMLs. This will not require any direct comparison of TARGET data with any other data, or any further development work on the analysis pipeline. The TARGET data will be analyzed separately from the adult AML data we have. This work will lead to improved diagnostic/prognostic assays for juvenile AML. It will also potentially provide new mechanistic insights into the etiology of the disease, as our classifier may be able to identify novel combinations of mutations that may be associated with outcome. Karsan, Aly BRITISH COLUMBIA CANCER AGENCY Improving stratification of pediatric AML Sep12, 2016 closed Acute myeloid leukemia is a cancer of the blood. Blood is made from special cells in bone marrow. In adults, the disease can arise because the bone marrow cells have changes to their DNA when people get old. But, this does not explain why children also can get the disease. We have developed new genetic tests that may explain why some adult patients with the disease survive longer than others. We would like to use our new tests on data from patients with the childhood form of the disease. We expect that our new test will work the same way, but that it will give different answers from the two groups of patients. By comparing the answers, we will learn more about the biology of the bone marrow cells in both adults and children, and why some people get the cancer and others do not. Acute myeloid leukemia (AML) is a clonal malignancy originating in hematopoietic stem cells. We have developed a novel analytic pipeline using RNA-seq and miRNA-seq data from adult AML samples. This pipeline is meant to serve as the basis for future diagnostic and prognostic tests for AML. The pipeline can be run on any RNA-seq data, from FASTQ files as input. First, we determine the presence of sequence variants, structural variants, and gene expression. Then we use machine-learning methods to classify the samples, and determine the patient's likely outcome and treatment options. We now seek to apply our analytic pipeline to RNA-seq and miRNA-seq data from TARGET, to confirm that our approach works on juvenile AMLs. This will not require any direct comparison of TARGET data with any other data, or any further development work on the analysis pipeline. The TARGET data will be analyzed separately from the adult AML data we have. This work will lead to improved diagnostic/prognostic assays for juvenile AML. It will also potentially provide new mechanistic insights into the etiology of the disease, as our classifier may be able to identify novel combinations of mutations that may be associated with outcome. Kataoka, Keisuke NATIONAL CANCER CENTER Pan-cancer annotation of driver alterations: understanding their distribution and their clinical significance in adults and pediatrics Sep05, 2018 approved The functional impacts of genetic alterations in different cancers are not fully understood, because genetic alterations may function differently in different cancers. The distribution and relationship of each genetic alteration in different cancers are not understood as well. Therefore, we intend to analyze the functional impact of each driver alterations in pediatric and adult cancers using several independent approaches. We will also analyze the distribution and relationship of these genetic alterations in each sample. We believe that this study will lead to the detection of the therapeutic target in both adult and pediatric cancers. Recent advances in high-throughput sequencing technologies have enabled us to systematically identify somatic mutations, copy number alterations, and fusion genes in various malignancies. However, the functional impact of these genetic alterations is not fully understood, since most of these alterations are expected to be “passenger” alterations. It is of urgent need to accurately detect “driver” genomic alterations. Also, it is of wide interest to comprehensively understand the clinical significance of each driver alterations. However, it is sometimes difficult, because driver alterations may function differently in different cancers. For example, same driver alteration may have different functional effect on pediatric cancers and adult cancers. Therefore, we intend to analyze the functional impact of each driver alterations in pediatric and adult cancers using several independent approaches (including our newly created pipeline). We also intend to analyze the distribution of driver alterations in each sample. Some of these genetic alterations may coexist in a same sample, and some may be mutually exclusive. We therefore analyze the distribution and relationship of driver abnormalities in different cancers (including leukemia and B cell lymphoma). By performing this study, we will be able to understand the distribution and clinical significance of each driver alterations in different cancers. This study will lead to the accurate detection of the therapeutic target in both adult and pediatric cancers. To perform this research, we would like to use the data deposited by TCGA, TARGET, Foundation One, GTEx and several other groups. In order to achieve sufficient statistical power, the combined analyses with other datasets outside of dbGaP are planned in the current project. We plan to analyze these datasets independently and together. The data integration with the other datasets does not create any additional risk to participants. No request will be made for the identification of participants. We publish or otherwise broadly share any findings from our study with the scientific community. Kato, Motohiro UNIVERSITY OF TOKYO Genome-wide analysis of pediatric hematologic malignancy Jun23, 2022 approved Acute leukemia is the most common pediatric malignancy. To date, the prognosis of acute leukemia has improved with treatment based on the risk factors of each patient, but the prognosis for some patients remains poor. Therefore, a more detailed study of the relationship between the characteristics of a patient's cancer cells and the pathogenesis of the disease may lead to better criteria for treatment decisions and the development of new drugs. We plan to use TARGET datasets to study the characteristics of childhood leukemia cells in detail, including their genomes, in order to characterize them and discover new therapeutic strategies. Acute leukemia is the most common type of pediatric malignancy. The prognosis of this disease has been improved by risk stratified therapies characterized by prognostic factors such as specific molecular features and response to chemotherapy. However, despite improvements in risk stratified therapies, the prognosis for some subgroups remains poor. Our research aims to elucidate the molecular genetic basis of childhood lymphoblastic leukemia and to identify therapeutic target pathways and molecules. Based on recent insights into molecular genetics of leukemia, we are analyzing the effect of acquired cancer cell mutations and the underlying germline variants on the pathogenesis of pediatric ALL. The TARGET dataset and the acute leukemia cohort under analysis will be analyzed separately using our in-house analysis pipeline. The downloaded datasets will be stored and analyzed independently and will not create any new risks for TARGET participants. KELLER, CHARLES CHILDREN'S CANCER THERAPY DEVELOP/INST Cancer Registry for Familial and Sporadic Tumors (CuRe-FAST) Nov03, 2017 closed RHABDOMYOSARCOMA: We will use these whole genome and whole exome DNA tumor & normal sequencing datasets to mine for potential therapeutic targets in the childhood muscle cancer rhabdomyosarcoma. We have an IRB-approved protocol that allows us to collect DNA sequence data and correlate it back to the biology of the tumor. These data-mining activities will lead to experiments on cell lines that will further validate potential therapeutic targets. All of our data is on secure servers at our performance site (physically secure, and HIPAA-compliant data secure). WILMS TUMOR: We request controlled sequencing data from the TARGET initiative to serve as a critical dataset to merge genomic and transcriptomic data attained from TARGET, dbGAP, EGA, and internally generated sequencing projects with functional data generated from high-throughput drug screening experiments performed on cell lines. RHABDOMYOSARCOMA: We will use these whole genome and whole exome DNA tumor & normal sequencing datasets to mine for potential therapeutic targets in the childhood muscle cancer rhabdomyosarcoma. We have an IRB-approved protocol that allows us to collect DNA sequence data and correlate it back to the biology of the tumor. These data-mining activities will lead to experiments on cell lines that will further validate potential therapeutic targets. All of our data is on secure servers at our performance site (physically secure, and HIPAA-compliant data secure). WILMS TUMOR: We request controlled sequencing data from the TARGET initiative to serve as a critical dataset to perform integrative computational analysis of high-throughput genomic and functional datasets in several high-risk pediatric cancers. The goal of this integrative computational analysis is to merge genomic and transcriptomic data attained from TARGET, dbGAP, EGA, and internally generated sequencing projects with functional data generated from high-throughput drug screening experiments performed on cell lines and patient-derived primary tumor cultures from several pediatric cancers. Analysis will utilize FASTQ and BAM datasets, and will be performed using the Probabilistic Target Inhibitor Map (PTIM) modeling methodology (BMC bioinformatics. 2017;18(Suppl 4):116; Nat Med. 2015;21(6):555-559; Computational Biology and Bioinformatics, IEEE/ACM Transactions on. 2014;11(6):995-1008). Our current screening and sequencing efforts are focused on the following pediatric cancers: Wilms’ tumor, osteosarcoma, rhabdomyosarcoma, epithelioid sarcoma, hepatoblastoma, DIPG, and medulloblastoma. Wilms’ tumor sequencing data will be of great value to our current Functional Genomics of Wilms’ tumor study (http://www.cc-tdi.org/research), which is fully funded with grants from The Rally Foundation and Cure Childhood Cancer. Integrative computational analysis will be used to identify promising disease-focused monotherapies or combination therapies for each pediatric disease under study, with the end deliverable being sufficient pre-clinical data to accelerate new therapies into clinical trials. KELLER, CHARLES CHILDREN'S CANCER THERAPY DEVELOP/INST Expanding Massive drug screen for sarcomas to other pediatric cancers Jul13, 2018 closed Soft tissue sarcomas are among the top five causes of death from childhood cancer. Despite 40 years of cooperative group trials of intensified chemotherapy, the dismal survival for metastatic disease remains unchanged for rhabdomyosarcoma (RMS) and non-rhabdomyosarcoma soft tissue sarcoma (NRSTS - also called UPS, which is the extra dataset requested). The focus of this academic-pharma partnership project is to reverse the trend that only 5 drugs intentionally developed for childhood cancer have earned FDA approval since 1978. We seek not only to identify new compounds that will eventually address the unmet clinical need for RMS and NRSTS, but to also identify future drugs that treat other kinds of childhood cancers. Only 5 drugs intentionally developed for childhood cancer have earned FDA approval since 1978. We hope to change that pattern now. Soft tissue sarco8as are among the top 5 causes of death from childhood cancer. In an academic-pharma partnership to address this unmet clinical need, we have already completed a 640,000 compounds for ARMS, ERMS and UPS-- and have in hand results to 446 confirmed, high priority hits. To extend the impact of these findings to a broader range of high mortality pediatric cancer, we propose: (Aim 1) to validate 446 compounds in human-derived rhabdomyosarcoma cell lines and cell cultures in vitro, and (Aim 2) to test for other high mortality pediatric cancers the 25 best RMS compounds in human-derived cell lines and cell cultures of Ewing’s sarcoma, DIPG, neuroblastoma and osteosarcoma. Machine learning as well as in vitro and in vivo testing will be employed. From these studies, we expect 10 compounds to emerge that we will take to lead optimization, thus setting a new precendent for “childhood cancer first” pharma-nonprofit collaborations. Kemmeren, Patrick PRINSES MAXIMA VOOR KINDERONCOLOGIE, BV Detection of cooperative and mutually exclusive genetic alterations in pediatric cancer Feb05, 2018 approved Childhood cancer is the number one cause of disease-related deaths in children in industrialized countries. Cancers arise and progress by acquiring combinations of mutations in the genome. Genetic interactions are specific combinations of mutations in gene pairs that have an unexpected effect, which usually is sign of a functional link between these genes. In this project, we will study genetic interactions in childhood cancer and their relationship with different cancer types. Our goal is to gain more insight in the underlying mechanisms of childhood cancer development and progression, as well as to propose more effective drug treatments. We aim to investigate the role of genetic interactions and their contribution to different pediatric cancers. In particular, we focus on cooperative and mutually exclusive interactions between genes, pathways and processes altered in pediatric cancers as exhibited in cancer genomes of patients. To this end we would like to use the TARGET data set to perform a genetic interaction test. We will first create a binary sample-gene matrix in which we can record for each gene and each sample whether the sample has one or more somatic alterations in that gene. Then, for each pair of genes we count the number of samples that have alterations in both genes (co-occurrence) and the number of samples that have alterations in only one of these genes (mutual exclusivity). By performing permutations, we assess whether the observed counts are significant. We will perform the test per cancer type and over all cancer types. We will further extend this gene level test to the pathway level. In a follow up study, frequently occurring or absent combinations will be selected and studied in cellular models for validation and unravelling underlying mechanisms. We are currently performing a similar analysis on another pediatric cancer data set in collaboration with Stephan Pfister from the Deutsches Krebsforschungszentrum (dkfz, German Cancer Research Center) in Heidelberg, Germany. The samples in this dkfz data set cover cancer types that are mostly complementary to the TARGET data set. The combined data set will thus provide us with a great opportunity to study genetic interactions in child cancer covering a broad set of cancer types. We are considering performing a test for genetic interactions on a combination of both data sets, to increase the power of the test and to find pairs of genes that show interactions at a pan-cancer level. No additional risk is anticipated for the participants. Kenichi, Yoshida NATIONAL CANCER CENTER Whole genome analysis of childhood cancer in Japan Jul03, 2024 approved Recent advances in DNA sequencing have deepened our understanding of human cancer genetics, unveiling a range of genetic and epigenetic (gene regulatory system independent of DNA sequence) changes that impact tumor development. However, there is still a lack of comprehensive genetic data on childhood cancer in Asian populations, including Japanese. To bridge this gap, we studied over 1,000 cases of pediatric cancer in Japan using technologies like whole genome sequencing, RNA sequencing, and DNA methylation analysis. By integrating our results with public data, our goal is to offer a comprehensive view of molecular genetics in pediatric cancer. Our study explores genetic changes such as single nucleotide variants, small insertions/deletions, and variations in gene structure and number. We are also investigating gene activation, epigenetic modifications of DNA, telomere length, and extrachromosomal DNA, all influencing cancer cell growth. This research aims to uncover new insights into cancer causes, how these elements interact, and potential cancer risks. The larger sample size from data integration is expected to enhance detection of rare genetic aberrations. Additionally, the study aims to highlight disparities in cancer genetics across different racial groups. Recent advances in high-throughput sequencing technologies have successively characterized molecular genetic abnormalities in human cancers and revealed that various genomic and epigenomic abnormalities drive tumor evolution. However, genomic profiles on childhood cancer in Asians, including Japanese, are still lacking. Therefore, we studied more than 1,000 Japanese pediatric cancers using whole-genome sequencing (WGS), RNA sequencing (RNA-seq) and DNA methylation analysis. By integrating our data with existing public data, we aim to provide a comprehensive understanding of the genomic, transcriptomic and epigenetic characteristics of pediatric cancer. The analysis process uses dataset including WES, WGS, RNA-seq, SNP array and DNA methylation data in dbGap to study SNVs, indels, structure variants, copy number alterations, gene expression and DNA methylation pattern. Sequencing data and primary analysis results will be used for downstream analyses, including the search for novel driver aberrations, mutation signatures, cancer predispositions, mutually exclusive relationships of genomic aberrations, telomere length, and ecDNA evaluation etc. These results will also be used to address differences in cancer genomics/epigenomics among racial groups. However, this comparison will not be based on genomic summary results, nor will we publish genomic summary results. No request will be made for the identification of participants. We fully intend to share our results widely with the scientific community by publishing our findings. Kentsis, Alex SLOAN-KETTERING INST CAN RESEARCH Oncogenic alterations in non-coding regulatory elements in pediatric leukemia Nov04, 2021 closed Acute leukemia is the most common form of childhood cancer. Despite of substantial effort, the prognosis of patients with pediatric acute leukemia remains unsatisfactory. Mutations in protein-coding genomic regions have been well studied as drivers for leukemogenesis so far, and new therapies targeting for mutated oncogenic proteins have been developed. However, it remains elusive how genomic alterations in non-coding regions contribute to leukemia pathogenesis. The overall goal of this project is to identify oncogenic alterations in non-coding regulatory elements in pediatric leukemia, elucidate their underlying mechanisms for leukemogenesis in childhood. We would propose to use whole genome and whole exome sequencing dada from TARGET-AML WGS and WXS for further investigation. Our study will provide new insights into leukemogenesis in childhood and contribute to developing novel therapeutic strategies for pediatric leukemia. Mutations in protein-coding genomic regions have been well studied as drivers for leukemia so far, and new therapies targeting mutated oncogenic proteins have been developed. However, it remains elusive how genomic alterations in non-coding regions contribute to leukemia pathogenesis. The overall goal of this project is to identify oncogenic alterations in non-coding regulatory elements in pediatric leukemia, especially focusing on transcription factor binding sites, and elucidate their underlying mechanisms for leukemogenesis. Acute leukemia is the most common form of childhood cancer, and comprehensive studies characterizing pediatric leukemias have revealed that the gene mutation signatures in pediatric leukemias are distinct from adult ones. Therefore, in this project, we would focus on pediatric leukemia and investigate the significance of genomic alterations in non-coding regions in childhood. Our study will provide new insights into leukemogenesis in childhood and contribute to developing novel therapeutic strategies for pediatric leukemia. In this study, first, we will nominate candidate genomic variants in non-coding regulatory elements which are identified in our pediatric leukemia patient samples. In preliminary studies, we have found several somatic regulatory mutations dysregulating oncogenic transcription factor expression in a subset of childhood AML so far. As the next step, we plan to validate these candidates in larger cohorts. Since whole genomic sequencing or whole exome sequencing data are needed for our analysis, the data resources are very limited. Therefore, validation and detailed investigation of this mechanism requires access to TARGET-AML WGS and WXS data, and it is the basis for its request. We will only use pediatric leukemia data for analysis in this study. As a contributor to the TARGET-AML project (McNeer at al, Kentsis. Genetic mechanisms of primary chemotherapy resistance in pediatric acute myeloid leukemia. https://doi.org/10.1038/s41375-019-0402-3), I am intimately familiar with these data and relevance to childhood leukemias. Kentsis, Alex SLOAN-KETTERING INST CAN RESEARCH Genomic plasticity in pediatric cancers Dec19, 2013 closed Extensive sequencing of human genes has revealed a dearth of mutations in many childhood cancers. Most known human genetic variants reside in non-coding portions of the human genome, but the structure and function of this prevalent genetic domain remains largely unknown. This proposal will identify, map, and determine the functions of endogenous DNA transposons in childhood cancers, leading in turn to improved treatments. The discovery of DNA transposition 60 years ago in the form of “jumping genes” established its importance for phenotypic plasticity in plants, as well as stress responses and evolution of numerous organisms. Nearly half of the human genome originates from evolutionary conserved mobile DNA elements, but their contribution to human biology and disease remains almost completely unexplored. Indeed, for many human diseases with a genetic basis, and for childhood cancers in particular, extensive sequencing of coding DNA has revealed a surprising dearth of gene mutations. This suggests the presence of alternative genetic mechanisms whose investigation could reveal fundamental aspects of human biology and novel targets for the development of improved therapies. The premise of this study is that endogenous DNA transposition contributes to human biology and disease, as based on the recent discovery of active endogenous human DNA transposition in childhood embryonal tumors. Based on these preliminary studies, we have developed a novel hybrid read mapping (HRM) algorithm for mapping DNA transposition in human genomes. Here, we will map DNA transposition in pediatric cancer genomes, and identify genomic targets of aberrant DNA transposition. These insights are expected to catalyze the discovery of molecular drivers of childhood tumors, leading in turn to improved targeted therapies. Kerl, Kornelius UNIVERSITY OF MUENSTER Epigenetic alterations in rhabdoid tumors Feb12, 2020 closed Rhabdoid Tumors are malignancies which often show resistance to therapy. These Tumors are localized in or outside the brain. Rhabdoid tumors outside the brain have a hypomethylated DNA, meaning that only at a few sides methyl groups are bound to the DNA. These reversible changes of the DNA are called epigenetic changes. These epigenetic changes on the DNA might be one reason of rhabdoid tumor initiation and therapy resistance. In this project we will elucidate the changes of the DNA methylation of rhabdoid tumors in comparison to the cells of origin. With these analyzes we will evaluate if these DNA methylation changes influence the expression of genes which are known to drive tumor progression. Epigenetic changes are in principle reversible. In preclinical experiments we have shown that epigenetic drugs are effective for rhabdoid tumor treatment. Therefor we will evaluate if these epigenetic drugs reverse the DNA methylation changes which rhabdoid tumors harbour. In summary, this project aims to implement new target directed therapies for rhabdoid tumor patients by using epigenetic drugs. Extracranial rhabdoid tumors have a hypomethylated DNA. We used scRNA Seq to define the tumor cell heterogeneity of eRT and to identify the cell of origin of these tumors. In this project we would like to compare the DNA methylation pattern of eRT with the global DNA-methylation of the potential cell of origin. With this analyzes we would like to identify CpG islands which are methylated or demethylated in both eRT and the cell of origin; and to identify CpG islands which are methylated in eRT and demethylated in the potential cell of origin and vice versa. We will analyze by combining 1. DNA-methylation data of eRT and the cell of origin, and 2. gene Expression data of both tissue types, which input changes in the DNA methylation have on gene expression. We will analyze if genes associated with tumor initiation and tumor progression are expressed in rhabdoid tumors (in comparison to the cell of origin). As rhabdoid tumors are often resistant to chemotherapy, alternative treatment approaches are needed. Epigenetic drugs are currently tested preclinically as well as in clinical studies (e. g. HDAC inhibitors, EZH2 inhibitors, DNA methyltransferase inhibitors). My own group has generated preclinical data, that epigenetic drugs (DNA methyltransferase inhibitors and HDAC inhibitors) might be promising target directed therapies for children with rhabdoid tumors. Therefor we will analyze in the second part of this projects if changes of the DNA methylom of RT (in comparison to the cell of origin) might be reversed by using HDAC Inhibitors and DNA methyltransferase inhibitors. Kerlavage, Anthony Robert NIH Cancer Genomic Cloud Resources Evaluation Feb16, 2018 closed The project aims to develop and provide new models for data analysis by collating data and compute resources and enable everyone to derive value from the investment made by NCI genomics research programs, including those who do not have the required infrastructure for large-scale data storage and compute. I serve as the Director for the Center for Biomedical Informatics and Information Technology. The Cancer Cloud Resources are critical infrastructure for the cancer research community and I am requesting access to the relevant dataset(s) as I have administrative oversight for the project. There will not be any scientific use of the data and any data access will be solely for the purpose of evaluating the fitness of the Cloud Resource systems. Kesserwan, Chimene ST. JUDE CHILDREN'S RESEARCH HOSPITAL Genomic Investigations of Childhood Cancer Predisposition Syndromes Jun19, 2018 closed The increased use of genetic testing both in the research setting and in the clinic has improved our understanding of the genetic causes of hereditary cancer and has allowed the discovery of new cancer predisposition genes. Yet, there are many hereditary cancers for which we do not know or understand the underlying genetic causes. In addition, we are learning that patients with known cancer syndromes may develop cancer types that were not originally thought to be part of the syndrome. We are interested in developing a better understanding of the genetic causes of childhood cancer. To accomplish this goal, we are using a type of comprehensive genetic testing known as whole genome sequencing to look at the DNA from children with cancer. We would like to validate our findings by examining additional samples from TARGET database. The use of high throughput next generation sequencing (NGS) approaches is enhancing our understanding of hereditary predisposition to cancer and has led to the identification of several novel cancer predisposition genes. The primary objective of our research project is to identify new cancer predisposing genes and/or genetic variants, and to characterize the phenotypic spectrum of children with cancer who are carriers of germline mutations in known cancer predisposition genes. We have a cohort of pediatric oncology for whom we have collected clinical and germline genomic information (whole genome sequencing; WGS). The genomic data for this cohort has been analyzed using established and validated pipelines developed by the Department of Computational Biology group at St. Jude. Our primary goal for the use of TARGET data is to validate our findings using data from this cohort. Having access to the TARGET data will increase the statistical power of our research findings. Data will be maintained on a secure server and there will be no attempt to identify or combine data from different sources at an individual level and therefore no increased risk to study participants. The proposed research will use TARGET data only for pediatric cancer research. We currently have no plans to collaborate with researchers from outside St. Jude. Khan, Javed NIH Identification of Immune Targets in Pediatric Cancers Feb05, 2018 rejected We will use exome and RNAseq data to identify somatic mutated and over-expressed genes that may be targeted for immune therapy Objectives 1. identify expressed somatic variants that may be immune targets in pediatric cancers. 2. Identify deferentially expressed cell surface proteins that may be targeted by antibody or CART cell therapy. Methods 1. Call variants in the data using GATK (Broad Institute) from exome tumor normal and RNAseq data. 2. Use a a rare minor allele fraction (MAF) filter using ExAC non TCGA data MAF<0.0001). 3. Predict neoantigens affinity using seq2HLA, HLAMiner, and NetMHC. 4. Calculate TPM expression values. 5. Determine deferentially expressed cell surface proteins from a predicted pool of 6414 genes compared with normal organ expression (in house data). Khan, Javed NIH Integration for TARGET Genomics Data Mar15, 2012 approved We will identify genes that are amplified or lost (by SNP array and exome sequencing), overexpressed (by microarray and transcript sequencing). We will identify somatic changes by subtracting the germ line from the tumor mutations. We will determine which of these are also expressed (by RNA seq). We will identify novel fusion genes (by whole genome and transcriptome analysis). We will determine if these aberrations are associated with a genotype (e.g MYCN amplification) or phenotype (e.g stage or survival length). We will mine TCGA data to determine if the pediatric cancer mutations are present in the TCGA data. We will compare expression profiles of adult with pediatric cancers. We will develop analysis pipleines based on the data we download The goal is to find genes that may cause or drive the cancers. Integration for TARGET Genomics Data Research Objective Our lab is one of the collaborators on the TARGET project for neuroblastoma. We have performed RNA sequencing of samples that have had whole genome and transcriptome sequencing. Objectives 1. Identify somatic and germ line mutations occurring in the exome and whole genome 2. Compare mutations identified in our RNAseq data with those found in whole exopme data to determine if the variant is expressed. 3. Identify chromosomal fusions and rearrangements in the whole genome data. Study design and analysis plan 1. Whole Exome data, take BAM files, realign germline and tumor pairs with GATK. Call somatic INDELS and mutations. Compare mutation found in tumor exome with the RNA seq data in our lab for the same samples to determine if the mutation is expressed. 2. Perform bioinformatic functional analysis on the somatic mutations that are expressed including Polyphen 2, SIFT and Annovar. 3. Whole genome use Pindel, and Breakdancer to detect large indels and fusions 4. Correlate genomic finding with genotype (MYCN amplification and stage) and survival length. 5. Compare mutations identified in Pediatric with adult cancers 6. Develop analysis pipelines The data will only be accessed by me and the individuals named in the application. The data will be kept in a password protected server within the NCI/NIH firewall. Khan, Javed NIH Pediatric Genomic Analysis_TARGET Mar15, 2012 closed The primary purpose of the research is to learn more about the germ line and somatic changes in the genome of pediatric cancers. We want to find genes that may predispose to cancer and identify mutations in the cancer that may represent oncogenes or tumor suppressor genes. Ultimately we hope to identify new biomarkers and targets for therapy. Objectives 1.Identify germ line and somatic mutations occurring in pediatric cancers 2.Identify copy number alterations in pediatric cancers 3.Identify chromosomal fusions and rearrangements in pediatric cancers. 4. Identify SNP alterations in DNA 5. Investigate allele specific expression in RNAseq We will determine changes in tumor compared with normal that may provide new treatments for therapy or act as biomarkers. Khew-Goodall, Yeesim UNIVERSITY OF SOUTH AUSTRALIA microRNAs and their regulons for improved stratification and treatment of neuroblastomas Aug15, 2024 approved Neuroblastoma is a cancer of early childhood, responsible for the most cancer deaths in children under the age of 5 years. Neuroblastomas are classified as low-, intermediate- or high-risk, depending on several clinical features. This assignment of risk is imperfect but is important because it determines the severity of treatment the patient receives. High risk patients are treated with surgery followed by chemotherapy and radiation therapy. Due to the young age of these patients, even those that recover can suffer subsequent life-long disabilities and a predisposition to other cancers caused by the treatment. Improved risk stratification could spare some children treatments that they do not need, while identifying those with the most to gain from intensified treatment. Additionally, it is hoped that better molecular understanding will lead to new and less toxic treatment options for children in the future. We are using a novel approach to discover potential biomarkers for improved risk stratification and to discover potential molecular targets for new therapies, that employs a combination of laboratory studies on the type of cell that gives rise to neuroblastoma, along with examination of patient tumours using high throughput RNA sequencing to profile the tumours. Neuroblastoma is a heterogenous disease, meaning there are diverse forms of the disease that can have different combinations of molecular drivers, many of which remain to be discovered. In paediatric cancers, incomplete differentiation can underlie cancer development. We are focussing on miRNAs and their downstream targets because these are intrinsically regulators of networks capable of re-programming cellular phenotypes and thereby provide an opportunity to identify biomarkers and therapeutic targets that are part of a network. Our approach is to combine in vitro analysis of drivers of neuroblast differentiation based on an iPSC differentiation model we have established (using normal iPSCs, not tumour-derived), along with patient tumour data, to pinpoint the highest likelihood tumour drivers. We can then verify the candidate differentiation drivers by manipulating their levels in the differentiation model. To set up this strategy we have implemented an in vitro model of neuroblast differentiation that proceeds in four steps, from stem cell, to neural crest cell, to neuroblast, to mature sympathetic neuron. We have performed single cell sequencing and bulk mRNA and miRNA sequencing on replicate time courses, showing the procedure is robust and highly reproducible. This has provided a wealth of expression data for regulatory network analysis that we are using in combination with clinical data to predict drivers of neuroblast survival and differentiation that may be relevant to some neuroblastomas. Incorporating the in vitro data with matched miRNA and mRNA sequence data from 96 tumours from collaborator Stefan Hüttelmaier (Halle, Germany), has produced a number of candidate miRNAs and their regulons as likely differentiation drivers that are dysregulated in a proportion of neuroblastomas. We wish to use the Target matched miRNA and mRNA data sets to provide an independent cohort to reinforce the evidence of that they can contribute to driving tumour progression in a proportion of neuroblastoma cases. As all data are de-identified this will not incur an increased risk to participants. The data will only be used for research purposes. All analysis of the requested dataset will be performed in Adelaide at the Centre for Cancer Biology, University of South Australia. KHURANA, EKTA WEILL MEDICAL COLL OF CORNELL UNIV Analysis of non-coding rearrangements and mutations that drive the development of childhood cancers Mar26, 2018 rejected Cancers arise as a consequence of genomic changes in important genes. However, some times we do not see changes in any of the currently known genes associated to cancers. In fact, most genomic changes are located outside of gene regions. These genomic regions that are not part of a gene have regulatory functions and changes that impact these locations could be also related to the cancer. This study aims to detect those regulatory regions that are important in different childhood cancers. The approach will take into account different types of changes, small changes (mutations) and big rearrangements that change structure of the genome. The goal is to find if different patients have the same regulatory region impacted by any changes and thus determine the importance of these genomic regions in childhood cancers. Genomic driver alterations aid the process of oncogenesis by transforming normal cells to a malignant state. Among the thousands of sequence variants present in a cancer genome, presumably only a few act as drivers. Identification of specific drivers acting in each cancer patient is the basis of targeted therapy in precision medicine. The complexity of this disease has shown cases of patients that do not have any of the currently known cancer drivers in their genome. In addition, unfortunately not all the known cancer drivers currently have an actionable therapy. Therefore, there is a critical need to complement our understanding and characterization of oncogenesis through identifying cancer drivers in the non-coding regulatory genome. It's especially important in currently untreatable childhood cancers. TARGET project allows us to apply a comprehensive whole-genome sequencing (WGS) approach to discover new molecular targets that drive childhood cancers and further translate those findings into the clinic. The objective of this proposal is to identify cancer drivers using WGS, by analyzing the combined impact of single nucleotide variants (SNVs), structural variants (SVs) and copy-number variants (CNVs) in the non-coding genome. The hypothesis is that mutations/rearrangements in regulatory regions could act as non-coding drivers with a complementary/alternative path that can explain oncogenesis. The currently known drivers of childhood cancers include genomic alterations such as gene fusions, extreme copy number changes or highly deleterious mutations in coding sequence of driver genes. Non-coding mutations have the potential to deregulate gene expression (for example, the mutations in the promoter of the TERT gene). Thus our proposal will incorporate functional impact score of SNVs, as well as the impact of the SVs/CNVs, with the recurrence of these alterations to identify regulatory drivers. This will lead to identification of new drivers, which can serve as potential therapeutic targets. Kim, Chang QUEEN'S UNIVERSITY BELFAST The landscape of immunogenic feature of pediatric tumors Feb12, 2020 closed We are working to better understand the immunogenic characteristics of pediatric tumors. This project is to construct the landscape of neoantigen, HLA somatic variations (mutations and copy number variations), and CD8+ TILs for pediatric tumors. By studying immunogenetic features of pediatric tumors, we hope to harness these genetic features to improve the patient stratification of pediatric cancers for immunotherapy. The low mutation burden tumors, which include many pediatric tumors, have been considered as poorly immunogenic. However, one of the recent studies reported that pediatric patients with acute lymphoblastic leukemia (ALL) have tumor-associated neoepitope-specific CD8+ T cells, responding to a significant number of tested neoantigens (Zamora et al., Sci Transl Med, 26 Jun 2019). This study suggests that pediatric cancers may be amenable to immunotherapies aimed at enhancing immune recognition of tumour-specific neoantigens. To fully validate their findings and further address additional questions, datasets with larger patient cohorts and across multiple institutions would be needed. The objective of our study is to construct the landscape of neoantigen, HLA somatic variations (mutations and copy number variations), and CD8+ tumor-infiltrating lymphocytes (TIL) for pediatric tumors, in which the state-of-the-art bioinformatics tools are fully utilized. The result of our study will be compared to these solid tumors to explore the immunogenic characteristics of pediatric tumors. For our study, we are requesting the following datasets (phs000463.v19.p8, phs000464.v19.p8, phs000465.v19.p8). We intend to add other relevant studies (including many other types of tumors ) available via dbGaP. All research findings generated from the requested dbGaP datasets will be broadly disseminated to the scientific community through publication and presentation at relevant conferences. Kim, Hoon SUNGKYUNKWAN UNIVERSITY RES/BUS FDN Exploring the role of focal gene amplifications and their clinical implication in pediatric cancer Feb29, 2024 approved Cells may grow more quickly when a specific subset of genes called oncogenes are activated. Normally, each gene is present in two DNA copies. Oncogenes may become activated when their DNA copies are duplicated. In some instances, hundreds of oncogene DNA copies have been detected in cancer cells. This process is called amplification. In our project, we want to explore which genes are commonly amplified. We also want to evaluate different mechanisms through which amplification can be achieved, which may be linear amplification, complex amplification, breakage fusion bridges, or extrachromosomal amplification. Finally, we would like to study how oncogene amplification relates to patient response to therapy. The investigation of molecular features, encompassing RNA expression, mutations, and structural variations, plays a pivotal role in elucidating the prognosis and treatment response of cancer patients. Among these features, the amplification of oncogenes emerges as a critical hallmark, exhibiting variability across distinct cancer types. Noteworthy instances include ERBB2 amplification in breast and gastric cancers and EGFR amplification in glioblastoma and lung cancers. Gene amplification in tumors involves diverse mechanisms, including extrachromosomal DNA (ecDNA), breakage-fusion-bridge cycles, and complex/linear amplification. Notably, ecDNA often harbors cancer-promoting oncogenes, derived from chromosomal DNA sequences, leading to heightened oncogene expression (Turner et al., 2017; Wu et al., 2019, Nature). Unlike canonical chromosomes, ecDNAs segregate unevenly during cell division, accumulating at high copy numbers within individual tumor cells. This unequal distribution contributes to intratumoral heterogeneity, providing selective growth advantages and fostering resistance to cancer treatments (deCarvalho, Kim et al., 2018; Kim et al., 2020, Nat Genet). In the context of pediatric cancers, clinical divergence between brain tumors in adults and children emerges, particularly in the case of pediatric tumor types, mainly glial and neuronal, which display heightened sensitivity to adjuvant irradiation and chemotherapy compared to their adult counterparts (Merchant TE et al., 2010, Semin Radiat Oncol). Moreover, molecular studies of pediatric brain tumors reveal a substantial deviation from adult tumors, constituting a distinct category of neoplasms characterized by unique genomic and molecular features (Comitani, F et al., 2023, Nat Med). Despite the efforts of studies to describe distinctive features in pediatric cancer, the occurrence of focal amplifications, its underlying mechanism, and clinical impact have not been clearly described. Leveraging the whole genome sequencing (WGS) , whole exome sequencing(WXS), and RNA sequencing (RNA-seq) data sets from TARGET dataset, our research initiative aims to assess and compare diverse mechanisms of focal amplification in pediatric cancer. Our focus encompasses the detailed exploration of various amplification mechanisms to ascertain their specific impacts on patient survival and treatment response. By enhancing our understanding of these intricate processes, we seek to contribute valuable insights that can potentially inform targeted therapeutic approaches for pediatric cancers. Kim, Jong-Won SAMSUNG MEDICAL CENTER Comparative Study of Signaling Pathways in Pediatric and Adult Cancers Aug01, 2024 approved Pediatric cancers often differ from adult cancers, being mostly non-carcinomas like sarcomas. This study aims to compare the signaling pathways in pediatric non-carcinomas and contrast these with adult cancers to understand their unique biological features. Pediatric cancers are predominantly non-carcinomas (such as sarcomas), which exhibit different pathological characteristics compared to carcinomas typically found in adults. Understanding these differences is crucial for the development of effective treatment strategies. Therefore, investigating the differences in signaling pathways between pediatric non-carcinomas and adult cancers, is essential to uncover the unique biological features of pediatric cancers. We will utilize the TARGET pediatric cancer dataset to gather genomic and transcriptomic data of pediatric cancer patients. The collected data will be analyzed to compare signaling pathways between non-carcinomas and adult carcinomas using bioinformatics techniques such as gene expression profiling, gene network analysis, and pathway analysis. This study will provide new insights into the signaling pathways of pediatric cancers, highlighting the distinct biological characteristics of these cancers. All of TARGET data will be used only for academic researches without any profit or commercial purposes, and we will not share these data with any other parties. We also agree the claim of Data-Use-Certification (DUC). Kim, Pora UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON Gene fusion study in pediatric osteosarcoma Nov14, 2018 closed Chromosomal instability is the key feature of cancer. Chromosomal rearrangements initiated by the DNA double helix breakage often make driver fusion genes, which initiate and promote cancer cells. Ewing sarcoma is one of the non-soft tissue sarcoma. This tumor is defined by the famous fusion gene including EWSR1. In this time, we will analyze the RNA-seq data of osteosarcoma patients and will predict fusion genes. From the identified fusion genes by Sanger sequencing and RT-PCR, we will perform the functional annotation using in-house tools and select candidate driver fusion genes. From these fusion genes, we will investigate the molecular mechanisms how these fusion genes are involved in the tumorigenic pathways. Sarcoma is connective tissue cancers. About 15 % of children (< 20 yrs) are diagnosed with sarcoma. Among these, the bone sarcoma (osteosarcoma) is the second major type of sarcoma including Ewing’s sarcoma and osteosarcoma. So far, there is no representative fusion genes reported in the osteosarcoma patients. In our study, we will investigate the molecular mechanisms by which oncogenic fusion genes and their effects on tumorigenesis including initiation, progression, and metastasis. For better understanding and defining the pediatric sarcoma patients, we will firstly, predict fusion genes from osteosarcoma samples and identify them and lastly, we will study deeply for their tumorigenic mechanisms to the downstream pathways. To do this, we will use in-house fusion gene prediction tool and the tools for the functional annotation of fusion genes to find driver fusion genes. Furthermore, we will investigate the epigenetic effects by fusion genes. Since pediatric bone tumors usually are described as the fusion gene driven cancer, we think our hypothesis would bring some critical aspect of the relationship between fusion genes and tumorigenesis. Kirsch, David DUKE UNIVERSITY Evaluating the effect of repetitive elements enhancer function on oncogene expression in pediatric and adult cancers. Feb20, 2019 closed The purpose of this project is to identify causes of cancers, especially in children. We are specifically studying whether the "junk" DNA located in the human genome can contribute to the expression of cancer causing genes. To do this we are using multiple programming techniques to re-use sequencing data to more closely examine junk DNA and whether it may act as an "on" switch for cancer causing genes in certain circumstances. In this project we are dissecting the role of repetitive element enhancer function in oncogene expression in pediatric cancers. We have selected retinoblastoma, ewings sarcoma (2 datasets), and Neuroblastoma studies for this analysis. Kiss, Andor MIAMI UNIVERSITY OXFORD Re-analysis of TARGET High Throughput Data for the Identification of Biomarker(s) Indicators of Osteosarcoma Replase Nov28, 2018 closed We are interested in re-analyzing the TARGET "high-throughput" datasets (whole genome, whole exsome, RNA-Seq, microarray, methylation) to look for potential biomarkers of relapse in osteosarcoma. We will use newly developed methodologies and computer programmes developed subsequent to the initial data analysis to look for biomarkers. We will also use the latest and most recent updated annotation of the human genome to examine any genomic abberations that can be used to predict severity of osteosarcoma and potential relapse of the disease. We wish to re-analyze the unprocessed NGS sequencing data from the TARGET database study dealing with pediatric osteosarcoma to look for possible biomarkers indicative of a potential disease relapse. Although the TARGET OS study data has been previously analysed and is publicly available, to our knowledge it has not been re-analyzed using newer software methods that account for influences of repeat sequences, read-mapped to the latest and most complete human genome annotations. Additionally, recent improvements in the mapping of RNA-Seq data has permitted much better correlation between RNA-Seq based differential gene expression (DGE) and other approaches to gene expression levels (i.e. qPCR), and whole pathway analysis. Part of the improvements have been better mapping algorithms that what the original software in the TARGET (RNA-Seq study) used to preform the analysis, part of this is much improved and better annotation of the human genome, part of it is improved statistical tests for low-replicate, high-count datasets. We will use WGS, WES, RNA-Seq and microarray data to attempt to identify key genes, and expression changes in those genes that may become targets for pediatric therapeutics, either as early warning biomarkers for particular osteosarcoma variants, or as targets for specific therapeutics. We are particularly interested in expansion of the c-myc gene in tumours of patients that potentially experience relapse. We will consider several criteria in the investigation of the possible c-myc expansion: (1) age of onset, (2) metastatic vs non-metastatic, (3) relapse, and (4) sex. We will use in-house developed software specifically written to identify chromosomal abnormalities as well as re-analysis using adaptive statistical methods more suitable for RNA-Seq DGE analyses (e.g. bROC). Thus, this work will represent a major advance in our understanding of this disease’s genomic pathology, the potential for an efficient biomarker its progression, and its potential treatment. Klco, Jeffery ST. JUDE CHILDREN'S RESEARCH HOSPITAL Genomic Analysis of Pediatric Acute Leukemias Sep09, 2020 approved Acute myeloid leukemia is a malignancy of the blood that carries a poor prognosis in children. We seek to better understand the changes in the DNA and/or RNA that occur during the development of AML with a goal to better identify high risk subgroups and to potentially identify new targets for therapy. We are interested in the genomic alterations that lead to pediatric acute myeloid leukemia. Our ongoing studies are using a comprehensive multi-omic approach to define the spectrum of alterations from whole genome sequencing or whole exome sequencing (including SNVs, indels, SVs) and RNA-seq (for expression profiles and chimeric fusion detection) in both diagnostic and relapse samples. This will be correlated with methylation and chromatin studies. This study is largely focused on a single institution and we are requesting all available next generation sequencing data for pediatric AML cases sequenced as part of the TARGET project. These data will be used to supplement our internal St Jude cohort with additional cases and serve to validate our findings. Considering the occasional lineage ambiguity in acute leukemias, we are also requesting access to similar data from ALL. These data will be likely be combined with our AML data to supplement our findings and to establish a robust collection of pediatric AML genomic data. This combination will not represent any additional risk to the participants. While our work is focused on pediatric acute leukemias, we are also requesting access to TCGA (phs000178)and BeatAML (phs001657) datasets to better understand the differences in AML types between children and adults. However, we will not combine these pediatric and adult datasets. Klee, Eric MAYO CLINIC ROCHESTER Wilms Tumor Profiling Aug21, 2014 closed Wilms tumors can have characteristics of both good and bad outcome. In this instance we have a Wilms tumor with all the characteristics of a good outcome that resulted in a poor outcome. We have profiled the molecular state of this tumor and would like to compare it to the existing data on a good outcome tumor that resulted in good outcome to determine the differences. We have sequenced a Wilms tumor sample of similar histology but adverse outcome. We would like to perform a comparative analysis between this Wilms tumor sample and the one we have generated. Both samples are RNAseq, which will allow us to look at expression level and pathway level differences. The results will be used to characterize potential hypotheses for why these two tumors of similar type ended in such diverse outcome. KNIGHT, ROB UNIVERSITY OF CALIFORNIA, SAN DIEGO Testing software methods to identify microbial reads in tumor sequence datasets Apr23, 2021 approved Several reports have suggested that tumor sequencing data can be mined for microbial reads, yet the consequences of using various software methods to do this have not been benchmarked, and the algorithms that have been applied to date tend to be generic in nature rather than tailored to this specific purpose. Our goal is to test the consequences of using existing and newly developed algorithms and databases in terms of their ability to identify microbial reads and generate microbial assemblies. We will perform no human genetic work in this project although if the preliminary results from this project are promising we would apply for a subsequent project to examine links between host and/or tumor genotypes and the associated microbiome. However, it is premature to assume that this will be possible given that it is not clear whether a microbiome does, in general, exist or how it should best be extracted and characterized. Evidence linking microbiology and oncology dates four millennia ago (Sepich-Poore et al., 2021 Science). We recently showed that intratumoral microbial communities exist and are unique between cancer types among more than 30 treatment-naive adult cancers, and that this microbial information creates a new class of cancer diagnostics (Poore et al., 2020 Nature; Narunsky-Haziza et al., 2022 Cell). These conclusions have been independently confirmed and also found to have therapeutic implications (Nejman et al., 2020. Science). We are thus requesting access to three cancer datasets: TARGET, CPTAC, and TCGA (1) Justification for TARGET: To date, little has been done to characterize pediatric cancer microbiomes. The most relevant findings have come from studying leukemia in genetically-predisposed hosts, where researchers showed that bacterial translocation from the gut into systemic circulation is necessary to drive leukemic progression (Meisel et al., 2018 Nature). Importantly, leukemia failed to develop in genetically-predisposed, germ-free hosts, meaning that cancer prevention and treatment opportunities in pediatric cancers may manifest with a better understanding of the pediatric cancer microbiome. We thus seek to re-examine all pediatric whole genome and transcriptome sequencing samples in the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database for microbial DNA, using similar computational methods applied by us on TCGA (Poore et al., 2020 Nature; Narunsky-Haziza et al., 2022 Cell). We plan to compare pediatric and adult cancer microbiomes in related cancer types and hope to provide insight into novel, microbial-based cancer diagnostics and therapeutics while also providing a useful resource to the research community. (2) Justification for CPTAC: Recent evidence reveals how intracellular bacteria can stimulate the host immune system against tumor cells through microbial peptide presentation (Kalaora et al., 2021 Nature). These bacterial peptides were presented on MHC-I and MHC-II molecules on the surface of melanoma cells and intratumoral immune cells, and they comprise a novel class of non-human tumor antigens. Additionally, these bacterial antigens were able to (i) stimulate tumor-derived T cells to release inflammatory cytokines and (ii) were shared amongst multiple patients, making them ideal immunotherapy targets. Nonetheless, the bacterial peptide data published by Kalaora et al. derived from only 17 metastatic melanomas, and it remains unknown how prevalent bacterial peptides are in other cancer types. Since the median bacterial biomass of melanomas is less than most other cancer types (Nejman et al., 2020 Science), these bacterial peptide-mediated interactions may be even more frequent and/or potent in other cancer types. Thus, it is a key interest to characterize the cancer microbiome proteome among multiple cancers, and these results may guide new immunotherapy targets. The CPTAC data, consisting of paired whole genome and mass spectrometry proteomic data for multiple cancers, is an ideal resource to accomplish such analyses that link cancer microbiomes with microbial proteomes. Using similar methods as described in our own papers (Poore et al., 2020 Nature; Narunsky-Haziza et al., 2022 Cell), we intend to explore the whole genome data for tumor-specific microbial communities, followed by a targeted search for microbial peptides in the proteomic data. This will be repeated for all cancer types in CPTAC with subsequent experiment validation if the data are positive. We hope that our analyses can provide new immunotherapy targets from these data while also providing a resource to the research community for future multi-omic analyses. (3) Justification for TCGA: We have continued to re-analyze TCGA data for microbial information in pursuit of four research goals: (1) Extending all TCGA microbiome analyses to functional profiles of these microbial communities; (2) exploring the role of varying tumor hypoxia on the composition and function of microbial communities using TCGA human data alongside our microbial atlas; (3) comparing primary tumor microbiomes and metastatic cancer microbiomes using TCGA data in combination with other publicly available studies; and (4) performing de novo metagenome assembly using TCGA samples among others. Access to the raw, controlled access data is necessary for each of these projects. Knoechel, Birgit DANA-FARBER CANCER INST Characterization of neoantigen expression in pediatric T-ALL using RNA-seq May06, 2020 approved T-cell acute lymphoblastic leukemia (T-ALL) is an aggressive malignancy associated with lymphoblasts committed to the T-cell lineage that predominantly occurs in children and young adults. Although cure rates have improved with intensified multi-agent chemotherapy reaching 80% in children, relapsed or treatment refractory disease remains very difficult to treat. Therefore, there is a great medical need for identifying novel vulnerabilities and therapeutic approaches. Targeting of the immunosuppressive tumor microenvironment has led to promising results in many solid tumors, particularly those with high mutation rates that lead to neoantigen expression causing T-cell exhaustion. Pediatric tumors tend to have quiet genomes with few mutations and structural genomic aberrations, but recent evidence suggests that neoantigens can also occur because of nongenetic mechanisms, such as splicing defects. The RNA-seq data generated from the TARGET cohort will be used to investigate non-canonical transcripts in pediatric T-ALL which result in neomorphic peptides that are computationally predicted to represent high affinity neoantigens. The controlled-access data generated by the TARGET study will be used in our research to determine neomorphic proteins generated by non-canonical transcripts in pediatric leukemia. The raw fastq-files from mRNA-seq of TARGET study will be processed and aligned using bowtie2, followed by calculation of the number of retained introns using the KMA algorithm. We then extend the reading frame into the detected intronic regions and predict peptide sequences. In order to determine which of these peptides might be presented by MHC class I of the respective individual, we will identify patient specific HLA alleles using the seq2HLA algorithm. We will predict peptide-MHC binding affinity using the NetMHCpan algorithm. Koeffler, Harold CEDARS-SINAI MEDICAL CENTER Epigenetic dysregulation: a novel pathway of oncogenesis in human cancers Jul27, 2018 closed The genome consists of genes that code for the proteins as well as DNA elements that control those genes. Apart from well-documented alterations in DNA, an increasing body of evidence suggests that the non-DNA epigenome is inappropriately altered in most cancers. By comparing differential levels of DNA and histone modifications between tumoral and non-tumoral cells/tissues, we will start to identify non-DNA aberrations that associated with cancer development. We will associate the non-DNA aberrations with the abnormal expression of genes in cancer in order to identify abnormal genes that play important roles in cancer development and therapeutic responses. "Epigenetic regulators" are frequently altered in human cancers. A remarkably large number of studies suggest "epigenetic addiction" as a new mechanism of oncogenesis. The characterization of aberrant epigenetic state in cancer will provide insights into how abnormal activity/function of chromatin regulators could modify downstream epigenetic signatures to drive transformation, and how to develop effective therapeutic strategy to target the epigenetic addiction. We are planning to compare the levels of histone modifications, chromatin accessibility, chromatin conformation, DNA methylation, and gene expression among a variety of human tissue, cell lines, and cancers. We attempt to identify 1) novel cancer-associated genes, and 2) new epigenetic features/mechanisms that could distinguish the disease states and disease subtypes. To perform these analyses, we need to access from NIH dbGAP the genomic and epigenomic sequencing data, expression data of both normal and cancer tissues/cells, as well as, the clinical data associated with these samples. We will also verify these observations/hypotheses using our in-house generated epigenetic data. The information that we will obtain from these data will be incredibly helpful and insightful to identify new mechanisms of epigenetic dysregulation and novel oncogenes/tumor suppressor genes associated with development or progression of the cancer. In addition, the use of datasets with restricted pediatric disease (e.g. data from TARGET cohort) and age-related conditions (phs000610.v1.p1) will be limited to the study of pediatric cancers. Kohanbash, Gary UNIVERSITY OF PITTSBURGH AT PITTSBURGH LANDSCAPE OF TUMOR INFILTRATING T CELL REPERTOIRE OF PEDIATRIC BRAIN TUMORS Feb12, 2020 approved This proposal will look at the bodys mechanism to fight cancer between adults and kids with brain tumors and other cancers. We aim to use this information to identify new targets on the tumor cells, called antigens. If we identify new antigens, we will then try to develop therapies that stimulate the body to kill tumor cells with the antigen. Antigens identification to develop immunotherapies for adults and children MUST be performed separately as these are distinct molecular entities and many antigens will not be shared. The overall goal of this project is to identify immunotherapy antigens to facilitate the development of immunotherapies separately for adults and children with cancer. The primary usage of the TARGET database is to extract T-cell receptor complementarity-determining region 3 (CDR3) sequences and integrate that with ~2,600 samples sample from the childhood brain tumor network (CBTN) dataset in the Kids First Data portal, and our in house generated data. It has previously been shown that CDR3 homology is linked with shared antigen recognition, so we have been developing a pipeline to use transcriptomic data and CDR3 sequence homology to identify antigens and then computationally predict epitopes for further testing with a goal of developing vaccine and T-cell adoptive cell therapies for pediatric brain tumors. In the past 2 years we have had about ~1000 samples in the CBTN and could not find overlap of CDR3 sequences (CDR3 homology) between the datasets but now with more than double the amount of CBTN samples available, we will increase the possibility to identify overlapping CDR3 homology between childhood cancers, which can be a rare event. Part of this study will also be looking at the HLA-haplotypes and expression profiles to perform bioinformatic predictions. If successful, this will strengthen a pipeline that we have in development to identify antigens and their associated epitopes to develop immunotherapies. The primary usage of the TCGA data will be to identify antigens that are found in both adult and pediatric brain tumors, to better understand the immunologic landscape of adult brain tumors, and evaluate antigen processing. In this study, the CDR3 workflow could broadly be categorized into two sections: (1) Low-level processing that will include sequence assembly, extraction of CDR3 regions, and error correction, and (2) Secondary analysis that will include TCR diversity measurements, determining clonal abundance, and distribution, analysis of gene usage, and establish relationship between sequence and antigen specificity. Multiple pipelines will be employed to extract as many true CDR3 sequences as possible. Distribution of TCR gene usage, and T cell type abundance in tumor infiltrating T-cells will be inferred to understand immune regulations in brain tumor. CDR3 amino acid properties will be calculated using VDJ tools, and gene usage analysis will be performed using “tcR” package in R (Nazarov et al., 2015). Comparison between two or more sets of sequence from patients in terms of measuring differential abundance of TCRs, and differential V(D)J usage will be quantified using Shannon divergence index. Uniformity between the samples is important, total diversity analysis will be performed using the concept of “unseen model species” from Vegan package in R (Dixon, 2003). Clonotype per kilo reads (CPK) will be used as a metric for measuring clonotype diversity which is defined as the number of unique CDR3 calls in each sample normalized by the total read count in the TCR region. Although it is well known that CDR3ß sequence of a TCR is important to determine the antigen specificity of T-cells, it is still unclear as to which structural features of the CDR3 or other parts of the TCR is responsible for antigen recognition specificity of T-cells. From the identified T-cell repertoire, public TCRs will be excluded to focus only on non-public sequences that are cancer-specific. We will employ immune-Similarity Measurement by Aligning Receptors of T cells (iSMART) algorithm that has increased clustering specificity to identify antigen-specific CDR3 groups from non-public CDR3 sequences (Li et al., 2018). Features of CDR3 clusters will then be associated to tumor gene expression profiles. We plan to combine pediatric and adult data sets through the dbGAP, our institution, the Chinese Glioma Genome Atlas, and the CBTN. All data will be analyzed independently and then compared. This would create no additional risk to patients as all samples are deidentified. Koks, Sulev UNIVERSITY OF WESTERN AUSTRALIA Transposable elements in the pathophysiology of osteosarcoma Jun13, 2024 approved This project aims to identify new diagnostic and prognostic markers for the sarcoma and osteosarcoma samples. These new markers are based on the genomic variants called transposable elements (TE) that have been recently described in our previous studies to be involved in pediatric osteosarcoma. Described markers have a significant effect on the gene expression profiles and can predict the prognosis of the disease. The predictive value of TEs is caused by their ability to induce carcinogenic pathways and activate genetic networks leading to malignancies. Identification of TE elements with their respective transcriptional changes helps to improve our understanding of these malignancies and provides new opportunities for drug development. The goal of our study is to develop diagnostic and prognostic genomic markers for pediatric osteosarcoma. We have previously had similar projects, and the present study will apply recently developed bioinformatic tools to focus on specific transcriptomic markers. The project is based on our previous studies on pediatric osteosarcomas, where we described significant stratification of the osteosarcoma patients by their transcriptomic profile. To achieve our goal, we plan to perform complex comparative genomic analyses of different sarcoma samples to identify prognostic and therapeutically important profiles for these malignancies. Our analysis is based on the combination of the detection of transposable elements (TEs) and alternative splicing events in various sarcoma samples and comparison to the clinical outcomes. This analysis requires raw transcriptome (RNAseq) and raw whole-genome sequencing (WGS) data, "fastq" and "bam" files. WGS data will be used to call TE elements in the samples and this information will be used to stratify the RNAseq results for further analysis. Our preliminary data indicate that TE elements are known to induce alternative splicing events (intron retention, activation of UTR regions) that could cause the changes in transcriptome leading to carcinogenesis. Our previous studies have also revealed that pediatric osteosarcoma has several TE elements upregulated, but we were not able to analyse the changes in genomic DNA to identify potential alterations explaining these findings. The present project studies transcriptomic changes in the context of the presence or absence of TEs in the genomic DNA analyses their impact on clinical outcomes and develops new biomarkers for osteosarcoma and sarcomas. This study extends our published works on the different cohorts, and during the analysis, we plan to combine the dbGAP dataset with our datasets of pediatric osteosarcomas. The combination of different datasets will not cause any additional risks for participants as raw data will not be exposed anyhow. Koks, Sulev UNIVERSITY OF WESTERN AUSTRALIA Comparative transcriptomic profiling of osteosarcoma and sarcoma subtypes May27, 2020 approved This project aims to identify new diagnostic and prognostic markers for the sarcoma and osteosarcoma samples. These new markers are based on the genomic variants called transposable elements (TE) that have been recently described in our previous studies to be involved in pediatric osteosarcoma. Described markers have a significant effect on the gene expression profiles and can predict the prognosis of the disease. The predictive value of TEs is caused by their ability to induce carcinogenic pathways and activate genetic networks leading to malignancies. Identification of TE elements with their respective transcriptional changes helps to improve our understandings about these malignancies and provide new opportunities for drug development. The goal of our study is to develop diagnostic and prognostic genomic markers for pediatric sarcomas, including soft tissue sarcomas and osteosarcomas. The project is based on our previous studies on pediatric osteosarcomas, where we described significant stratification of the osteosarcoma patients by their transcriptomic profile. To achieve our goal, we plan to perform complex comparative genomic analysis of different sarcoma samples (soft-tissue sarcomas and osteosarcoma) to identify prognostic and therapeutically important profiles for these malignancies. Our analysis is based on the combination of the detection of the transposable elements (TEs) and alternative splicing events in a variety of sarcoma samples and comparison to the clinical outcomes. This analysis requires raw transcriptome (RNAseq) and raw whole-genome sequencing (WGS) data, "fastq" and "bam" files. WGS data will be used to call TE elements in the samples and this information will be used to stratify the RNAseq results for further analysis. Our preliminary data indicate that TE elements are known to induce alternative splicing events (intron retention, activation of UTR regions) that could cause the changes in transcriptome leading to carcinogenesis. Our previous studies have also revealed that pediatric osteosarcoma has several TE elements upregulated, but we were not able to analyse the changes in genomic DNA to identify potential alterations explaining these findings. The present project studies transcriptomic changes in the context of the presence or absence of TEs in the genomic DNA, analyses their impact on clinical outcomes and develops new biomarkers for osteosarcoma and sarcomas. This study extends our published works on the different cohort and during the analysis, we plan to combine the dbGAP dataset with our datasets of pediatric osteosarcomas. The combination of different datasets will not cause any additional risks for participants as raw data will not be exposed anyhow. Komurov, Kakajan CINCINNATI CHILDRENS HOSP MED CTR Comprehensive characterization of pediatric AML transcriptome Jun26, 2015 closed Pediatric cancers are different from adult cancers in that they are mainly driven by alterations in their epigenetic program. We have identified a major global phenotype in pediatric brain tumors that reprograms the tumor cell's epigenetic and gene expression program. In this proposal, we would like to undertake a comprehensive analysis of pediatric cancer transcriptomes to identify patterns of deregulated alternative cryptic transcription events. The success of this project will have important implications in our understanding of pediatric cancer drivers, and may suggest new therapeutic avenues. Our lab has been involved in the characterization of alternative transcription events in human cancers. We have recently found a major epigenetic phenotype in about 10% of adult cancers that results in a global alteration of alternative transcription events and a highly specific signaling phenotype. This phenotype, which we called Global Cryptic Transcription (GCT), affects patients' response to certain types of therapy. Given major role of epigenetic reprogramming in pediatric cancers, we hypothesized that this phenotype may play a greater role in pediatric cancers. Indeed, our preliminary analyses of pediatric brain tumor RNAseq datasets revealed that >90% of pediatric brainstem gliomas displayed GCT, although it was somewhat different from adult. Now with the availability of TARGET data, it will be possible to test if GCT plays a role in pediatric AML and other pediatric cancers. This analysis can only be performed with the TARGET data, as this is the only comprehensive genomic resource for pediatric cancers. Moreover, if our hypothesis is true, this will be a major breakthrough in pediatric cancer research, with important implications in the clinic. Later, we will extend this analysis to the pediatric cancer cell lines and xenograft models from TARGET to identify a possible experimental model of the pediatric GCT phenotype. I would also like to mention that for this project, we need access to the primary sequence data (BAM/FASTQ files for RNAseq) from TARGET, as we will be analyzing non-canonical alternative, mostly cryptic, expression patterns in pediatric AML and other cancers. Koster, Jan ACADEMICAL MEDICAL CENTER Analysis of Genes Frequently Affected in Relapses in Neuroblastoma Aug07, 2014 closed Neuroblastoma is a pediatric cancer which often has a lethal outcome, especially for patients diagnosed with a high stage of the disease. Patients typically succomb from recurrence of the disease (relapse). Biopsies from relapsed tumors have only been performed in very rare occasions. Around the globe, independent research groups have tried to assess which tumor specific changes in the genetic material have occurred from primary tumors to relapse. Due to the rare nature, drawing conclusions from the separate studies remains a challenge. In this proposal, 3 different research groups are teaming up to create the largest set of relapsed neuroblastoma samples to date. With the increase in sample size, we are confident that conclusions will be substantiated. Relapse material of tumors from patients suffering from Neuroblastoma has only been collected in exceptional cases over the last decade. With the advent of whole genome sequencing, several groups have started to explore the changes that are occuring in relapsed neuroblastoma samples. However, none of the groups have data sets of sufficient size due to the rare nature of biopsy. In an ongoing collaboration of the groups of John Maris (TARGET representative, CHOP, USA), Olivier Delattre (Curie, France) and Rogier Versteeg (AMC, Netherlands), we intend to combine our precious samples to create the largest neuroblastoma relapse cohort to our knowledge. We request access to the TARGET Complete Genomics ASM data for the 9 triplo neuroblastoma cases where normal, primary tumor, as well as relapsed material was whole genome sequenced (WGS). The data will be utilized to harmonize analyses on recurrently affected genes as assessed by somatic mutations as well as somatic structural variations. The findings from the 3 different parties involved in the collaboration, will then be combined for publication. Kozlowski, Piotr INSTYTUT CHEMII BIOORGANICZNEJ Analysis of somatic mutations in miRNA and miRNA-biogenesis genes in cancer Dec20, 2022 approved A growing body of evidence indicates that miRNAs may be a class of genetic elements that can either drive or suppress oncogenesis. It has been well documented that miRNAs downregulate numerous genes and either stimulate or inhibit many important biological processes and diseases including cancer. Therefore, the primary aim of our project is whole genome analysis and identification of important oncogenic mutations in miRNA and miRNA-biogenesis genes (miRNOME) in a large panel of cancer samples with special attention to lung cancer. We will use the next generation sequencing technology and developed in our laboratory miRNome enrichment platform that will allow identification of mutations present in analyzed cancer samples. We believe that among the identified mutations will be important oncogenic driver mutations, biomarkers or targets for cancer therapies that are invisible to currently used approaches and that the development and introduction of new tailored cancer therapies may be a far-reaching consequence of our project. A short (~21 nt long) single-stranded noncoding RNAs known as microRNAs (miRNAs) are nowadays considered as a class of genetic elements that can either drive or suppress oncogenesis. They act predominantly by binding to the target mRNA 3’ UTR sequence. It can be assumed that mutations in miRNA or its precursor sequence may alter folding of the miRNA precursor structure, efficiency and specificity of miRNA-biogenesis and the sequence of the miRNA itself, thus altering the spectrum of target mRNAs. The somatic mutations may both disrupt naturally occurring miRNAs and create new miRNAs that recognize targets completely different from the original ones, which may result in cancer development associated with gain- or loss-of-function of a specific MIRNA gene. The most common form of polymorphism that affects the function of a miRNA is the single nucleotide polymorphism (SNP). Although germline polymorphisms in miRNAs continue to be identified, the extent and occurrence of somatic mutations affecting miRNA in cancer are practically unknown, resulting mostly from a lack of appropriate experimental approaches. To date, miRNAs have been studied mostly in terms of changes in expression; however, the analysis focused on somatic mutations in miRNA sequences is of great interest. Therefore, in our project, we would like to perform whole genome analysis leading to the identification of important oncogenic mutations in miRNA and miRNA-biogenesis genes. We would like to analyze datasets gathered in TCGA in parallel with our ongoing experimental project in which we use NGS technology to sequence approximately 400 samples of primary cancers and cancer cell lines enriched in the miRNA and miRNA-biogenesis gene sequences. We plan to use the whole-genome-sequencing TCGA data originating mostly but not exclusively from lung cancer to investigate the occurrence of somatic mutations within miRNOME [annotated genomic sequences of pre-miRNAs (~2000) and exons of miRNA-biogenesis genes (~20)]. Later we are going to compare the occurrence of mutation with an expression of affected miRNAs. KULESA, PAUL STOWERS INSTITUTE FOR MEDICAL RESEARCH Logic-based modeling to predict Neuroblastoma disease outcome Aug05, 2020 closed Neuroblastoma is a cancer of developing nerve cells (neurogenesis) that occurs most often in infants and young children. Current methods to predict disease outcome and guide treatment are outdated and inaccurate, often leading to excessive radiation and chemotherapy that effect normal child development. In this study, we propose to explore a novel approach to increase the accuracy of neuroblastoma prognosis and treatment based on our recent identification of a specific signaling pathway in the brain during neurogenesis, from which neuroblastoma derives. We hypothesize that the dynamics of developmental genes implicated in neuroblastoma can be predictive of the disease. We will develop and analyze a computational logic model based on 7 developmental genes. We will then test our model’s capability to predict disease outcome of 26 human neuroblastoma cell lines and patient derived data. At the conclusion of our study, we will better understand how developmental signaling dynamics contribute to neuroblastoma disease progression and provide a more refined predictive tool to guide a personalized and targeted treatment strategy. Failure of the proper migration and differentiation of normal embryonic neural crest cells to become sympathetic neurons is a major driver in childhood neuroblastoma cancer. Accurate prediction of disease progression and outcome could lead to life-saving treatment strategies and reduce excessive radiation and chemotherapy treatments that often lead to long-lasting effects on child development. However, current prognostic tools that rely on high risk gene lists, derived from large population-level rather than individual patient data (Hallett et al., 2016), and image-defined data that guide surgical interventions have been inaccurate. This is primarily due to a lack in understanding the cellular and molecular dynamics of neuroblastoma and inability to refine the subclasses of high-risk individuals. To overcome these challenges, we developed a more clinically relevant approach to predict neuroblastoma disease outcome through construction and simulation of a logic based computational model with a 6-gene input signature of developmental genes (Figure 1A). We hypothesized that receptor tyrosine kinase signaling in sympathoadrenal development is a critical predictor of neuroblastoma disease progression based on our experimental results and deep knowledge of sympathetic ganglia formation (Kasemeier-Kulesa et al., 2005, 2006, 2010, 2015). We narrowed our focus to receptor tyrosine kinase A (trkA) and its ligand nerve growth factor (NGF), trkB and its ligand brain-derived neurotrophic factor (BDNF) and Anaplastic lymphoma kinase (Alk) and its ligand midkine (MDK) signaling networks as the inputs for our logic-based model. Simulations of these signaling pathways lead to model outputs of tumorigenic cell behaviors (proliferation, apoptosis, angiogenesis and differentiation) to predict a favorable or unfavorable disease outcome. Thus, this innovative approach uses individual patient data to provide mechanistic insights into disease progression and outcome, and addresses a major unmet need in neuroblastoma research. Our computational neuroblastoma model is the first predictive tool based on mechanism and is a more accurate predictor of early age/stage neuroblastoma disease outcome than any current method (Kasemeier-Kulesa et al., 2018). Using the 6-gene input signature curated from 77 human patient samples (Wang et al., 2006), our model achieved 91% accuracy of disease outcome for early age/stage patients (Kasemeier-Kulesa et al., 2018). To explain this, we suggest that neuroblastoma cancer in younger patients has a stronger link to developmental signaling pathways than in older patients (55% prediction accuracy at >4yrs of age). We now propose to refine the model to include a 7-gene input and determine what allows the model to be so accurate in predicting disease outcome. Specifically, we plan to: (1) revise the model to remove redundant connections and include MYCN as a gene input; (2) increase the specificity of the model to predict disease outcome for a broader class of patients based on age, stage, and chromosome abnormalities. Access to dbGaP datasets will allow us to extract individual patient 7-gene input signatures and test our computational neuroblastoma model to predict a more accurate and personalized disease outcome. Model predictions will yield a stratified classification of human neuroblastoma patient disease risk and identify critical network interactions that are targetable for a more informed therapeutic strategy. We plan to use dbGaP datasets separate and analyze independently. Kulozik, Andreas GERMAN CANCER RESEARCH CENTER Project-I-OS May19, 2022 approved Osteosarcoma (OS) is a very aggressive bone cancer that occurs mostly in children and young adults. Unfortunately, the treatment options of OS patients have barely improved throughout the past decades. The active parts in human cells are proteins. In our study we want to investigate altered proteins in OS patients compared to healthy bone tissue. Therefore, we will identify and quantify proteins in OS derived samples by a highly sensitive technique called Mass Spectrometry. Proteins also interact with other proteins depending on the shape and surface structure. Hence, another goal of our study is to investigate altered interactions between different proteins in OS compared to healthy bone tissue. As the shape and activity of proteins is determined to a high degree by the genes which contain the “construction plan” for proteins, we want to analyse the genomics data from TARGET / db GaP. Thereby, we hope to find common alterations in the genes of OS patients that contain the construction plan for crucial proteins in the development and progression of OS. We will then look into the interaction partners (other proteins) of these proteins which might help us to improve the treatment of OS patients. Osteosarcoma (OS) is the most frequently observed primary malignant bone tumor, which originates from osteoblast cells or their precursors and characterized by a high degree of disability and fatality rate. Importantly, treatment options of OS patients have not improved throughout the last decades. Therefore, our goal is to unravel proteome, protein interaction of the driver genes and thereby the targetable landscape using our high throughput and highly sensitive MS based proteomics and systems biology approaches. Further, we are planning to perform validation experiments using our established patient-derived xenograft osteosarcoma cells to discovery the dependencies in OS both at in vitro and in vivo. Using the dataset we would like to analyze common mutations in OS patients, for this we are collecting all publicly available datasets including TARGET (NIH), St. JUDE study, and INFORM (from our Institute). The gathered insights will be used to learn the frequently mutated drivers and non drivers genes. Therefore to achieve this purpose, we are planning to combine the Genomics (and Transcriptomics) data from the St. Jude cloud and the TARGET (db GaP) database. Depending on data structure the datasets will be combined before analysis and processed together or analyzed separately and results will be summarized afterwards. We aim at investigating especially the somatic mutations of osteosarcoma patients to identify the most frequently mutated genes, which will not create any additional risks to participants. Further, we are planning to correlate our Proteomics data with available Genomics and Transcriptomics data to perform a Multiomics analyses. Overall, these data will greatly help us understand the pathogenesis of osteosarcoma and promote further studies to unveil the proteogenomic landscape of osteosarcoma. KUNG, ANDREW SLOAN-KETTERING INST CAN RESEARCH Identification of non-genetically encoded cancer vulnerabilities Nov03, 2017 expired Current studies suggest that 25-40% of cancers have some DNA mutation that may be amenable to drug treatment. This means that a majority of patients do not have treatment options based on DNA sequencing. We are trying to develop methods that will allow us to identify drug treatment options based on sequencing the RNA in tumor cells. The goal is to broaden the number of patients to apply precision medicine approaches. We are interested in finding new therapeutic approaches for both adult and childhood cancers. We are developing integrative analysis methods to identify non-genetically encoded vulnerabilities in cancer cells. We will use the TCGA dataset to create a classifier and to build disease-specific signaling networks using systems biology approaches (e.g., as pioneered by Andrea Califano, Allen Tannenbaum, etc). We will then project individual gene expression profiles into the model both for classification purposes as well as to predict which signaling pathways are active in the individual sample. Having now established a framework for these studies using TCGA data (for adult cancers) we are expanding our research to pediatric cancers, which is our primary research (Dr. Kung is Chair of Pediatrics at MSKCC). We will use the TARGET dataset to develop models of pediatric sarcomas and pediatric leukemias. The goal of these studies is to identify non-genetically encoded vulnerabilities that may represent new therapeutic opportunities for childhood cancers. Kuure, Satu UNIVERSITY OF HELSINKI Role of MAPK/ERK pathway in Wilms tumor Aug01, 2024 approved Wilms tumor is one of the most common kidney cancers in children. The causes are largely unknown but connects to disturbed kidney development. Wilms tumors resemble embryonic kidneys structurally and in gene expression. Additionally, Wilms tumors are characterized by nephrogenic rests that contain remnants of embryonic tissue. Wilms tumor is linked to many genetic changes including increased expression of IGF2 which leads to increased MAPK/ERK activity. MAPK/ERK activity plays an important role during kidney development for example by affecting the differentiation of mesenchymal tissue that is also found abnormally in Wilms tumor. To study the role of MAPK/ERK pathway in Wilms tumors we aim to compare gene expression profiles of Wilm tumors (and their subtypes) to gene expression profiles of MAPK/ERK deficient mouse embryonic kidneys. Shares pathways could be those playing role in Wilms tumorigenesis and having MAPK/ERK contribution. Pathways can be used to solve biological and molecular mechanism behind tumorigenesis. In addition, we aim to solve biomarkers that would be specific to Wilms tumor subtypes and could contribute to diagnosis or prognosis. Defective cellular and molecular regulation of renal differentiation causes variety of congenital kidney anomalies including Wilms tumor. Kidney development ends in utero in humans and couple of days postnatally in mice. The embryonic kidney contains progenitor populations, such as nephron progenitors, that are permanently lost when kidney development ceases.(Cullen-McEwen, Sutherland, and Black 2016; Li, Hohenstein, and Kuure 2021) The origin of Wilms tumor is poorly understood but the tumors are characterized by precursor lesions called nephrogenic rests that are abnormal remnants of mesenchymal tissue. (Beckwith 1998) In addition, the histology and transcriptome of the tumors resembles embryonic kidneys connecting Wilms tumor to disturbed kidney development. (Coorens et al. 2019; Treger et al. 2019; Young et al. 2018) The genetic background of Wilms tumor is diverse and genetic studies have identified biallelic expression of imprinted insulin-like growth factor 2 (IGF2) and subsequent increased signaling as one of the most common alterations in Wilms tumor. (Maschietto et al. 2014; Scott et al. 2012) This and increased MAPK/ERK activity in Wilms tumor (Maschietto et al. 2014) suggest a role for the MAPK/ERK pathway in Wilms tumorigenesis. We have shown the MAPK/ERK pathway to play an important role in renal progenitor stemness, self-renewal and differentiation.(Ihermann-Hella et al. 2018; Kurtzeborn et al. 2022) To study the role of MAPK/ERK pathway in Wilms tumorigenesis we aim to use controlled bam files to count raw gene-expression counts for downstream differential gene expression analysis and subsequent gene set enrichment (GSEA) and pathway analysis using both whole dataset and subtype specific parts. Similar type of analysis will be done to our MAPK/ERK deficient mouse embryonic kidney datasets (Kurtzeborn et al. 2022). Datasets are analyzed separately and no risk for patients is caused. Comparison of transcriptional profiles could allow identification of molecular and biological pathways related to Wilms tumor and MAPK/ERK pathway leading to dissecting reasons driving the tumorigenesis. In addition, based on the bioinformatics analysis results we aim to find biomarkers that would be subtype specific. These biomarkers could have potential in diagnostics and predicting prognosis. Results of GSEA and pathway analysis and differential gene expression analysis will be published but no individual patients can be identified. Kyi, Cindy Win NIH Administrative Support for Human Cancer Models Initiative's Analysis Working Group Jul10, 2023 approved The Center for Cancer Genomics (CCG) is leading an analysis working group (AWG) for the Human Cancer Models Initiative (HCMI). HCMI’s repository include patient-derived next-generation cancer models, case-associated tumors and matched normal samples annotated with genomics and molecular data, from rare adult and pediatric cancers. The purpose of the HCMI AWG is to classify HCMI’s patient-derived next-generation cancer models into cancer subtypes, validate that the models preserve the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. HCMI AWG intends to use the TCGA and TARGET datasets as reference datasets to characterize the HCMI models and associated tumors from adult and pediatric cases, respectively. The findings from the AWG would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology. I will also be managing the analysis working groups for CTSP KIRC and MILD projects, as well as serving as a back-up manager for ALCHEMIST and CTSP DLBCL projects. The Center for Cancer Genomics (CCG) is leading an analysis working group (AWG) for the Human Cancer Models Initiative (HCMI). The purpose of the HCMI AWG is to classify HCMI’s patient-derived next-generation cancer models into cancer subtypes, validate that the models retain the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. In order to map the HCMI tumors and models in cancer genetic taxonomy, the group will be using methods such as (1) Celligner algorithm to map tumors and models against reference datasets, and (2) OncoMatch approach to analyze regulatory networks and model fidelity, and (3) place each tumor and model in the subtypes classified by the Tumor Molecular Pathology (TMP) work group. The group will also be analyzing copy numbers, mutations, mutation signatures and structural variants from the Whole Genome Sequencing Data of HCMI tumors and models and compare them against those of the reference datasets: TARGET (for pediatric cases), and TCGA (for adult cases). The findings from the AWG would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology. This request is for access to the clinical and genomic data generated by the National Cancer Institute’s Human Cancer Models Initiative (HCMI);dbGaP Study Accession phs001486,The Cancer Genome Atlas(TCGA); dbGaP Study Accession phs000178, and Therapeutically Applicable Research to Generate Effective Treatments (TARGET); dbGaP Study Accession phs000218.v25.p8. In addition, I will also be managing the analysis working groups for CTSP KIRC (phs001175.v2.p2) and MILD (phs 002253)projects, as well as serving as a back-up manager for ALCHEMIST and CTSP DLBCL projects and, therefore, requesting data access request to these projects as well. LaFramboise, Thomas CASE WESTERN RESERVE UNIVERSITY Role of mitochondrial DNA mutations in pediatric tumors Dec19, 2016 approved Mutations in DNA drive cancer in humans. In our project, we propose to examine mutations in a region of human DNA that has not been studied at great length. Our hypothesis is that this region may contain mutations that are specific to tumors in children. The project is designed to test the hypothesis that pediatric tumors have a high rate of mitochondrial DNA (mtDNA) mutations. This hypothesis will be tested by examining whole-genome and whole-exome sequencing data from the TARGET study and running the files through our own pipeline that is specifically designed to extract mtDNA mutations. The mutational burden in tumors will be examined for correlations with tumor type and other clinical characteristics. Whole-genome and whole-exome data from adult cases will also be run through the same pipeline and the results between pediatric and adult tumor will be compared. Furthermore, the patterns of mutations will be queried to determine whether there are candidate "driver" mutations. If the hypothesis is validated, mtDNA mutations will be characteristic of some pediatric tumors, therefore suggesting novel alternative treatment strategies. We intend to publish our results and also disseminate our findings via presentation at academic conferences. LaFramboise, Thomas CASE WESTERN RESERVE UNIVERSITY A genomic survey of allele-specific selection in tumor amplicons Oct14, 2020 approved The goal of this study is to determine whether certain inherited gene variants are advantageous to the cancer cell when they are further promoted via sporadic mutation in the tumor. We will accomplish this goal by analyzing data across hundreds of tumors. We hypothesize that particular germline alleles within a somatic amplicon are positively selected for during tumor evolution and therefore achieve a higher allele frequency among amplified (versus non-amplified) chromosomes. We propose to query genomic data for somatic amplification across a large number of tumors. In regions that recurrently undergo allelic imbalance, we will adapt principles from the fields of population and statistical genetics to test the hypothesis. Specifically, we will examine the SNP allelic content of the promoted chromosomes, and compare it to that of the non-promoted homolog. We hope that this will facilitate the discovery of specific regions under positive selection. Pinpointing candidate regions suggests downstream functional assays, which can be immediately pursued. In pediatric tumors, we propose that the positive selection of germline alleles within somatic amplicons is particularly strong, since the contribution of inherited variants to childhood cancers is generally larger. As such, we also propose the analysis of pediatric tumor data in the manner proposed above, which may lead to the identification of SNPs that predispose children to these early-onset cancers. Identified SNP alleles may serve as diagnostic or prognostic markers in pediatric cancer. We intend to broadly share any findings from our studies with the scientific community. Data from this dataset will only be used in research consistent with this study data use limitation and will not be combined with other datasets of other phenotypes. Landsman, David NIH Elucidating the mechanisms of histone mutations in human cancers Dec23, 2020 closed Somatic histone mutations have been observed in various tumor types, and it is important to understand the mechanisms of histone mutations in cancer development. In this proposed research project, we will collect the data about the histone genetic variants from various tumor types and develop new computational approaches to analyze them. We aim to identify the “key” histone mutations that drive cancer development and elucidate their effects on protein functions and relevant biological processes. Our study will also potentially identify novel prognostic markers for early cancer diagnostics and drug target for developing more effective treatment. Somatic histone mutations have been observed in various tumor types, and their effects on epigenetic processes can probably drive the oncogenic development. Here, we plan to investigate the mechanisms of histone cancer mutations for three primary objectives: 1) Identify histone cancer driver mutations. 2) Analyze histone cancer mutations’ effects on different epigenetic processes. 3) Identify potential prognostic markers and drug targets. The design of our study includes 1) Data collection. We will collect and combine all the data about the histone genetic variants and phenotypic characteristics from four studies (phs000178, phs000218, phs000748, phs001486). We will further integrate these data with other data from ICGC (https://icgc.org) to build a comprehensive dataset of histone genetic variants. All of these data will be analyzed together in our project. 2) Predict/rank histone mutations with respect to their driver status. We will develop a pipeline that utilize the current start-of-art bioinformatics approaches to identify the histone driver mutations from the collected data. 3) Analyze histone driver mutations’ effects on epigenetic processes. Analysis plan includes 1) Analyze histone driver mutations’ effects on protein functions. We plan to use our recently constructed human histone interactome (Peng, et al. Journal of Molecular Biology, 2020) to elucidate driver mutations’ effects on protein-protein interactions, protein-DNA interactions, and post-translational modifications. It will help us to understand how driver mutations disrupt protein functions and relevant biological processes. 2) Analyze the association of histone driver mutations with patients’ clinical features including patients’ survival rate and treatment. We collect all the data only for General Research Use (phs000178 and phs001486 required) and will develop a new computational method to predict and understand histone driver mutations in various tumor types (phs000748 required). Since many histone mutations have also been observed in pediatric cancer, our study can potentially identify the prognostic markers and discover novel drug targets for developing more effective treatment for pediatric cancer (phs000218 required). Thus, our proposed research is consistent with the data use limitations for all the requested datasets. Our collaborators include Dr. Anna Panchenko (Queens’ University) and Daniel Espiritu (Queens’ University). We will use the TARGET dataset to refine the pipeline to benefit pediatric cancer patients and the work will develop pediatric-specific biomarkers and compare them with adult biomarkers. We believe it will benefit pediatric cancer patients. LARGAESPADA, DAVID UNIVERSITY OF MINNESOTA Understanding the molecular basis of Osteosarcoma tumorigenesis Jan17, 2017 approved Osteosarcoma, a rare but deadly children’s cancer, is generally associated with massive genomic instability, making it problematic to identify driver genes. We aim to utilize the vast information stored in TARGET to confirm predicted oncogenes, tumor suppressors and molecular pathways identified in our tumor models from model organisms as molecular targets for therapy. We will test this hypothesis and simultaneously conduct student-training exercises by re-examining raw TARGET RNA sequencing, methylation, mutation, and microRNA data and performing a broader analysis of perturbed genes. We will also attempt to correlate gene expression levels with other quantifiable genetic markers. The intended use is purely academic, and key findings will be submitted for peer review and ultimately publication. 1. Objective: Our project involves understanding the underlying molecular mechanisms of osteosarcoma to develop better therapeutics. To accomplish this, we are searching for molecular targets by identifying oncogenes, tumor suppressors and evaluating the presence of novel transcripts in osteosarcoma. This involves searches in RNASeq, mutation, copy number and methylation data. We would like to search TARGET data to confirm predicted genes and pathways involved in osteosarcoma development for follow up experiments and identifying molecular targets. We will test this hypothesis and simultaneously conduct student-training exercises by re-examining raw TARGET RNA sequencing data, re-determining the mutation calls, and performing a broader analysis of perturbed genes. We will also attempt to correlate expression levels with other quantifiable genetic markers. 2. Study Design: Upon approval, we will download raw RNA sequencing data and analyze Osteosarcoma data. If available, RNA sequencing data from matched normal tissues will be analyzed in parallel. In addition to using preexisting software tools such as TopHat2, deFuse, and Trinity, we will use methods already developed in-house to identify gene sets involved in osteosarcoma development. 3. Analysis Plan: We are interested in gene sets and their correlating features. We would like to test our data analysis pipeline to determine the best candidate genes and gene sets (pathways) for follow-up laboratory experiments. Appropriate statistical tests will be applied such as two-tailed student’s t-tests, Fisher’s exact tests for enrichment analyses with corrections for multiple hypothesis testing. 4. Explanation for how the proposed research is consistent with Use Restrictions for the requested dataset(s): We will not distribute the information to any other party beyond our academic group. Our studies will not require subject identification (i.e., all original donors will remain anonymous). We will only focus on pediatric osteosarcoma as stated in Use Restrictions. The intended use is purely academic and predominantly for training purposes. Key findings will be submitted for peer-reviewed publication. Lau, Ching JACKSON LABORATORY Identification of clinically-associated genomic alterations in osteosarcoma Aug26, 2020 approved Despite significant prior research into the genomic landscape of osteosarcoma, a clinically-actionable target for the development of therapeutics remains elusive. We would like to compare our analysis results from the TARGET osteosarcoma expression, copy number, and methylation array data with the DNA and RNA sequencing data, and add to our knowledge of the genomic context of these results. Additionally, we want to determine the immune cell populations present in the bulk data, and examine their association with clinical outcomes in order to inform efforts to implement immunotherapy in osteosarcoma. Although our understanding of the genomic alterations in osteosarcoma continues to expand, we have not yet found clinically-actionable targets that can be used to guide development of potential therapeutics. Our goal is to utilize the TARGET DNA and RNA sequencing data (substudy phs000468.v19.p8) to better understand the complex genomic landscape of osteosarcoma in children. We will compare the expression and copy number of candidate biomarkers found in the overlapping TARGET array data with the sequencing data, to verify those findings and to integrate DNA mutation and structural information with our internal pediatric osteosarcoma analysis results. In order to evaluate the susceptibility of osteosarcoma to immunotherapy, we will use modern methods to deconvolve the bulk expression data and quantitatively detect immune cell subtypes. We will combine the data with internally-generated long-read sequencing data, in order to validate discovered structural rearrangements between datasets. They will be analyzed separately and the results compared, and there will be no additional risk to participants. Lau, Ching JACKSON LABORATORY Mechanisms, Genomic Risk Stratification and Precision Intervention for Acute Myeloid Leukemia in Children with Down syndrome (ML-DS) Jun30, 2022 approved We will create a new workflow to assess genetic variants and RNA-sequencing gene expression data of DS individuals, identifying molecular differences between DS individuals known to have had preleukemic transient events that progress to acute myeloid leukemia with those who do not, including completely normal DS individuals as a control. Children with constitutional trisomy 21 Down syndrome (DS) have a unique predisposition to develop myeloid leukemia of Down syndrome (ML-DS). This disorder is preceded by a transient neonatal preleukemic syndrome, referred to as transient abnormal myelopoiesis (TAM), which has been thought to be unique among clonal neoplastic disorders by its universal linkage with trisomy 21. Recent work has shown that these transient events appear also in trisomy 21 mosaic individuals and have highlighted the role of GATA1 mutations as deterministic in potential outcomes. There is an unmet need to define molecular characteristics based upon both the GATA1 variant information and the gene expression profiles of blasts of TAM, ML-DS and relapsed ML-DS, as well as normal T21 hematopoietic progenitor populations. We have assembled a multidisciplinary team to collaborate in a cloud-based environment, using and extending workflows and analyses in an open, transparent and collaborative manner using the stated dbGaP datasets available on the INCLUDE and Kids First portals. We will will explore the role of GATA1 mutations in hematopoiesis, including understanding normal T21 hematopoiesis and the transition from TAM to ML-DS and relapse in a subset (phs001657, phs000159, phs000413, phs001027, phs000178, phs001287, phs000424, phs001746, phs000218). We will also explore the gene expression profiles of the determined subpopulations based both upon the phenotype information of the groups (diagnosed ML-DS with and without evidence of TAM events), as well as relapsed ML-DS and normal T21 subpopulations (phs001657, phs000413, phs001027, phs000178, phs001287, phs000424, phs001746, phs000218). We will compare the identified genomic/expression features with diploid 21 AML cases to identify alterations specific to T21 TAM and ML-DS. In addition to contributing to primary data analysis, Dr. Deslattes Mays will provide oversight of reproducible pipeline development to ensure adherence to F.A.I.R. Data Principles. Data will be combined with normal T21 controls sourced from the Linda Crnic Institute for Down Syndrome’s Human Trisome Project, facilitated by the Joaquin Espinosa lab. This data will be stored on a separate AWS bucket and merged for the purposes of comparison between disease-affected T21 patients and normal T21 subjects. The proposed project will study myeloid leukemia in pediatric T21 patients, with both computational methods and results shared with the broad scientific community, and we intend to publish any findings. Therefore the study falls within the data use limitations of the requested datasets phs001657, phs000159, phs000413, phs001027, and phs000218. Lee, Dung-Fang UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON Dissection of RNA splicing in RB1-mutant osteosarcomas Mar16, 2023 approved Osteosarcoma is a primary non-hematologic malignancy that frequently affects children and adolescents. Despite advancements in surgery and multi-agent chemotherapy, early detection of OS remains a challenge due to the lack of proper biomarkers and approaches. Consequently, the survival rates of affected patients have not significantly improved in the past four decades. Therefore, understanding the genetic alterations (such as RB1) associated with osteosarcoma and investigating the molecular mechanisms involved in osteosarcomagenesis are critical for effective disease management.We have previously shown that the abnormal upregulation of spliceosomal genes resulting from RB1 loss contributes to the initiation and progression of osteosarcoma [PMID: 35412907]. In this study, our aim is to identify splicing events associated with osteosarcoma in the absence of RB1 tumor suppressor. Ultimately, we hope that this study will lead to the development of potential therapeutic strategies to target RB1-mutant osteosarcoma. Alternative RNA splicing is a biological process that allows for the expression of multiple RNA and protein isoforms from a single gene, contributing to transcript variation and proteome diversity in eukaryotes. Cancer cells exploit this process to express unique cancer-specific splicing isoforms, which can drive cancer progression or contribute to specific oncogenic features. The objective of this study is to identify novel splicing variants in the presence of RB1 mutation in osteosarcomas. Our recent work demonstrated that the abnormal upregulation of spliceosomal genes resulting from RB1 loss and RB1-mutant osteosarcoma is selectively sensitive to spliceosome inhibitors [PMID: 35412907]. Using RB1-mutant osteoblasts and RB1-deficient osteosarcoma lines, we have identified various altered splicing variants based on RB1 status. In this study, we aim to identify splicing events associated with osteosarcoma in the absence of RB1 tumor suppressor using TARGET OS datasets. We plan to analyze transcriptome data and whole exosome sequences from the TARGET OS dataset to correlate RB1 mutation status with distinct splicing events. We will focus on the splicing variants’ functions related to immune response and cancer signaling pathways. Ultimately, we hope that this study will lead to the development of potential therapeutic strategies to target RB1-mutant osteosarcoma. Also, we will fully comply with the data usage terms and limitations set by NIH for the dbGaP TARGET project. Lee, Joo Sang SUNGKYUNKWAN UNIVERSITY RES/BUS FDN Targeting gene fusions in pediatric cancer via synthetic lethality Nov24, 2023 approved Gene fusion events have been a topic of significant interest in cancer research, particularly since the discovery of the BCR-ABL fusion gene in chronic leukemia patients. Recent report from St. Jude on the systematic repertoire of oncogenic fusion in pediatric cancer from more than 5000 children suggests potential opportunities to target gene fusions for pediatric cancer treatment. However, some gene fusion targets are not druggable, thus our research aims to alternatively target gene fusions by decoding the underlying cellular wiring and clinical phenotypes associated with fusion genes via multi-omics profiles of pediatric cancer patients. To perform this analysis, we would require access to level 1(raw- data) data of multi-omics profiles the participants in the TARGET and other studies. This will newly address potential therapeutic targets for pediatric cancer driven by gene fusions. Gene fusions, since the discovery of BCR-ABL in chronic myeloid leukemia patients, have gained its significance in potential as diagnostic markers and therapeutic targets. Recent report from St. Jude on the systematic repertoire of oncogenic fusion in pedicatric cancer from more than 5000 children suggests potential opportunities to target gene fusions for pediatric cancer treatment (Liu et al. Nat Commun 2023). However, many gene fusions are not directly druggable. Our research aims to decode the underlying cellular wiring from the perspective of synthetic lethality, and identify potential therapeutic targets for pediatric cancer driven by fusion genes. To identify potential targets in pedicatric cancer driven by fusion events, we would require access to level 1(raw- data) data of multi-omics from the participants from TARGET(phs000218.v22.p8). For comparative analysis with the adult tumors and healthy normal cells, we request the access to TCGA study(phs000178.v11.p8), CPTAC (phs001287.v17.p6), CGCI project (phs000235.v20.p6) together with GTEX (phs000424.v9.p2). Multi-omics data includes the whole genome genotyping, whole genome sequencing, whole exome sequencing, RNA sequencing. Using these data files (1) we will derive robust gene fusions by integrating multiple distinct gene fusion detection tools on the provided raw RNA sequencing files and additional gene fusion detection tools on WGS files. We will prioritize those fusions recurrently reported in (Liu et al. Nat Commun 2023). (2) We will then analyze differing genetic, proteomic, transcriptomic, epigenetic patterns as well as phenotypic characteristics depending on the presence of selected gene fusions. (3) We will also test such interactions with fusion genes in fusion gene annotated cell lines. We have a track-record of inferring synthetic lethality based on multi-omics data for precision oncology (Lee et al. Cell 2021, Feng et al. Cancer Cell 2019, Lee et al. Nat Commun 2018) and aim to apply the established approach to infer synthetic lethal partners of gene fusion events in pediatric cancer. Lee, Semin UNIST Identification of novel DNA damage-related key gene in acute lymphoblastic leukemia using multifaceted analysis methods. Apr27, 2023 approved We purpose to identify DNA damage-related genes associated with acute lymphoblastic leukemia (ALL) through bioinformatic analysis and experimental validation. We select candidate genes that show significant differences in the ALL group compared to the healthy group using two open datasets sequenced in different ways. Then, the correlation between genomic instability and candidate gene expression levels is planned to be confirmed using genomic data available in dbGAP. The results of our study will provide a blueprint for follow-up research for future ALL treatment and drug development. The objective of this project is to identify DNA damage-related genes that affect the onset and progression of acute lymphoblastic leukemia (ALL) and to prove their role as new therapeutic markers. We select and verify key gene candidates by combining bioinformatics analysis and experiments. First, single-cell RNA seq data and bulk RNA seq data are acquired from the GEO database, and then genes showing significant differences in expression level between the ALL group and the healthy group are identified using bioinformatics technology. Candidate key genes commonly obtained from both datasets are identified through experimental validation. To investigate whether candidate genes deficiency affects genomic instability, a hallmark and driver of tumorigenesis, we plan to calculate known genomic scar signatures (telomeric allelic imbalances, loss of heterozygous (LOH), large scale transition, and weighted genome integrity index) using the dbGAP dataset and identify correlations between chromosomal instability levels and candidate genes. TARGET ALL data will be used for the following purposes: 1. Comparison of candidate key gene expression levels between the ALL group and the normal group using RNA seq data 2. Calculation of genomic scar signature scores using SNP6 genotyping data 3. Confirmation of correlation between candidate gene expression levels and genomic scar signature scores. Each data will be analyzed independently and is not likely to pose additional risk to participants. LEFEBVRE, Celine INSTITUT DE RECHERCHES SERVIER Target identification and characterization for the development of novel therapeutic molecules in pediatric cancers Mar25, 2021 rejected This project fits in a context of development of therapeutic molecules for the treatment of pediatric cancers. In order to support target identification and characterization for research programs for pediatric cancer at Institut de Recherches Servier, we analyze genomics data to characterize genomic alterations such as gene fusions and combine those with gene expression level and molecular signatures in order to better understand the disease and identify new treatment strategies. In order to support target and biomarker validation for research and translational research programs at Institut de Recherches Servier, we interrogate genomics datasets from large pediatric cancer patient cohorts. The analysis of these genomics datasets will allow the identification and validation of new targets, as well as the cancer subtypes the most susceptible to respond to a therapeutic modality and therefore the patients that may benefit from the given therapy. In this project, we aim to 1) identify and better characterize fusion transcripts from raw sequencing (RNASeq) data and 2) combine gene alterations, gene expression levels and molecular signatures for better understanding the pediatric cancers and predict which disease subtypes would benefit the most from a single or combination therapy. Lennartsson, Andreas KAROLINSKA INSTITUTE Transcriptional regulation of acute pediatric leukemia Aug27, 2018 closed The most common mutations in pediatric acute leukemia perturb the output of our genome (transcriptome), either directly or indirectly via a specific type of regulatory RNA molecules called long non coding RNA (lnRNA) or epigenetic mechanisms. We will use the unique TARGET patient cohort to dissect the molecular mechanisms that contribute to a leukemic transcriptome. In addition to analyze different pediatric acute leukemias, we will also compare adult and pediatric leukemia to understand the molecular difference between them to be able to suggest better and safer cross usage of drugs. Research progress We have analyzed the expression of lncRNA in the pediatric cohort and identified several lncRNA with prognostic value. We are further analyzing their function and interaction with the mRNA transcriptome. Each year approximately 300 children are diagnosed with cancer in Sweden. Acute myeloid leukemia (AML) or acute lymphoid leukemia (ALL) accounts for 25% of the diagnosed cases. The most common diagnosis is ALL and for this form of leukemia major progress has been made in its treatment. However, AML still has a strikingly poor outcome, where only half of the patients are cured. Better understanding of the molecular mechanisms is therefore needed to be able to develop new more efficient and targeted drugs. Research objectives We aim to use the Target dataset for pediatric ALL and AML to increase our understanding how epigenomics and transcriptomics networks interact to create a leukemic transcriptome. The main focus is A) leukemic specific expression of long non-coding RNA (lnRNA) B) how aberrant enhancer activation give rise to a leukemic transcriptome C) how does the transcriptomic regulation differ between pediatric and adult acute leukemia. Study design and Analysis plan A) A characterization of the expression of lnRNa will be performed in both AML and the ALL cohort. LnRNa expression will be correlated to karyotype and mutation status of the patients to identify novel pathways that contribute to leukomogenesis. B) Aberrant DNA methylation at enhancers will be identified in the target cohort, but also in our own and other publically available datasets (PMID: 24063430). The enhancer activity will be correlated with promoter activity in the TARGET dataset using the enhancer-promoter pairing that we have previous made within the FANTOM5 consortium (Andersson R et al. Nature 2014). C) The results from part A and B will be compared with adult AML and ALL cohorts. Research progress We have analyzed the expression of lncRNA in the pediatric cohort and identified several lncRNA with prognostic value. We are further analyzing their function and interaction with the mRNA transcriptome. Leslie, Stephen UNIVERSITY OF MELBOURNE Loss-of-function genetic variants in pediatric neuroblastoma Mar14, 2017 expired The aim of our study is to identify rare loss-of-function (LoF) genetic variants that are associated with childhood neuroblastoma. Our particular focus is LoF variants in redundant genes, which may confer a protective effect against neuroblastoma, the discovery of which could point the way to improving neuroblastoma therapies. Genome sequencing studies have shown that all human genomes contain genetic variants that cause loss of function (LoF) of protein-coding genes (known as human knockouts). The human genome typically carries about 100 LoF variants with 20 genes completely inactivated. LoF variants in healthy individuals may be: severe recessive disease alleles in the heterozygous state; less deleterious disease alleles that still impact disease risk; benign LoF variants in redundant genes; or variants that do not seriously disrupt gene function. In a small number of cases LoF of a redundant gene has been found to confer a health benefit. For example, knockout of PCSK9 has the effect of lower LDL cholesterol and reduced heart disease risk. The objective of our study is to determine if LoF variants are associated with pediatric neuroblastoma, and, in particular, if there is evidence of LoF variants that are protective against pediatric neuroblastoma. We wish to make use of the publicly available GWAS data (phs000124.v2.p1), and sequencing data (phs000868.v1.p1, and phs000467.v13.p6 - a substudy of the TARGET study phs000218.v16.p6). We request MYCN amplification status phenotype data in order to conduct an association analysis restricted to pediatric patients with MYCN amplification. We will perform genotype imputation of LoF variants in the neuroblastoma samples, and conduct rare variant association tests with appropriately selected control samples. These data will supplement our other samples of childhood neuroblastoma cases. Li, Dawei UNIVERSITY OF VERMONT & ST AGRIC COLLEGE TARGET sequencing analysis Aug18, 2017 closed We aim to learn more about the genetic mechanisms underlying the childhood cancers by analyzing the existing sequencing data from the TARGET project. Our research aim is to identify the causes of childhood cancers. Thus, it can only be achieved by analyzing pediatric cancer genome data. In this proposed project, we will use a method that was developed in our laboratory to analyze the TARGET sequence data to identify viral sequences. Our goal is to discover subsets of childhood cancer patients that have viral causes. Thus, we plan to analyze all the samples included in the TARGET project. Completion of this project will provide us a better understanding of causes of childhood cancers. Proving cases of viral causation will lead to development of cancer prevention and individualized treatment. We aim to learn more about the genetic mechanisms underlying the childhood cancers by analyzing the existing sequencing data from the TARGET project. Our research aim is to identify the causes of childhood cancers, particularly viral etiology in children. It can only be achieved by analyzing pediatric cancer genome data. We have developed a new method to identify viral sequences using cancer genome sequencing data (paper recently published). In this proposed project, we will use our method to analyze the TARGET sequence data to identify viral sequences. Our goal is to discover subsets of childhood cancer patients that have potential viral causes. Thus, we plan to analyze the samples included in the TARGET project. Completion of this project will provide us a better understanding of causes of childhood cancers. Proving viral etiology will lead to development of cancer prevention and individualized treatment. Li, Lang OHIO STATE UNIVERSITY systems biology of osteosarcoma Jul17, 2019 expired We want to use the Osteosarcoma cancer genome and transcriptome data available in dbGAP to study the molecular mechanisms of copy number alternations in Osteosarcoma genomes, to investigate the genetic cause of Osteosarcoma, and to identify new druggable genes evolved in associated pathways. Osteosarcoma (OS) is an aggressive bone tumor that preferentially develops in adolescents, and it is the second leading cause of cancer-related deaths in adolescents. The tumor is characterized by an abundance of chromosomal aberrations, including amplifications, deletions, translocations and overall aneuploidy. Our research aims to elucidate the molecular mechanisms of copy number alternations (CNAs) in osteosarcoma (OS). Some research questions we plan to address: 1) the landscape of CNAs in OS tumor, including amplified and deleted genes, their associated pathways, and druggable targets in these pathways; 2) the co-occurrence of CNA with the other genomic features, such as insertions, deletions, and fusions genes; 3) the associations between CNAs and gene expressions and DNA methylations; and 4) the associations between CNAs and clinical patient survival. The sequence information for OS patients is limited due to the rareness of OS tumor. Therefore we request to access two OS datasets: phs000468 and phs000699. With the access to OS patients’ whole genome data, genome-wide CNAs analysis in OS tumor will be performed, associated pathway and potential drug targets will be identified. Then we will integrate information from Affy Exon ST, Affy SNP 6.0, whole genome, exome, and RNA-seq data to study CNA-associated events. DNA methylation and fusion genes will also be investigated. To enlarge the sample number, this study will include data from Gene Expression Omnibus (GEO) database. Specifically, we will get 32 more patients CNAs information (Appy6.0) from GSE33153. Additionally, OS cell line data from CCLE will be analyzed separately. The CNAs analysis comparison between TARGET patients and CCLE cell lines will help us inspect the consistency between clinical and experimental data. Our findings will expand our understanding of OS cancer and provide valuable insight on new therapy development. We will follow the terms of the model Data Use Certification and TARGET dataset use limitation. Li, Shuaicheng CITY UNIVERSITY OF HONG KONG Complex structure variation analysis in childhood cancer Apr10, 2019 closed Complex structure variation is prevalent in cancer patients and has not been well studied yet. To investigate the details of complex structure varitation pattern in childhood cacners, it is important to recover what the local genome looks like and try to understand the mechanism behind the integration process. We have developed an algorithm to automatically analyze the genomic map around structure varitation breakpoints with the help of adjacent copy number variants. With the aid of "TARGET" dataset, we are able to construct the full scape of complex structure varitaion, detect the complex sv pattern, and decipher prognostic sv signatures. This work can be potentially applied in diagnosis of childhood cancers and study of related tumorigenesis. Genome instability in childhood cancers has been discussed for decades. However, the full landscape of the complex structure variation still need to be investigated. Whether the complex structure variations are formed at single time, or accumulated during caner development. If it is formed at single time, simliar to the classical 'chromothripsis', how does the process operate? Else if it is accumulated in long time, it maybe relevant to evolution of cancer, can we model this process combining with mutations along the cancer genome? Is there some common SV pattern between different cancer types? Is there some SV signature associated with patient's overall survival? We have developed algorithm to analyze the full landscape of complex structure variation. It provides clear insights into cis-regulation and mechanisms of complex structure variation and how the genome instability forms. We would like to request the dataset of “National Cancer Institute (NCI) TARGET: Therapeutically Applicable Research to Generate Effective Treatments ” ( phs000218.v21.p7 ). TARGET contains 5 childhood cancer types. We would like to identify the structure variation breakpoint, and the adjacent CNVs, to constructe full scape of complex structure variation and the related regulation mechanism. We would like to further investigate the prognostic associated SV signature among the 5 cancer types. We do not have any plans to develop a commercial product or service or file Intellectual Property (IP) based on the findings from this proposed research. Our proposed findings are just for mechanism study of complex structure variation, and they are not expected to result in any commercialized product or service. Our plan will not change regarding our intention not to seek IP or commercialization. We agree to inform the NIH if our plans for IP or commercialization change. The research findings from the proposed research will be published in academic journals. Li, Wang ESSENTIAL SOFTWARE, INC. TARGET-DGC Data Migration May24, 2016 approved The main purpose for accessing TARGET data is for the QC/QA of the data upload and download process to/from the Genomic Data Commons (GDC). ESI is under contract from Leidos to assist with independent QC/QA. The ESI's QA team verifies the availability of TARGET data in GDC and the integrity of data. ESI engineers will download a selected set of clinical, sample, and molecular data from the TARGET data matrix (https://target.nci.nih.gov/dataMatrix/), and compare the original TARGET data with data GDC acquires to make sure GDC processes TARGET data properly. Comparison will include verifying that the MD5 checksums and file size are consistent after data import into the GDC and that the number of files imported is consistent with the number of TARGET data files. Once the QC/QA tasks are complete, all downloaded data will be deleted immediately in a fashion that it cannot be recovered from the computer. The main purpose for accessing TARGET data is for the QC/QA of the data upload and download process to/from the Genomic Data Commons (GDC). ESI is under contract from Leidos to assist with independent QC/QA. The ESI's QA team verifies the availability of TARGET data in GDC and the integrity of data. ESI engineers will download a selected set of clinical, sample, and molecular data from the TARGET data matrix (https://target.nci.nih.gov/dataMatrix/), and compare the original TARGET data with data GDC acquires to make sure GDC processes TARGET data properly. Comparison will include verifying that the MD5 checksums and file size are consistent after data import into the GDC and that the number of files imported is consistent with the number of TARGET data files. Once the QC/QA tasks are complete, all downloaded data will be deleted immediately in a fashion that it cannot be recovered from the computer. Over the past year, the ESI team has used its access to TARGET controlled data to provide thorough quality assurance for the import of TARGET data into Genomic Data Commons (GDC) and download access of the data via GDC portal and download tools such as the Data Transfer Tool (DTT). When new TARGET projects are imported to the GDC portal, ESI performed data validation testing that required downloading and opening both controlled and open files in order to completely verify TARGET the data imports. DTT testing requires that varying data from each project can be downloaded without corruption by authorized users. This includes positive and negative testing on user tokens as well. In addition, data downloading is performed for each project in regression testing with each new release. However, with testing that requires downloading any controlled data, the downloaded files are deleted shortly after to preserve the data's security. Li, Wenbo UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON Examine the landscapes of noncoding RNAs in childhood tumors Apr13, 2023 approved A large part of human genome is made into RNA molecules, and most of them are not code from proteins. They are called non-protein-coding RNAs, or ncRNAs in short. ncRNAs have been extensively characterized in various adult tumors but less so in pediatric tumors. We have recently examined noncoding RNAs deregulated in human adult tumors (Zhang et al., Nature Communications, 2019). The current research project intends to utilize published large datasets of Pediatric Cancer patients to study how ncRNAs may be deregulated in pediatric tumors. We aim to further understand if deregulated ncRNAs can play key roles in gene deregulation and aberrant tumor growth in pediatric tumors. Our work will make important contribution to provide novel diagnostic or therapeutic targets for human pediatric tumors. Alterations in protein-coding genes have been extensively characterized in various pediatric tumors. However, the dysregulation of the noncoding part of human genome that are associated with pediatric tumors remains poorly known, particularly in terms of their underlying mechanisms. We have recently examined noncoding RNAs deregulated in human adult tumors (Zhang et al., Nature Communications, 2019), but such landscapes and roles of noncoding RNAs in pediatric tumors are not explored. The current research project intends to utilize published large sequencing datasets of Pediatric Cancer patients or healthy individuals to decipher how the ncRNAs from noncoding regions, particularly those in the enhancer and promoter regions, may be deregulated in tumor, and which may contribute to altered gene transcription in tumors, and therefore contributing to disease etiology or genotype variation. The datasets we intend to analyze include tumor and non-tumor samples from pediatric tumors, such as Neuroblastoma (NB) and Kidney Rhabdoid Tumor (RT), particularly the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative. Our analyses will utilize the RNA-seq, whole genome sequence (WGS) and epigenomic datasets (such as ChIP-Seq of histone modifications or ATAC-Seq from other sources). Our results will not only provide a complete landscape of ncRNAs in pediatric tumors, but also provide mechanistic insights into functions of noncoding genome elements. We expect that additional experimental work will potentially offer clinically relevant means to detect and therapeutically target noncoding elements to achieve a precision therapy for human pediatric tumors. Liang, Chengyu WISTAR INSTITUTE Exploring oncogenic mechanism and therapeutic target of pediatric acute lymphoblastic leukemia Jun03, 2021 approved T-cell acute lymphoblastic leukemia (T-ALL), a devastating form of childhood malignancy, represents NEARLY ONE THIRD of all pediatric cancers. It is characterized by infiltration of the bone marrow with immature lymphoblasts expressing T-cell immunophenotypic markers. Although the specific biological and molecular mechanisms that account for the aggressiveness and poor therapy response of T-ALL remain to be elucidated, accumulating evidence strongly indicates that T-ALL is a disease primarily caused by aberrant activation of Notch1 signaling. It is thus very important to understand the molecular pathways that control Notch1 activity and identify key players involved therein, ultimately leading to better treatment strategies. To meet this challenge, the goal of this study is to identify a new regulator that calibrate this crucial pathway activation and open the way for targeting the unconventional endo-lysosomal pathway in pediatric T-ALL. Acute lymphoblastic leukemia (ALL), the most common form of malignancies in children, is induced by the transformation of blood (hematopoietic) stem cells and progenitors. The damaged cells produced in the bone marrow crowd out normal cells and also metastasize to other organs. Although the outcome of ALL patients has improved in recent years, one in five children with T-ALL do not survive due to either unresponsive or relapsed disease. Better understanding of the molecular basis of this disease and the search for fresh ideas, and new-targeted therapies are thus in high gear. Recent studies have demonstrated that approximately 60% of cases of pediatric T-cell ALL (T-ALL) are marked with aberrant activation of the Notch1 receptor. However, clinical application of Notch1-targeted therapy by inhibiting its proteolytic cleavage by gamma-secretase has been unsuccessful because of dose-limiting toxicity. Overcoming these difficulties will require improved understanding of the oncogenic mechanisms controlled by Notch1 and a better appreciation of the genes and pathways that regulate Notch-driven pediatric leukemogenesis as potential targets of T-ALL therapy. To meet this challenge, the goal of this study is to exploit a new molecular pathway(s) that controls Notch1 actions under leukemic condition. This goal provides a critical trajectory for the development of novel prognostic approaches and new therapeutic strategies for this aggressive pediatric lymphoid malignancy. To achieve this goal, we apply for database access to NCI TARGET. The datasets will be independently focused and not combined with other datasets outside of dbGAP and also will be explored within the primary study scope. We have discovered that pediatric ALL-associated Notch1 is degraded through a previously unknown endo-lysosomal pathway. This valuable resource will help us elucidate whether the identified mechanism is indeed an oncogenic driver in pediatric ALL patients and provide proof of concept for new targeted therapy in this childhood cancer. We posit that disruption of this regulatory module impacts T-cell homeostasis and contributes to ALL; it not only serves as a novel biomarker to evaluate Notch1-associated pediatric ALL progression but also facilitate the personalized treatment of pediatric T-ALL. Liang, Chun MIAMI UNIVERSITY OXFORD Detection of inverted repeats and exploration of their potential roles in thrombogenesis among different cancer genomes. Nov16, 2023 approved It is well known that repetitive sequences are linked to many human diseases. In cancer genomes, inverted repeats are abundant at common fragile sites that often exhibit gaps and breaks. As the first genome-wide, comparative study to explore the interaction among inverted repeats, DNA methylation and gene expression regulation in human Osteosarcoma cancer genomes, this project will yield valuable insights regarding the connection between human diseases and repetitive sequences, including transposons. The ultimate goal of this project is to further our understanding of the interconnection among the repeatome, epigenome and transcriptome in eukaryotes. A repeatome is defined as all repetitive sequences (tandem, inverted and interspersed repeats) within a genome, including transposons (retrotransposons and DNA transposons). Repeat sequences could be biologically important because they often serve as target sites of DNA- or RNA-binding proteins. Inverted repeats can form DNA hairpins in single strand and cruciforms in double-strand, transcribe into single or double-stranded RNAs, and participate in RNA editing, miRNA/siRNA biogenesis, alternative splicing/polyadenylation and other biological processes. Inverted repeats are known to trigger DNA methylation by their hairpin secondary structures and by generating small non-coding RNAs. Meanwhile, transposons are often characterized by their terminal inverted repeat (TIR) or terminal direct repeats (TDR) and some transposons are represented as interspersed repeats. Transposons not only can modify gene structures and genome architectures to induce genetic changes, but also can interact with epigenomic elements dynamically to determine heritable phenotypic responses. Unfortunately, due to the limitation of conventional string search-and-comparison algorithms, current annotation of repeats and transposons in genomes is far from being complete and accurate, presenting major obstacles for us to fully understand the origins, compositions, structures, molecular regulations and biological relevance of repeats and transposons in genomes. Because of the association and differences between repeats and transposons, it is worthy of exploring these two groups synergistically and independently. So far, there is no one-stop bioinformatics tool that can provide whole repeatome annotation for a given genome. Recently, we developed novel algorithms that replace traditional string search-and-comparison with number calculation-and-comparison in detecting inverted repeats and miniature inverted repeat transposable elements (MITEs). Our new tools prove to identify lots of novel inverted repeats and MITEs missed by other popular tools. First, this project is aimed to develop a one-stop bioinformatics program that can provide more comprehensive and accurate repeatome annotation. We will improve and expand our core algorithms to detect interspersed repeats, terminal inverted/direct repeats, and transposons characterized by these repeats, and we will integrate other open-source tools into our one-stop program. In eukaryotes, it is increasingly accepted that transposon elements and epigenetic components are the major contributors in introducing both genetic variations and phenotypic responses to environmental stressors. Secondly, this project will focus on the comparative study using Arabidopsis and human to explore the interconnection among inverted repeats, DNA methylation and gene expression regulation. We will investigate the relationship between DNA methylation levels of inverted repeats and associated gene expression, study the impacts of different features of inverted repeats (e.g., their size features, compositions and architectures, genic configurations, associated gene structures, gene categories and gene ontologies, secondary structures, minimum free energies and so on) on DNA methylation, and predict novel siRNAs and miRNAs transcribed from inverted repeats that play a pivotal role in DNA methylation. Moreover, inverted repeats are shown to cause human diseases and developmental defects through gene amplification, gene deletion, and chromosomal translocation. In cancer genomes, they are abundant at common fragile sites that often exhibit gaps and breaks. We will compare human cancers with the control (healthy) group to detect inverted repeat associated genes that display differential mRNA/smRNA expressions and distinctive methylation profiles. Clearly, this project will help improve our understanding of repeatome-epigenome-transcriptome interaction and unravel pathogenic mechanisms of repeat-related human diseases. We already obtained the Osteosarcoma data in 2021 through dbGap. Using the data, we have already published two papers (https://www.nature.com/articles/s41598-021-04208-5) and (https://www.nature.com/articles/s41598-022-22082-7). Now, we need to access more data from different cancer data to explore inverted repeats and their connection with tumorigenes Licht, Jonathan UNIVERSITY OF FLORIDA Consequences of the WHSC1 E1099K mutation in relapsed pediatric acute lymphoblastic leukemia Aug12, 2016 expired Our research examines the genetic causes of cancer, and specifically one mutation that frequently occurs in pediatric leukemia. This mutation is designated WHSC1 E1099K, which names the affected protein, Wolf-Hirschhorn Syndrome Candidate 1 (WHSC1), and the amino acid changed by the mutation, a glutamic acid (E) at position 1099 changed to a lysine (K). Our lab’s previous research showed that WHSC1 is one of several proteins that control interpretation of genetic information when DNA is transcribed into RNA. WHSC1 affects how much RNA is produced from genes by modifying histone proteins that package the DNA. We also showed that the E1099K mutation causes WHSC1 to malfunction and consequently certain genes produce more RNA than intended (overexpression). The NCI TARGET data would allow us to examine patient data and identify specific genes overexpressed in patients after their cells gained WHSC1 E1099K. We will study these genes and their encoded proteins further to find specific vulnerabilities that can be exploited to improve therapy for pediatric leukemia patients. Our research goal is to identify genes that are misregulated due to the E1099K mutation of the histone methyltransferase WHSC1. This mutation is detected in pediatric acute lymphoblastic leukemia (ALL) and most frequently in cases of relapse. We developed a model using CRISPR gene editing to disrupt the endogenous E1099K allele of cell lines established from pediatric ALL patients. Loss of E1099K decreased cell proliferation and survival in correlation with a large subset of genes with reduced expression. The NCI TARGET dataset would answer critical questions regarding this research: Are expression changes that occur with loss of E1099K the same genes that are aberrantly activated upon gain of the mutation? Although WHSC1 has a known correlation with active expression, it is not clear what protein factors or DNA sequence features regulate its recruitment and activity. Are the genes identified through our cell line model relevant to clinical progression of ALL? Seeing the same genes identified by analysis of the TARGET data would reveal these targets are also expressed and relevant in patient cancers. The TARGET dataset includes three DIAGNOSIS/RELAPSE sample pairs from pediatric ALL patients where the E1099K mutation is only detected in the RELAPSE sample. Alignment and quantitation of RNA-Seq reads from these sample pairs will be analyzed to identify genes differentially expressed in the presence of the mutation. Based on our results we expect most changes will be gene upregulation, but this could differ from the clinical scenario where other de novo mutations simultaneously arose during relapse. Genes identified from each DIAGNOSIS/RELAPSE pair will be compared to reveal common targets, which could represent direct targets of the WHSC1 mutant. Gene set enrichment analysis (GSEA) will also be performed on differentially expressed genes to identify specific pathways that are consistently altered by WHSC1 mutant activity. These results will be compared to results from our CRISPR model to establish a high-confidence gene set targeted by WHSC1 E1099K that will guide further experiments seeking to find new approaches to reduce disease relapse in pediatric ALL. Liu, Jack OMICSOFT CORPORATION Genetic and transcriptional landscapes in childhood cancers Feb06, 2014 closed Identifying the genetic alterations and biological pathways is essential for understanding the initiation of cancer, how a tumor progresses, and why a treatment is effective to some patients but not to others. In this project, we plan to perform integrated data analysis using the TARGET (Therapeutically Applicable Research to Generate Effective Treatments) datasets, and to identify genetic mutations, gene fusions and other potential driving factors that are unique to childhood cancers. Our goal is to identify clinical biomarkers for improved diagnosis and potential therapeutic targets for drug intervention in pediatric cancers. The research objective of analyzing the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) data sets is to identify biomarkers and potential novel targets for drug intervention in pediatric cancers. Omicsoft Corporation has developed advanced software and novel algorithms for analyzing next-generation sequencing (NGS) data, including the Omicsoft aligner (OSA) for mapping RNA-Seq reads and the novel gene fusion detection method – FusionMap. We have gained significant knowledge from analyzing The Cancer Genome Analysis (TCGA) data and identified a number of recurrent gene fusion and alternative splicing events across multiple tumor types. It is well known that the types of cancer found in children are very different from those in adults. We plan to investigate whether the candidate gene fusions and splicing forms identified in adult cancers also occur in pediatric tumors, which could potentially shed light on the mechanism of tumorigenesis. With the access to TCGA and TARGET data, we are particularly interested in performing systematic comparisons of pediatric and adult AML data, searching for possible oncogenic drivers for AML. The results from analyzing TARGET data are expected to further our knowledge on cancer biology, and provide basis for improved diagnosis and effective treatments of pediatric cancers. Liu, Tao UNIVERSITY OF NEW SOUTH WALES The role of noncoding RNAs in neuroblastoma Aug16, 2012 closed Neuroblastoma is the most common solid tumor in early childhood. Recent research has identified several novel genetic mutations in human neuroblastoma tissues, and novel therapies targeting the genetic mutations will potentially lead to better patient survival. In this project, we propose to identify the noncoding RNAs which may play critical roles in the initiation and/or progression of neuroblastoma Noncoding RNAs are emerging as key players in tumor initiation, progression and metastasis. While full genome sequencing has revealed several novel genetic mutations in human neuroblastoma tissues in the last several years, there are limited systematic studies of noncoding RNAs in human neuroblastoma tissues. In this project, we propose to analyze the neuroblastoma RNA-Seq (polyA+) data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative, and identify the noncoding RNAs which are either up- or down-regulated. We will then examine the role of the noncoding RNAs in the initiation and/or progression of neuroblastoma in vitro and in vivo Liu, Tsunglin NATIONAL CHENG KUNG UNIVERSITY molecular mechanisms of neuroblastoma metastasis Mar11, 2016 closed Neuroblastoma (NB) is one of the most common cancers in infancy. To understand whether mutations of certain genes are involved in NB, a group of researchers had sequenced the genomes of many NB patients and identified few recurrently mutated genes in NB. Via other means, we identified some genes that might play a role in NB and like to know whether the genes undergo mutations in the NB patients using the same data from the research group. The purpose of our research project is to understand the molecular mechanisms of neuroblastoma (NB) metastasis. One component of this project is to examine potential mutations of genes of interest in neuroblastoma patients using the raw data from the NCI neuroblastoma initiative. Mutation analyses will be compared between limited and advanced stages of NB patients, as well as between the genes and their corresponding RNAs. We will not combine the requested datasets with datasets outside dbGaP. The outcomes of the primary study focus on mutations with a frequency higher than that of background mutations. However, we wish to include the low-frequency mutations and will use GATK to call SNPs for each NB patient. Liu, Xiaole DANA-FARBER CANCER INST Identification of key immune repertoires for acute myeloid leukemia (AML) Jun05, 2017 closed Cancer immunotherapy has received great success in treating late stage melanoma. However, the progress of the traditional therapy development is relatively slow. We want to investigate AML from the perspective of immune repertoires to find potential predictive biomarkers for patient outcome under immunotherapy. Acute myeloid leukemia (AML) is a cancer that originates in the bone marrow from immature white blood cells known as myeloblasts. About 25% of all children with leukemia have AML. Despite the remarkable progress that have been made in some leukemias such as CML, cytotoxic treatment for AML remains basically unchanged over the last 4 decades. Given the slow progress of the traditional therapy development for this disease, many novel immunotherapies have been explored. Better understanding of immune system of AML patients could lead to promising biomarkers and be extremely helpful to new therapy development. In the past our group has developed a computational method to identify the T/B cell receptor repertoires and estimate immune infiltration abundance from RNA-seq data. Using the AML data from TARGET, we will apply our methods to study the T/B cell receptor repertoires. We will compare the repertoire difference between peripheral blood and bone marrow to study the process of immune repertoires derivation and identify key biomarkers. We will also investigate the relationship between key immune repertoires and patients' clinical outcome. It would be very interesting and helpful to study AML from the perspective of immune repertoires. We will use the phs000816.v2.p1 RNAseq data as normal blood reference to control the impact of age. The study will investigate primary AML samples from both TARGET and TCGA. RNA-seq data from non-tumor samples were used from PRJNA263846 was used as a control. Liu, Yang CHILDREN'S RESEARCH INSTITUTE miRNA and Wilms tumor pathogenesis Dec28, 2015 closed We will use the Wilms tumor miRNA sequencing data to investigate the role of miRNA in wilms tumor development and also to identify the novel miRNA associated with this disease. The objective of study is to understand the role of miRNA in Wilms tumor (WT) development, I would like to compare the miRNA expression profile of Wilms tumor samples where miRNA processing genes (miRNAPG) DROSHA and DICER1 are mutated with wild type Wilms tumor. Statistical analysis will be performed on miRNA-seq data to identify the significant miRNAs which are deferentially expressed in these conditions compared to normal sample. We would like to correlate the miRNAPG mutated data and wild type Wilms tumor data to investigate which miRNAs are deferentially expressed in non miRNAPG mutated sample as it is already known that about 12% Wilms tumor cases are having miRNA processing gene mutated, which indicates that there are other drivers and miRNAs that causes WT in most of the cases (~88%). Further, significant miRNAs will be tested with WT biological samples in our lab for their expression profile and gene target. Therefore I would like download BAM files or .fastaq file for phs000471 from NCBI SRA. Lock, Richard UNIVERSITY OF NEW SOUTH WALES Molecular determinants of drug responses in ALL Oct20, 2023 approved This project will investigate the potential for developing predictive biomarkers using modern methods in systems biology, incorporating molecular data with in vivo preclinical drug efficacy data in acute lymphoblastic leukemia (ALL). We propose to use the molecular data from the TARGET initiative to reconstruct key gene-gene interactions and measure gene pathway activity. This will serve as the basis for capturing biological signatures that are associated with drug response and will be used for generating predictive biomarkers for use in a clinical setting through Australia’s ZERO2 personalized medicine program. Objectives of the proposed research Our research project aims to identify molecular determinants of drug response and facilitate the translation of predictive biomarkers for clinical use in acute lymphoblastic leukemia (ALL). Study design Our lab is part of the NCI Pediatric Preclinical in Vivo Testing (PIVOT) program and has generated a large body of preclinical efficacy data for many different drugs in pediatric ALL using patient-derived xenograft (PDX) models. We are proposing to use molecular data (including RNA-seq and miRNA-seq) to build disease-specific gene interaction networks using bioinformatics tools (e.g NetBID2). The gene interaction networks will enable us to quantify gene and pathway activity across our own pediatric ALL PDXs and identify signatures which, in combination with preclinical drug testing, may be used to predict drug sensitivity. Promising biomarker signatures will be identified, validated, and provided to clinicians within personalized medicine programs such as our own, ZERO2. As an additional aim, we intend to use this analysis to identify novel drug targets and drive new hypotheses for future research in our lab. Analysis plan Data will be stored locally behind an institutional firewall in a restricted access folder for our internal research team. Data will be copied temporarily onto a secure private access high performance computing cluster (NCI Gadi) for read processing, which includes alignment and gene quantification using STAR, RSEM, Arriba and Picard suite, and removed from the cluster upon completion. RNA expression derived from the dbGaP dataset will be used as the input for NetBID2/sjaracne to generate a gene interaction network to calculate gene activity scores with our own PDXs. Differential activity between sensitive and resistant PDXs will be determined with NetBID2. Consistency with data use limitations The dbGaP data will be strictly accessed by members of the PI’s research team or the personnel listed in this application, and only for the project listed above. Raw fastq files are necessary to ensure the bioinformatics methodology is consistent between the dbGaP data and our own and will be destroyed upon project completion. The gene interaction networks constructed in this project will contain summarized information about gene associations (for example, correlation or mutual information) across the cohort and will not contain any information regarding individual patients. Planned collaboration with researchers at other institutions. We do not have any planned collaborations with researchers outside of our institute. Lock, Richard UNIVERSITY OF NEW SOUTH WALES Molecular analysis of Ph-like pediatric acute lymphoblastic leukemia Aug12, 2016 closed Cancer is a disease driven by genetic mutations. Analysis of the requested dataset in the applicant laboratory will result in the design and testing of new targeted drug treatment strategies to improve the treatment of high-risk childhood leukemia. The applicant has established patient-derived xenografts (PDXs) from TARGET acute lymphoblastic leukemia samples provided by the Children's Oncology Group. In 2013 the applicant's laboratory shipped approximately 175 PDX samples back to the TARGET consortium for molecular analysis, the results of which are now included in the requested dataset. The objective of the proposed research using the requested dbGaP dataset is to mine the PDX data and compare it to the matched primary patient sample originally used to establish the PDX in the applicant’s laboratory. The study design will involve comparing PDXs and their matching primary patient sample for single nucleotide variants, insertions and/or deletions and gene expression in order to assess: (1) how closely the PDX reflects the primary patient sample; and (2) identify actionable mutations for future preclinical drug testing. The analysis plan will involve relating genetic variants in the PDXs to possible susceptibility to targeted anti-leukemic drug therapy. The data will be maintained in secured network databases and not released to other investigators. The applicant does not intend to combine the requested dataset with any other dataset(s). Loewer, Martin TRON -TRANSLATIONAL ONCOLOGY GGMBH Molecular Profiling of Pediatric Cancers for Individualized Cancer Therapies Sep12, 2018 closed We are developing cancer therapies that utilize a patient’s immune system to target the cancer cells. To teach the immune system, which cells are cancer cells and need to be targeted, we are developing so-called biomarkers. These biomarkers can be structures on cell surfaces that can be detected by other cells. We aim to analyze the provided data to evaluate the presence of known biomarkers or determine the presence of yet unknown markers. Additionally, we aim to improve our understanding of changes that occur in the DNA of cancerous cells. The results will be evaluated to help us understanding tumor and immune system interactions. We are developing cancer biomarkers and immunotherapies. We have demonstrated that our novel therapeutic platform, RNA encoded tumor antigens, can induce T cell immune responses to genes with tumor-specific expression pattern, and can even induce T cell immune responses and tumor control against neo-epitopes, i.e. tumor specific somatic mutations. We have started clinical trials in melanoma using both approaches. With our established GMP production of RNA, we are developing a warehouse of therapeutic RNA vaccines targeting multiple tumor antigens. We start the process by examining a tumor class for antigens, including both, genes with tumor-restricted expression and tumor specific non-synonymous somatic mutations. We perform this by using sample cohorts sequenced on our Illumina HiSeq 2500, NovaSeq 6000, and from externally generated NGS reads. After identification of somatic mutations and expressed antigens, we evaluate, experimentally and in silico, whether the potential antigen is likely to be immunogenic. From the set of candidates, we select a collection of antigens that has broad coverage across patients and proceed to develop the RNA vaccines. Here, we specifically want to use primary pediatric genome, exome and/or transcriptome sequencing data to evaluate the presence and expression of known biomarkers and to investigate pediatric cancer restricted gene/transcript/exon expression patterns. Our research will include the search for pathogenic signatures. These datasets will help us to further improve our software and our understanding of genomic structural variations and gene fusions, as well as allelic biases, and improve our mutation calling methods and investigate profiles of somatic mutations in tumors. The results will be functionally evaluated to help us understanding tumor and immune system interactions. We intend to publish all of our findings, including those derived from your database, in peer-reviewed journals. Loewer, Martin TRON -TRANSLATIONAL ONCOLOGY GGMBH Characterization of tumor antigen expression in childhood sarcoma Aug15, 2024 approved We are developing cancer therapies that utilize a patient’s immune system to target the cancer cells. To teach the immune system, which cells are cancer cells and need to be targeted, we are developing so-called biomarkers. These biomarkers can be structures on cell surfaces that can be detected by other cells. We aim to analyze the provided data to evaluate the presence of known biomarkers or determine the presence of yet unknown markers. The results will be evaluated to help us understanding tumor and immune system interactions. Our research in this project will focus on previously neglected classes of biomarkers in childhood cancers, as these are different from adult cancers in terms of genetics and structures presented on the cell surface. The identification of the tumor antigen landscape holds the promise to identify immunotherapy targets in childhood cancers. In contrast to adult cancers, childhood cancers show a much lower mutational burden, and subsequently have less options for targeted therapies against actionable targets and neoepitope-based immunotherapy approaches. Therefore, our project is focused on the identification of alternative tumor antigens like overexpression targets, splice variants and gene fusions within and across childhood sarcoma subtypes. We address our target discovery approach to childhood sarcoma with a high medical need, and want to understand the composition of expressed and potentially presented tumor antigens in sarcoma subtypes like osteosarcoma, Ewing sarcoma and rhabdomyosarcoma, and whether the antigen expression correlates with clinical or molecular features of the young patients (e.g. treatment, survival, risk factors, biomarker, or genomic alterations). We will process multiple childhood sarcoma RNAseq datasets from dbGaP, as well as other repositories and in-house data for detection and independent validation of tumor antigens. Our target discovery pipeline will be followed by further characterization of antigens to determine cancer-specificity, immunogenicity, and finally the ability for clinical translation into novel immunotherapy approaches. This research project is strongly supported by clinical and pathology experts for these sarcoma entities. RNAseq data analyses includes standard expression analysis of annotated genes, as well as prediction of novel fusion genes and splice variants, and subsequent prediction of patient- specific epitopes that may be recognized by the immune system. All datasets will be processed separately from raw FASTQ/BAM files using the same bioinformatic pipeline. We will then compare results across different cohorts of childhood cancer, including the occurrence, frequency, and expression levels of expressed antigens, predicted fusion genes, and splice variants. We will investigate whether these correlate with clinical features within and across multiple datasets, and if they have a high probability to be presented by the cancer cells. Our analysis will focus on providing a comprehensive description of the tumor antigen landscape within childhood sarcomas as a starting point for the development of novel immunotherapies in young patients. We see no additional risk to participants. We plan to publish our findings in scientific journals and via conference contributions. Loh, Mignon UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Genomic Analysis of Monosomy 7 B-Acute Lymphoblastic Leukemia Oct23, 2019 closed Monosomy 7, an unusual cytogenetic abnormality in patients with pediatric B-cell acute lymphoblastic leukemia (B-ALL), portends a poor prognosis. In recent data from patients with acute myeloid leukemia (AML), SAMD9, a gene that may play a role in regulating cell proliferation and apoptosis, was identified as a germline alteration present on chromosome 7 that is then lost in leukemic cells. It is now known that SAMD9 and SAMD9L mutations cause a spectrum of multisystem disorders that carry an increased risk of developing myeloid malignancies with somatic monosomy 7, but this has not been systematically explored for lymphoid malignancies. Rare germline mutations have been linked to familial predisposition to other subtypes of childhood ALL and therefore we will identify if germline mutations in SAMD9 or SAMD9L are present in monosomy 7 ALL patients. Monosomy 7 is associated with many myeloid disorders; however, the significance and pathogenesis of such abnormalities in B-cell acute lymphoblastic leukemia (B-ALL) is unknown. Event-free survival and overall survival for patients with a loss involving chromosome 7 compared with patients without this aberration have been shown to be worse. In acute myeloid leukemia (AML), SAMD9, a gene that may play a role in regulating cell proliferation and apoptosis, was identified as a germline alteration present on chromosome 7. Germline missense SAMD9 mutations have been identified as the cause of MIRAGE syndrome, a disorder characterized by an elevated risk of early-onset myelodysplastic syndrome (MDS) with somatic monosomy 7. In monosomy 7 leukemic bone marrow cells of these patients, germline SAMD9 mutations were absent, indicating a selective pressure of leukemia to eliminate particular genes. Given the role of these genes in monosomy 7 AML, these genes may also contribute to leukemogenesis in cases of monosomy 7 B-ALL. In order to determine the presence of SAMD9/SAMD9L germline mutations in patients with non-hypodiploid, non-Ph+ monosomy 7 B-ALL, we plan to obtain germline DNA samples from the Children’s Oncology Group biospecimen bank. We have identified 120 such patients with germline DNA available. These samples will undergo germline sequencing of SAMD9/SAMD9L using a targeted sequencing assay to identify mutations and determine the percentage of patients with germline lesions in these two genes. The next-generation sequencing (NGS) panel we will use is CleanPlex® Custom NGS Panels by Paragon Genomics. We successfully performed control testing of our custom panel using monosomy 7 AML/MDS samples with known SAMD9/SAMD9L mutations. After determining the presence of germline lesions, we will request companion leukemic diagnostic and/or relapse samples of those germline samples that show SAMD9/SAMD9L mutations to assess whether these mutations are then lost during leukemic growth. Before obtaining these samples, we hope to interrogate TARGET for preliminary data from monosomy 7 B-ALL germline samples sequenced by St. Jude. This data will help to inform our project which we hope will ultimately provide previously undescribed insight into the genomic landscape of this high-risk B-ALL subset. This could subsequently lead to more effective treatments, and identifying germline lesions may also have important implications for counseling of patients and families with monosomy 7 B-ALL. We will perform the analysis of germline events with the guidance of bioinformaticians at UCSF. Longabaugh, William INSTITUTE FOR SYSTEMS BIOLOGY ISB Cancer Genomics Cloud TARGET Jan17, 2017 approved The growth of large-scale DNA sequence data for cancer research and its routine use in translational science is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic, proteomic, and imaging data, combined with associated clinical annotations collected from various studies, is critical to accelerating research and making new discoveries. This project aims to support the development of a new model for data analysis that will allow groups ranging in size from single laboratories to large research consortia to derive value from the investments made in the TARGET project without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. The ISB Cancer Gateway in the Cloud (ISB-CGC) is one of three NCI Cancer Research Data Commons (CRDC) Cloud Resources, supporting a new model for the computational analysis of biological data in which a data repository is co-located with computational capacity, with an interface that provides data access while ensuring data security. The research objective of this project is to provide access to the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) data (hosted in Google Cloud Storage by the NCI Genomic Data Commons) to users of the ISB-CGC, via the Data Commons Framework (DCF), within the NCI Cancer Research Data Commons (CRDC). Doing so will allow users who do not otherwise have access to the necessary infrastructure to compute against this large dataset, to do so quickly and efficiently using Google Cloud resources. Example analyses include mutation calling, integration of data types, and analysis of pathways and regulatory networks. In support of this effort, our institution applied for and was granted NIH Trusted Partner status in July 2015. We have implemented the necessary authentication and authorization, and logging protocols to ensure that only dbGaP-authorized users are able to gain access to TARGET controlled-access data. Our FISMA moderate Authority to Operate is maintained through a continuous process of testing, monitoring and auditing, with issues tracked in our Plan of Actions and Milestones (POA&M). The ISB-CGC, which (since late 2021) now hosts the TP53 database for the NIH DCEG, is also working to provide linkages between somatic and germline TP53 mutations identified in that database with TARGET case IDs. Lopez, Elixabet UNIVERSITY OF THE BASQUE COUNTRY Role of non-coding RNAs in pediatric malignancies Jul22, 2022 approved Pediatric cancer is the most common cause of death from disease in children. In order to improve survival, it would be of great interest to stablish markers that could help identify patients that will have a poorer response to treatment, to provide them with a more adequate treatment. In this context, recent investigations point to non-coding RNAs, molecules with a role in the regulation of genes, as useful biomarkers in cancer as well as important players in the development of the disease. Therefore, in this work, the main goals were to determine a non-coding set with utility as markers for pediatric cancer, as well as to decipher the mechanism of action of deregulated non-coding RNAs in the disease. Personalized precision medicine is currently emerging as a tool that seeks to make the treatment of pediatric cancer more effective. However, there remains a shortage of biomarkers to guide treatment decisions and assess response to treatment. We believe that the non-coding part of the genome can be very useful in this regard. Therefore, the aim of this work is to define a panel of non-coding RNAs useful as biomarkers in pediatric cancers. In order to do that, we will perform a differential expression analysis of non-coding RNAs and protein-coding genes between patients and controls, patients with different subtypes, and among groups of patients with different clinical phenotypes, as well as survival analyses (Kaplan Meyer) using the expression of the most significant genes. We will also look for association between the expression of the most significant non-coding RNAs and the expression of protein-coding genes in order to stablish the mechanism of action of the most interesting non-coding RNAs. LOPEZ-BIGAS, NURIA FUNDACIO INSTITUT DE RECERCA BIOMEDICA Expanding the analysis of driver mutations Apr17, 2017 approved The project will be focused on the discovery of new driver mutations and driver genes and also to expand the current driver event databases that are available. The aim of the project is to go further in the discovery of driver mutations and cancer driver genes in pediatric malignancies, currently underrepresented in analysis of driver genes. We plan to use various computational tools such as Intogen(https://www.intogen.) and OncodriverFML(http://bg.upf.) to identify mutations involved in tumorigenesis or resistance to cancer therapies in pediatric mlignancies. Both tools look for signals of positive selection and compare the findings with known cancer genes stored in a large within dataset. The analysis will also serve to enrich the driver event databases of different cohorts and cancer types, specially in hematopoietic malignancies, of the tools. All results from the analysis undertaken in this project will be published in scientific journals for the benefit of the broad pediatric cancers research community. Loving, Kathryn STEM CENTRX, LLC Compare NCI TARGET RNASeq and sequencing data with internal tumor sample data Jan29, 2016 closed Stemcentrx is combining a stem cell centric philosophy and cutting edge technologies to develop novel cancer therapies and diagnostics. Founded in 2008 with the intent of developing life-changing therapies for cancer patients, the company has developed proprietary genetic and proteomic discovery platforms that yield unprecedented insight into the biology of solid tumors. The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) data will be used to extend our drug discovery work to childhood cancers. Specifically, Stemcentrx has built a custom next-generation sequencing analysis pipeline that maps and analyzes data from patient-derived xenograft (PDX) models. We plan to compare the gene expression and mutation data found in our samples to those in TARGET to validate promising drug targets in childhood cancers. We plan to use the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) mRNA-seq data to conduct drug discovery research on childhood cancers. 1. Research objectives - Compare gene expression and sequencing data from TARGET with our internal data and other publicly-available data sources. This comparison may be used to confirm associations between genetic components and to validate promising drug targets in childhood cancers. - Complement our knowledge of the genetic components involved in tumor formation in all cancer types, in particular to look for similar cancer drivers between adult and pediatric cancers. Such similarities, if found, would suggest potential cross-indication drug targets. 2. Study design - Illumina RNASeq data from TARGET for both tumor and normal samples will be used to compare directly with our internal next-generation sequencing data. - Whole-genome, whole-exome, and targeted resequencing data from TARGET for both tumor and normal samples will be used to compare directly with our internal gDNA sequencing data. 3. Analysis plan - A variety of industry-standard gene expression analyses will be conducted to compare pediatric with adult cancers. 4. Explanation of how the proposed research is consistent with Use Restrictions for the requested dataset. - We will retain control over the data, following dbGaP security best practices and will not distribute the data to any entity not covered in this Data Access Request. We will not attempt to identify or contact individuals whose data is included in TARGET. We acknowledge the Intellectual Property Policies as specified in the Data Use Certification. LUO, XINGGUANG YALE UNIVERSITY Genome-wide and candidate gene association study of acute myeloid leukemia May16, 2024 approved Acute myeloid leukemia is a genetic complex disorder. GWAS is very helpful to identify the risk regions for this disorder. However, some risk regions might be missed in previous studies. In this project, we aim to identify novel risk loci. Acute myeloid leukemia clearly has genetic components. Previous studies using some traditional analytic methods, such as single-marker-based, gene-based, pathway-based, haplotypic, population-based and/or population-structured association analyses, etc., identified some risk genes. However, other risk regions might still be missed. In this project, in addition to the traditional methods, we aim to develop more analytic methods to detect novel risk loci. Significance: This project will enable researchers to better understand the etiology of acute myeloid leukemia. Analysis plan: Traditional analytic methods will be used and novel analytic methods and designs will be developed to identify novel risk loci. Expected results: We expect to find some chromosome regions of interest for acute myeloid leukemia. Data security and Acknowledgement: We will strictly comply with the restrictions of data use in each study specified in dbGaP DUC Agreements (including those embargo release dates). All data will be kept confidential and will not be shared with any unauthorized people. We will publish any scientific significant finding from this analysis. We will acknowledge NIH GWAS Data Repository, the Contributing Investigator(s) who contributed the phenotype data and DNA samples from his/her original study, and the primary funding organization that supported the contributing study in all oral and written presentations, disclosures, and publications resulting from any analyses of the data. We will include the dbGaP accession number to the specific version of the dataset(s) analyzed. Ma, Xiaotu ST. JUDE CHILDREN'S RESEARCH HOSPITAL Analysis of etiology of structural rearrangements in pediatric cancers Mar25, 2021 rejected We will determine the molecular patterns of cancer type defining rearrangements in pediatric cancers. We expect our study to reveal novel insights on the biology of pediatric cancer, clinical management, and potentially therapeutic insights. The GTEX dataset will serve as a nice control dataset for our study. Oncogenic fusions are known to drive many types of pediatric cancers. However, the etiologic mechanism of these fusions is not well understood. With extensive experience in analyzing the somatic mutation landscapes (https://www.ncbi.nlm.nih.gov/myncbi/xiaotu.ma.1/bibliography/public/), the Ma lab is focused on understanding the molecular patterns of somatic structural variants (SV). We are currently analyzing multiple genome and transcriptome datasets, including those generated by 1) St. Jude Children’s Hospital-Wash U Pediatric Cancer Project of >1000 pediatric cancer cases; 2) Dr. Jeff Klco (St. Jude Pathology, on pediatric AML) and COG (Children’s Oncology Group) investigators 3) Drs. Patrick Brown/Mignon Loh (on pediatric ALL) and 4) Dr. Soheil Meshinchi (on pediatric AML). Here we request access to the genome and transcriptome data generated by TARGET as supplement to our datasets and to validate our findings, which in turn will ensure translational potential and reproducibility of our discoveries. We will analyze the somatic SVs, SNV/Indel, and CNVs in pediatric AML, B-ALL, T-ALL, WT, OS, and NBL tumors in TARGET cohort. Because we only focus on analyzing somatic variants, this project will not create any additional risks to participants. We anticipate to reveal novel insights to the molecular mechanisms of cancer type defining rearrangements, its association with clinical management, and potentially therapeutic insights. (below updated April 30, 2021) All significant findings from this project will be shared with the scientific community through journal publications and/or conference presentations. (Oct 1, 2024) We have published following paper in 2023 (PMID:37019972). This current request is to add cohort phs002276 and phs002005. Madhavan, Subha GEORGETOWN UNIVERSITY Develop methods for predictive analysis of late effects of treatment in pediatric cancers Dec19, 2013 expired Cancer is increasingly a Big data problem. We need new methods and way to look at cancer research data to extract meaningful information about therapies and clinical outcomes. Our group at Georgetown develops informatics methods for secondary use of clinical data from Electronic Health Records and research. We will apply similar methods to the Target data to see if we can infer new connections between patient characteristics, therapies and outcomes. We are planning to conduct additional analysis using RNAseq data on Osteosarcoma patients for assessment of gene expression. Our expertise in this field includes analysis of RNA-seq data to understand gene expression and mutations in the RNA, looking at the data from an immuno-oncology perspective, predicting HLA type I and II status and extracting viruses. We also have expertise in analysis of microRNA and microRNA-seq data, and integration of multi-omics data We are also planning to exnted our analysis pipeline to other pediatric cancer data in the TARGET collection. The TARGET data collection is a complete set of multi-omics data across pediatric cancer patients making it very valuable for research and teaching graduate students in our genomic data science class. We plan to continue to use Target datasets to develop informatics methods to analyze pediatric cancer datasets. We will develop novel methods to mine the data through natural language processing, apply machine learning approaches such as SVM-RFE, Random Forest to extract features that might correlate strongly with outcomes. We plan to develop these methods using our internal dataset and will use the target data to validate the methods. We will then publish our results as well as our own datasets for consumption by the biomedical community for further research. We may do some meta analysis with other studies not in dbGAP. The analysis will be at the germline variant level only and does not pose any additional risks to patients involved. The other datasets are not from the same patients. We will combine this data with germline data that we have generated from patients seen at the Inova hospital system. Magee, Jeffrey WASHINGTON UNIVERSITY Cooperating mutations and developmental regulators in pediatric acute myeloid leukemia Sep30, 2020 expired We are interested in understanding why childhood leukemias are different from adult leukemias. Children have mutations, and we have studies how these mutations behave in young blood cells in mice. We plan to use the public leukemia data to extend our mouse studies to benefit human patients. We are interested in mechanisms that drive pediatric acute myeloid leukemia (AML) initiation. We have discovered that cells of origin change through the course of development. Thus, normal developmental switches can determine whether a cell is competent to transform into AML when it acquires a given pediatric driver mutation. We have also found that mutations synergize with one another in different ways at different ages. This means that the consequences of targeted therapies may change depending on the age at which the AML arose and the cooperating mutations that are present. We have developed mouse models that recapitulate the complex interactions. To translate our findings to humans, we need to identify which developmental switches and which target gene profiles are conserved and contribute to the pathogenesis of human AML. We therefore request access to the gene expression profiling (mRNAseq and Affymetrix), the miRNA profiling and relevant metadata (e.g. cytogenetic abnormalities and mutation profiles). We will interrogate expression of specific gene expression signatures, identified in mouse AML progenitors, in human samples so that we can focus ongoing mechanistic studies. We would like access to the fastq/bam and cel files so that we can normalize the data for human-mouse comparisons. Mager, Dixie PROVINCIAL HEALTH SERVICES AUTHORITY Spatial clustering of rRNA and ribosomal protein mutations in cancer Jul17, 2019 closed Genetic information in DNA is first transcribed to a messenger RNA which is translated into protein via the ribosome. The ribosome itself is made up of ~80 proteins and 4 RNA molecules called ribosomal RNA (rRNA). Previous research has identified that some protein components of the ribosome are often mutated in blood and solid cancers and give rise to a specialized cancer ribosome, which could favor tumour growth. This project will analyze the cancer mutations in the ribosome and group different mutations based on their spatial proximity. To accomplish this, we will analyze The Cancer Genome Atlas sequencing data to identify mutations in cancers (comparing cancer DNA to normal DNA) within the genes encoding for the ribosome components. These mutations will then be projected onto a 3-D model of the ribosome and we will statistically test which physical areas within the ribosome are heavily mutated. We can then classify cancers by these mutational hotspots to see if they have prognostic relevance to patients and better inform us of how cancers develop. Spatial clustering of rRNA and ribosomal protein mutations in cancer Objective: To comprehensively determine the variation and mutational space of the human ribosome complex in cancer, and in particular paediatric cancers. Study Design: The human ribosome is a collection of 4 ribosomal RNA (rRNA) and ~80 proteins. A reference human ribosomal DNA (rDNA) is absent from the standard human reference genome and variant analysis of this high copy number gene (80-800 copies/cell) requires specialized alignment and variant calling. Ribosomal proteins are oncogenic drivers in a diverse set of cancers (RPL5, RPL10 or RPL11 in T-ALL and RPS15 mutations in CLL) indicating that perturbation of the ribosome plays a broad role in tumourigenesis. The central structural and catalytic core of the ribosome is its rRNA, it follows that mutation of the rDNA can similarly be oncogenic in the same capacity as ribosomal protein mutations. Our hypothesis is that rDNA is a proto-oncogene. We believe this is in particular true for paediatric cancers such as T-ALL, where ribosomal protein mutations are known cancer drivers. Analysis Plan: This analysis pipeline has been deployed successfully on smaller public datasets. Briefly, we align raw RNA- or DNA-sequencing reads from the cancer and their matched normal tissues to a reference rDNA sequence. Perform rDNA mutational and variational calling. We project the base variation data onto the rRNA structure in the human cryo-EM structure of the ribosome complex. Within this structure, we spatially cluster the 3-D distribution of variations and compare it against a set of random mutational simulations (which accounts for atomic heterogeneity in the structure). This allows us to empirically determine physical areas of the ribosome that are mutational hotspots. From the complete repertoire of ribosomal protein mutations in COSMIC, we discovered 21 distinct mutational clusters (p < 0.001), of which only 4 are characterized in the literature. In addition, a dataset of normal-tissue rRNA sequence variants is needed to characterize polymorphic variable regions of the ribosome from cancer-driver hotspots. Once rRNA mutational clusters are identified within a particular cancer, clinical parameters such as tumour-grade, progression-free survival and overall survival will be compared between cancers containing the mutations and those that do not. Data Use Limitations: Our study is in agreement with the TCGA, TARGET, CPTAC3 and GTEx Data Terms of Access and Data Access Policies as a primary research project. Our hypothesis and the literature support that these events are common to paediatric cancers in particular and therefore meet the TARGET requirement for a focus on paediatric cancers. Collaborators: None Man, Tsz-Kwong BAYLOR COLLEGE OF MEDICINE Analysis of cytokine/chemokine axes in pediatric osteosarcoma Dec09, 2020 closed The communication between blood proteins and tumor proteins can influence the development of cancer and the spread of the disease. The communication is regulated by messenger proteins in blood and receiver proteins in tumor cells. Osteosarcoma is the most common form of bone cancer in children. The survival of patients with this bone cancer has not improved in the past three decades. The goal of this study is to investigate the communication of a group of blood messenger proteins called cytokines and chemokines and their corresponding receiver proteins in osteosarcoma cells that known to regulate cancer growth and spread. We will study if the expression levels of the messenger and receiver proteins in osteosarcoma cells are changed and correlate the observed changes with the clinical risks of the patients. We expect the results of this study will lead to the development of new and better cancer markers and treatments for osteosarcoma. The main objective of the proposed research is to analyze the RNAseq data to dissect the RNA isoforms of different ligand and receptor pairs of cytokines/chemokines that correlate with the clinical risks of osteosarcoma patients. We have previously reported that circulating cytokines and chemokines play an important role in the prognosis of osteosarcoma patients. Our recent research showed that the expression of a specific cytokine/chemokine axis increased the development of metastasis in a mouse model of osteosarcoma. Furthermore, initial evidence from our lab suggested that the coordinated expression of ligand-receptor pairs of cytokines and chemokines may lead to the recruitment of specific immune cell subpopulations infiltrating to the tumor site. Since isoforms expression can influence the cellular behavior of the cytokine/chemokine axes, the study design of this proposed study is to conduct an in-depth analysis of the RNAseq sequence data from osteosarcoma patients to specifically map and quantify the expression of different RNA isoforms that have been previously reported in the literature, such as CXCR3A, CXCR3B, and CXCR3alt. The coordinated expression of the corresponding ligands, such as CXCL9, 10, and 11, will be studied. The expression of novel isoforms will be discovered and analyzed similarly. To understand the clinical significance of the ligand-receptor expression, their expressions will be correlated with metastatic status, response to chemotherapy, overall survival, etc. We will also correlate the expression of the ligands and receptors with the proportions of various tumor-infiltrated immune subpopulations after bioinformatic deconvolution of the RNAseq data, such as CD8+ T cells and different types of macrophages. To understand the role of paracrine signaling of these molecules, the tumoral expressions and the circulating levels of cytokines and chemokines and their receptors in osteosarcoma patients will also be correlated using the multiplex Luminex data that our lab has previously generated. Understanding the role of cytokine/chemokine axes may lead to the identification of new targets for therapeutic interventions. The proposed research is consistent with the data use limitations of the TARGET data, because our research focus is to understand the roles of cytokines and chemokines in pediatric osteosarcoma, which will lead to more effective treatments and better prognostic markers. The proposed research will be conducted in Dr. Man’s lab of Baylor College of Medicine and there are no external collaborations or collaborators will be involved. Mano, Hiroyuki UNIVERSITY OF TOKYO Search for novel chromosome translocations in childhood and AYA (adolescents and young adults) with ALL (Acute Lymphoblastic Leukemia) Jun26, 2015 closed Chromosome translocations in cancer are useful to predict prognosis and can be possible therapeutic targets, which would lead therapeutic success in the clinic. We are now searching for novel chromosome translocations in Japanese AYA with ALL, and have discovered some novel chromosome translocations. We wish to examine in the large TARGET datasets of childhood ALL the frequency of novel chromosome translocations thus identified. Further, we hope to identify novel chromosome translocations specific to childhood. The primary purpose of the research is identification of novel chromosome translocations in childhood, adolescents and young adults with Acute Lymphoblastic Leukemia (ALL). We are now searching for novel chromosome translocations in Japanese adolescents and young adults (AYA) cohort (15-24 age) by RNA-seq using next-generation sequencers, and have discovered some novel chromosome translocations. Here we wish to examine in the large TARGET datasets of childhood ALL the frequency of novel chromosome translocations thus identified. Further, we hope to identify novel chromosome translocations specific to childhood. If combined with our Japanese datasets, biological differences between childhood and AYA will be clearly illustrated, which enable us to gain deep insights into biology of ALL. When combining datasets for analyses, we focus on cancer-related, somatic chromosome translocations. Therefore, our study will not create any additional risks to the patients. Marcotte, Erin UNIVERSITY OF MINNESOTA Genetic predisposition, socioeconomic status, and neuroblastoma survival Aug03, 2023 approved Childhood cancer survival rates in the United States have improved dramatically over the last four decades. Despites these improvements, marked racial, ethnic, and socioeconomic disparities in outcomes persist. The goal of this project is to examine the associations between socioeconomic status, genetic predisposition, and tumor prognostic factors with neuroblastoma survival. We will additionally characterize the association between socioeconomic status and neuroblastoma tumor characteristics. Survival rates for neuroblastoma are known to be lower among non-Hispanic black patients compared to non-Hispanic white patients. Socioeconomic status and maternal education level have also been positively associated with incidence of neuroblastoma cases (OR = 1.15, 95% CI: 1.02, 1.30) even after adjusted for established risk factors. The goal of this project is to examine the associations between socioeconomic status, genetic predisposition, and tumor prognostic factors with neuroblastoma survival. We will additionally characterize the association between socioeconomic status and neuroblastoma tumor characteristics. Neuroblastoma is the most common solid tumor in children under the age of one. It displays remarkable phenotypic heterogeneity, resulting in differences in outcomes that correlate with clinical and biologic features at diagnosis. While neuroblastoma accounts for approximately 5% of all cancer diagnoses in pediatrics, it disproportionately results in about 9% of all childhood cancer deaths. It has only been recently that advances in genetics and genomics have allowed researchers to unravel the predisposing factors enabling the development of neuroblastoma and fully appreciate the importance of germline predisposition. We will use neuroblastoma patient and sequencing data which is publicly available through the Gabriella Miller Kids First (GMKF) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET) programs as well as socioeconomic data generated through NIH funding to accomplish the aims of this project. Our aim is to further define the role of hereditary genetic predisposition in neuroblastoma survival disparities by including socioeconomic status as a predictor of survival, and to characterize the pathogenic germline genomic landscape of short variants (SNV or indels >250bps). Specifically, we will examine the impact of SES, germline cancer predisposition, and tumor characteristics on patient survival. We will also determine whether tumor characteristics vary among patients of different SES strata. Early trials have already identified potential targeted therapies for ALK positive neuroblastoma in addition to better survival with increased screening. Marcotte, Erin UNIVERSITY OF MINNESOTA Racial and ethnic disparities in childhood cancer outcomes and survival Jan27, 2021 approved Childhood cancer survival rates in the United States have improved dramatically over the last four decades. Despites these improvements, marked racial, ethnic, and socioeconomic disparities in outcomes persist. Compared with non-Hispanic white children, non-Hispanic black and Hispanic children experience lower survival from many cancers, including leukemias, lymphomas, central nervous system (CNS) tumors, and extracranial solid tumors. The underlying causes of these survival differences are poorly understood and may vary by cancer type, and both biological and socioeconomic pathways have been proposed. Our goal is to identify molecular features of childhood cancers that are distinct among racial and ethnic minorities and may be associated with outcomes and survival. Childhood cancer survival rates in the United States have improved dramatically over the last four decades. Despites these improvements, marked racial, ethnic, and socioeconomic disparities in outcomes persist. Compared with non-Hispanic white children, non-Hispanic black and Hispanic children experience lower survival from many cancers, including leukemias, lymphomas, central nervous system (CNS) tumors, and extracranial solid tumors. The underlying causes of these survival differences are poorly understood and may vary by cancer type, and both biological and socioeconomic pathways have been proposed. Our goal is to identify molecular features of childhood cancers that are distinct among racial and ethnic minorities and may be associated with outcomes and survival. By uncovering these features, a better understanding of cancer biology among children and adolescents may lead researchers to identify potential gene targets for more specific treatments. Childhood cancer remains a top cause of death among children and adolescents. By conducting integrated analyses of multi-layer molecular data, we may begin to identify new therapeutic targets among high risk subgroups. To identify potential, actionable genes in childhood cancer across multiple tumor types, including acute lymphoblastic leukemia, acute myeloid leukemia, Wilms tumor, neuroblastoma, and osteosarcoma, we will use multi-omic data to identify genes that may play important roles in cancer pathogenesis and progression among racial and ethnic minorities. Our findings will be disseminated through presentations at national research meetings and through publication in peer-reviewed scientific journals. MARIS, JOHN CHILDREN'S HOSP OF PHILADELPHIA Therapeutically Applicable Research to Generate Effective Treatments (TARGET) for Neuroblastoma May27, 2010 closed This project is designed to provide a complete genetic characterization of the pediatric cancer neuroblastoma. In addition to our past published work, we are finalizing the largest neuroblastoma GWAS manuscript, as well as the final germline and somatic tumor sequence landscape manuscripts, all with much finer resolution. We have used these datasets to identify tumor-specific antigens for the development of cellular immunotherapy. I am submitting this application as the PI of the Neuroblastoma-TARGET and Neuroblastoma-GWAS projects. We will utilize the SNP, expression and sequencing data, as well as clinical co-variates, to discover disease causal variations and mutations. Additionally, we would like to use the RNA reads in GTEx to compare to that of TARGET data in combination with ligandomics data we have generated, such as to identify novel tumor targets. Marra, Marco PROVINCIAL HEALTH SERVICES AUTHORITY Comprehensive characterization of pediatric malignancies (TARGET) May16, 2019 approved The TARGET initiative uses a variety of technologies to characterize pediatric cancers. Combining the results from these different technologies will help verify all experiments and in combination provide a robust view of cancer in children. As part of the TARGET initiative, we are conducting next-generation sequencing of tumour, relapsed (where applicable) and matched normal samples for several pediatric cancers including, but not restricted to, rhabdoid, AML and neuroblastoma. The sequencing data will be used in conjunction with the data generated during other phases of the TARGET project to achieve a comprehensive view of high-risk pediatric cancer genomes and transcriptomes. We will also compare the variation and molecular signatures from the TARGET data that we are generating, to Canadian pediatric cases that are being characterized in a locally-based project. We are not combining the TARGET data set with the local dataset. Marra, Marco BRITISH COLUMBIA CANCER AGENCY Comprehensive characterization of pediatric malignancies (TARGET) May14, 2010 closed The TARGET initiative uses a variety of technologies to characterize pediatric cancers. Combining the results from these different technologies will help verify all experiments and in combination provide a robust view of cancer in children. As part of the TARGET initiative, we are conducting next-generation sequencing of tumour, relapsed (where applicable) and matched normal samples for several pediatric cancers including, but not restricted to, rhabdoid, AML and neuroblastoma. The sequencing data will be used in conjunction with the data generated during other phases of the TARGET project to achieve a comprehensive view of high-risk pediatric cancer genomes and transcriptomes. We will also compare the variation and molecular signatures from the TARGET data that we are generating, to Canadian pediatric cases that are being characterized in a locally-based project. We are not combining the TARGET data set with the local dataset. Martin, Renan JOHNS HOPKINS UNIVERSITY Osteosarcomas Sep05, 2024 approved We will pursue the following Aim: Aim 1. Analyze existing WGS data from samples from patients with sarcomas to identify tumor-specific candidate causal variants. Approach. WGS VCF files WGS from patients in the 3 projects that sequenced samples from patients with sarcomas through the Kids First DRC/CAVATICA collaboration and can be downloaded from or analyzed in CAVATICA. VCF files will be annotated and will undergo prioritization steps with a focus on our curated list of ~4,000 HIF-1 related genes. PhenoDB analysis pipeline. VCF files created as part of the WGS Kids First analysis will be uploaded in PhenoDB, where variants will be annotated using Annovar (version 2013_09_11) against a variety of data sources. Activity 3a) Rare coding candidate variants. We will identify rare (MAF<1%) heterozygous and homozygous coding and splicing SNVs and indels. Activity 3b) Prioritization of candidate genes (coding variants). Genes identified in Activities 3a will be classified as follow: level 1 (best) mutated in =3 probands with at least 2 variants that are NDL (novel, de novo, or LoF); level 2 mutated in =3 probands with at least one NDL; level 3 mutated in 2 probands with at least one NDL; level 4 mutated in 2 probands with at least one variant with CADD score >20. We will pursue the following Aim: Aim 1. Analyze existing WGS data from samples from patients with sarcomas to identify tumor-specific candidate causal variants. Approach. WGS VCF files WGS from patients in the 3 projects that sequenced samples from patients with sarcomas through the Kids First DRC/CAVATICA collaboration and can be downloaded from or analyzed in CAVATICA. VCF files will be annotated and will undergo prioritization steps with a focus on our curated list of ~4,000 HIF-1 related genes. PhenoDB analysis pipeline. WGS VCF files created as part of the Kids First analysis will be uploaded in PhenoDB, where variants will be annotated using Annovar (version 2013_09_11) against a variety of data sources. Activity 3a) Rare coding candidate variants. We will identify rare (MAF<1%) heterozygous and homozygous coding and splicing SNVs and indels. Activity 3b) Prioritization of candidate genes (coding variants). Genes identified in Activities 3a will be classified as follow: level 1 (best) mutated in =3 probands with at least 2 variants that are NDL (novel, de novo, or LoF); level 2 mutated in =3 probands with at least one NDL; level 3 mutated in 2 probands with at least one NDL; level 4 mutated in 2 probands with at least one variant with CADD score >20. Other genes can be prioritized based on phenotypes, animal models, function, expression and interaction, missense Z-score, pLI and LOEUF score, etc. For example, variants in genes responsible for phenotypes overlapping OD or MS such as PTPN11, COL2A1, ACP5, EXT1, and EXT2 will be prioritized. And the HIF-1 related genes will also be prioritized. All the genes classified as Level 1-4 will be submitted to a pathway analysis using g:Profiler for the identification of new pathways that may be affected in these patients. Activity 3c) Burden analysis. All the genes classified as Level 1 or 2 will also be submitted to a burden analysis. Burden analysis will be performed on probands and a control set of samples sequenced in our center. A contingency table containing the number of probands presenting qualified variants will be built for each candidate gene. The p-value (Fisher’s exact test) will be corrected by the Benjamini and Hochberg method [false discovery rate (FDR)=5% considered significant]. Maruvka, Yosef TECHNION-ISRAEL INSTITUTE OF TECHNOLOGY Molecular sub-types of pediatric tumors Oct28, 2021 approved Pediatric cancers differ from adult cancer by the fact that they have fewer mutations. The small amount of mutations makes it harder to find molecular subtypes. An optional direction to solve this problem is by increasing the amount of patients that will be analyzed. For that we will add to the TARGET data other datasets that were published since the TARGET data was published. This larger dataset is likely to give us more statistics power for detecting molecular subtypes. The revolution in DNA sequencing technology in the last decade, enabled sequencing many tumor samples. With that technology, collaborative large efforts such as the The Cancer Genome Atlas (TCGA), analyzed systematically many tumors across different tumor subtypes. One of the main discoveries of these sequencing efforts was that tumors have distinct molecular subtypes. The molecular subtypes can be described by either having distinct mutational processes, such as an increase in copy number changes, an increase in single nucleotides variants , or by having specific driver mutations. The search for molecular subtypes was very successful in adult tumors but more limited in pediatric cases. This is likely because adult tumors have many more mutations than pediatric cases; on average adult tumors have around 1 mutation per megabase while pediatric cases have <0.1 mutations per megabase. The problem of finding molecular subtypes in tumors with low mutation rate can be solved by adding more samples to the analysis. We would like to reanalyze the TARGET data in the search for molecular subtypes and to compare it to other datasets that were generated since the TARGET database was published such as the GENIE project and others. The molecular subtypes will be searched for using various data types such as single nucleotides variants (SNVs), copy number variations (CNVs), structural variations (SVs) and germline variants. We will use non-negative matrix factorizations to develop signature based analysis that will search for subtypes combining all the levels of the data that is available, while previous attempts looked at each level by itself. Masahito, Kawazu NATIONAL CANCER CENTER Discovery and validation of MEF2D-fusion regulated transcriptional network in childhoods with B-cell Acute Lymphoblastic Leukemia Aug28, 2019 closed Chromosome translocations in cancer are useful to predict prognosis and can be possible therapeutic targets, which would lead therapeutic success in the clinic. We are now searching for transcriptional network of MEF2D fusion, because B-ALL with chromosome translocations involving MEF2D is associated with inferior prognosis. We wish to examine the gene expression profiling of childhood ALL with MEF2D fusion and compare with that of other subtypes in the large TARGET datasets of childhood ALL. These results might clearly validate transcriptional network of MEF2D fusion we identified. The primary purpose of the research is identification of transcriptional network in B-cell Acute Lymphoblastic Leukemia (B-ALL) harboring MEF2D-fusion. B-ALL with chromosome translocations involving MEF2D is reported to be associated with dismal prognosis. To discover therapeutically actionable molecular targets under control of MEF2D-fusion, we are searching for transcriptional network by integrated biological assay including chip-seq, RNA-seq and knockdown assay with the use of B-ALL cell lines with MEF2D rearrangement. Here, we wish to examine the gene expression profiling of childhood B-ALL with MEF2D fusion and compare with that of other subtypes in the large TARGET datasets of childhood ALL. These results might clearly validate transcriptional network of MEF2D fusions we identified in cell line system. Starting with published FASTQ or BAM files, we use deFuse to detect chromosome translocations; TopHat2, HTSeq, and DESeq2 to calculate gene expression. Data from phs000463 (phase1) and phs000464 (phase2) will be used in accordance with data-use restrictions. We only focus on cancer-related, somatic chromosome translocations and gene expression profiling. Therefore, our study will not create any additional risks to the patients. McLaughlin, Richard PACIFIC NORTHWEST RESEARCH INSTITUTE An Analysis of Retrotransposon Activity in Neuroblastoma without Autoimmunity Apr12, 2021 approved We are studying a group of neuroblastoma (NB) patients that develop an unusual cancer complication: their NB causes an autoimmune disease that attacks the brain. It is not known why these patients (which are only 2% of NB patients) develop this complication while most do not. It is also not known what specific molecules the immune system “sees” and attacks in the brain or tumor. We are specifically interested in the activity of a set of viruses and virus-like sequences that reside within the genome called retrotransposons. These mobile DNA sequences comprise a large fraction of the genome but are only active in a small set of tissues after embryonic development is complete, including both cancer and the brain. We propose to measure whether retrotransposons may be differently active in children with NB, with and without autoimmune disease. We will use TARGET data, including gene expression and DNA methylation, to estimate retrotransposon activity in NB tumors from kids without autoimmunity. These data will allow us to begin to understand how the activity of these elements may be differently controlled in each patient group. Objectives of the proposed research: Our proposed work aims to investigate the expression of retrotransposons (RTs), especially LINE1s and HERVs, in neuroblastoma (NB) as part of a broader investigation of possible transcription or protein production from these elements in the context of paraneoplastic disease associated with NB. The longer-term goal is to test the hypothesis that proteins from these elements contribute to autoimmunity, using patient sera and CSF. Study design: We propose to use TARGET NB as comparators for our study of NB tumors (39) from patients with Opscolonus Myoclonus Ataxia Syndrome (OMAS), for which we already have RNAseq data available (Rosenberg et al, forthcoming). Reads will be mapped to reference L1 and H-ERV elements. We hypothesize that transcription of the youngest, most active of these elements may lead to immunogenic ORF expression, relevant for tumor restriction or regression. Follow-up studies would employ long-read RNAseq or RT-PCR to confirm expression localized to specific transcripts. Analysis plan: We plan to assess correlation of RT expression with features of both neoplasm (e.g. ploidy, histopathological characteristics, as well as prognostic measures, like EVS) and paraneoplastic disease (severity of neurological symptoms, response to treatment) for OMAS, and where possible, for NB controls. Extant data in TARGET, such as DNA methylation landscape, may also inform our model. Data use limitations: TARGET data, and extant data from OMAS patients, will only be used by the applicants in a full patient de-identified manner, according to the data use limitations provided by the study directors. McReynolds, Lisa James NIH Frequency of Myeloid Malignancy Predisposition Syndromes in an AML Cohort Jul22, 2022 closed Acute myeloid leukemia (AML) is the most common form of acute leukemia in adults (4 per 100,000) and the second most common in children (~700 cases per year in the US). AML is a clonal expansion of immature cells in the bone marrow. These proliferating immature cells interfere with normal blood cell production leading to low blood cell counts, causing symptoms such as fatigue, pallor, bruising/bleeding, or fever/infection. Most patients have sporadic disease, but there is a small subset of patients who have an increased risk of developing AML due to inherited genetic variants, called myeloid malignancy predispositions (MMPs). In the past, these genetic diseases were thought to be easily identified in patients due to recognizable physical features. Recently, it has been understood that these genetic diseases can go unrecognized. We would like to use the TARGET AML sequencing data to determine the frequency of MMPs. Identification of an MMP is critical for patient screening, management, transplant donor selection, and genetic counseling. It has long been understood that certain inherited disorders such as Fanconi anemia, Li-Fraumeni, Noonan and Down syndromes have a predisposition for myeloid malignancies, such as acute myeloid leukemia (AML). However, in the past two decades, due in part to rapid growth of genomics, newer predispositions to AML have been identified. This led to the most recent revision of the World Health Organization myeloid neoplasms and acute leukemias classification, carving out a section specifically for myeloid neoplasms with germline predisposition. Patients with germline myeloid malignancy predispositions (MMPs) may present with a variety of phenotypes. These MMPs include syndromes with other congenital abnormalities (e.g., Fanconi anemia), those with related cytopenias (e.g., familial platelet disorder, RUNX1), those with other organ system involvement (e.g., GATA2 deficiency) or those with AML as the presenting symptom (e.g., CEBPA). Our group is actively investigating the prevalence of germline MMPs in two large population-based exome sequencing cohorts (UK BioBank and DiscovEHR). We request use of the whole exome and whole genome sequencing data from the TARGET AML cases. We plan to re-call the sequencing data using a pipeline developed internally for accurate germline calling. We will then use the known inheritance patterns for each disease gene and zygosity of identified pathogenic variant(s) to determine the presence of MMP cases within the TARGET AML cohort. We will compare the rate of MMPs identified in the general population cohorts with the rate in TARGET AML cohort and other published AML cohorts. These comparisons will allow for the estimation of the penetrance of MMPs as well as a better understanding of the burden of germline disease within the pediatric AML population. Increased awareness of MMPs will lead to earlier disease identification, a critical step for proper screening and management of patients with these disorders, as well as for hematopoietic cell transplant donor selection and family planning. MCVICKER, GRAHAM SALK INSTITUTE FOR BIOLOGICAL STUDIES Identification of cis-acting regulatory mutations in pediatric neuroblastoma through analysis of allele specific expression Apr30, 2018 approved Human cells are diploid and therefore have 2 copies (alleles) of each gene. When a DNA mutation causes one copy of the gene to be mis-regulated, the expression of the gene shows ‘allelic imbalance’. The presence of allelic imbalance for a gene can therefore indicate the presence of genomic mutations that affect the regulation of a gene and this signal can potentially be utilized to detect driver mutations in cancer. In this study we will utilize allelic imbalance to identify candidate genes and their nearby DNA regulatory regions which may be important for development and progression of pediatric neuroblastoma. Mutations which affect gene regulation in cis are difficult to identify and characterize because gene expression is also influenced by differences in the environment and trans-acting factors. One way to identify cis-regulatory changes is through the analysis of allele-specific expression (ASE). ASE can be a signature of cis-acting mutations because mutations that affect promoter, enhancer and 3’UTR sequences are likely to affect the expression of only a single allele. In this project, we propose to identify genes which show ASE across multiple pediatric neuroblastoma (NBL) patients using Whole Exome Sequencing and RNA sequencing from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. We will then utilize Whole Genome Sequencing (WGS) data from TARGET to identify point mutations, insertions and deletions within DNA regulatory regions located within 1MB upstream and downstream of genes showing ASE. The main objective of our proposed research is to identify functional DNA regulatory mutations which are involved in the oncogenesis and progression of neuroblastoma. The proposed work is consistent with the Use Restrictions for the TARGET data sets and will be analyzed completely independently of any other datasets. The data analysis will be conducted in our lab in Salk Institute for Biological Studies and no inter-institutional collaborations are planned at this time. MEDINA, PEDRO UNIVERSITY OF GRANADA Mutational status of SWI/SNF chromatin remodeling complex subunits in childhood leukemia. Feb16, 2021 approved Leukemia is the most common form of cancer in childhood. Although the majority of pediatric patients with this hematologic malignancy are cured, leukemia remains as the most frequent cause of death from cancer in children. Genetic alterations are responsible for the uncontrolled proliferation of undifferentiated hematopoietic cells concerned with blood cell production. The regulation of chromatin architecture is essential for a proper hematopoietic development. In this context, the SWI/SNF complex is one of the main chromatin remodeling complexes and it is mutated in more than 20% of human cancers. Recent molecular studies have discovered new biomarkers for the classification, diagnosis and prognosis of leukemia patients, but alterations in the SWI/SNF complex have not been described. We are currently investigating the mutational state of several SWI/SNF genes to study its functional impact in leukemia. In order to perform this objective, we will analyse the requested mutation annotations of ALL and AML to assess whether these alterations may be responsible for patient prognosis. Leukemia is the most common childhood cancer. It can be classified as myelogenous or lymphocytic, according to the predominant type of cell involved (myeloid or lymphoid). Most pediatric leukemias are acute lymphocytic leukemia (ALL), while most of the remaining cases are acute myeloid leukemia (AML). Although these diseases have been studied, the underlying molecular basis remains unknown. Currently, most of the patients who are not cured have acquired resistance to the treatment, which often leads to relapse. The regulation of chromatin architecture is essential for physiological processes, including hematopoietic development. The SWI/SNF remodeling complex, generally composed of 9–15 subunits, is one of the main chromatin remodeling complexes present in human cells. Depending on the subunit composition, several classes of SWI/SNF may exist in the cell simultaneously, and their subunit composition and activity are tissue-specific. The SWI/SNF complex is mutated in more than 20% of human cancers. Increasing research interest is being focused on understanding the prognostic and, in particular, the potential therapeutic implications of mutations in genes encoding SWI/SNF subunits. In the context of our research, we are interested in knowing the mutational status of the SWI/SNF complex subunits in pediatric haematological neoplasias. We would like to access the mutation data (VCF/MAF) included in phs000463.v19.p8, phs000464.v19.p8 and phs000465.v19.p8, which are all part of phs000218.v22.p8, to evaluate alterations in SWI/SNF subunits and their clinical relevance in childhood leukemia. Our objectives are: • Detecting mutations that could affect the functionality of the SWI/SNF complex in pediatric ALL and AML. • Correlating mutation data with expression and clinical data from the same patients. We aim to perform correlation and survival analysis to identify consequences derived from this mutational context. • If the mutational frequency of SWI/SNF subunits is too low to be biologically significant, other candidate driver genes may be searched for using the same methodology as outlined above. To put our findings into context, we may compare the SWI/SNF mutational patterns with those from other datasets analyzed independently. We will not combine the raw data with any other datasets and our analysis will not create any additional risks to participants. Our main aim is increasing the knowledge of the role of chromatin remodeling proteins and their mutational patterns in leukemia to identify vulnerabilities that are generated by these alterations. Our research could help to discover novel therapeutic targets as well as to develop a more accurate stratification of leukemic patients. Meltzer, Paul Stuart NIH TARGET (Therapeutically Applicable Research to Generate Effective Treatments Jan24, 2013 approved Childhood cancers, like those of adults are associated with acquired alterations in the genetic material of tumor cells. By decoding these changes and their effects on the growth of tumors, it is hoped that rational precision therapies and diagnostics can be developed which may improve outcome. Our particular disease of interest, childhood osteosarcoma, presents a great challenge as progress in improving outcomes has been slow. We hope that by learning the details of the changes which occur in tumors, we will gain knowledge which will open new opportunities to develop better treatment and improve clinical care of patients with this disease. I am a pediatric oncologist and intramural investigator in CCR/NCI collaborating with the TARGET program (http://target.cancer.gov/). In our laboratory, we are conducting genome analyses of osteosarcoma as an integral part of the TARGET osteosarcoma project for which I act as co-principle investigator. This research project is aimed at developing a better understanding of this pediatric disease in order to identify features of the genome that may be important for translational research, such as diagnostic and prognostic markers, and candidate drug targets that could be investigated in future studies to benefit patients afflicted with this disease. The TARGET osteosarcoma has been configured with these goals in mind, and includes carefully curated samples with associated clinical information that enables the identification clinical correlates of genome alterations using standard statistical methods. This research directed at pediatric osteosarcoma can only be conducted using pediatric data. We plan to use the TARGET data for pediatric research only, and will not use it for method, software, or tool development. We are focused on analyses of the TARGET osteosarcoma samples and have no plans to combine that data set with data outside of dbGAP. Outcomes analyses are restricted to those originally planned for this study. Merkenschlager, Matthias UK RESEARCH & INNOVATION Functional impact of mutations of Cohesin and other epigenetic and transcriptional regulators in AML Oct23, 2019 approved Cohesin is a multiprotein complex that cooperates with the sequence-specific DNA binding protein CTCF in forming key features of 3D genome organization such as topologically associated domains (TADs), contact domains and chromatin loops. In addition to its role in genome compartmentalization, cohesin is essential for genome integrity. We recently showed that cohesin was critically required for inflammatory gene expression in primary human AML cells. Now, we want to further investigate the functional impact of mutations in cohesin in AML cells. My lab at the MRC London Institute of Medical sciences is interested in understanding mechanisms of gene regulation by transcription factors, chromatin states, and 3-D genome conformation. We have recently extended this work from primary mouse tissues to human patient cohorts (Cuartero et al. Nat Immunol 2018) and want to investigate the functional impact of mutations in the 3-D genome organiser cohesin and other epigenetic and transcriptional regulators in paediatric AML, at the level of genome sequence, chromatin state, transposon activity, transcription factor binding, mRNA expression, clinical features, and patient outcomes. For this purpose, we request permission to access RNA-seq BAM files and protected CNV MAF and VCF files which are restricted/controlled TARGET/BEAT/TCGA data. In addition to gene expression which is already available by TARGET/BEAT/TCGA, we need RNA-seq bam files for transposable element expression analysis and study of splicing and transposable-gene fusions. Data from paediatric AML will allow us to separate age-related changes from mutation-related changes. To this end we will ask whether the relationship between mutations and inflammatory gene expression is the same or different for paediatric and adult AML. Clinical data will allow us to link patterns of inflammatory gene expression to patient outcome. Our translational goal is to improve treatment outcomes by stratifying AML based on patterns of inflammatory gene expression. The TARGET collection of biospecimens may become important for experimental validation as the study progresses. The Principal Investigator confirms his compliance with the specific data use limitation for the dataset requested, specifically, the model Data Use Certification. Our intention is to share the results broadly with the scientific community through publication, and dissemination at appropriate scientific meetings. MESHINCHI, SOHEIL FRED HUTCHINSON CANCER RESEARCH CENTER Mechanisms, Genomic Risk Stratification and Precision Intervention for Acute Myeloid Leukemia in Children with Down syndrome (ML-DS) May03, 2022 rejected We will create a new workflow to assess genetic variants and RNA-sequencing gene expression data of DS individuals, identifying molecular differences between DS individuals known to have had preleukemic transient events that progress to acute myeloid leukemia with those who do not, including completely normal DS individuals as a control. Children with constitutional trisomy 21 Down syndrome (DS) have a unique predisposition to develop myeloid leukemia of Down syndrome (ML-DS). This disorder is preceded by a transient neonatal preleukemic syndrome, referred to as transient abnormal myelopoiesis (TAM), which has been thought to be unique among clonal neoplastic disorders by its universal linkage with trisomy 21. Recent work has shown that these transient events appear also in trisomy 21 mosaic individuals and have highlighted the role of GATA1 mutations as deterministic in potential outcomes. There is an unmet need to define molecular characteristics based upon both the GATA1 variant information and the gene expression profiles of blasts of TAM, ML-DS and relapsed ML-DS, as well as normal T21 hematopoietic progenitor populations. We have assembled a multidisciplinary team to collaborate in a cloud-based environment, using and extending workflows and analyses in an open, transparent and collaborative manner using the stated dbGaP datasets available on the INCLUDE and Kids First portals. Each team will focus on different areas: the Lau lab will explore the role of GATA1 mutations in hematopoiesis, including understanding normal T21 hematopoiesis and the transition from TAM to ML-DS and relapse in a subset (phs001657, phs000159, phs000413, phs001027, phs000178, phs001287, phs000424, phs001746, phs000218). The Meshinchi lab will explore the gene expression profiles of the determined subpopulations based both upon the phenotype information of the groups (diagnosed ML-DS with and without evidence of TAM events), as well as relapsed ML-DS and normal T21 subpopulations (phs001657, phs000413, phs001027, phs000178, phs001287, phs000424, phs001746, phs000218). Both groups will compare the identified genomic/expression features with diploid 21 AML cases to identify alterations specific to T21 TAM and ML-DS. In addition to contributing to primary data analysis, Dr. Deslattes Mays will provide oversight of reproducible pipeline development to ensure adherence to F.A.I.R. Data Principles. Data will be combined with normal T21 controls sourced from the Linda Crnic Institute for Down Syndrome’s Human Trisome Project, facilitated by the Espinosa lab. This data will be stored on a separate AWS bucket and merged for the purposes of comparison between disease-affected T21 patients and normal T21 subjects. The proposed project will study myeloid leukemia in pediatric T21 patients, with both computational methods and results shared with the broad scientific community, and therefore falls within the data use limitations of the requested datasets phs001657, phs000159, phs000413, phs001027, and phs000218. MESHINCHI, SOHEIL FRED HUTCHINSON CANCER RESEARCH CENTER Therapeutically Applicable Research to Generate Effective Treatments (TARGET) for pediatric Acute Myeloid Leukemia Nov05, 2012 expired We intend to use the data we've generated, TARGET AML, to continue our research in pediatric AML. We are integrating all genomic alterations observed in pediatric AML (whole genome sequencing, targeted sequencing, transcriptome sequencing, miRNA sequencing and methylation status by array) to define pathways or genes altered that are drug targets. Additionally, we hope to identify biomarkers that can be used to monitor a patient's disease, i.e. markers of minimal residual disease (MRD), or stratify patients upon diagnosis for better up-front therapies. We will compare our pediatric results to the published results of adult AML datasets to determine if there are differences between the two populations, especially in terms of frequency of mutations observed in specific genes and whether there are different genes altered. We intend to use the data we've generated, TARGET AML, to continue our research in pediatric AML. We continue to define the full complement of genomic alterations observed in pediatric AML as we integrate the data we've generated: whole genome sequencing, targeted sequencing, transcriptome sequencing, miRNA sequencing and methylation status by array. Our goal is to identify pathways or specific genes altered that are drug targets. Additionally, we hope to identify biomarkers that can be used to monitor a patient's disease, (i.e. markers of minimal residual disease (MRD)), or stratify patients upon diagnosis for better up-front therapies. We will compare our pediatric results to the published results of adult AML datasets, (such as TCGA, BEAT AML, and SWOG) to determine if there are differences between the two populations, especially in terms of frequency of mutations observed in specific genes and whether there are different genes altered. In addition, we will compare the pediatric observations in the TARGET cohort to AML and ALL data from St. Jude and look for shared alterations among acute leukemias, especially in the younger AML age group. We may use data from dbGaP and combine with data from the St. Jude portal and process in the cloud. There will be no additional risk to participants in either TARGET,TCGA, or St. Jude as all data are de-identified and as somatic mutations and expression targets will be the primary focus. Mestdagh, Pieter GHENT UNIVERSITY Identification of aberrant splicing and circular RNA expression in T-ALL Feb10, 2023 approved T-lineage acute lymphoblastic leukemia (T-ALL) is an aggressive hematologic malignancy that is mainly diagnosed in children and requires treatment with intensified chemotherapy. This therapeutic regimen can result in life-threatening toxicities. Thus, further advances in the treatment of paediatric T-ALL requires the development of effective and highly specific targeted anti-leukemic drugs, which requires a better understanding of paediatric T-ALL disease biology. In the past, childhood T-ALL biology research has largely been focused on genetic and transcriptomic analyses. However, aberrant RNA splicing and circRNA expression as an additional level of complexity implicated in the biology of paediatric T-ALL remains largely unexplored. Given this, we will here use a protected 'Therapeutically Applicable Research to Generate Effective Treatments' (TARGET) dataset to confirm aberrant splicing and circRNA expression in pediatric T-ALL. This work will be relevant to identify an aberrant splicing signature and specific circRNAs that can serve as a biomarker for the use of specific inhibitors in the treatment of childhood leukemia. The first objective of the project is to reconfirm that paediatric T cell acute lymphoblastic leukaemia (T-ALL) consists of different subsets that are characterised by differential RNA splicing and differential circRNAs. These differences were initially identified based on polyA RNA sequencing analysis of a cohort of 64 pediatric T-ALL patients (Peirs et al, Blood 2014) that we obtained from Saint-Louis Hospital (Paris, France) in collaboration with Prof Jean Soulier. Differential splicing analysis was performed using previously published bioinformatic pipelines (Anande G et al. Clinical Cancer Research 2020). A second objective of the project is to show that the RNA-binding protein QKI regulates circular RNA expression in paediatric T-ALL. For this, we initially used total RNA sequencing data from a cohort of 25 paediatric T-ALL patients (Verboom et al, Haematologica 2018) which we also obtained from Saint-Louis Hospital (Paris, France) in collaboration with Prof Jean Soulier. CircRNA analysis was performed using a previously published bioinformatic pipeline CirCompara2 (Gaffo E et al. Brief Bioinform 2021). For this, we would like to request access to the FASTQ files from a previously published study in which RNA sequencing has been performed on a large cohort of pediatric T-ALL samples (Lui et al., Nature Genetics 2017). In this manuscript, it is stated that FASTQ files from RNA-seq data from this study are accessible through the database of genotypes and phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap) under the accession numbers phs000218 (TARGET) and substudy specific accession phs000464 (TARGET ALL Expansion Phase 2). We will use these data to reconfirm the aberrant splicing signatures in paediatric T-ALL and to compare circRNA expression between paediatric T-ALL samples with high versus low QKI expression as mentioned above. Given that the initial discovery cohort only consisted of paediatric T-ALL patient samples, we can only use paediatric T-ALL samples to perform validation of our findings in an independent cohort. Therefore, and given potential differences between the biology of T-ALL in the paediatric and adult setting, we hereby confirm that our research objective cannot be accomplished using data from adults. In addition, given that we are using computational biology pipelines that have already been developed and are publicly available, we hereby confirm that we will not use these data for methods, software, or other tool development purposes. Finally, confirmation of aberrant splicing and circRNA expression in an independent pediatric T-ALL cohort will eventually allow us to identify putative (splicing based) biomarkers that could eventually be used to select paediatric T-ALL patients that might benefit from specific novel treatment modalities such as the H3B-8800 spliceosome inhibitor (Seiler et al., Nature Medicine 2018). Given this, research use of these data will likely be relevant for developing more effective treatments, diagnostic tests, and/or prognostic markers for childhood cancer. MEYERSON, MATTHEW BROAD INSTITUTE, INC. Neuroblastoma exome sequencing Dec15, 2011 closed We aim to discover changes in the DNA of neuroblastoma cancer cells that may suggest the use of existing drugs or development of new ones to treat this disease. To discover these changes, we are examining the DNA sequence of every gene in over 200 neuroblastoma samples. These samples have been analyzed in the past using lower resolution microarray technologies that provided data vital for steering our cutting-edge DNA sequencing experiments. Combining our DNA sequencing results with clinical data from these samples may yield insights into which genes are important in the causation and treatment of neuroblastoma. Our primary goal is to uncover novel somatic alterations in neuroblastoma that may lead to actionable targets for therapeutic application or development. To this end, we are generating whole exome sequences from over 200 richly annotated neuroblastoma samples and corresponding normal samples collected by TARGET consortium members. Several of these samples have been previously analyzed using DNA microarrays, the data from which is the subject of this request. We anticipate using these data firstly as a quality control metric for our sequencing experiments and secondly as a component of an integrated analysis of somatic alterations in these tumours. Genotypes derived from dbGaP data will be compared with our sequencing data to confirm sample identity. Combination of dbGaP microarray and clinical data with our sequencing data will form a rich data set for in depth analysis of this cancer. In addition, clinical (ploidy) and microarray copy number data will be used to guide the development of copy number detection and quality control metrics using exome sequencing data. Miller, Christopher WASHINGTON UNIVERSITY Studies of AML predisposition in children Jun11, 2021 approved Acute Myeloid Leukemia (AML) is a devastating form of blood cancer that is rapidly fatal unless patients can achieve a remission with chemotherapy and/or a stem cell transplant from a matched donor. Pediatric and adult AML have unique genomic signatures and clinical courses. By using genomics technology and the TARGET database, we hope to better understand why pediatric AML patients progress and relapse, and to define novel approaches for therapy that may improve patient outcomes. This includes developing a better understanding of mutations that may cause predisposition to AML and related syndromes, such as those seen in children that develop pediatric tumors. As part of this work, we will be searching for therapeutic agents that can prevent the development of AML or can selectively kill tumor cells. The goal of our project is to better define the genetic events that contribute to the pathogenesis of AML, and to use this information to improve the risk assessment and treatment of patients with this disease. We have previously focused on adult AML samples; however, we are also interested in developing an understanding of AML predisposition syndromes affecting pediatric patients, such as Tatton-Brown Rahman Syndrome. We are putting together a manuscript reporting an enrichment of hematologic malignancies within this newly described patient population. The "Therapeutically Applicable Research to Generate Effective Treatments" database is a unique and powerful resource to identify both inherited and acquired mutations in pediatric AML patients. We hope to identify specific mutational and/or epigenetic patterns that are causative, and others that may suggest specific therapeutic options. These therapies might include use or repurposing of existing pharmaceutical agents, and others may involve development of new drugs, all with the aim of reversing specific expression or methylation patterns that cause tumorigenesis. These genomic data will be integrated with that produced in our Genomics of AML project to extend our findings and help identify causative variants, transcriptomic signatures, and other markers in pediatric patients. We will analyze some of these data with our own genomic workflows to enable integration with our own datasets, including studies of clonal evolution, examining the unique aspects of pediatric AML: epigenetics, understanding the causes of relapse, studying resistance to allogeneic transplant or other therapies, germline predisposition to AML, and AML immunogenomics. Access restrictions described in the consents will be respected, and all data will be analyzed at Washington University in St. Louis. Mirabello, Lisa J NIH Understanding the genomics of osteosarcoma Feb20, 2014 approved We propose to use the TARGET dbGAP data to investigate the inherited (germline) and acquired (somatic) genomic changes in osteosarcoma. Studies of inherited genetic variation have identified several genetic changes associated with risk of osteosarcoma. We plan to use the TARGET data to examine and characterize the genetic variants previously identified in our germline genomics studies of osteosarcoma. We will investigate the interactions and/or changes associated with these variants using the gene expression, copy number, miRNA, sequencing, and methylation data. We feel that these results will improve our understanding of the etiology of pediatric osteosarcoma. Osteosarcoma is the most common bone tumor in children and adolescents, and it accounts for approximately 3% of pediatric cancers. Recent studies have identified several novel heritable (germline) genetic variants associated with osteosarcoma. Our research objective with the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative data is to examine and characterize the genetic variants previously identified in our germline genomics studies of osteosarcoma. We plan to integrate the comprehensive TARGET genomic and epigenomic datasets (gene expression, copy number, miRNA, sequencing, and methylation) to investigate previously identified candidate genetic variants among TARGET osteosarcoma cases and clinical outcomes, and determine their potential influence on methylation status, gene expression, and miRNA. The TARGET data will be examined independently. Significant genomic regions in osteosarcoma will potentially be examined in the other TARGET datasets to determine if they represent sites of recurrent changes in other pediatric cancers. Characterizing these important genetic variants in somatic tissue, and effects on potential regulatory elements, may eventually lead to the development of novel therapeutic targets and improved diagnosis, and will improve our understanding of the etiology of pediatric osteosarcoma. MITROFANOVA, ANTONINA RBHS-SCHOOL/ HEALTH RELATED PROFESSIONS Computational approaches to uncover prognostic markers in ALL Aug27, 2018 approved In pediatric acute lymphoblastic leukemia, African American children often have different clinical features than white children at the time of diagnosis, and there have been reports of uneven survival rates. The goal of this study is to use clinical and genomic data to identify potential causes of these disparities. When completed, this study will yield greater insight into the biological features that may affect these differences in terms of diagnosis and outcomes. This project seeks to investigate the genetic influences of biological differences between African American and White pediatric acute lymphoblastic leukemia (ALL) patient populations. The study will investigate clinical and genetic differences between African American and White ALL patients using both data generated by the TARGET-ALL project. Clinical phenotype at the time of diagnosis as well as response and outcome data will be correlated with genomic features extracted from whole exome sequencing data in an effort to identify novel genomic features that may explain clinical and outcome disparities between African American and White pediatric ALL patients. When completed successfully, this project will result in new data that may inform new therapeutic practices or targets to address the disparities in pediatric ALL outcomes. MITROFANOVA, ANTONINA RBHS-SCHOOL/ HEALTH RELATED PROFESSIONS TARGET: Molecular mechanisms of pediatric leukemia Nov07, 2016 approved The objective of this project is to identify genes that are different between less aggressive and more aggressive pediatric leukemia. These genes might potentially be used as markers to identify patients with the most aggressive form of the disease and guide therapies administered to such patients. The objective of this projects is to elucidate molecular mechanisms that govern childhood leukemia progression.We aim to analyze DNA and RNAseq data from TARGET for AML and ALL, independently. We will compare RNAseq profiles and mutation patterns of patients with fast progressing disease to those with slow progressing disease and will identify molecular determinants that differentiate these two distinct phenotypes in pediatric leukemia . Miyano, Satoru UNIVERSITY OF TOKYO Genetic analysis of relapsed pediatric acute myeloid leukemia May04, 2015 closed Pediatric acute myeloid leukemia (AML) comprises ~20% of pediatric leukemia, representing one of the major therapeutic challenges in pediatric oncology. Nearly 40% of patients still relapse after present first-line therapies and once the relapse occurs, the long-term survival rates decrease, ranging from 21% to 34%. As for the pathogenesis of AML relapse, the recent development of massively parallel sequencing technologies has provided a new opportunity to investigate comprehensive genetic alterations that are involved in tumor recurrence of adult AML. However, little is known about the molecular details of relapsed pediatric AML. In this study, we will analyze the clonal origin and the major mutational events in relapsed pediatric AML. Pediatric acute myeloid leukemia (AML) comprises ~20% of pediatric leukemia, representing one of the major therapeutic challenges in pediatric oncology. Nearly 40% of patients still relapse after present first-line therapies and once the relapse occurs, the long-term survival rates decrease, ranging from 21% to 34%. However, little is known about the molecular details of relapsed pediatric AML. In order to reveal the clonal origin and the major mutational events in relapsed pediatric AML, we performed whole exome-sequencing of 4 trio samples from diagnostic, relapsed and complete remission. We would also like to validate our findings using AML cohort in the TARGET project, and analyze the differences of clonal evolution patterns in relapse between acute lymphoblastic leukemia (ALL) and AML using TARGET’s data sets including WES and whole-genome sequencing analyses. The combined analysis of these data sets does not create any additional risk to participants. No request will be made for the identification of participants. This is a collaborative project with Dr. Seishi Ogawa at the Kyoto University (#8404: Genetic analysis of relapsed pediatric acute myeloid leukemia). Mo, Huan NIH Association between germline variants and subtypes and prognosis of pediatric B-lymphoblastic leukemia Feb08, 2024 approved Lymphoblastic leukemia in children with TP53 mutation appeared to have bad outcome. We know that a TP53 variants that are commonly carried in the population will increase the risk of lymphoblastic leukemia. We want to investigate whether common variant will affect cancer outcome similar to the mutations. In our All of Us and UK Biobank study, we found that TP53 rs78378222 variant was associated with B-acute lymphoblastic leukemia (B-ALL), which is an important pediatric cancer. It has been known that germline TP53 loss (LI-FRAUMENI SYNDROME) was associated with hypodiploidy type of B-ALL, which is among those B-ALL subtypes with the worst outcome. We would like to exam whether the rs78378222 germline variant, which is very common in the population, is also associated with bad prognosis in B-ALL. The findings may be reported in the same manuscript with Application #36579 but for different cancer entities. However, the analyses from different datasets will be analyzed separately to answer different questions. Analysis plan: We will extract rs78378222 variant from RNA-sequencing data in the B-ALL cohort, and perform association studies with B-ALL subtypes and outcome data. Modzelewski, Andrew UNIVERSITY OF PENNSYLVANIA TARGET pan cancer analysis project Mar07, 2024 approved Please refer to application from collaborator Kai Tan for summary (tank1@chop.edu) NCI TARGET has generated large-scale NGS data for five pediatric cancers and a pan-cancer study is expected to provide new insight in the similarity and the differences of these pediatric cancers. We plan to combine these large datasets to investigate and characterize the pediatric cancer genome landscape and cancer type specific genetic lesions in pan-cancer study. Specifically, we plan to use the TARGET datasets to develop novel molecular network based methods to better understand deregulated pathways, to identify novel drug targets and to identify causal noncoding mutations. We will apply our methods to single cancer types and groups of cancer types to understand cancer-type-specific and shared genetic features. The proposed research will use TARGET data only for pediatric cancer research. Moffitt, Andrea EMORY UNIVERSITY Whole Genome Analysis of AML for Prediction of Patient Outcome Sep28, 2023 expired This study aims to predict patient outcome in acute myeloid leukemia (AML) by analyzing genomic data collected from patients with AML at various times during their treatment and disease progression. We will ask if the patterns observed in this data help predict how a patient will respond to treatment and if they are likely to relapse. The research objectives of this study are to predict patient outcome in adult and pediatric AML based on the analysis of serial whole genome sequencing samples. The study design involves first identifying leukemia-associated and remission-associated somatic genetic variants from paired whole genome sequencing data generated from samples from AML patients. We will do de novo variant calling from BAM files in order to capture variants at all frequency levels that may have been filtered out from publicly available lists of variants. We will then assess the frequency distributions of these variants in all available samples from the patients in these studies to determine how the variant counts and distributions are correlated with patient outcome, including treatment response, relapse, and death. Pediatric samples from the TARGET-AML study will be analyzed with a similar approach. Here the objective is specifically to ask whether the approach to predicting patient outcome initially demonstrated in adult AML is also applicable in pediatric AML. We intend to publish the results of this project in a reputable scientific journal to broadly share any findings with the scientific community. MONTI, STEFANO BOSTON UNIVERSITY MEDICAL CAMPUS Somatic DNA alterations in advanced stage Neuroblastoma Mar21, 2013 closed Mutations and gains or losses of specific choromosomal segments are types of genetic alterations that occur in Neuroblastoma and can be detected by DNA sequencing. We plan to integrate the requested dataset with new sequencing data we will generate, to carefully characterize the differential patterns of gene mutations and gains and losses in different subtypes of advanced stage Neuroblastomas. The goal of the project is to identify patterns of genetic alterations that might help inform treatment decisions. An additional goal is to use the dataset for the analysis of RNA-editing patterns, i.e., nucleotides substitutions that occur at the mRNA level (while the corresponding DNA sequence remains unaltered). The data has been downloaded and utilized to integrate with and validate results of the analysis we performed on our own data. The results have not been published. We plan to continue the analysis of the data for the purposes listed above, namely, gene mutation and translocation detection, validation of analysis results from our own data, and RNA-editing detection. Additionally, we will use the availability of paired DNA-seq and RNA-seq profiles to study pattern of RNA-editing and their relevance to tumorigenesis. We will also compare the patterns of RNA-editing identified in neuroblastoma with those detected in other publicly available datasets (in particular, lymphoma dataset ). The datasets will be analyzed separately, and only their results will be compared. MOORMANN, ANN UNIV OF MASSACHUSETTS MED SCH WORCESTER Biomarkers and Mechanisms of Viral Associated Cancers Sep12, 2018 closed The goal of this study is to better understand the steps that lead to the development of cancers and the role of associated viruses in terms of developing better markers that can predict disease severity and likely outcomes. We also aim to identify potential pathways within cancer cells that may give rise to and treatments specific to particular associated viruses or cancer types. We are using the publicly available data from dbGAP combined with our own sequencing of Burkitt lymphoma and other associated cancers. We plan to use this data to inform our studies of malignancies with viral etiology or association compared to viral negative subtypes and related cancers. We will compare and contrast the effects of given viruses across cell cancer types examining transcriptional, mutational and epigenetic changes to better understand the viral role in oncogenesis. We will also examine cancers for previously unrecognized viral associations. Comparisons between different viruses, different tissues associated with the same virus, as well as related tumors without the virus will allow us to discern key biologic commonalities as well as the potential to discover virally influenced pathways that may influence tumorigenesis and tumor maintenance. We will also examine differences in host anti-viral and anti-tumor surveillance. One of our initial goals will be a better understanding of the role herpes viruses, most specifically Epstein Barr Virus (EBV), which is involved in over 1% of all malignancies worldwide--both pediatric and adult cancers. The data will be analysed using a combination of standard software and novel bioinformatics pipelines under development. Combined with our own datasets for endemic Burkitt lymphoma and other viral-associated cancers this project should provide biologic insight into their etiology and pathogenesis and provide refined diagnosis, novel prognostic and predictive biomarkers as well as target pathways for new therapies. Data will be maintained on a secure server and there will be no attempt to identify or combine data from different sources at an individual level—and therefore no increased risk to study participants. Therapeutically Applicable Research to Generate Effective Treatments (TARGET) datasets will be solely used to study aspects under this project that are relevant to the biology of pediatric cancers for more effective diagnosis, prognosis and treatment. Other specific data sets limited to particular diseases or patient populations (e.g. pediatric or adult disease) will only be used for their intended purpose. MOORMANN, ANN UNIV OF MASSACHUSETTS MED SCH WORCESTER Molecular and genetic pathogensis of hematologic malignancies Jul17, 2017 expired Our research group is investigating the molecular and genetic underpinnings of hematologic malignancies. Our primary focus is B cell tumors including understanding the molecular causes of Burkitt lymphoma, particularly the endemic form which is the most prevalent pediatric malignancy in sub-Saharan Africa. Combined with our data, we will compare and contrast tumor and cellular additional data from db GaP to improve diagnosis and treatment. RUS: We plan to use this data to inform our studies of specific pediatric and adult hematologic malignancies of interest, with a focus on understandng endemic pediatric Burkitt lymphoma, the most prevalent malignancy in subSaharan Africa, as well as for general comparisons between hematological malignancies, to identify common and specific alternations. Our data from normal cells and tumors samples will be analyzed in parallel using available genomic RNA, DNA and epigenetic experimental data from patients with the same and related tumors. Comparing different malignant types, both closely and distantly related, will allow us to develop a more refined understanding of the common as well as disease-specific alterations in genes and pathways. Likewise comparisons to data from normal tissue and cellular subsets can help us better define the exact functional alterations and epigenetic changes that occur and drive malignancy relative to the normal state. The data will be analyzed using a combination of standard programs and pipelines and will also allow us to develop and improve our bioinformatic algorithms. Combined, we will gain additional power and our work should provide novel information regarding the molecular mechanisms on potential diagnostic, prognostic, and predictive biomarkers as well as targets for new therapies. Therapeutically Applicable Research to Generate Effective Treatments (TARGET) datasets will only be used for research that can only be conducted using pediatric data. Data will be maintained on a secure server and there will be no attempt to identify or combine data from different sources at an individual level—and therefore no increased risk to study participants. Sample sets that are limited to only a disease or patient class will be only used to answer specific questions related to the approve use. Moriarity, Branden UNIVERSITY OF MINNESOTA Using Intronic Alternative Polyadenylation to Explore Novel Tumor Antigens Pediatric Cancer Dec09, 2020 expired Some cancer cells make mistakes when reading the instructions of their DNA that allow them to make components of the cell. Some of these mistakes change the way a cell looks to the immune system, such that they are recognized as being cancer cells. Unfortunately, the cancer cells can also inhibit the immune systems ability to kill the cancer cells. In order to leverage these mistakes made by cancer cells we will take a systematic look at all the mistakes made by pediatric cancer cells to find common mistakes made by cancer cells when reading their DNA instructions. If we find a common mistake we can make a new therapy that specifically looks for cells making this mistake and selectively kills them. Moreover, we can train the immune system to both identify the mistakes that mark the cancer cells and also overcome the suppression of the cancer cell that normally prevents killing of the cancer cells. Thus, our end goal is to develop effective therapies for pediatric cancer based on targeting the mistakes cancer cells make when reading the instructions of their DNA that allow them to make components of the cell. The eukaryotic genome is proficient in generating multiple isoforms from a gene through the use of different polyadenylation signals during pre-mRNA processing. At present, many studies have focused on the characterization of Alternative Polyadenylation (APA) in gene expression and their impact on a variety of pathological conditions, including cancer. Yet, how the eukaryotic genome expression is controlled by APA is still poorly characterized in tumorigenesis. Thus, I propose to investigate APA in pediatric cancer using advanced bioinformatics analysis and deploy the CRISPR/Cas9 system for probing functional proteome diversity in cancer development and treatment. Moreover, we hypothesize that we will identify novel therapeutic targets (i.e. neoantigens) generated by APA, which can be target using T Cell Receptor (TCR) T cell based cancer immunotherapy. APA can change the length of processed mRNAs, but whether the consequential protein length is affected depends on the polyadenylation site used in a transcript. While the APA within the 3’-untranslated region (UTR) of the last exon does not affect the protein length, intronic APAs in an upstream exon/intron of a gene can produce mRNA truncation by early termination of transcription, subsequently producing a protein with a different length (typically a smaller protein, missing the functional C-terminus, with novel amino acid sequences). Although truncated mRNAs could increase the functional complexity of cellular proteomes and molecular interactomes, their physiological relevance and regulation are poorly understood. In the current proposal, we will identify APA events in pediatric cancer using a custom-developed bioinformatical integrative pipeline, Intronic Polyadenylation Scan (IPScan), built by our collaborator (Wei Zhang, at the University of Central Florida, who is not involved with the project and will not be accessing any of the requested data)(PMID:?), which can quantitatively analyze the alteration of polyadenylation sites and predict the existence of potentially truncated mRNAs in the transcriptome. One result of our IPScan analysis will be the identification of novel peptide epitopes that can be presented by tumor cells, generating new therapeutic targets for TCR based T cell cancer immunotherapy. This work will open an entirely new area of research in the Moriarity lab, but one that will complement ongoing work. I am sure that the proposed APA-based approach in pediatric cancer will be a valuable tool for targeting and augmenting gene and cell-based immunotherapies in the pediatric space. Interrogation of The Cancer Genome Atlas (TCGA) datasets revealed a cluster of truncated mRNAs that are distinctively enriched in adult tumor or normal tissues. This suggests that truncated mRNAs might render uncharted dimensions in functional proteomics. With funding from the Children’s Cancer Research Fund, we will extend this work to identify cancer-relevant truncated mRNAs by utilizing existing RNA-seq data, from multiple pediatric cancers, including rhabdomyosarcoma, osteosarcoma, leukemia, and others. These RNA-Seq data are available in the Sequence Read Archive (SRA) and The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database. Also, we will further examine the mechanistic role of APA by introducing or mutating APA site in target genes using well-established CRISPR/Cas9 system, which will allow for more chances to change the tumor-related phenotype to normal by experimental validation in cellular and animal model systems. Taken together, the finding of truncated mRNAs in numerous pediatric cancers implies that the intronic APA-coordinated truncated mRNAs can be translated into proteins that exert different functions than their full-length counterparts. Moreover, this process produces new amino acid sequences that can be leveraged as the novel, cancer-specific neoantigens. It also suggests that genome-wide intronic APA could provide critical functional characteristics to discovering specific cellular status or pathogenic conditions for the initiation and treatment of many pediatric cancers. In summary, the proposed system opens up a new dimension in understanding gene functions in pediatric tumor biology and enlightens the importance of multifaceted gene features by intronic APA in the clinically relevant therapeutic interventions beyond genome mutations and epigenetic deregulations. Morin, Ryan BRITISH COLUMBIA CANCER AGENCY Meta-analysis of pediatric osteosarcoma tumours Jun26, 2014 closed The pediatric bone cancer osteosarcoma is poorly understood. We are looking at the DNA from tumour cells from patients with osteosarcoma to determine how the DNA changes during the transition of healthy cells into this type of cancer. We will compare the genetic changes ("mutations") present in these samples to tumours from other patients, that have moved to new sites in the body. We have generated a separate NGS data set set from pediatric osteosarcoma samples including primary tumours and matched metastases. We have sequenced the genome and exomes from these and are analyzing them to identify genetic factors and features of clonal evolution associated with metastasis. We will analyze the requested samples in concert with our own genome and exome sequence data to identify any mutations that are enriched in metastatic tumours relative to primary tumours. Our analysis will focus on somatic variants including SNVs, structural variations and CNVs. We will attempt to infer the clonal representation of individual mutations to differentiate those arising early during oncogenesis vs later during tumour progression. Given the observation that primary and metastatic tumours have largely divergent sets of somatic SNVs, we will also identify the set of genetic variants shared among primary and metastatic tumours from the same patient to infer early (possibly initiating) genetic lesions. Overall, we aim to identify key genes and pathways involved in pediatric osteosarcoma initiation and metastasis. Mullighan, Charles ST. JUDE CHILDREN'S RESEARCH HOSPITAL Mutation analysis of TARGET ALL data Nov05, 2012 approved ALL is the commonest childhood cancer and the commonest cause of non-traumatic death in young people. ALL is a genetic disease, but the full range of genetic changes contributing to the development of ALL are unknown. The ALL TARGET project has performed exome sequencing to identify all mutations in coding genes in ALL, and here we will analyze these data to identify all genetic alterations, and compare these results to those generated by other childhood cancer sequencing initiatives. The data will be used to analyze sequence and structural variants in the genomes of acute lymphoblastic leukemia samples sequenced as part of the NCI Therapeutically Applicable Research to Generate Effective Treatments. The applicant is a coinvestigator of the ALL TARGET Project. The applicant's team includes members of computational biology at St Jude: Xin Zhou and the Center for Applied Bioinformatics: Gang Wu and Ti-Cheng Chang. Each of these individuals will be involved in analyzing the data, which will be mined to assess coverage and detect variants to identify new diagnostic, prognostic and therapeutic markers and targets in pediatric cancers. The data will be analyzed using established tools and pipelines at St Jude inluding SNPdetector, Indeldetector, CREST, GATC, FREEC, DILLY, and CONSERTING. The data will be maintained in secured network databases and not released to other investigators. The data will be combined with additional ALL cases sequenced by the applicant at St Jude Children's Research Project, including from children treated on Total Therapy ALL protocols. These cases are being combined into a single marker manuscript describing the genomics of ALL. Since the last renewal, considerable progress has been made on this manuscript, which is being finalized for submission in Q1 2021. Mullighan, Charles ST. JUDE CHILDREN'S RESEARCH HOSPITAL Transcriptomic analysis and classification of acute leukemia Jun03, 2020 expired Acute leukemia is the commonest childhood cancer and remains an important cause of death in children an adults. Alterations in DNA (the genetic code) drive the formation of leukemia, and different DNA alterations define different subtypes of leukemia. It is important to define these subtypes as they provide understanding on the biology of disease, and enable the development of better diagnosis and treatment approaches. This study will examine RNA sequencing data from TARGET leukemia cases across multiple subtypes of leukemia in a unified analysis approach. Goal: to use transcriptomic data from acute leukemia to identify subgroups of acute lymphoblastic and myeloid leukemia, with an emphasis on resolving the nature of lineage ambiguous leukemias Background and approach: Prior work from our group and others have shown that RNA-sequencing is a powerful approach to classify acute leukemia, and the underlying genomic alterations, by analyzing gene expression, chromosomal rearrangements, aneuploidy and sequence mutations (e.g. Gu et al Nat Genet 2019). These studies have also shown that prior conventional classification schema using leukemia lineage may fail to accurately classify criteria, or impose criteria that are not founded in genomics or biology, and that genomic lesions can transcend normal classification (e.g. early T cell precursor leukemia (Zhang et al Nature 2012; Alexander et al Nature 2018). However, many leukemias remain unclassified, or difficult to classify in studies that are restricted to cases nominally identified as one lineage (B-ALL, T-ALL or AML). Thus, our goal is to assemble a large cohort of acute leukemia samples including T-ALL, B-ALL, AML, ETP ALL and MPAL, and to apply uniform analysis and more thoroughly describe leukemia subtypes and underlying genomic alterations. We will use unified mapping, fusion calling, and mutation detection as previously described (papers cited above). Our interest is in childhood leukemia, and the TARGET dataset is essential for this analysis. MULLOY, JAMES CINCINNATI CHILDRENS HOSP MED CTR Analysis of pediatric relapse AML gene expression via PDX RNAseq Jun19, 2019 expired Our team will use TARGET gene expression data to confirm findings we have made in specialized mice that are able to grow human pediatric leukemia (called xenografts). We will also compare the gene expression between our xenografts and TARGET patient samples to see how similar or different they are. Many labs, including ours, routinely make xenografts in order to test new treatments, so it is important to do this comparison in order to know how useful the system is. We have established a series of PDX models from ~40 unique pediatric AML patients and have performed RNAseq analysis from these models. Our cohort consists almost entirely of relapse and refractory disease from intermediate to poor prognosis genotypes. Because we are uniquely focused on the worst cases of pediatric AML, we expect that analysis of our PDX RNAseq data may uncover new insights of potential vulnerabilities amenable to targeted therapies. After discovery and validation, these findings can then be translated back to our characterized PDX models for pre-clinical testing. We are requesting access to annotated TARGET RNAseq data in order to 1) confirm specific gene expression alterations observed in our dataset (separate analysis) and 2) compare our PDX-derived genotype-specific RNA signatures to selected relapse/refractory TARGET cases (combined analysis). Confirmation of findings in an independent dataset is an important validation step. The comparison of our PDX data to TARGET patient data is also an important step to assess the fidelity of xenografts to the original leukemia. Unfortunately, matched patient material is not available for RNAseq analysis for many of our samples. In addition to better understanding relapse AML gene expression, this work will increase the understanding of the value of preclinical modeling with PDXs. Treatment protocols are significantly different for pediatric vs adult AML, and therefore relapses have occurred under different pressures and the mechanisms for resistance and/or relapse are likely different. Additionally, the mutation spectrum in pediatric and adult AML is known to be quite different. This means that adult data will not be adequate for our studies and the desired analyses can only be done with pediatric TARGET data. We will not collaborate or share TARGET data with anyone outside of our immediate project team at CCHMC. TARGET data and our PDX data will be mixed only for the purpose of this project and will not be available to anyone else. Our use of TARGET data will be limited to that outlined above and will be used only for this specific project. Mundi, Prabhjot COLUMBIA UNIVERSITY HEALTH SCIENCES Gene Regulatory Network Based Analysis of pediatric malignancies to identify targetable vulnerabilities and to study context specific drug mechanism of action Aug14, 2019 expired Cancer is inherently a disease caused by the aberrant activity of a subset of proteins that dynamically interact to disrupt the normal homeostasis of a tissue (which involves a balance of cell proliferation and cell death and respect for anatomic and histologic boundaries) and leads to a new dysregulated state typically marked by a variety of abnormal behaviors such as uncontrolled proliferation, invasion across anatomic boundaries, rapid formation of new blood vessels, and evasion from a number of gatekeeping processes that prevent tumor formation. While directly measuring protein activity on a proteome wide scale is not currently possibly, we employ a number of systems biology algorithms to infer protein activity in order to better understand cancer at an individual tumor level and to identify drugs that may interfere with the dynamic interplay of aberrantly activated proteins driving individual tumors. We will work towards this for the pediatric cancer cohorts in the TARGET data set. Rna-Seq data from all available samples in this cohort will be used to construct pediatric malignancies context-specific gene regulatory networks using a well-established algorithm called ARACNe. This network will then be used to interrogate individual patient gene expression profiles from this cohort as well as other publicly available cohorts of pediatric malignancies using the VIPER algorithm to identify potentially druggable aberrantly activated proteins on an individual patient basis. DNA-Seq will also be downloaded, and inferred protein activity profiles will be matched to recurrent genomic alterations Further, this network will be used to interrogate post-drug perturbation gene expression profiles from experimental model systems (in particular cell lines) to infer drug specific effects on regulatory protein activity. This will be used for global mechanism of action inference of drugs to better understand their potential to disrupt the cancer state. The outlined method is being explored in various cancer types at our institution through the laboratory of Andrea Califano and has led to the genesis of a few ongoing clinical trials. Any results or methods developed through the use of this dataset will be published and made publicly available, including the gene regulatory network models. Murakawa, Yasuhiro KYOTO UNIVERSITY Unveiling full-length transcripts in pediatric leukemia using long-read RNA-seq Jun21, 2024 approved Pediatric leukemia is a very heterogenous disease with full of genomic scars. In this study, we aim to identified full-length and highly specific transcripts as potential therapeutic targets and novel biomarkers in pediatric leukemia. Our study can provide valuable candidates of leukemia-specific novel transcripts as the potential targets for the treatment and biomarkers. This research aims at elucidating the novel RNA molecules and the full-length sequences from pediatric leukemia patients, which helps provide new strategies for pediatric leukemia therapy. We developed a novel long-read RNA-seq method for sequencing entire full-length RNA molecules by directly capturing the 5'-end cap structures and the 3'-end poly(A)-tails. We applied this method to pediatric leukemia samples and constructed our original gene model including novel transcripts. However, one of the challenges in our study is to quantify the novel transcripts and explore its prognostic impact. Here we are requesting the RNA-seq (tumor and normal) and targeted-sequencing data from this study to validate our novel transcripts and analyzed the clinical impact. This study includes stranded RNA-seq, which is important to accurately quantify our novel transcripts. The requested data will help us to explore biomarkers and prognostic factors, and understand the biology of pediatric leukemia. Our results will be published in peer-reviewed journals and shared with the scientific community. NAGARAJAN, RAKESH PIERIANDX, LLC Development of a clinical genomic assay to assess and report out on pediatric cancers Mar09, 2015 closed Genetic mutations play an important role in cancer. The purpose of this project is to develop a laboratory test that can accurately find mutations in children’s cancers to allow physicians to treat such patients more effectively. TARGET data will be used as along with other data to make sure that the test can find important mutations correctly and reliably. Once we have determined that the test is accurate and can detect important mutations in children’s cancers, we will start using the test to help physicians treat their patients using his/her own DNA results. Multiple genetic aberrations including point mutations, copy number variations (CNV) and gene fusions are involved in the development and progression of pediatric cancers. The goal of this project is to develop an advanced clinical cancer diagnostic assay for detecting and interpreting such genetic aberrations for more precise patient management. Therefore, we are applying for access to the full TARGET datasets in order to support clinical diagnostic assay validation. In addition, we are applying for other human cancer-derived data sets where variants have already been identified (e.g. gene fusions using RNASeq) which will be used to support analytical pipeline setup and initial validation. Our specific aims are: Aim 1. To build a high-throughput analysis pipeline to analyze next-generation sequencing data (DNA and RNA) to detect genomic aberrations from tumor and paired non-malignant samples. Data analysis from DNA will include point mutations, structural variations (i.e., insertions, deletions), and CNVs, while RNA analysis will focus on gene fusions. Non-TARGET data sets that we are applying for will be used in this Aim to support pipeline configuration and validate variant identification. Aim 2. To generate data from well characterized cell lines (e.g. HapMap) and spiked in samples with known cancer-associated mutations (e.g., from Acrometrix) that are mixed at different ratios to simulate minor allele frequency (MAF) ranging from 3%-50%. These data sets will be used to determine the analytical sensitivity and specificity and limit of detection of the assay. Aim 3. To apply the pipeline on the next generation sequencing data obtained from TARGET and other clinical samples (approximately 40 cases) to assess the diagnostic specificity and sensitivity. Sanger confirmation of both positive and negative variants in a clinical laboratory will be performed. The final, clinically validated assay will be used prospectively to more precisely manage pediatric patients with cancer. Nam, Jin-Wu HANYANG UNIVERSITY Development of long non-coding RNA biomarkers for detecting prognosis, and responsiveness to chemotherapeutic agents in pediatric cancers Jul06, 2016 closed Our experience with tumors have led us to believe that lncRNAs play key roles in defining many childhood tumors and their subtypes. We have performed several exploratory analyses in various tumors that provide evidence of unique non-coding intergenic regions that are characteristic of tumor types. Our efforts in identifying non-coding transcripts related to tumors have focused on pediatric tumors. It is more and more evident that many genomic mutations in tumor reside in regions which do not encode amino acids. But these regions are often transcribed into long non-coding RNAs (lncRNAs). Long non-coding RNAs (lncRNAs) have been shown to regulate important biological processes that support normal cellular functions. Aberrant regulation of these essential functions can promote tumor development. The next-generation sequencing (NGS) to a growing number of cancer transcriptomes has indeed revealed thousands of lncRNAs whose aberrant expression is associated with different cancer types. Our experience with tumors have led us to believe that lncRNAs play key roles in defining many childhood tumors and their subtypes. We have performed several exploratory analyses in various tumors that provide evidence of unique non-coding intergenic regions that are characteristic of tumor types. Our efforts in identifying non-coding transcripts related to tumors have focused on pediatric cancers. We aim to identifying lncRNAs related to pediatric cancers using sequencing data in the TARGET DB. For this research, we will analyze the TARGET data to looking for lncRNA candidates and validate lncRNA candidates involved in metastasis and responsiveness of chemotherapeutic drug in cancers. We will prove the possibility of lncRNA in a cohort of actual pediatric patients with biomarkers. NAOUAR, Naira SORBONNE UNIVERSITE Pediatric Acute Myeloid Leukemia project - CONECT-AML Aug22, 2019 approved CONECT-AML (COllaborative NEtwork on research for Children and Teenagers with Acute Myeloid Leukemia) project is focused on the genomic profiling of pediatric AML by integrating WGS data from the TARGET Acute Myeloid Leukemia project (dbGaP: phs000465.v18.p7) with our own WGS data, in order to establish the genomic alterations responsible for the disease. Specifically, we aim to: (i) characterize de novo AML and (ii) detect cumulative pre-leukemic events in predisposition syndromes (like Fanconi anemia, GATA2 germline mutation or congenital neutropenia), both attributes that were not deeply analysed by the TARGET approach. Objectives of the proposed research: CONECT-AML (COllaborative NEtwork on research for Children and Teenagers with Acute Myeloid Leukemia) project is focused on the genomic profiling of pediatric AML by integrating WGS data from the TARGET Acute Myeloid Leukemia project (dbGaP: phs000465.v18.p7) initiative with our own WGS data, in order to establish the genomic alterations responsible for the disease. Specifically, we aim to: (i) characterize de novo AML and (ii) detect cumulative pre-leukemic events in predisposition syndromes (like Fanconi anemia, GATA2 germline mutation or congenital neutropenia), both attributes that were not deeply analysed by the TARGET-AML approach so far. Study design : 50 samples, belonging to paired tumoral and complete remission cases from CONECT-AML, will be profiled by WGS and RNA-Seq and added to the TARGET AML children’s dataset for detecting all somatic variants. Afterwards, these variants will be validated using deep targeted sequencing at 400x coverage. Analysis plan : It is known that pediatric AML evince particular clinical phenotypes which are associated with specific genomic features. In this context, our methodological work seeks to elucidate the leukemogenesis process by running (at local servers, without cloud computing instances) novel analytical pipelines on the entire datasets for acquiring distinct genomic profiles. At the end, these attributes will facilitate the proposal of novel therapeutic schemas. How the proposed research is consistent with the data use limitations : Hence, the inclusion of TARGET dataset will allow the enlargement of our analysis and, consequently, its accuracy in discovering genetic variants. Altogether, the genomic context that promotes pediatric AML in patients undergoing de novo AML or predisposition syndromes will be disentangled. Nestor, Colm LINKOPING UNIVERSITY Silencing of driver gene splice-variants is a frequent event in T-cell acute lymphoblastic leukemia. Feb13, 2018 closed Many genes are silenced ('turned-off') in childhood T-cell acute lymphoblastic leukemia (T-ALL). Using small public datasets of T-ALL and computer programs that we have developed, we have found many more genes and variants of genes (called ‘splice-variants’) that are silenced in T-ALL. In order to confirm that this is a general feature of T-ALL we need to analyze the raw data from the TARGET T-ALL studies. These genes and their splice-variants cannot be identified from the public data that has already been pre-processed. Our research is important as it will improve how we classify different sub-types of T-ALL, which may be critical for predicting treatment and diagnosis. Also, by identifying new molecular pathways that are important in T-ALL we may be able to find new ways of treating cases in which these pathways are disrupted. We are a research group based in the Department of Paediatrics, Linköping University, Sweden. My research has focused on how altered epigenetic processes and patterns may contribute to the pathogenesis of cancer (i.e. PNAS 2012, Genome Research 2012, Science Trans Med 2015, Cell Reports 2016). Our ultimate goal is to identify novel therapies targeting epigenetic processes in pediatric T-cell acute lymphocytic leukemia (T-ALL). Consequently, the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) datasets requested will be used in research that can only be performed using pediatric data with the ultimate goal of improving diagnosis and treatment of pediatric T-ALL. All data will be stored on an isolated, secure server. Our work is supported by grants from the Swedish Research Council and the Swedish Cancer Society. Our reanalysis of smaller published T-ALL RNA-Seq datasets has revealed that specific splice-variants of several known drivers of T-ALL are frequently silenced in T-ALL. Silencing of these genes is undetected when transcription levels are determined at the level of whole genes, as opposed to individual transcripts. We will use our novel gene model prediction algorithms to identify transcript level silencing of these known and potentially novel driver genes in the TARGET T-ALL datasets. Potential outcomes of this simple experiment are (i) an improved molecular classification of T-ALL subtypes, (ii) improved understanding of the molecular mechanisms that establish and maintain T-ALL cells and (iii) as several of the candidate genes we have identified are druggable, the proposed research may result in identification of novel therapeutic avenues for treatment of T-ALL. Thus, our proposed use of TARGET data is simple, clinically relevant and can only be performed using data from pediatric T-ALL. Newman, Scott EMORY UNIVERSITY Expression of CD36 and the Presence of Cytoplasmic Granules in Blasts Predicts Poor Prognosis in Children with B-Lymphoblastic Leukemia Mar19, 2015 closed B-lymphoblastic leukemia (B-LL) is the most common childhood cancer. While the over 80% of patients are cured, there remains a subgroup who die from the disease. Our research efforts target this subgroup. CD36 is a protein found on the cell surface of monocytes and erythroid precursors, which are cells normally found in the bone marrow. Rarely, CD36 is detected on B-LL cells, and we have found that these patients have an overall worse outcome. We have also noticed that if there are granules in the cancer cells, they do especially worse (3 year event free survival of 24%+/-18.85%). We are now trying to find out why. Of the identified 20 patients with B-LL expressing CD36, and only one had RNA quality adequate enough to do RNA sequencing. We found a rare gene fusion in this sample, and we would like to see if there is a true association between CD36 expression and this gene fusion by reviewing the published database of whole genome sequencing of pediatric B-LLs. We will then publish these findings to reach a wide scientific audience. CD36, a marker expressed on monocytes and erythroid cells, is rarely expressed on blasts in childhood B-lymphoblastic leukemia (B-LL). We have observed that these CD36+ blasts often have cytoplasmic granules (CG), also a rare finding, and that patients with CD36+/CG+ B-LL seem to have a worse outcome. We performed an IRB-approved retrospective review of B-LL cases diagnosed between September, 2008 and April, 2013 at our institution. To be eligible for this analysis, patients had to be less than 21 years old and have had a marrow aspirate at initial diagnosis demonstrating moderately bright CD36 expression on at least 5% of blasts. A pediatric hematopathologist reviewed smears for the presence of cytoplasmic granules. We abstracted data on patients, disease, and treatment from patients’ charts. We identified 20 cases of CD36+ B-LL. 11 patients met NCI high-risk criteria. 11 cases had cytogenetic abnormalities, including 3 with Ph+ ALL. 10 patients had blasts with cytoplasmic granules. Treatment varied according to risk classification. Induction therapy failed in 5 cases; in another 10, there was minimal residual disease (greater or=0.01%) at day 29 of induction. 4 patients underwent hematopoietic stem cell transplantation in 1st complete remission. With a median length of follow up 22 months, 3-year EFS was 49.45±14.01 for the entire cohort, and 24±18.85 for those whose blasts were CD36+/CG+ (p=0.033). This is much worse than the typical pediatric B-LL with EFS >85%. We have performed RNA sequencing in one of our 20 patients and have identified a rare gene fusion, and would like to confirm our findings by comparing them to the whole genome sequencing database of pediatric B-LLs. We will be looking specifically for abnormalities of the CD36 gene, association with the gene fusion, or other abnormalities that have been described in Ph+-like ALL. We plan to publish these findings to reach a wide audience in the scientific community, including pediatric hematologists/oncologists, pathologists, and researchers. Ng, Yao GENETIC INTELLIGENCE Elucidating the genetic bases of pediatric cancers Aug08, 2017 closed The goal of this project is to identify genomic mutations that lead to the development of pediatric cancers. This can lead to a better understanding of the molecular mechanisms behind childhood cancers, as well as future developments of more accurate diagnostics, effective drugs and preventive treatments. We are interested in identifying the genetic causes of pediatric cancers, such as acute lymphoblastic leukemia, neuroblastoma and Wilms tumor. These have strong genetic components compared to adult cancers, since they arise early in life. We plan to study the whole genome with the help of machine learning algorithms, in order to discover mutations in genes and other non-coding regions that are responsible for the development of childhood cancers. This will go beyond looking at just the exome/SNPs and has the potential to lead to the discovery of new targets for the development of diagnostics and therapeutics. This analysis will not result in any additional risks to participants. NICHOLS, KIM ST. JUDE CHILDREN'S RESEARCH HOSPITAL Familial investigations of childhood cancer predisposition Nov17, 2019 approved In this study, we want to expand the number of genes that when mutated in the germline can cause children to have higher risk of developing cancer. Approximately 10% of pediatric cancers are attributed to mutations at the germline level. However, we have observed “clustering” of cancers in families in some of the other 90% of cases. Therefore, we know that there are additional genes to be discovered as causing an increased risk of cancer when mutated in the germline. To answer this question, we are studying the DNA from patients who have cancer but have had uninformative (or negative) clinical genetic testing for known cancer-causing suspects, and their cancer-affected and unaffected relatives. By looking at additional sequences of people with specific cancer types, like the ones we are requesting from dbGaP, we can find out if the mutated genes we uncover are present at a higher frequency in individuals with the same cancer, or across the cancer spectrum. Finding out this information will help us to hone in on genes that are cancer-causing when mutated in the germline DNA. It is known that approximately 10% of all cancers are attributed to mutations at the germline level. Nonetheless, a large proportion of familial cancers remains without a known cause. The purpose of this study is to identify novel genetic causes of familial cancer by using genomic sequencing. The phenomenon of familial clustering of cancer without a known genetic driver suggests that there are still novel cancer predisposing and/or disease-modifying variants to be identified. In the current study, we are collecting germline samples for genomic sequencing to identify novel predisposing mutations in individuals and relatives with familial cancer of undetermined origin. The “familial cancers” being investigated in this study are focused on pediatric onset tumors (a proband diagnosed before the age of 26 years) and cancers occurring at earlier than expected ages (first, second, or third degree relative with a history of cancer diagnosed under 51 years of age). By looking for variants segregating with cancer in these families, we hope to hone in on novel genes and variants that may be predisposing these “mystery” families to cancer. In the case that a novel gene is found to be segregating disease in the family and is relatively rare in the population, we hope to utilize the requested dbGaP datasets to query for additional individuals that have variants in the same gene (both in a related cancer and non-related cancer) and to potentially conduct burden testing to determine if the genes and/or variants that we have identified as candidates are also enriched in presumably sporadic cancers. For example, if we identify a loss-of-function variant in IKZF1 in a child, mother, and grandfather all with a history of acute lymphoblastic leukemia, we will query all germline ALL samples in the requested datasets for loss-of-function variants in the IKZF1 gene. Statistical analysis of population enrichment will be calculated via a 2x2 Fisher’s exact test and significance will be defined by a two-sided P = 0.05. Estimates of the odds ratio (OR) will also be performed. The control population employed will comprise an in-house cohort and the non-cancer subset of the Genome Aggregation Database (gnomAD). We will further perform rare variant burden test using total-frequency testing to analyze enrichment of pathogenic and likely pathogenic germline variation. P-values will be corrected for multiple testing and with a significance threshold of FDR <0.05. If germline variation and enrichment compared to the control population is found in a gene of interest, clinical and family history information will be evaluated to better understand if inherited changes in the gene may be associated with specific cancer phenotypes or characteristics. Although many biological processes are dysregulated in both pediatric and adult cancers, the affected genes may be either pediatric-specific (e.g., transcription factors and JAK-STAT pathway genes) or common to both (e.g., cell cycle genes and epigenetic modifiers). This is well established in the comparison of pediatric and adult somatic genomic landscapes. Accordingly, Ma et al (2018) reported 142 driver genes in pediatric cancer, of which only 45% (64 out of 142) match those found in adult pan-cancer studies. Similarly, Gröbner et al reported in 2018 that only around 30% of pediatric significantly mutated genes overlapped with adult significantly mutated genes. We believe that germline cancer predisposition is also likely to differ between pediatric onset tumors and adult-onset tumors. Therefore, use of the requested datasets is necessary to broaden our knowledge of germline predisposition to pediatric cancer. Knowledge of germline predisposition can lead to more effective and targeted treatments (e.g. the use of immune-checkpoint inhibitors in patients with CMMRD), and possibly earlier interventions and implementation of surveillance protocols. In addition to pediatric datasets, we are also requesting the use of TCGA data as persons with adult-onset tumors will also be members of some of these “mystery” families with suspected underlying predisposition to cancer. The investigators believe that collaboration and sharing of discoveries will lead to greater advancement in the field of pediatric cancer predisposition, and thus, relevant findings from these investigations will be broadly shared with the scientific community via timely publication in peer-reviewed journals. NICHOLS, KIM ST. JUDE CHILDREN'S RESEARCH HOSPITAL Interrogation of pediatric tumors for hotspot mutations and using this information to elucidate cancer risk (HOTSPOT) Dec16, 2019 approved A mutational hotspot is a single amino acid position in a protein-coding gene that is mutated more frequently than would be expected in the absence of selection. The landscape of mutational hotspots across 24,592 has previously been defined, and remarkably, germline mutations affecting 54 of these 247 genes (22%) are associated with well-established cancer predisposition syndromes. However, less is known about pediatric cancer hotspot genes and the role of these genes in childhood cancer predisposition. We propose that pediatric tumors will have a unique set of hotspot genes and mutations based on the differences observed in the genomic landscapes of pediatric cancers, where between 30-45% of driver genes in pediatric cancers match those found in adult cancers. Information about mutational hotspots in pediatric cancers will expand our understanding of tumorigenesis and facilitate identification of new predisposition genes and syndromes. Recently, Chang et al. identified recurrent alterations, designated “mutational hotspots”, in 247 genes across 24,592 adult tumors. Remarkably, germline mutations affecting 54 of these 247 genes (22%) are associated with well-established cancer predisposition syndromes [1, 2]. Chang defined a somatic mutational hotspot as a single amino acid position in a protein-coding gene that is mutated more frequently than would be expected in the absence of selection [2]. While the exact definition of a hotspot varies depending on the approaches used in calculations [1-3], methodologies assign a statistical significance to the recurrence of mutation at a given amino acid corrected for the background mutational rate of the same position, gene, and sample both within and across tumor types in a cohort of cancer-affected individuals. Despite the identification of mutational hotspots in adult tumors, less is known about pediatric cancer hotspot genes and the role of these genes in childhood cancer predisposition. We propose that pediatric tumors will have a unique set of hotspot genes and mutations based on the differences observed in the genomic landscapes of pediatric and adult cancers, where between 30-45% of driver genes in pediatric cancers match those found in adult cancers [4,5]. Specifically, we hypothesize that pediatric tumors will have a unique landscape of mutational hotspots and that at least some of these pediatric cancer hotspot genes will be associated with novel germline cancer predispositions. Defining the spectrum of mutational hotspot genes and the variants within them will: 1) Expand our understanding of pediatric tumorigenesis; 2) Lead to the discovery of novel pediatric cancer predisposition genes. The requested pediatric datasets will be combined with other datasets outside of dbGaP, including data from the Pediatric Cancer Genome Project (PCGP) and other datasets internal to St. Jude Children’s Research Hospital. All datasets will be analyzed together to define the landscape of mutational hotspots in pediatric tumors and potential germline variation in these genes. Specifically, the somatic mutation calls from each dataset will be processed through a quality control and filtering pipeline developed at St. Jude. Variant call files will then be harmonized with data from other pediatric cancer cohorts at peer institutions. An algorithm described by Chang et al [1,2] and publicly available on GitHub (https://github.com/taylor-lab/hotspots) will be applied to the dataset. Resulting hotspots will undergo thorough review to ensure that only high-confidence hotspot mutations are retained. RNAseq data will be utilized where available to evaluate the effect of hotspot mutations on gene expression. These analyses will be performed by a bioinformatics analyst. We will subsequently determine whether constitutional DNA from pediatric cancer patients harbor germline variants in genes with mutational hotspots in pediatric tumors and/or other genes that are frequently mutated in pediatric tumors and whether variation in these genes is enriched as compared to a non-cancer control population. Information about mutational hotspots in pediatric cancers will expand our understanding of tumorigenesis and facilitate identification of new predisposition genes and syndromes. The investigators believe that collaboration and sharing of discoveries will lead to greater advancement in the field of pediatric cancer tumorigenesis and germline predisposition. Therefore, relevant findings from this study will be broadly shared with the scientific community via timely publication in peer-reviewed journals. [1] Chang, M.T., et al., Accelerating discovery of functional mutant alleles in cancer. Cancer discovery, 2018. 8(2): p. 174-183. [2] Chang, M.T., et al., Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nature biotechnology, 2016. 34(2): p. 155. [3] Huang, K.-l., et al., Pathogenic germline variants in 10,389 adult cancers. Cell, 2018. 173(2): p. 355-370. e14. [4] Gröbner, S.N., et al., The landscape of genomic alterations across childhood cancers. Nature, 2018. 555(7696): p. 321. [5] Ma, X., et al., Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature, 2018. 555(7696): p. 371. Nikolaev, Sergey INSTITUT GUSTAVE ROUSSY Genomic profiling of tumors in children with Xeroderma Pigmentosum Nov21, 2019 closed Somatic cells of the human body accumulate DNA damage and mutations due endogenous and exogenous genotoxic substances. DNA repair pathways are the gate-keepers of the genome stability and repair the majority of DNA damage. Deficiencies of DNA repair pathways result in increased mutation rates and subsequent increase of cancer risk. Xeroderma pigmentosum (XP) is a genetic disorder caused by impaired Nucleotide Excision Repair (NER) and is associated with the increased cancer risk in children. We performed recently sequencing of 8 cancer genomes in children with XP where NER is not active. We would like to study the impact of this NER pathway in sporadic childhood cancers. For that we would like to compare mutational burdens and profiles of NER-deficient childhood cancers with age-matched and tissue-matched sporadic childhood cancers. Understanding of the role of NER in mutagenesis in childhood cancers may guide the treatment strategies. Xeroderma pigmentosum (XP) is a genetic disorder caused by impaired Nucleotide Excision Repair (NER). The most aggressive type of XP associated with increased risk of skin cancers of 10’000 fold and of internal cancers of 20 fold is XP-C subtype. These patients start to develop cancer at early age (around 4 years old). It is suggested that increased cancer risk is due to inability to repair some types of DNA lesions in XP-C and subsequent increase of the mutation rates. However, genome sequencing of such childhood cancers was not performed. We recently performed WGS of 6 leukemia, breast cancer and rhabdomyosarcoma of children with XP-C and revealed very high mutation rates in those cancers. Moreover we observed a very specific mutational signature, COSMIC Signature 8, which, based of transcriptional bias analysis, is associated with mutations from purines. In order to better understand how mutational burden may impact the risk of childhood cancers we would like to further analyze XP-C mutational profiles in the context of age-matched and tissue-matched sporadic childhood cancers. Genomic comparisons of cancers from children with XP-C may also reveal the contribution of NER deficiency in sporadic childhood cancers. Norman, Paul UNIVERSITY OF COLORADO DENVER Genetic Variants of Natural Killer Cell Receptors can Protect from Developing Pediatric Leukemia Apr17, 2019 expired The overarching purpose of this project is to understand why most individuals don’t get leukemia. This information will mean that immunotherapy protocols that manipulate NK cells can be improved by using NK cells having the best chances of success. This will ultimately direct NK cell manipulation to be exquisitely targeted for optimal use on an individual patient basis. Natural killer (NK) cells are lymphocytes that can detect and kill leukemia cells. Accordingly, NK cells are becoming recognized as major players in cell-based immunotherapy for leukemia. NK cells carry receptors to recognize the proteins that are expressed by tissue cells so they can signify their health status to the immune system. We have shown that the genes encoding these receptors and ligands are highly variable between individuals and populations, and that this genetic variation directly affects NK cell functions. Thus we will determine the genetic variants of NK cell receptor and ligand pairs that protect or predispose to development of pediatric leukemia. We will extract the HLA class I and KIR allele sequences from BAM files of the TARGET ALL patients and determine the alleles present at each of these loci. We will compare the combinations with those found in healthy individuals. We have devised an interaction score to measure the strength of NK cells for each specific combination of receptor and ligand alleles. We will extract limited ancestry informative markers to enable proper matching of disease and healthy cohorts. We plan to combine these data with those ALL patients of the Pediatric Cancer Genome Project, and with no extra risk perceived. Ogawa, Seishi KYOTO UNIVERSITY Genetic analysis of pediatric acute lymphoblastic leukemia Apr17, 2017 closed Acute lymphoblastic leukemia (ALL) is the most common cancer in childhood. The survival rate of pediatric ALL has greatly increased over time, but relapsed cases are chemo-resistant and long-term survival of these cases is still poor, moreover it is difficult to predict the risk of relapse accurately. In this study, we will identify the molecular mechanisms of relapse or treatment failure in ALL. Acute lymphoblastic leukemia (ALL) is the most common cancer in childhood. The survival rate of pediatric ALL has greatly increased over time, but relapsed cases are chemo-resistant and long-term survival of these cases is still poor, and it is difficult to predict the risk of relapse accurately. In this study, we will perform whole-exome sequencing (WES), targeted sequencing, and RNA sequencing to identify the molecular mechanisms of relapse or induction failure in ALL. We would also like to analyze the genetic difference between cured and relapsed/refractory cases combining other data sets of ALL cohorts deposited in dbGaP including WES, RNA-seq, and methylation analysis. The combined analysis of these data sets does not create any additional risk to participants. No request will be made for the identification of participants. We intend to broadly share any findings from public data sets with the scientific community. Ogawa, Seishi KYOTO UNIVERSITY Genetic analysis of relapsed pediatric acute myeloid leukemia Mar19, 2015 closed Pediatric acute myeloid leukemia (AML) comprises ~20% of pediatric leukemia, representing one of the major therapeutic challenges in pediatric oncology. Nearly 40% of patients still relapse after present first-line therapies and once the relapse occurs, the long-term survival rates decrease, ranging from 21% to 34%. As for the pathogenesis of AML relapse, the recent development of massively parallel sequencing technologies has provided a new opportunity to investigate comprehensive genetic alterations that are involved in tumor recurrence of adult AML. However, little is known about the molecular details of relapsed pediatric AML. In this study, we will analyze the clonal origin and the major mutational events in relapsed pediatric AML. Pediatric acute myeloid leukemia (AML) comprises ~20% of pediatric leukemia, representing one of the major therapeutic challenges in pediatric oncology. Nearly 40% of patients still relapse after present first-line therapies and once the relapse occurs, the long-term survival rates decrease, ranging from 21% to 34%. However, little is known about the molecular details of relapsed pediatric AML. In order to reveal the clonal origin and the major mutational events in relapsed pediatric AML, we performed whole exome-sequencing of 4 trio samples from diagnostic, relapsed and complete remission. We would also like to validate our findings using AML cohort in the TARGET project, and analyze the differences of clonal evolution patterns in relapse between acute lymphoblastic leukemia (ALL) and AML using TARGET’s data sets including WES and whole-genome sequencing analyses. The combined analysis of these data sets does not create any additional risk to participants. No request will be made for the identification of participants. Onel, Kenan FEINSTEIN INSTITUTE FOR MEDICAL RESEARCH The GWAS of Pediatric Cancers Jan27, 2017 closed The Onel lab focuses on the study of pediatric cancers. Specifically, we look for genetic mutations linked with the likelihood of developing pediatric cancers, particularly cancers of the blood, bone marrow, and brain. We propose to use the TARGET datasets for genome-wide association studies (GWAS), which look for disease-related variants across the genome. Our objectives are two-fold: 1) use the data to replicate findings from a previous study for increased confidence in their validity and 2) combine this data with our original dataset to look for variants with small effects on disease predisposition that we could not detect with our smaller discovery dataset alone. Since our goal is to study cancers in a pediatric population, we would be unable to do this research with adults cases. We plan to look for potential therapeutic targets in our results, with the goal to development of new cancer treatments for children. We will also determine if these genetic variants can be used to predict the type and severity of the cancer, and possibly the patient response to certain treatments. The Onel lab focuses on the study of pediatric cancers. Specifically, we look for genetic variants associated with predisposition for pediatric cancers, particularly hematologic cancers (ALL and AML) and neuroblastoma. We propose to use the TARGET genotype array datasets for genome-wide association studies (GWAS). Our objectives are two-fold: 1) use the data to replicate findings from a previous discovery GWAS and 2) meta-analyze this data with our discovery dataset to look for novel low-penetrance variants that we did not have the power to detect with our discovery dataset alone. Since our goal is to study cancers in a pediatric population, we would be unable to do this research with adults cases. We plan to pursue functional studies of any disease-associated variants that we find in our GWAS using the TARGET datasets with the goal of identifying potential therapeutic targets. We will also evaluate any novel predisposition variants for their use as diagnostic or prognostic markers. UPDATE: This project was previously approved at the University of Chicago and is now being reapplied for at Northwell Health (Feinstein Institute). Dr. Onel remains the same P.I. Onel, Kenan UNIVERSITY OF CHICAGO GWAS of Pediatric Cancers Nov24, 2015 closed The Onel lab focuses on the study of pediatric cancers. Specifically, we look for genetic mutants linked with the likelihood of developing pediatric cancers, particularly cancers of the blood, bone marrow, and brain. We propose to use the TARGET datasets for genome-wide association studies (GWAS), which look for disease-related variants across the genome. Our objectives are two-fold: 1) use the data to replicate findings from a previous study for increased confidence in their validity and 2) combine this data with our original dataset to look for variants with small effects on disease predisposition that we could not detect with our smaller discovery dataset alone. Since our goal is to study cancers in a pediatric population, we would be unable to do this research with adults cases. We plan to look for potential therapeutic targets in our results, with the goal to development new cancer treatments for children. We will also determine if these genetic variants can be used to predict the type and severity of the cancer, and possibly the patient response to certain treatments. The Onel lab focuses on the study of pediatric cancers. Specifically, we look for genetic variants associated with predisposition for pediatric cancers, particularly hematologic cancers (ALL and AML) and neuroblastoma. We propose to use the TARGET genotype array datasets for genome-wide association studies (GWAS). Our objectives are two-fold: 1) use the data to replicate findings from a previous discovery GWAS and 2) meta-analyze this data with our discovery dataset to look for novel low-penetrance variants that we did not have the power to detect with our discovery dataset alone. Since our goal is to study cancers in a pediatric population, we would be unable to do this research with adults cases. We plan to pursue functional studies of any disease-associated variants that we find in our GWAS using the TARGET datasets with the goal of identifying potential therapeutic targets. We will also evaluate any novel predisposition variants for their use as diagnostic or prognostic markers. ORSULIC, SANDRA CEDARS-SINAI MEDICAL CENTER Characterizing the overexpression of genes in osteosarcoma and rhabdoid tumor Jul28, 2016 closed We have previously identified a set of genes that normally play a role in development of various tissues in the body, but are elevated at high levels in many pediatric cancers. Using cell lines, we have shown that these genes are important for tumor biology. We will use the Target dataset to confirm whether or not our cell line data is correct and therefore clinically relevant. Through this analysis, we can potentially identify new drug targets to help treat these poorly understood and understudied cancers. Utilizing public datasets studying pediatric cancers and in house pediatric tumor samples, we have identified a subset of developmental transcription factors (DTFs) as overexpressed in many poorly differentiated pediatric tumors, including osteosarcomas and rhabdoid tumors (RT). We plan to analyze the osteosarcoma and RT TARGET RNA-seq pediatric datasets to identify pediatric patient samples that have aberrant overexpression of the identified DTFs, identify whether there are any clinical variables that correlate, and find co-regulated genes. We will integrate this dataset with our RNA-seq data of pediatric osteosarcoma and RT cell lines that have been genetically engineered to overexpress or knockdown the the DTFs. This will provide clinical validation and confirmation of our in vitro studies. We also have previously performed ChIPseq analysis for the DTFs, H3K27ac, and H3K27me3 in pediatric RT cell lines. Therefore, we plan to integrate our cell line ChIPseq data with the Target RT ChIP-seq dataset. Analysis of the ChIPseq data will help identify putative DTF target genes that may be serve as therapeutic targets since targeting transcription factors is currently not amenable to small molecule inhibition. These integrations of the TARGET ChIPseq and RNAseq data with our cell line datasets will not create any additional risk to participants. We expect to identify a subset of pediatric patient samples that aberrantly express the DTFs and identify coregulated genes. We also expect to identify target genes that overlap with the H3K27ac/H3K27me3 patient data to potentially identify clinically relevant therapeutic targets that we will validate in functional studies using pediatric cancer cell line models. We acknowledge the data use limitations for the TARGET dataset and will limit our use of TARGET to pediatric cancer research only, specifically rhabdoid tumor and osteosarcoma. Use of the TARGET datasets will only be conducted using pediatric data. Otridge, John Brian NIH Testing TCGA, TARGET and CGCI data in NCI's CGC Cloud Pilots Mar22, 2017 closed I will test the TCGA, TARGET and CGCI data in 3 different cloud environments to confirm that the TCGA, TARGET and CGCI data was successfully migrated to each environment, that the data is appropriately available in each environment and that data integrity is maintained for data downloads from each environment. My purpose for accessing TCGA, TARGET and CGCI data is to evaluate the quality and accessibility of the data in NCI's Cancer Genomics Cloud (CGC) pilots. The presence of the TCGA, TARGET and CGCI data is a key requirement for each of the 3 pilots. By including TARGET and CGCI data, the cloud pilots will advance pediatric cancer research as each of the pilots will create an environment where researchers not only have access to the data, under appropriate access control, but will also have the necessary tools and informatics compute infrastructure enabling them to analyze the data. This project involves confirming the availability of TCGA, TARGET and CGCI data in the 3 CGC pilots, verifying the integrity of data at each pilot and testing the availability of the data to be used with the tools each pilot's environment provides or enables to be used. In addition, data will be downloaded to verify data integrity and access control of the download process. Each of the pilot’s will source the TARGET and CGCI data from NCI’s Genomic Data Commons (GDC). By performing this type of testing, our project helps ensure that when pediatric cancer researchers use the pilots, 1) they will have access to the Pediatric Cancer data provided by TARGET & CGCI, 2) the data is the same as found in the GDC, 3) they will have access to the relevant tools for analyzing the pediatric cancer genomic data, 4) they will be able to run analyses on the data, and 5) that TARGET and CGCI data access is appropriately controlled within each pilot’s environment. No research paper(s) will be written from this testing so no TCGA, TARGET and CGCI data will be published as a result of this work. Furthermore, no TCGA, TARGET or CGCI data will be disseminated from this project. When we complete testing, all downloaded data will be destroyed as per requirement of the DUC. Palmqvist, Lars GOTEBORG UNIVERSITY Molecular Mechanisms in Development of Acute Myeloid Leukemia with t(7;12) or 11q23 in Children Oct15, 2021 approved Acute myeloid leukemia (AML) is the most common form of acute leukemia in adults but it also affects children. AML has an inferior outcome in children compared to the more common form of acute lymphoblastic leukemia (ALL) also seen in children. Our project aims to reveal the mechanisms behind AML development and to find new treatments. For this purpose, we need and use data derived from bone marrow or blood sampels from children with AML. The overall aim of our projects is to identify molecular mechanisms involved in the development of Acute Myeloid Leukemia (AML) in Children and find novel ways of targeting leukemic cells. Pathogenic mechanisms for the development of this disease and response to different treatments are studied by using data derived from samples from children with AML, in particular those with t(7;12) AML, which is a form of AML only found in children, but also AML with 11q23 rearrangements, also mostly found in children. Comparing different data sets from patient with AML can help us identify importerat pathways and mechanisms that can be targeted as a mean to treat patients. We believe that our investigations will lead to a better understanding of the development of AML and ultimately to better treatment of patients with AML. All results will be published or otherwise made broadly available to the scientific community. PAN, ZHIJUN ZHEJIANG UNIVERSITY prognostic markers of childhood Osteosarcoma by Genomic Instability model Mar17, 2021 expired Our research is to analyze the correlation between the number of mutant genes and the prognosis of childhood osteosarcoma. Some osteosarcoma mutant genes have been discovered . We use the number of these mutant genes to construct a new prognostic marker and model. Methods: Download the transcriptome data and clinical data of childhood patients with osteosarcoma, and the number of mutant genes from the dbGaP. The data is divided into two groups based on the number of mutations: genetically unstable group and genetically stable group, and then differential analysis is performed, survival analysis is combined with clinical survival time to construct a prognostic risk model. This model is used to calculate patient risk scores and perform clinical correlation analysis..we can creat a new prognostic model and discover key genes as prognostic marker to contribute to further clinical diagnosis and treatment? a new prognostic marker model of childhood Osteosarcoma by Genomic Instability Osteosarcoma is a more common type of malignant bone tumor that mainly occurs in childhood. It is the most common pediatric bone malignancies, which is about 5% of pediatric tumors. We want to establish a new prognostic marker and model for childhood osteosarcoma through gene mutation data for childhood osteosarcoma in the Target database. At present, the TCGA open data only has two samples, and it is impossible to establish prognostic marker and model by two samples. Only through the data of childhood osteosarcoma in Target database can a new prognostic marker and model be established. Methods: The transcriptome data and clinical data of patients with osteosarcoma, as well as the number of mutated genes in patients were extracted from the dbGaP. According to the number of mutant genes in childhood patients, the data is divided into two groups: genetically unstable group and genetically stable group, and then differential expression analysis is performed. Survival analysis was performed by combining the results of the difference analysis with the clinical survival time. The risk score was calculated based on the number of mutant genes, expression levels and their corresponding coefficients related to the prognosis of osteosarcoma to construct a prognostic model. This model is used to calculate patient risk scores and perform clinical correlation analysis?we can creat a new prognostic model and discover key genes as prognostic marker to contribute to further clinical diagnosis and treatment? Panchenko, Anna QUEEN'S UNIVERSITY AT KINGSTON Elucidating the mechanisms of histone mutations in human cancers Mar31, 2021 closed Somatic histone mutations have been observed in various tumor types, and it is important to understand the mechanisms of histone mutations in cancer development. In this proposed research project, we will collect the data about the histone genetic variants from various tumor types and develop new computational approaches to analyze them. We aim to identify the “key” histone mutations that drive cancer development and elucidate their effects on protein functions and relevant biological processes. Our study will also potentially identify novel prognostic markers for early cancer diagnostics and drug target for developing more effective treatment. Somatic histone mutations have been observed in various tumor types, and their effects on epigenetic processes can probably drive the oncogenic development. Here, we plan to investigate the mechanisms of histone cancer mutations for three primary objectives: 1) Identify histone cancer driver mutations. 2) Analyze histone cancer mutations’ effects on different epigenetic processes. 3) Identify potential prognostic markers and drug targets. The design of our study includes 1) Data collection. We will collect and combine all the data about the histone genetic variants and phenotypic characteristics from four studies (phs000178, phs000218, phs000748, phs001486). We will further integrate these data with other data from ICGC (https://icgc.org) to build a comprehensive dataset of histone genetic variants. All of these data will be analyzed together in our project. 2) Predict/rank histone mutations with respect to their driver status. We will develop a pipeline that utilize the current start-of-art bioinformatics approaches to identify the histone driver mutations from the collected data. 3) Analyze histone driver mutations’ effects on epigenetic processes. Analysis plan includes 1) Analyze histone driver mutations’ effects on protein functions. We plan to use our recently constructed human histone interactome (Peng, et al. Journal of Molecular Biology, 2020) to elucidate driver mutations’ effects on protein-protein interactions, protein-DNA interactions, and post-translational modifications. It will help us to understand how driver mutations disrupt protein functions and relevant biological processes. 2) Analyze the association of histone driver mutations with patients’ clinical features including patients’ survival rate and treatment. We collect all the data only for General Research Use (phs000178 and phs001486 required) and will develop a new computational method to predict and understand histone driver mutations in various tumor types (phs000748 required). Since many histone mutations have also been observed in pediatric cancer, our study can potentially identify the prognostic markers and discover novel drug targets for developing more effective treatment for pediatric cancer (phs000218 required). Thus, our proposed research is consistent with the data use limitations for all the requested datasets. Our collaborators include Dr. David Landsman and Yunhui Peng from NCBI, NIH. We will use the TARGET dataset to refine the pipeline to benefit pediatric cancer patients and the work will develop pediatric-specific biomarkers and compare them with adult biomarkers. We believe it will benefit pediatric cancer patients. Papaemmanuil, Elli SLOAN-KETTERING INST CAN RESEARCH mutational signature analysis of neuroblastoma Oct04, 2016 expired Internationally accepted treatment protocols for neuroblastoma includes chemotherapy as well as additional radiation therapy for high-risk cases. Adoption of such intensified protocols has led to significant improvements in survival rates, however the effect of these therapeutic insults on the genome has not been studied. We would like to perform a genome-wide analysis of the patterns of genetic changes induced by therapy with a special focus on radiotherapy. Our aim is determine to what extend such changes are observed in neuroblastoma patients that have undergone therapeutic protocols with treatment intensity and look for correlations between these patterns and clinical variables. This study will potentially inform us on the design of future treatment protocols. Ionizing radiation is a potent carcinogen, inducing cancer through DNA damage and mutation. We recently studied 12 whole cancer genomes from four different radiation-associated second malignancies and identified specific signatures of somatic mutation that characterize ionizing radiation exposure irrespective of tumour type. Internationally accepted treatment protocols for neuroblastoma includes chemotherapy as well as additional radiation therapy for high-risk cases. In an effort to assess the relevance of radiation-associated mutation signatures in neuroblastoma, we would like to request access to the raw data generated by the TARGET initiative. Our aim is to delineate the mutational signatures in the genomic data and correlate the identified patterns with available clinical information. We first will perform a detailed analysis of the available diagnostic/relapse samples. This dataset will include the 23 tumour pairs from TARGET initiative as well as our own patient cohort from the MSKCC neuroblastoma clinic. We will focus on mutational signature analysis comparing the patients that underwent additional radiotherapy to those that have received chemotherapy alone. Our preliminary analysis of the published mutation calls for the TARGET samples already identifies enrichment in deletion burden as compared to insertions (1.5-3 fold) amongst those patients that underwent radiotherapy. However, we believe a thorough investigation of mutational signature patterns in the genomic data as well as extension of this observation to a larger dataset are warranted. In the second stage, we will look for correlations between these patterns and clinical variables in the whole dataset. We also would like to include ~140 patients that have diagnostic samples with whole genome sequencing data available. We hope that this analysis will give us insights into the genomic changes induced by therapeutic agents and will potentially inform us on the design of future treatment protocols. Papanicolau-Sengos, Antonios NIH Optimal validation of DNA-methylation based tumor purity estimation Nov18, 2022 closed The standard way of classifying cancer is to look at thin sections of tissue under the microscope. Although generally reliable, this way of cancer classification cannot capture the molecular complexity of the cancer. This “hidden” molecular complexity has multiple levels including DNA sequence, the number of copies of each gene, molecules added on the DNA itself (ie, a form of epigenetics). Some cancers can be subclassified into molecular subsets which can, for example, be particularly aggressive or treatable by a specific medication. A problem with these molecular methods of cancer is that their result can be affected by non-cancerous cells in the tested specimen. Although pathologists can look at the microscope and give us an estimate percent of tumor cells, this is known to be a poor measure of tumor purity because of its subjectivity. Our goal is to get multiple levels of molecular data from the same specimens, get tumor purity estimates using various computerized methods, exclude cases with tumor purity that varies too much between methos, and validate all methods, including DNA methylation, in order to create an optimal DNA methylation tumor estimation method. update 2: ADD CPTAC-3 REQUEST (INADVERTENTLY NOT INCLUDED INITIALLY) Update 1: CLARIFY PURPOSE OF INCLUDING NCI TARGET IN THIS REQUEST Objective: Our laboratory is specialized in DNA methylation-based classification of cancer. We have repeatedly observed that low tumor purity is a major obstacle in DNA methylation-based cancer classification and wish to develop a rigorously validated model of DNA methylation-based tumor purity prediction. Study design and analysis: In silico estimation of tumor purity can be obtained using multiple models such the DNA methylation-based RFpurify, the DNA sequencing (mutation)-based PurBayes, and the DNA-sequencing (copy number)-based Sequenza models. From experience we know that there is substantial variation between the outputs of these and other models even when the same specimen is used. We intend to use multiple in-silico methods of estimating neoplasm content to curate a large number of specimens analyzed by DNA sequencing and DNA (array-based) methylation. We intend to apply multiple models on data originating from the same specimens (including germline data which is required in some of these models). Specimens with discordant outputs will be excluded in order to produce a highly curated subset of tumors with tumor purity that is reproducible by multiple purity estimation methods. These curated cases with highly harmonized outputs will be used to train and validate a DNA methylation-based model of tumor purity. We are particularly interested in data that will allow us to use the PurBayes R package which requires raw (somatic + germline) MAF files from WXS data (aggregated somatic mutations) and the model contained in the Sequenza R package which requires BAM files from WXS data. Except for TCGA, the CPTAC, CGCI, TARGET, and HCMI projects contain appropriate file sets for this study. The TARGET dataset in particular will be used only to evaluate the developed tool. Analysis plan: Using the purity outputs of the same material tested by multiple methods (DNA sequencing, DNA copy number, and DNA methylation) we will identify specimens that are reproducibly pure by multiple methods. These specimens will be used to retrain the models and validate a DNA methylation tumor purity model. Park, Sunita CHILDREN'S HEALTHCARE OF ATLANTA, INC. Expression of CD36 and the Presence of Cytoplasmic Granules in Blasts Predicts Poor Prognosis in Children with B-Lymphoblastic Leukemia Jul13, 2015 closed B-lymphoblastic leukemia (B-LL) is the most common childhood cancer. While the over 80% of patients are cured, there remains a subgroup who die from the disease. Our research efforts target this subgroup. CD36 is a protein found on the cell surface of monocytes and erythroid precursors, which are cells normally found in the bone marrow. Rarely, CD36 is detected on B-LL cells, and we have found that these patients have an overall worse outcome. We have also noticed that if there are granules in the cancer cells, they do especially worse (3 year event free survival of 24%+/-18.85%). We are now trying to find out why. Of the identified 20 patients with B-LL expressing CD36, and only one had RNA quality adequate enough to do RNA sequencing. We found a rare gene fusion in this sample, and we would like to see if there is a true association between CD36 expression and this gene fusion by reviewing the published database of whole genome sequencing of pediatric B-LLs. We will then publish these findings to reach a wide scientific audience. CD36, a marker expressed on monocytes and erythroid cells, is rarely expressed on blasts in childhood B-lymphoblastic leukemia (B-LL). We have observed that these CD36+ blasts often have cytoplasmic granules (CG), also a rare finding, and that patients with CD36+/CG+ B-LL seem to have a worse outcome. We performed an IRB-approved retrospective review of B-LL cases diagnosed between September, 2008 and April, 2013 at our institution. To be eligible for this analysis, patients had to be less than 21 years old and have had a marrow aspirate at initial diagnosis demonstrating moderately bright CD36 expression on at least 5% of blasts. A pediatric hematopathologist reviewed smears for the presence of cytoplasmic granules. We abstracted data on patients, disease, and treatment from patients’ charts. We identified 20 cases of CD36+ B-LL. 11 patients met NCI high-risk criteria. 11 cases had cytogenetic abnormalities, including 3 with Ph+ ALL. 10 patients had blasts with cytoplasmic granules. Treatment varied according to risk classification. Induction therapy failed in 5 cases; in another 10, there was minimal residual disease (greater or=0.01%) at day 29 of induction. 4 patients underwent hematopoietic stem cell transplantation in 1st complete remission. With a median length of follow up 22 months, 3-year EFS was 49.45±14.01 for the entire cohort, and 24±18.85 for those whose blasts were CD36+/CG+ (p=0.033). This is much worse than the typical pediatric B-LL with EFS >85%. We have performed RNA sequencing in one of our 20 patients and have identified a rare gene fusion, and would like to confirm our findings by comparing them to the whole genome sequencing database of pediatric B-LLs. We will be looking specifically for abnormalities of the CD36 gene, association with the gene fusion, or other abnormalities that have been described in Ph+-like ALL. We plan to publish these findings to reach a wide audience in the scientific community, including pediatric hematologists/oncologists, pathologists, and researchers. Park, Woong-Yang SAMSUNG MEDICAL CENTER Effect of high-functional germline variant of cancer-biology related genes in neuroblastoma Oct06, 2022 closed Neuroblastoma is among the most common solid tumor in children. Although there are few genetic mutations that directly increase the risk of developing neuroblastoma, other inherited individual genetic variations might affect features of the tumor. We will investigate DNA from normal cells of neuroblastoma patients and identify genetic changes associated with cancer biology. We hope that our research might help tailor neuroblastoma treatments and contribute to precision medicine in the future. Most neuroblastoma develops in the absence of pathogenic variants in known cancer predisposing genes. However, germline variants that are not associated with cancer risk could also affect the somatic mutagenesis processes and function through interactions with somatic mutations. Therefore, we hypothesize that high-functional impact germline variants in cancer-related genes have a critical role in tumor biology. With preliminary results in our own data, we could find the spectrum of germline variants and their significant somatic interactions in neuroblastoma. We would therefore, like to validate our results and compare them with the use of neuroblastoma TARGET whole exome sequencing data (raw data and BAM files) across different ethnic groups. We will analyze TARGET data independently and identify putative high-functional impact germline variants in cancer-related genes in TARGET dataset. In addition, we also would like to find whether germline variants in cancer-biology related genes are associated with transcriptomic differences in neuroblastoma. Therefore, we also request access to the raw transcriptome sequencing data. Finally, we are planning to compare profiles of germline variants in neuroblastoma patients to those in adult-onset cancer patients from TCGA data. Restricted access TARGET data will be analyzed independently and only for the purpose of our project. All the results from the analyses undertaken in this project will be published in scientific journals for the benefit of the broad pediatric cancer research community. Paull, Michael GV20 THERAPEUTICS LLC Tumor Microenvironment and Immunogenomics Analyses Based on Pediatric-Cancer Transcriptomic Data Sep26, 2019 closed The interactions between cancer and the host immune system are critical for pediatric cancer treatments. We study the immune cells within the tumor by applying the latest computational tools to existing cancer data. We hope to find new biomarkers and drug targets for more efficient treatments on childhood cancers. We will analyze the TARGET data to study the roles of immune cells (i.e.: T and B cells), in the childhood cancers such as acute lymphoblastic leukemia (ALL). Unlike many solid tumors, ALL cancer cells have close interactions with the host immune system and so it is important to monitor the immune responses during the cancer treatment. Compared to adults, children's immune system contains more naive T/B cells and less memory T/B cells. Due to such difference, it is questionable to use the same biomarker to monitor the disease progression and make treatment decisions. On the other hand, childhood cancers have very different mutation profiles compared to adult cancers which are usually derived from random mutations during aging. It means that some widely used targeted therapy in adults may not be the best therapeutic targets for children. Therefore, to define good biomarkers and novel therapeutic targets for childhood cancers, we need to understand the tumor microenvironment and the role of tumor-infiltrating B cells in targeting tumor-associated antigens. From the raw RNA-seq data, we will 1) estimate the immune cell composition, 2) measure the immune cell regulation signatures, and 3) extract the T cell and B cell receptors from fragmented sequence reads. To achieve the research goals, we will run several open-source computational tools, including TIMER (PMID: 29092952), TIDE (PMID: 30127393), and TRUST (PMID: 30742113). We will correlate the output of these tools with the following phenotypic characteristics: disease type, stage, age, gender, treatment response, and overall survival. In the end, we hope to detect new associations between the host immune system and the treatment responses, and new antibody drug targets for specific childhood cancers. Paulsson, Kajsa LUND UNIVERSITY Increased biological insight and identification of targetable weaknesses in pediatric acute lymphoblastic leukemia Dec12, 2018 approved Leukemia is one of the most common malignancies in early childhood and is caused by genetic mutations in cells in the blood-forming bone marrow. In our research, we investigate these genetic mutations in order to understand how and when they arose as well as how they affect the cell. This improved understanding of how leukemia develops will then be used to identify specific weaknesses in the leukemic cells that may be targeted in novel therapies. The ultimate goal of our research is to improve survival and decrease therapeutic side effects in children with leukemia. Hypodiploid (<40 chromosomes) and high hyperdiploid (>50 chromosomes) acute lymphoblastic leukemia are two subtypes of pediatric malignancies that are genetically characterized by specific aneuploidy. In the first part of this project, we aim to elucidate the underlying biology of leukemia development in these subtypes and to use this understanding to develop new therapies. We plan to screen TARGET SNP array, whole genome sequencing, whole exome sequencing and RNA sequencing data to investigate the temporal order of genetic aberrations, the underlying mechanism of aneuploidy, and the pathogenetic effects of the chromosomal gains in hypodiploid and hyperdiploid leukemia. This will include analyses of driver and passenger mutations to determine at which time point the aneuploidy arose. Furthermore, we will investigate copy number data and mutational data to understand how selective forces shape these leukemia genomes, by looking at frequencies of different aneusomies and clonal heterogeneity. We will also analyze RNA sequencing data to see how copy number aberrations and mutations affect expression. By comparing the genomic/RNA sequencing data between hypodiploid / high hyperdiploid subtypes and other subtypes of B-Cell precursor acute lymphoblastic leukemia (BCP-ALL), we plan to investigate the hypodiploid / high hyperdiploid subtypes-specific genetic aberrations and hypodiploid / high hyperdiploid subtypes-specific gene expression patterns. Publicly accessible clinical data will be used to investigate the association between specific genetic aberrations and clinical outcome. The insight generated by these studies will subsequently be used to identify weaknesses that can be targeted using novel treatment strategies. The data will be combined with data from cases from our local biobank at Skåne University Hospital, Lund, Sweden, as well as two SNP array datasets from our collaborators Dr Jan Zuna and Marketa Zaliova, Leukaemia Investigation Prague (CLIP), Prague, Czech Republic and Dr Nicolas Duployes, Laboratory of Hematology, Centre Hospitalier Universitaire (CHU) Lille, Lille, France, and with data from StJude cloud. No TARGET raw data has been shared with our collaborators. In the second part of this project, we will focus on non-coding mutations in pediatric ALL. We will analyze SNP array, whole genome sequencing, whole exome sequencing and RNA sequencing data to identify single nucleotide variants (SNVs), deletions and structural rearrangements that affect the expression of nearby genes. Data will be compared with publicly available and in-house datasets on regulatory elements. The aim is to identify novel driver mutations in pediatric leukemia. Publicly accessible clinical data will be used to investigate the association between specific genetic aberrations and clinical outcome. As above, the insight generated by these studies will subsequently be used to identify weaknesses that can be targeted using novel treatment strategies. The data will be combined with data from cases from our local biobank at Skåne University Hospital, Lund, Sweden, and with data from StJude cloud. Childhood leukemias are genetically distinct from adult leukemias and display a different etiology and it is therefore necessary to do these studies on pediatric samples; they cannot be done on samples from adult patients. Furthermore, given that leukemia is genetically heterogeneous, large cohorts are needed in order to draw solid conclusions from obtained data. The data from the requested datasets will be analyzed separately on a case-by-case basis but may be combined with in-house genetic data on similar leukemias for final publication. Pellegrini, Matteo UNIVERSITY OF CALIFORNIA LOS ANGELES Pan-Cancer analysis of Alu retrotransposon expression Jan11, 2017 closed The human genome is punctuated by mobile DNA elements, called retrotransposons, among which Alus are the most numerous. Retrotransposons can generate new copies of themselves by a copy-and-paste mechanism, involving transcription of an element into RNA, followed by retrotranscription to produce a new DNA copy of the element at a new genomic location. This process, called retrotransposition, challenges genome stability. Alu-derived RNAs, however, may also directly modulate genome expression in unexplored ways. Our hypothesis is that Alu expression profiles might represent a novel type of molecular signature in cancer. We propose to investigate Alu expression profiles in thousands of patient samples of the several cancer types starting from RNA deep sequencing data deposited at the Cancer Genome Atlas (TCGA) and at TARGET, devoted to childhood cancer. The expected discovery of cancer type-specific and subtype-specific Alu RNA profiles will provide a novel type of biomarker and open the way to further studies on the involvement of Alu expression in both adult and pediatric cancer. LINEs and SINEs are non-LTR retrotransposons potentially contributing to genome instability associated with cancer causation and progression. With more than 1 million copies, the Alu SINEs are the most numerous mobile elements in the human genome. Despite their role in genome evolution and sporadic examples of their involvement in cancer, the transcriptional activity of Alu loci in normal and cancer cells is still largely unexplored. We recently set up a bioinformatic pipeline allowing to identify and quantitate individual Alu transcripts in RNA-Seq datasets. By applying it to ENCODE data, we generated for the first time Alu expression profiles in different cell lines, and we identified a few hundreds of Alu loci producing transcripts with potential roles as retrotransposition intermediates or as regulatory ncRNAs (Conti et al 2015, Nucleic Acids Res. 43:817-835). The key hypothesis of this proposal is that Alu expression profiles might represent a novel type of molecular signature in cancer cells reflecting epigenome alterations accompanying malignancies. Alu expression profiles in approximately 3000 patient samples of several cancer types will be generated by applying our pipeline to RNA-seq data within TCGA (no combined analysis with other datasets is planned). Moreover, based on our previous work, one tumour type for which Alu deregulation has been found to be relevant from both pathogenetic and therapeutic perspectives is pediatric neuroblastoma [Castelnuovo et al. 2010, FASEB J 24:4033; Castelnuovo et al. 2013, Neuroblastoma: inhibition by Alu-like RNA. In M.A. Hayat (ed.), Pediatric Cancer, Volume 4: Diagnosis, Therapy, and Prognosis (Springer)]. Therefore we plan to apply the same investigation strategy to TARGET dataset. The access to TARGET data would represent a unique opportunity to address an unexplored issue of high relevance for the diagnosis, prognosis and treatment of childhood cancers. The expected discovery of cancer type-specific and subtype-specific Alu RNA profiles will provide a novel type of biomarker and open the way to further studies on the involvement of Alu expression in both adult and pediatric cancer. Penzo, Marianna UNIVERSITY OF BOLOGNA Defining the role of ribosomal alterations in T-cell acute lymphoblastic leukemia Mar11, 2020 approved Childhood acute lymphoblastic leukemia (ALL) is a type of cancer in which the bone marrow makes too many immature lymphocytes (a type of white blood cell). ALL is the most frequent cancer among children and young adults. About 90% of children diagnosed with ALL can be cured. This is a high percentage, still some patients do not respond to available therapies. Therefore, new therapeutic approaches are needed to bring the cure rate even higher, and to reach this goal a better understanding of this illness is necessary. Recent findings showed that about 10% of cases of childhood T-ALL, a subtype of ALL, is characterized by mutations in some genes (namely RPL5 and RPL10) which are thought to help leukemia development by altering fundamental cellular processes, like protein synthesis. However, how these specific mutations may contribute to leukemia is, at the moment, poorly understood. We propose to study these datasets to correlate mutations in genes linked to protein synthesis (among which RPL5 and RPL10), and to correlate this data to the prognosis and relapse occurrence (where available). The ultimate goal of our research is to identify specific mutations in genes related to protein synthesis, which could be used as prognostic markers or, in the future, as new therapeutic targets Background and rationale T-cell Acute Lymphoblastic Leukemia (T-ALL) development can be driven by different genetic lesions; among these, somatic mutations in genes encoding for ribosomal proteins (RPs), mainly RPL5 and RPL10, are found with a frequency close to 10% in pediatric (but not adult) cases of T-ALL (De Keersmaecker K, et al. (2013) Nat. Genet. 45:186-191). Different studies have demonstrated that in T-ALL the process of ribosome biogenesis is altered at many different levels, including the transcription of ribosomal DNA, and the deregulation of several small nucleolar RNAs (snoRNAs), that are normally required to drive ribosomal RNA (rRNA) post-transcriptional modifications (Todd MAM, et al. (2015) Genes. 6:325-352; Teittinen KJ (2013) Cell Oncol. 36:55-63). Besides this, there are additional indications for an essential role of translation deregulation in T-ALL. T-ALL cells are addicted to cap-dependent translation; previous studies have demonstrated that eIF4A and eIF4E (two of the factors required for cap-dependent translation initiation) overexpression accelerates T-ALL development in mouse, and that inhibiting eIF4A and eIF4E reduces the proliferation and viability of T-ALL cells (Wolfe AL, et al. (2014) Nature. 513:65-70; Schwarzer A, et al. (2015) Oncogene. 34:3593-3604). Altogether these observations suggest the hypothesis that alterations of the translational apparatus and, in particular, of the ribosome itself, may play an important functional role in supporting the neoplastic phenotype of T-ALL. Objectives The analysis of the datasets that are being requested, will contribute to pursue the general objective of clarifying whether RP mutations found in T-ALL can affect ribosome biogenesis (in terms of production of mature rRNAs or RPs), thereby affecting ribosome structure and function, impacting on the protein synthetic cellular activity and ultimately on gene expression, or if instead they promote leukemia by virtue of translation-independent or extraribosomal functions. In this context, the specific objective of the analysis will be the characterization of the genetic alterations affecting ribosome biogenesis and translation in T-ALL pediatric patients. Study design Requested datasets are: • NCI TARGET: Therapeutically Applicable Research to Generate Effective Treatments phs000218 (pediatric cancer research) and substudy phs000464; • Genomic Analysis of Relapsed Pediatric Acute Lymphoblastic Leukemia (phs001072). Whole genome sequencing data will be analyzed with a particular interest in somatic mutations in genes related to ribosome biogenesis (e.g., RPs, snoRNAs, rRNA modifying enzymes, ribosome assembly and transport factors) and translational control (e.g., translation initiation, elongation and termination factors, mTOR signaling pathway). Mutational data will be analyzed to explore co-occurrence of mutations in the above-mentioned genes, and will also be correlated to the clinical evolution and prognosis. In the cases of relapse, occurrence of mutations in the same gene set will be explored. Analysis plan Data from the selected datasets will NOT be analyzed in combination with others. In the first place, phenotypic characteristics will not be considered in the analysis. Only if mutations in the targets of interest will appear in a recurrent way, these will be correlated to the immunophenotype (i.e., expression of surface markers like CD4, CD8, CD25, CD44, CD3, B220, Gr1, Mac1 and Ter119) and/or to the relapse risk. Explanation of how the proposed research is consistent with the data use limitations for the requested dataset(s) The requested datasets are available for: • phs000218 (and substudy): research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that have likely relevance to developing more effective treatments. This research project can only be conducted in childhood leukemia, since the mutations in the above-mentioned RP genes are not detectable in adult ALL. In addition, the final goal of the project is to define new therapeutic approaches, targeting the translational apparatus, tailored for those ALL cases with RP mutations. • Phs001072: Use of the data must be related to Cancer. This project is indeed related do cancer. Perlman, Elizabeth LURIE CHILDREN'S HOSPITAL OF CHICAGO TARGET: High Risk Wilms Tumor Oct12, 2010 closed Approximately 25% of children with Wilms tumor do not respond satisfactorily to current treatments. Using tumor samples previously collected as part of a Wilms Tumor clinical trial, the proposed study will utilize a comprehensive set of state-of-the-art analyses to define molecular differences that delineate Wilms tumors that do not respond to treatment and/or that relapse. In addition to generating data sets that can be used by many cancer researchers, this work may enable patients with high risk Wilms tumor to be identified at diagnosis and treated more effectively. Favorable Histology Wilms tumor (FHWT) represents the most common pediatric renal tumor. Although the overall outcome for FHWT is excellent, two groups of patients have suboptimal outcomes. First, approximately 15% of FHWT relapse despite therapy (RFHWT), and only 50% of RFHWT survive. Second, 8-10% of Wilms tumors develop histologic features known as anaplasia (which confers unfavorable histology, UHWT), which is associated with only 60% survival. Little improvement in the outcome of patients with RFHWT and UHWT has been seen in recent years, and very little is known about them biologically. A comprehensive and comparative analysis of these two groups would allow for the discovery of pathways that may be specifically targeted for therapy, and for the discovery of markers that would predict poor outcome and allow for therapeutic stratification. Such studies are also an efficient mechanism for surveying changes within Wilms tumors as a group. We therefore propose to interrogate the genomic, transcriptomic, epigenetic, and mutational characteristics high risk Wilms tumors treated on NWTSG/COG protocols. We will compare this data with similar TARGET data available for neuroblastoma, osteosarcoma, and leukemia. Perreault, Claude UNIVERSITY OF MONTREAL Transcriptomic analysis of MHC Class I- associated Tumor-Specific Antigens in acute lymphoblastic leukemia (ALL) Sep16, 2020 approved Leukemias are the most common cancers in childhood worldwide, accounting for 35% of cancers diagnosed before the age of 14. Therapeutic tools remain limited for some patients. One of the most promising treatments, in the war against cancer, is the immunotherapy. The objective of this treatment is to help the immune system, particularly cytotoxic T lymphocytes to recognize and destroy tumor cells. Lymphocytes scan cell tumor associated specific proteins called tumor-specific antigens (TSA). By highlighting news TSA, we will be able to help lymphocytes to target them. To discover TSAs for leukemia, we will study pediatric primary leukemic samples and then use available genomic data to select the best ones according to their level of expression in patient tumor cells. This will allow to develop therapeutic vaccines against cancer. Major histocompatibility complex class I (MHC I) molecules form a group of cell surface proteins which present short peptides, collectively referred to as the immunopeptidome, that are shaped by an interplay of protein synthesis and degradation. These MHC I-peptide complexes play crucial role in the adaptive immune system as they reflect the inner workings of the cell. As such, CD8 T cells are the main mediators of naturally occurring and therapeutically induced immune responses to cancer. Accordingly, in solid tumors, the abundance of CD8 tumor-infiltrating lymphocytes (TILs) positively correlates with prognosis, which in turn have been shown to recognize tumor-specific antigens (TSAs). These novel sequences can arise from i) modification to the genome sequences (mutation, translocation) ii) modification to the transcript sequence (non-canonical reading frames, intron retention) iii) gain of function re-expression of genes. We developed a proteogenomic approach allowing the discovery of TSAs which derive from any region of the genome. So far, we identified TSAs in human and murine solid tumors (ovarian, breast, lung, colorectal) and haematological tumors (murine EL4 lymphoma and human AML). Leukemias are the most common cancers in childhood worldwide, accounting for 35% of cancers diagnosed before the age of 14. Therapeutic tools remain limited for some patients, mostly because of the lack of actionable targets identified. We want to use our proteogenomic approach to extend the discovery of TSAs in primary pediatric ALL samples that we aquired. In order to select the best TSAs for immunotherapy, we need to validate that TSA source RNA sequences are shared between a high proportion of ALL patients. To do so, we require access to raw unannotated RNA-Seq transcripts of a large cohort of ALL patients such as the TARGET-ALL. PERTSEMLIDIS, ALEXANDER UNIVERSITY OF TEXAS HLTH SCIENCE CENTER Deriving ncRNA expression from TARGET sequencing data Jun18, 2015 closed Aberrant expression of both coding and non-coding RNAs has been implicated in both developmental processes and disease, particularly cancer. While protein-coding genes often display differential expression that correlates with cancer patient survival, such evidence is largely lacking for non-coding RNAs (ncRNAs). We will use deep coverage RNA-seq data to derive expression data for coding and non-coding RNAs across primary tumors and investigate the global relationship between expression of RNA species and patient survival. The primary focus of this project is the identification of noncoding RNAs that regulate cell viability and drug response in pediatric cancers. While the genetic underpinnings of cancer have been studied intensively, it is becoming increasingly clear that cancer susceptibility in general, and pediatric cancers specifically, cannot be attributed solely to variation in the portion of the genome that codes for proteins. While less than 2% of the genome is transcribed into RNA that serves as templates for proteins, nearly 70% is transcribed into RNA that does not. Recent GWAS studies have shown that more than 80% of single nucleotide polymorphisms associated with cancer occur in non-coding regions of the genome. We believe that a significant fraction of cancer etiology depends on non-coding RNAs. Non-coding RNAs have been shown to play significant roles in almost every aspect of cell biology, and are therefore likely to be associated with most, if not all, of the hallmarks of cancer. Therapeutically Applicable Research To Generate Effective Treatments (TARGET) is a comprehensive genomic approach to determining the molecular changes that drive childhood cancers. In most cases, tumor RNA was sequenced by next generation sequencing. We will use curated sets of coding and non-coding RNA transcripts and their genomic locations to derive RNA expression in TARGET patient specimens. Reads from the TARGET RNAseq data that map to RNA genomic locations will be used to derive expression profiles for all coding and non-coding RNAs. The results will be used to identify RNA species that are differentially expressed and associated with patient survival and combined with results from high-throughput viability and drug-response screens of pediatric cancer cell lines. Those that meet all three criteria – differentially expressed between patient tumor and NAT, associated with RFS or OS and selectively cytotoxic, either alone or in combination with a chemotherapeutic agent – will be characterized further through in vitro and in vivo studies. Petritsch, Claudia UNIVERSITY OF CALIFORNIA-SAN FRANCISCO Deletion of Histone Cluster Genes in Pediatric Brain Tumors Sep05, 2013 closed Mutations in the DNA accumulate during the lifetime of an individual, changing the genetic recipe of his cells. A mutation that provides a cell with an advantage can fuel malignant growth. However, since most mutations do not provide any benefit it often takes several decades until mutations have altered enough relevant locations in the DNA for cancer to emerge. Pediatric tumors arise as a consequence of relatively few mutations as compared to adult malignancies and recent studies indicate that these few mutations converge on the epigenome to deregulate developmental processes. The epigenome adds a whole new layer to genes beyond the DNA. It proposes a control system of 'switches' and ‘tuners’ that adjust the intensity with which genes are expressed. We propose that a critical mutation in the DNA of a subgroup of pediatric tumors (a deletion within chromosome 6) causes a massive change in the epigenome of the affected cells. Here we will analyze the genome of pediatric brain tumors for the presence of this mutation and its effects on the transcriptional activity of affected tumors. Brain tumors are the most common solid tumors in children, comprising 22% of all malignancies occurring up to 14 years of age. Pediatric brain tumors arise as a consequence of relatively few somatic alterations as compared to adult malignancies and remain genetically stable despite progression to invasive or drug resistant states. Whole genome sequencing of pediatric diffuse intrinsic pontine gliomas (DIPGs) revealed mutations in histone H3.3 and H3.1 in 78% of analyzed cases. In addition, distinct H3F3A mutations define epigenetic subgroups of pediatric gliomas with a distinct global methylation patterns. These studies indicate that the few genetic alterations that mark childhood brain tumors converge on the epigenome to deregulate developmental processes. We analyzed the gene-expression profile of 95 pediatric brain tumors and found a subgroup marked by a particularly low expression of histone cluster genes (HCGs). This expression signature was found in 16 medulloblastomas, 5 gliomas and 2 PNETs, indicating that the loss of HCG expression is a common feature in a variety of brain tumors. This subgroup was further characterized by a globally increased transcriptional activity. As HCGs lie close together on a narrow range on chr6p22, we hypothesize that a deletion causes downregulation of HCGs in a variety of pediatric brain tumors. We will analyze the genetic landscape of pediatric brain tumors for deletions of chr6p22. We hypothesize that the deficiency in HCG expression impedes the repressive effect of nucleosome structures on transcription, causing a hyperactive transcriptome, with the identity of upregulated genes varying with the cell type in which the deletion occurred. We will investigate whether deletion of HCGs effect global transcription levels by comparing the fraction of up- and downregulated genes between HCG- and wild type tumors. A negative correlation between the copy number of HCGs and global transcription would indicate a causal relationship between the deletion of HCGs and transcriptional hyperactivity. Pfister, Stefan GERMAN CANCER RESEARCH CENTER Germline predisposition study May08, 2019 closed In this pediatric pan-cancer study, we aim to study the germline of different cancers and predict common and cancer specific predisposition genes that could help us establish a germline mutational landscape across different pediatric cancer entities. In this pediatric pan-cancer germline study, we aim to analyze pediatric patients with different cancers from different cohorts, covering a wide spectrum of pediatric cancers like Medulloblastoma, Ewing Sarcoma, Neuroblastoma, ETMR, AML, ALL, Osteosarcoma, Wilms Tumor to name a few. The workflow would include the analyses of the whole-genome sequences and exome sequences from blood and tumour samples for rare damaging germline mutations in cancer predisposition genes. Where applicable, DNA methylation profiling would be done to determine the molecular subgroups. Other data types like RNA seq, GWAS would also be included as available. We aim to predict known and novel cancer specific predisposition genes on the basis of rare variant burden tests as previously described in Waszak et al,2018. Previously defined somatic mutational signatures would be used to further classify cancer genomes into different groups using previously established criteria. Progression-free survival and overall survival would be modelled for patients with a genetic predisposition to the respective cancer. By this, we further aim to present an extensive pediatric pan-cancer analysis and explore the prevalence of damaging germline mutations in the known cancer predisposition genes across different pediatric cancers in our cohort. This would enable us to propose a criteria for routine genetic screening for patients with a specific cancer based on the corresponding clinical and molecular tumour characteristics. In addition, we also assembled the largest MB germline cohort of 1070 MBs and further do a follow up study to explore any genome wide novel cancer predisposition genes by expanding the current cohort. The addition of new MBs(phs000218) would help assessing our extended aims and add sample strength to the large cohort. Pfister, Stefan GERMAN CANCER RESEARCH CENTER Molecular analysis of pediatric extracranial rhabdoid tumors May23, 2016 closed Rhabdoid tumors represent a pediatric tumor entitiy associated with a high mrotality: Treatment options are limited and despite aggressive chemotherapy many childrens uccumb to their disease. We aim to analyze the methylation level of pediatric rhabdoid tumors to investigate if demethylating therapy (with substances that are under investigation in the clinic in other diseases) may represent a promising strategy in these tumors. We aim to analyze pediatric renal rhabdoid tumors. This research aim can only be accomplished by using pediatric data as this is a pediatric entity. The analysis will help to understand the underpinnings of the disease for improvement of treatment. More specifically we will investigate into the global methylation level of pediatric rhabdoid tumors to see if demethylating therapy might be promising for further in vivo and preclinical studies. In the last year, we hav made extensive progress in the characterization of MRT and ATRT-MYC and found that both MRT and ATRT-MYC display a high degree of similarity on the level of immune cell infiltration. This has required the analzsis of RNAseq and chipseq datasets provided within the frame of this project Philippakis, Anthony BROAD INSTITUTE, INC. Broad Institute FireCloud Data Sciences Platform Aug30, 2018 expired The growth of large-scale DNA sequence data for cancer research and its routine use in translational science is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic data and associated clinical annotations collected from various studies is critical to accelerating research and making new discoveries. This project supports the model for data analysis that allows groups ranging in size from single laboratories to large research consortia to derive value from the investments made in The Cancer Genome Atlas data without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. FireCloud began as one of three NCI Cancer Genomics Cloud Pilots, a program to support a new model for the computational analysis of biological data in which a data repository is co-located with computational capacity, with an interface that provides data access while ensuring data security. Now, as a NCI Cancer Genomics Cloud Resource, FireCloud continues its operations to allow users to access data sets and analytical pipelines. As this initiative has grown, PI leadership has expanded to support creation of new features, as well as maintain operations. Primary data for this project will include open and controlled access TCGA and TARGET data. Open access will include clinical, Level-3 molecular, and somatic mutation data. Controlled access will include Level-1 sequence and SNP-chip data. Users will also be able to upload and compute on private data. The research objective of this project is to pilot hosting TCGA and TARGET data in a cloud to allow users who do not otherwise have access to the necessary infrastructure to compute against this large dataset quickly and efficiently. Analyses examples include mutation calling, integration of data types, and analysis of pathways and regulatory networks. In support of this effort, our institution has applied for and been granted NIH Trusted Partner status. FireCloud was launched in January 2016, and makes TCGA and TARGET open access data available to the public, Controlled access data is made available to those users of the FireCloud platform who have been authenticated through NIH and verified to have received dbGaP access for use of controlled access data. We will have completed a Security Impact Assessment (SIA) and received NCI’s approval. We will have implemented the necessary authentication and authorization protocols to ensure that only dbGaP-authorized users will be able to gain access to the controlled data. Philpott, Anna UNIVERSITY OF CAMBRIDGE Investigating the role of ASCL1 in spontaneous regression of neuroblastoma Sep26, 2019 closed Neuroblastoma (NB) is thought to be a cancer which is locked in a state where cells consistently divide and cannot turn into neurons in a process termed differentiation. NB patients with stage 4S disease have a greatly improved prognosis and spontaneously regress, however neuroblastoma in most, and especially older, patients does not regress. The reasons behind this are still largely unknown. We have gathered evidence to suggest that activity of a protein, ASCL1, determines whether neuroblastoma cells divide or differentiate. In neuroblastoma cells grown in the lab, overexpression of a modified version of ASCL1 results in the altered expression of a group of genes, and this ultimately stops cells dividing and causes differentiation. We aim to discover if ASCL1 activity is causing tumours in stage 4S patients to differentiate and subsequently regress. We will do this by seeing if the genes which are highly expressed in 4S neuroblastoma patients are the same as our recently discovered ASCL1-regulated genes. If so, this will suggest that ASCL1 may cause regression of neuroblastoma by making tumour cells differentiate. This will justify trying to force ASCL1 to make tumours differentiate in other incurable stages of neuroblastoma and bring about regression for patient benefit. Neuroblastoma (NB) is thought to arise from noradrenergic neuroblast precursors which might be sensitive to differentiation therapy. Interestingly, NB that resembles neuroblastic tissue and can occur as metastatic NB (Stage 4S) can spontaneously differentiate, resulting in disease regression. However, most and especially older patients do not regress, pointing to a developmentally-regulated loss of NB’s intrinsic differentiation capacity. ASCL1 is required for both proliferation and differentiation of NB cells. Phosphorylated ASCL1 directs a pro-proliferative transcriptional programme, while a less phosphorylated form directs a differentiation program. Previous work has shown that cyclin dependant kinases (CDKs) phosphorylate ASCL1, CDK inhibitors prevent ASCL1 phosphorylation and drive differentiation in a range of NB cell lines. Therefore, dephosphorylation of ASCL1 to drive differentiation represents a new therapeutic approach for NB. We aim to investigate the potential role of ASCL1 in regression of neuroblastoma. Using our phosphomutant ASCL1 (S-A ASCL1) gene signature (genes specifically upregulated by phosphomutant ASCL1 when over-expressed in the SH-SY5Y NB cell line), we will investigate the enrichment of the most strongly upregulated S-A ASCL1 genes in NB patient samples. Strong upregulation of S-A ASCL1 genes in only 4S NB patient samples as opposed to other stages of neuroblastoma will be indicative that dephosphorylated ASCL1 could be driving differentiation and plays a role in spontaneous regression of NB. Simultaneously, we will investigate the transcriptomic changes induced by CDK inhibition in neuroblastoma cells in vitro, and investigate whether changes induced by these inhibitors recapitulates the favourable transcriptomic landscape of 4S NB. If so, this would support our theory that CDKi can be used to dephosphorylate ASCL1 to drive differentiation and regression of NB. This research can only be performed using paediatric data and aims to develop more effective treatments for NB. There are no planned collaborations with any other institutions. Piccolo, Stephen BRIGHAM YOUNG UNIVERSITY Evaluating the landscape of compound heterozygous and de novo variants in pediatric diseases Jul02, 2018 closed The majority of pediatric cancers have no strong external risk factors, emphasizing the need to understand potential genetic associations that could be involved. In other types of childhood disease, such as birth defects, the underlying cause of disease is unknown (this is true for the majority of birth defects) but genetic variants and environmental factors are likely contributors. It is estimated that only about 8.5% of pediatric cancer patients have identifiable germline mutations in cancer predisposing genes. With such a low detection rate of variants in pediatric cancers and with few cancers linked to identifiable germline mutations, there is an urgent need to further elucidate the genetic factors associated with these and other rare diseases. We aim to use trio data (data from patient and parents) to identify certain types of variants that may be frequently missed when population-based studies are employed. Once we identify rare variants using trio data, we will determine if the same genes are being affected in larger populations of patients with similar cancer/disease types. Compound heterozygous and de novo variants are potential causes of pediatric cancer and other childhood genetic diseases such as birth defects. Compound heterozygous variants occur when each parent donates one recessive disease allele and these alleles are from different loci within the same gene. De novo variants arise in germ cells and may be passed to offspring; though usually not to more than one child. We hypothesize that many pediatric diseases arise from these types of variants and that using a trio-based approach will enable us to identify these types of variants that would not be observed using population-based methods. Filtering genetic variants based on population frequencies may cause the investigator to overlook compound heterozygous variants because either allele may be present at moderate frequency in the population, even though the allelic combination may occur much more infrequently. Thus to detect such variants, access to trio data helps ensure that these variants are not filtered out as we will be able to compare the patient's DNA profile to that of the patient's parents. These variants will also be compared to the latest genome annotations to determine their relationship to known transcripts and genes. Using dbGaP datasets, de novo and compound heterozygous variants will be identified using trio data and then compared among patients with similar cancer/disease types to determine if variants in the same genes are found. Doing this will help us determine if similar de novo and compound heterozygous variants commonly affect the same genes across pediatric cancers and other childhood genetic diseases. While the exact same variants are unlikely to be found on a patient-to-patient basis, there may be genes or pathways which are commonly found to have these types of variants. If this is seen, it will suggest that variants of low frequency are being overlooked in analyses, thus particular genes linked to pediatric cancers and other childhood diseases are being overlooked. We will also look patterns that span multiple types of pediatric diseases to evaluate whether diseases that seem to have different etiologies may have similar genetic underpinnings. Our analyses do not include evaluations of population genetics of ancestry groups. Pillai, Manoj YALE UNIVERSITY Aberrant RNA Splicing in Pediatric Cancers Jun06, 2016 closed Our laboratory is interested in how RNA splicing ( a molecular process by which RNA molecules are cut and rejoined) changes in cancer in pediatric patients. By using RNA-Seq datasets from the TARGET datasets, we hope to better understand this process better. Understanding splicing better may help us device better therapies to these cancers. We request access to the TARGET dataset to study RNA splicing in pediatric cancer. Our group is interested in how alternative splicing regulates key steps of cancer initiation and progression in children. By analysis of next generation sequencing data, we hope to define specific patterns of alternative splicing. Specifically, our results will be correlated to mutational status of the particular cancer. Analysis will be performed using software such as rMATS, MISO and Spladder. We hope to find specific patterns of altered splicing associated with distinct subsets of cancers which in turn may inform the development of novel therapeutics. TARGET datasets are requested since pediatric cancers cannot be studies adequately by the current TCGA datasets that have only adult samples Pirooznia, Mehdi NIH Genotyping analysis of Acute Myeloid Leukemia for tryptase expression Jul31, 2019 closed Our goal is to determine tryptase genotype and gene expression in individuals with myeloproliferative diseases such as certain blood cancers like AML, and where possible how this may affect the patients. Our plan is to re-analyze sequences mapping using unique isoform-specific mutations we have identified in order to identify the specific tryptase gene expression profile in these cells. Little is known about the composition, regulation, or affect on clinical phenotype that tryptase proteins may have in AML. Recently we have generated evidence that increased germline copy number of one of the tryptase-encoding genes affects myeloid cell homeostasis. Given that this change has never been observed to be somatic (and cannot be confirmed to be germline in any of the datasets we have examined), it is critical to evaluate this finding in a robust cohort/dataset with linked whole genome sequence (WGS; which is far superior to WES for bioinformatic determination of tryptase genotype). Such a dataset does not currently exist outside of TARGET-AML. We also hypothesize that because of the germline nature of this genetic variant (as opposed to other somatic variants identified in AML), any affects on clinical outcomes or phenotype will be most profound within the pediatric population whom are more likely to have a heritable genetic trait associated with their transformation given that AML is a predominantly adult malignancy. Finally, the tryptase isoform encoded by extra germline copies (which leads to approximately 5-fold over-expression) is entirely absent in any form in ~30% of the U.S. population without observable pathology. Thus, both targeting tryptase, as well as linked transcriptional programs associated with over-expression represent novel therapeutic strategies in both adult and pediatric AML populations. We plan to re-align sequences mapping to the tryptase locus using unique isoform-specific variants we have identified in order to identify the specific tryptase gene expression profile in these cells. Using linked WGS reads we will then use a script we have developed to perform tryptase genotyping in these individuals, and evaluate linked gene expression and clinical metadata in these individuals. PLASS, CHRISTOPH GERMAN CANCER RESEARCH CENTER MNX1 enhancer hijacking in t(7;12) pediatric AML Jan06, 2023 closed Acute myeloid leukemia is a blood cancer which can affect both very young children and old adults. A translocation between chromosomes 7 and 12 is rather common in pediatric leukemias. We suspect that this translocation leads to the aberrant expression of a gene, and we would like to collect more data in order to better understand this cancer type. Approximately 4% of acute myeloid leukemia cases (AML) cases under the age of 2 years old harbour a translocation t(7;12). Cytogenetically, t(7;12) AML is associated with the occurrence of trisomy 19, but no other recurrent mutations have been described. The breakpoints of the translocation are located close to the MNX1 gene on chr7 and in the ETV6 gene on chr12. All t(7;12) cases have high MNX1 expression, and we hypothesize that this is due to a hijacking of an ETV6 enhancer. We already have WGS for two AML t(7;12) cases, and we would like to enrich this data with other cases. The TARGET AML cohort contains more than 200 pediatric AML cases, so should include several t(7;12) cases. We plan to use the WGS data to precisely characterize the breakpoints of the t(7;12) translocations, and compare the breakpoint coordinates between all cases. In addition, we plan to look for mutations in known AML driver genes as well as copy number alterations in order to see if recurrent concurrent alterations are present. Finally, we plan to use the RNAseq data to check that the t(7.12) cases have high MNX1 expression, and identify differentially expressed genes between t(7;12) cases and others. This research will enable a better understanding of this pediatric AML subtype, which is the first step towards therapies targeting this subtype. Pot, David SRA INTERNATIONAL, INC. ISB Cancer Genomics Cloud - TARGET Apr17, 2017 approved The growth of large-scale DNA sequence data for cancer research and its routine use in translational science is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. The ability to access and analyze genomic, proteomic, and imaging data, combined with associated clinical annotations collected from various studies, is critical to accelerating research and making new discoveries. This project aims to support the development of a new model for data analysis that will allow groups ranging in size from single laboratories to large research consortia to derive value from the investments made in the TARGET project without the need to 1) transfer these data to their local site; 2) maintain local copies of these data; and 3) support the massive compute capacity necessary to perform analyses over these data. The ISB Cancer Gateway in the Cloud (ISB-CGC) is one of three NCI Cancer Research Data Commons (CRDC) Cloud Resources, supporting a new model for the computational analysis of biological data in which a data repository is co-located with computational capacity, with an interface that provides data access while ensuring data security. The research objective of this project is to provide access to the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) data (hosted in Google Cloud Storage by the NCI Genomic Data Commons) to users of the ISB-CGC, via the Data Commons Framework (DCF), within the NCI Cancer Research Data Commons (CRDC). Doing so will allow users who do not otherwise have access to the necessary infrastructure to compute against this large dataset, to do so quickly and efficiently using Google Cloud resources. Example analyses include mutation calling, integration of data types, and analysis of pathways and regulatory networks. In support of this effort, our institution applied for and was granted NIH Trusted Partner status in July 2015. We have implemented the necessary authentication and authorization, and logging protocols to ensure that only dbGaP-authorized users are able to gain access to TARGET controlled-access data. Our FISMA moderate Authority to Operate is maintained through a continuous process of testing, monitoring and auditing, with issues tracked in our Plan of Actions and Milestones (POA&M). POULOS, REBECCA CHILDREN'S MEDICAL RESEARCH INSTITUTE Pan-cancer multi-omic analyses focusing on proteogenomics Feb26, 2020 closed TCGA, TARGET and CPTAC data will be used as validation cohorts. They will allow us to confirm any scientific discoveries that we make using smaller datasets that we generate ourselves. Our research aim is to understand how changes to our DNA affect the expression of proteins in cancer cells of both adults and children. Under no circumstances will we seek to identify individuals using these data. TCGA, TARGET and CPTAC data will be analysed as validation cohorts to confirm findings arising from multi-omic (specifically, proteogenomic) analyses performed using in-house data generated by the Children's Medical Research Institute. In this project, we are seeking to develop a greater understanding of how genomic changes manifest themselves in the transcriptome and proteome in both adult and paediatric cancer. Under no circumstances will we seek to identify individuals using these data. Pugh, Trevor UNIVERSITY HEALTH NETWORK Beyond somatic coding mutations: subclonal, non-coding and germline contributors to TARGET pediatric cancers Nov13, 2014 approved Recent advancements in comprehensive genomic approaches have given rise to a number of tumour sequencing projects such as the Therapeutically Applicable Research To Generate Effective Treatments (TARGET). While these data are valuable alone, there is great power to be gained through integration and comparison across different models and data types to study these tumours. Specifically, we will use TARGET data to compare the types of cancer cells that we see in patients’ tumours with those that grow in the lab as cell lines or when implanted into mice. We will also look outside of genes that have already been well investigated to look for novel inherited or cancer-specific variants that cause childhood cancer. Finally, we are investigating possible immunogenomic features of pediatric cancers that may inform outcome of treatment with immunotherapies. Findings from our proposed project may lead to more effective research, treatments, and diagnostic tests for pediatric cancers. We seek to uncover new genomic contributors to the development and progression of pediatric cancer using TARGET sequencing data, beyond those reported by marker papers. Our analysis will focus on five areas: 1) Fidelity of subclonal representation of primary cancer cell populations in diagnosis and progression neuroblastoma models (xenografts and cell lines from neuroblastoma, to start), 2) Significance analysis of mutational burden within active regions of pediatric cancer genomes defined ENCODE and internal data sets, 3) Statistical analysis of expression germline variants as well as those outside of annotated genes using whole genome sequencing data. 4) Studying copy number variations of targeted panel sequencing data in neuroblastma patients to find focal and subgenic gene copy number variations to identify high-risk vs. low-risk patients. 5) Clonality analysis of T- and B-cell infiltrates to understand immunogenomic correlates with survival. Together, these activities will support a program integrating genomic data across different representations of pediatric cancer (primary vs model organisms) as well as different types of genome variation (somatic vs germline, coding vs non-coding) together with the study of the immune cells and patient response. 6) Combine TARGET and CBTTC datasets to study overall layout of immune repertoire in pediatric cancers. The initial study will be performed on these two datasets independently to address technical dissimilarities. These two datasets will also be analyzed together as one pan-pediatric dataset and will be compared to TCGA datasets. This approach will not create potential risks to participants and will fall under the primary scope of our project. Pugh, Trevor UNIVERSITY HEALTH NETWORK Understanding tumour microenvironment in pediatric neuroblastomas May06, 2021 closed Neuroblastoma is a form of cancer that happens in nerves throughout the body and is a common cancer in children. We will use Kids First Neuroblastoma data to understand how immune system reacts to this type of cancer and whether specific immune cell types recognize the cancer cells. Findings from this study will result in effective strategies to use immune cells in fight against cancer. Objective and aims: To discover mechanisms of immune response, we seek to investigate tumour microenvironment and its features in pediatric neuroblastoma using Kids First Neuroblastoma RNA-seq dataset. Our analysis will focus on A) total and cell-type specific immune infiltration, B) characteristics of T and B cell repertoire in tumour tissue, C) gene expression profiling relevant to immune pathways, and D) correlation of immune attributes with tumour-autonomous genomic aberrations and clinical parameters. Analysis plan and data usage: We will determine extent and features of immune infiltration using available immune deconvolution tools such as ESTIMATE, CIBERSORT and QuanTIseq. We will align RNAseq reads to VDJ regions by using an immune repertoire tool (MiXCR) to explore clonality and diversity of infiltrating immune repertoire. We will study expression of actionable inhibitory and stimulatory genes such as immune checkpoint genes to understand mechanisms of immune evasion. Using genomic data, we will be able to study associations between immune infiltration and genomic aberrations and identify number of patients with mutations in immune-related pathways such as antigen presentation. We will explore the prognostic value of immune microenvironment attributes by using clinical parameters. In parallel, we use TARGET neuroblastoma dataset as a validation dataset and will address possible technical discordance and batch effects. This analysis will be part of effort to address immunogenomic features of pediatric nervous system tumours that includes data from CBTTC and ICGC with TCGA dataset serving as adult comparator. This approach will not create potential risks to participants and will fall under the primary scope of our project. Qi, Lei STANFORD UNIVERSITY Assessment of variants interaction on CRISPRi responsive Non-coding elements in gene regulation for pediatric AML patients Oct16, 2019 closed Many genomic variances occur in genome non-coding region, which account for 79% of the variance in GWAS analysis. Based on our experimental result and prioritizing algorithm, we have identified several combinational effects of causal non-coding regions in oncogene regulation in pediatric Acute Myeloid Leukemia (AML) cancers. We would like to integrate our regions with dbGAP phs000218 genotype datasets to explore SNP-SNP interaction in gene regulation in pediatric AML cancers. In our project, we aim to apply CRISPRi-derived large datasets to infer causal relationship between genetic mutations occurring in both coding and noncoding regions of the genome related to pediatric Acute Myeloid Leukemia (AML) cancers. Traditional methods based on GWAS can infer correlational but not causal relationship. Variants associated with disease susceptibility by GWAS are heavily enriched in putative non-coding regulatory elements, which provides singificant powerful for disease diagnosis and prediction. While more and more studies suggest that combinations of sequence variants within multiple enhancers of a gene, minor risk effects on non-coding regulatory elements make using them for prediction very challenging, which often leads failures of prediction. Compared to traditional research, our project adopts a new approach by combining genome engineering dataset with genomics datasets. Ideally, we will combine our CRISPR perturbation pooled screening data with dataset from Therapeutically Applicable Research to Generate Effective Treatments (phs000218) dataset to conduct the desired work. We will integrate our results with the various omics datasets generated by the TARGET consortium to identify therapeutic targets responsible AML patients. If successful, the work will generate significant knowledge and new dataset related to the diagnosis and/or treatment of childhood AML cancers. We will only work with de-identified data and will not attempt to identify any of the participants. Our proposed research does not include the study of population origins or ancestry. The data will only be used for research purposes. Any findings from the study will be published or shared with the scientific community. qin, chen SUN YAT-SEN UNIVERSITY The genetic mutation study in childhood leukemia Jan22, 2015 closed Nowadays, more and more people pay attention to the childhood leukemia, which is one of the most frequently pediatric cancers. The current treatments have resulted in 5-year event-free survival rates of approximately 80% for pediatric acute lymphoblastic leukemia and almost 60% for pediatric acute myeloid leukemia. However, appropriate treatments for several types of childhood leukemia still need to be developed, such as the infant leukemia with MLL-fusion protein, which only has 10% overall survival after the treatment. Thus, we endeavor to use the TARGET data to improve the understanding of the cause of the childhood leukemia, which may help us to develop the efficient and specific treatment. Nowadays, more and more people pay attention to the childhood leukemia, which is one of the most frequently pediatric cancers. Contemporary treatments have resulted in 5-year event-free survival rates of approximately 80% for pediatric acute lymphoblastic leukemia and almost 60% for pediatric acute myeloid leukemia. However, appropriate treatments for several types of childhood leukemia still need to be developed, such as the childhood leukemia with MLL-fusion protein, which only has 10% overall survival after the treatment. This outcome could be the limited understanding of the genetic mutation in the childhood leukemia. Our objective is to use the leukemia RNA-seq data of the TARGET to find the new mutation in childhood leukemia, including identify the mutations occurring in the transcript And identifying the chromosomal fusions and rearrangements in the childhood leukemia. In detail steps include: (1).Use the TOPHAT to analysis the RNA-seq data, and use the CUFFLINKS for the further annotation. And use the hg19 and RNA-seq for the gene annotation, the noncode4.0 database for the non-coding annotation. (2). Use the TOPHAT-FUSION, STAR, and de-FUSE for the gene fusion search. (3). Classify the samples through the mutations we get, and try to find the generality of the leukemia samples. These insights are expected to discovery the novel common mutation in pediatric leukemia for improving the treatment. Quesnel-Vallières, Mathieu UNIVERSITY OF SHERBROOKE Transcriptomic targeting of cancer Sep12, 2024 approved New cancer treatments use the patient’s own immune system to kill cancer cells without affecting normal cells. These treatments can be more effective and cause fewer side effects than chemotherapy. In order to expand the application of these new treatments, we need to identify what are the differences between normal and cancer cells. We use DNA and RNA sequencing data from normal and cancer cells to identify molecular patterns that exist in cancer but not normal cells. These molecular patterns are then used to teach the patient’s immune system how to recognize and kill cancer cells. Emerging immunotherapies enable the specific targeting of cancer cells, but this targeting requires the identification of cancer markers. Cancer antigens have traditionally been identified from genomic alterations that result in protein isoforms that are not present in normal cells. Such “neoantigens” can be used to engineer chimeric antigen receptor T cells, therapeutic antibodies or vaccines. Unfortunately, the low number of actionable neoantigens if often a limiting factor to the development of new immunotherapies. The objective of this project is to leverage transcriptomic variations, in particular alternative RNA splicing events, as an additional source of cancer antigens. WGS and WES data will be used to generate genomic profiles and identify neoantigens. RNA-Seq data will be used to generate gene expression and RNA splicing profiles and identify cancer splicing antigens. Cancer samples from TCGA, TARGET and CPTAC datasets will be compared to GTEx samples in differential gene expression and splicing analyses using stringent criteria to define truly cancer-specific genomic and transcriptomic variations that could be targeted in immunotherapy-based precision medicine. Correlation analyses and unsupervised hierarchical clustering will also be applied to define cancer subgroups that share molecular signatures and to link those signatures to mutations. TARGET datasets: A sub-aim of this project is the development of personalized immunotherapeuties for high-risk pediatric cancers. Research with the TARGET datasets was initialized by the PI (Mathieu Quesnel-Vallières) while working under Dr. Andrei Thomas-Tikhonenko's guidance (Children's Hospital of Philadelphia) as a postdoctoral researcher. We have been able to use the TARGET AML, B-ALL and NB datasets to identify potential targets for CAR-T therapy, which we are currently validating in the lab. Access to TARGET datasets is requested to maintain the data that were used in for this research as well as to pursue studies in other pediatric datasets with the objective of designing therapeutic vaccines. Rabadan, Raul COLUMBIA UNIVERSITY HEALTH SCIENCES Algorithms for the identification of germline variants in cancer using large scale genomic data Jul24, 2014 approved We hypothesize that pediatric tumors have a different burden of germ-line variants compared to adult tumors. To study the role of somatic alterations and germ-line variants and to compare pediatric and adult tumors, we will develop advanced statistical and computational methods to identify candidate cancer genes by combining the information from diverse datasets, and compare publicly available pediatric and adult samples. The methods and algorithms developed in this project will be made available to the scientific community through standard scientific platforms. We are interested in examining the contribution of germ-line variants to pediatric neoplasms. We hypothesize that pediatric tumors have higher burden of germ-line variants compared to adult tumors. The planned strategy aims to compare (1) the landscape of somatic alterations and germ-line variants in pediatric tumors and (2) the distribution of such mutations to adult tumors. In particular, we plan to analyze the spectrum of mutations in pediatric samples of Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and compare to the adult samples collected by The Cancer Genome Atlas (TCGA) project. For this purpose we will rely on our already developed algorithms for identifying somatic variants, discovering novel gene fusions, highly sensitive detection of subclonal variants, and integrating copy number and point mutation information. The main purpose is to develop statistical and computational techniques to identify candidate cancer genes by combining the information from diverse dataset collected in large number of cancer patients, across different studies. Radhakrishnan, Ravi UNIVERSITY OF PENNSYLVANIA Multiscale Models for Signaling and Cell Fate in Wilms Tumor Jun03, 2020 closed We develop a mathematical model based on data science and quantitative systems pharamacology to predict the progression of kidney cancers in pediatric patients Introduction: Wilms’ Tumor (WT) is a common pediatric renal cancer for which there has been great success in developing treatment strategies in a patient specific manner based on histological findings from excised tumors. However, pathogenesis of the disease remains poorly understood as WT stands out as highly heterogeneous amongst childhood cancers, complicating clinical decision making (Ooms 2020). Better understanding of the molecular mechanisms at play could help clarify what biomarkers indicate early stage disease and higher risk disease, and so allow appropriate and timely clinical actions to be taken. Importantly, no mechanistic model for WT currently exists that accounts for advances in modeling of the p53 DNA damage response pathway and ErbB pathways thought to be canonically involved in WT as well as recent discoveries in WT genetics (Ghosh & Radhakrishnan 2019; Treger 2019). Similarly, recent sequencing studies indicate that WT involves differential behavior of short non-coding RNAs, microRNAs (miRNAs), currently under investigation in many cancers including those of the lung, breast, and gastrointestinal system, but the potential contribution of miRNA profiles to clinical decision making in treating WT remains undetermined (Treger 2019; Peng & Croce 2016). Materials and Methods: To address the need for new clinically actionable models of WT, we construct a mechanistic model that adapts the hybrid multiscale modeling framework of Ghosh & Radhakrishnan 2019; we leveraged their advances in representation of the p53 DNA damage response pathway and ErbB pathways while introducing modifications to account for more recently characterized pathways important to nephrogenesis and newly identified genetic contributors to WT. To investigate the contribution of microRNAs to WT in particular, we additionally designed our model to predict patient-specific treatment responses given patient profiles of microRNA, mRNA, and protein expression, and prescribed treatment regimens. Results and Discussion: We plan to use the TARGET Data to run our model over a group of control patients and WT patients. For each patient, we obtained a cell kill rate, as the net difference of estimated cell growth rate and cell death rate, under patient-specific miRNA and mRNA expression levels and prescribed treatments. The obtained cell kill rate was directly used as an input to phenomenological tumor growth models. Ultimately, we found considerable variability in cell fates both across patients and across treatment options for individual patients. We also found a monotonically increasing relationship between patient tumor volume change and cell growth rates predicted with our model, demonstrating our model’s ability to correctly predict patient treatment responses based on their specific expression profiles. Conclusions: With this model, we aim to provide a mechanistic foundation for empirical models used to study WT and similar cancers. We hope that this study also yields new insights into the role of miRNAs, and other WT associated genetic changes, to disease pathogenesis in a manner that can help inform future study of these cancers. Our future work will be focused on performing a detailed sensitivity analysis to characterize the contribution of inherent tumor heterogeneity and specific WT mutations to model outcomes, and to subject the framework to clinical validation. Radvanyi, Francois CURIE INSTITUTE Exploration of pediatric cancer aneuploidy landscape Aug31, 2023 expired Accumulation of genomic abnormalities can lead to the development of cancer. Aneuploidy that is defined by the loss or gain of chromosome arms, is one of these mechanism. This process can be particularly important in pediatric cancer that harbors less mutations and structural variants than in adults. Aneuploidy has been extensively studied in adult malignancies but not in pediatric cancers. The objectives are to characterize recurrent arm-alterations and find candidate genes, called “drivers”, that have a role in the development of the cancer, among all the genes present in the altered region. These drivers may be potential drug candidates or biomarkers. Recent works have shown the importance of aneuploidy in the development of cancer and have proposed tools to identify associated drivers (Taylor et al., 2018, PMID:29622463). These studies has been focusing on adult cancer whereas it is known that pediatric cancer are less altered and harbors specific alteration. Therefore, we want to take advantage of the SNP array data from the TARGET project to characterize aneuploidy in the available pediatric cancers: ALL (phs000464.v22.p8), AML (phs000465.v22.p8), neuroblastoma (phs000467.v22.p8), osteosarcoma (phs000468.v22.p8) and wilms tumors (phs000471.v22.p8). Other pediatric cancer will also be integrated including medulloblastoma (Northcott et al., 2012, PMID:22832581) and retinoblastoma (Liu et al., 2021, PMID: 34552068 ; Kooi et al. 2016, PMID:27126562). It is important for us to carry the analyses from the raw data to make the results comparable between different data sources. Briefly, the calling of copy-number alterations (CNA) will be performed following the ABSOLUTE pipeline for each samples (Carter et al., 2012, PMID:22544022). Then, aneuploidy events will be identified using the criteria defined in Taylor et al., 2018 (PMID:29622463). Results will be merged together in order to identify recurrent altered arm and to propose associated candidate driver genes based on literature and public databases (COSMIC, PeCan). The identified alteration may also be candidate biomarkers. Ramos Mejía, Veronica ANDALUSIAN PUBLIC FND PROGESS AND HEALTH Integrated analysis of omic data, signaling networks and drug repurposing in pediatric Acute Myeloid Leukemia Apr01, 2020 closed Pediatric acute myeloid leukemia (AML) is a heterogeneous disease, therefore the appropriate stratification and characterization of the cytogenetic subtypes is crucial. In our research, we aim to characterize potential biomarkers of AML subtypes through meta-analysis techniques as well as to study the expression patterns and phenotypic characteristics of three specific AML mutations. Our goal is to improve the stratification of pediatric AML patients based on molecular profiles in order to help improve their diagnosis. In addition, we aim to characterize the signaling networks of AML subtypes and carry out in silico studies of drug repurposing. Acute leukemias are the most common type of cancer in children. Although the majority of them are lymphoblastic, pediatric acute myeloid leukemia (AML) has a higher mortality rate. When comparing with other cancers, pediatric AML shows a high amount of karyotypic abnormalities, making it a heterogeneous disease, therefore the appropriate stratification and characterization of the cytogenetic subtypes is crucial. In our lab, we focus on the study of pediatric AML and we have recently acquired the sequencing data of a cohort of children and adolescents diagnosed with several AML subtypes from Andalusian Health System. In our research, we aim to characterize potential biomarkers through meta-analysis techniques and integrated analysis of heterogeneous omic data from public cohorts and from our own pediatric AML patients. We also plan to study the expression patterns and phenotypic characteristics of three specific mutations. Our goal is to improve the stratification in the classification of pediatric AML patients based on molecular profiles in order to help improve their diagnosis. In addition, we will try to identify the signaling and regulation pathways associated with the tumor phenotype in the different subtypes of AML. We will ultimately use this data to characterize the signaling networks of AML subtypes and carry out in silico studies of drug repurposing. We will evaluate available clinical data such as the overall survival, event-free survival, remission and relapse variables in all our data sets, in association with our results from sequencing analysis and the cytogenetic variant information. Here we request access to raw data for the purpose of studying the subtype-related mutation and also to homogenize the data processing of several AML studies in order to perform meta-analysis. We will combine the processed data of TARGET-AML, TCGA-LAML projects with other AML data sets as well as our own samples in order to perform meta-analysis. The raw data will not be combined but analyzed using the same methods. Ramsingh, Giridharan UNIVERSITY OF SOUTHERN CALIFORNIA RNA expression of TP53 mutated and TP53 unmutated AML Oct16, 2017 expired As individuals age normal blood cells undergo decline in function through the process of senescence. But occasionally due to acquired genetic changes in these stem cells, they escape senescence and undergo leukemic transformation. We would like to determine what changes lead to senescence and what changes lead to cancer formation by analyzing the gene expression in these cells. Healthy individuals contain circulating healthy and aged (senescent) blood stem cells. We have completed RNA sequencing in these cells to look at their gene expression. We request gene expression from leukemia patients to compare our results with that of leukemia. We comprehensively characterized the expression of transposable elements in 178 TCGA AML patients and identified a prognostic signature based on the expression of 5 transposable elements. The expression of these 5 transposable elements can classify adult AML into good and poor risk AML. We would like to validate the predictable power of these TEs in pediatric cancers. We will use the transcritome data from the TARGET database for this discovery. Rand, Vikki UNIVERSITY OF NEWCASTLE Unravelling the genomic complexity of aggressive paediatric B-cell non-Hodgkin lymphoma to improve patient outcome May10, 2018 closed Understanding the biology of paediatric B-NHL will enable us to improve the lives of children diagnosed with the disease. Current treatment is associated with serious side effects and for those children with relapsed or primary refractory disease outcome is dire. More effective treatments are urgently required. Using a combination of cutting-edge techniques the importance of genetic mutations and their role in the B-NHL cancer cell will be investigated. In this application we request access to genomic and transcriptomic data from BL and DLBCL cases managed by dbGaP. Access to this data will enable us to determine the incidence of abnormalities and assess their relevance in paediatric disease as therapeutic targets. Treatment for childhood Burkitt lymphoma (BL) and diffuse large B-cell lymphoma (DLBCL) has a high chance of success but at the cost of aggressive multi-agent therapy, resulting in distressing and dangerous side effects and prolonged periods of hospitalisation. Furthermore, for children with refractory/relapsed disease, treatment options are limited and frequently unsuccessful with <30% of children salvaged by second-line treatments. Recent advances in understanding the biology of BL and DLBCL has identified recurrent abnormalities in genes which may offer alternative treatment options to both reduce treatment-related toxicity and improve outcome for patients who do not respond to current treatment protocols. To understand their relevance in paediatric disease we have established a cohort of paediatric B-NHL samples which we are characterising using next-generation sequencing and arrays. Paediatric BL and DLBCL are rare cancers with <6 cases/100,000 per year and we are requesting access to published data to increase number of cases in the study. Primarily we propose to use the datasets to validate our findings and compare them across other cancers and haematological malignancies. In addition, we aim to use the requested data to understand the differences between paediatric and adult disease with a particular focus on relapsed/refractory B-NHL. As such, we also request access to age, outcome, relapse/refractory status and survival data if available. Data requested from dbGaP will be integrated with our data and data from GEO or ArrayExpress. Data integration is to identify recurrent abnormalities and will be undertaken in Nexus Discovery Software. Our analysis will not create additional risks to participants. No identifiable patient information is required nor requested. Newcastle University’s IT will provide a data management and storage solution which meets the standards outlined. Data will be stored on an offline workstation, encrypted using BitLocker with access via a controlled USB Flash Drive or a very string 12 character password. dbGaP data will not be copied from the workstation. It is our intention to publish or otherwise broadly share any findings from our studies with the scientific community. Rechavi, Gidi SHEBA RESEARCH FUND Whole genome analysis of pediatric tumors for identification of somatic transpositions resulting in dsRNA, aiming to augment immunotherapy Oct28, 2021 approved A major advance in oncology in recent years is the development of checkpoint inhibitors, antibodies that augment the immune response towards tumors (the discovery of this therapy was awarded the Nobel Prize in Medicine in 2018). Unfortunately, and in contrast to adult tumors, the success of this approach in childhood cancer is very limited. About half of the human genome is composed of transposable genetic elements, gene segments that can move from one place to another in the human genome. Our previous work demonstrated the role of such elements in cancer. Recently, it was shown that the enhanced movement of such elements in the genome of the cancer cell may be an "Achilles heel" that may enable to augment therapy with checkpoint inhibitors in a way that will allow to treat and hopefully cure childhood cancer types for which until recently the success rate was low. As a pediatric Hematologist-Oncologist (former head of the Ped. Hem-Onc Department at Sheba Medical Center) my two main research avenues are RNA modifications (including Adenosine-to-Inosine editing) and the study of transposable genetic elements. Our group was the first to demonstrate the role of transposable genetic elements in mammals and provided the first examples of involvement of such transposition events in cancer (Rechavi et al, Nature, 1982; Rechavi et al, Nature 1983; Rechavi et al, Nature 1988; Amariglio et al PNAS 1991). Large-scale analysis of thousands of adult tumors revealed that in a subset of them there is a very high frequency of transposition events (Rodriguez-Martin et al, Nat. Genetics, 2020). We were the first to develop the methodology for whole-transcriptome identification of Adenosine-to-Inosine editing events (Levanon et al Nature Biotechnology 2004) and among the first to decipher the role of editing, mediated by the ADAR1 enzyme, in the stabilization of retrotransposon-derived dsRNA structures, thus dampening the response of the innate immune system (Solomon et al Nature Communications 2017). Checkpoint inhibitors that play a major role in adult oncology so far have very limited success in pediatric oncology, probably due to the low mutation load in childhood cancer (Morad G et al Cell 2021). Recently, several groups, including ours, are involved in augmenting checkpoint inhibitor's effect in primary or secondary resistance tumors by combining checkpoint inhibitors with augmentation of the immune response, activation of the interferon pathway, and turning a "cold" tumor microenvironment to "hot". There are indications that tumors which include a high number of retrotransposition events can be manipulated in a way (e.g., silencing ADAR1 expression) (Ishizuka et al Nature 2019) that will activate the immune system and improve the response to checkpoint inhibitors. Currently, there is very limited information regarding somatic retrotransposition in pediatric tumors (in contrast to the information regarding many adult tumors). In the suggested project we aim at the identification of those pediatric tumors that have high somatic retrotransposition rate using an advanced bioinformatics approach we developed and published (Jacob-Hirsch et al, Cell Research, 2018). The suggested research may identify specific pediatric tumors or a subset of specific tumors that can benefit from immunotherapy combined with dsRNA-related innate immunity manipulation. Reina, Oscar KAROLINSKA INSTITUTE Elucidating the cellular composition of high-risk neuroblastoma Feb17, 2022 approved Neuroblastoma (NB) is a cancer frequently originated in the adrenal glands. This malignancy comprises 8-10% of all childhood cancer cases worldwide, and is responsible for 15% of deaths. In my research I have identified that tumor-cell populations differ between low- and high-risk NBs. In particular, a population of progenitor cells that exist in postnatal adrenal gland with migratory and mesenchymal signatures is common to high-risk tumors but absent in favorable malignancies. How are different types cells associated to metastasis in neuroblastoma?, what is the origin of this malignancy? In this project I aim to to determine the cell type composition in favorable and high-risk neuroblastomas, pheochromocytoma and paraganglioma. The abundances of each cell type is going to be estimated. Discovering the characteristic cell composition of high-risk neuroblastomas can unlock new medical treatments, and contribute to the early diagnosis of tumors with metastatic potential. The clinical hallmark of neuroblastoma (NB) is heterogeneity. This heterogeneity features a broad range of outcomes classified in various risk and prognosis groups. In particular, patients diagnosed with high-risk NBs have unfavorable outcomes characterized by tumor relapse, metastasis and ineffective chemotherapy. In my research I have found that tumor-cell populations differ between high- and low-risk NBs. Notably, high-risk NB resembles an unknown subtype of TRKB+ cholinergic progenitor population identified in human postnatal gland [1]. In connection with this finding, the role of the identified cells populations in NB and adrenal gland needs to be clarified. For instance, in more differentiated forms of the malignancy (i.e. ganglioneuroma and ganglioneuroblastoma), and in rare sympathoadrenal tumors with no established cellular composition (i.e. pheochromocytoma and paraganglioma PCC/PGL). The present project will contribute to clarify the role of the newly identified cell populations in more benign forms of NB, and in other sympathoadrenal malignancies (i.e. PCC/PGL). Understanding the role of the cell populations in metastasis and other forms of sympathoadrenal cancers, will provide a critical insight about their developmental origin and contribute to developing targeted therapeutic strategies that are currently missing for high risk neuroblastoma. The overall aim of the project is to determine the cell composition of neuroblastoma, pheochromocytoma and paraganglioma. The specific goals of the project are: 1) Determining the cell types composition of bulk-sequence NB samples, using as reference the cell populations recently reported for post-natal adrenal gland and NB [1]. 2) Determining the cell types composition of bulk-sequence PCC/PGL samples, using as reference the cell populations recently reported for post-natal adrenal gland and NB [1]. 3) Determining the transcriptional changes between homologous cell types in the different malignancies. Study design and analysis plan: Bulk-sequenced RNA samples obtained from the dbGaP database will be aligned to the human genome (version GRCh38) with STAR, and gene expression quantified using HTSeq. The computed gene expression is going to be use to determine the proportions of the cell populations previously identified for developing and post-natal adrenal glands, and neuroblastomas [1]. To do so, two different approaches are going to be used, namely: deconvolution (using Cibersort or other similar) and projection (using Canonical Correlation Analysis -CCA- or Non-negative matrix factorization -NMF-). Consistency with the data use limitations: For phs000467: to obtain cell composition, previously sequenced NB and post-natal adrenal glands are going to be used. The results of this study will provide a critical insight about the developmental origin of high-risk neuroblastoma and contribute to developing targeted therapeutic strategies. For phs000178.v11.p8: genomic summary results are not going to be computed. NOTE: The analysis of phs000467 and phs000178.v11.p8 datasets is going to be conducted independently. To compute the proportions of cell populations, for each study case the single-nuclei/cells data recently reported by us is going to be used as reference (i.e. data is not going to be combined). Collaboration with researchers at other institutions: None planned. References: [1] Bedoya-Reina O.C. et al. Single-nuclei transcriptomes from human adrenal gland reveals distinct cellular identities of favorable and unfavorable neuroblastoma tumors. Nat. Communications. 12.1 (2021): 1-15 RELLING, MARY ST. JUDE CHILDREN'S RESEARCH HOSPITAL whole exome analysis of severe osteonecrosis in young children with acute lymphoblastic leukemia (ALL) Feb01, 2011 closed Our goal is to identify genetic risk factors for one of the main side effects of chemotherapy, osteonecrosis or avascular necrosis, in children with acute lymphoblastic leukemia. We will compare the genetic features that we measure in children who developed osteonecrosis (measured in a project conducted at St. Jude) with those of children with leukemia who don't develop osteonecrosis (deposited in dbGaP). Glucocorticoids are an invariant component of regimens to treat acute lymphoblastic leukemia (ALL) and are among the most widely used drugs world-wide. Glucocorticoid-induced osteonecrosis (ON) is the primary dose-limiting toxicity in children with ALL and occurs in 10-30% of older children (> 10 yrs of age) with ALL. Patients < 10 years of age are at ~ a 4-5 fold lower risk of ON than older patients, and grade 3-4 ON occurs in only 2.5% (5 of 196 whites) of children < 10 years with ALL. However, this adverse drug effect can occur in any patient taking glucocorticoids. Our laboratory focuses on this adverse effect, and created the first and only murine model of ON (Yang L et al), we have conducted several candidate gene (Relling et al; French et al) and genome-wide (GW) studies (Kawedia et al, submitted) for this complication in unselected children with ALL. In order to find the most penetrant polymorphisms associated with risk of ON, we are in the process of performing whole exome sequencing (as part of a project funded by NIH’s Pharmacogenomics Research Network) on ~ 25 patients from the “extremes” of the distribution of patients at risk of ON: these would be young children (< 10 yrs of age) with ALL, only ~ 0.5- 2% of whom develop ON. These children have been enrolled on front-line studies for ALL conducted by the Children’s Oncology Group (COG) or St. Jude Children’s Research Hospital. All patients received their therapy as part of controlled prospective clinical trials within the COG or St. Jude, and adverse events (i.e. ON) were prospectively ascertained and graded according to a standardized phenotyping scale (i.e. the NCI CTEP grading scale). For this analysis, all cases will have severe (grade 3 or 4) ON, and all controls will have no AVN (grade 0). We will focus on children < 10 years of age at diagnosis of ALL who are genomically-defined whites (to minimize problems comparing cases to controls), and we have determined that we have adequate high-quality germline DNA on 20 children from the COG’s AALL0232 and AALL0331 studies, and 5 children from St. Jude’s Total XV study. Our goal is to use whole exome sequencing to determine whether there is an excess of deleterious variants in specific genes or pathways relative to the baseline frequency of inactivating variants across the exome in a group of young children who develop severe ON, compared to a control group of unaffected individuals. Although our sample size is small, we will perform our statistical testing in a staged manner to reflect our prior information. The purpose of this dbGaP request is to gain access to protected whole genome or whole exome sequence data from children with ALL to serve as “no-ON” controls. Because there are some germline polymorphisms that differ in frequency between children with ALL and non-ALL controls (Trevino et al), the best control group for this project comprises children with ALL. Therefore, this proposal complies with the restriction that “requests for access to protected data will only be considered for those research projects using the data for research relevant to the biology, causes, treatment and late complications of treatment of pediatric cancers. Access to protected pediatric data will be granted solely for those research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that focus on the development of more effective treatments, diagnostic tests, or prognostic markers for childhood cancers.” Furthermore, we have several methods for future validation of possible hits from this project. There are over 2500 children enrolled on the COG’s AALL0232 study (most > 10 years of age), and we have germline DNA for all these children. In some subgroups, the frequency of ON is 10-20% from this cohort, and genes and pathways which predispose to severe ON in younger children may harbor variants in older children as well. In addition, we have the only mouse model for glucocorticoid-induced ON (Yang et al) and we can test whether mice that are homozygous or heterozygous deficient with respect to the most affected genes are more susceptible to the phenotype than their wild-type counterparts. We will use controls (genomically-defined whites) from at least two sources: (1) from at least 20 ALL controls: patients with ALL whose DNA has been sequenced via whole exome or whole genome sequencing (average of 30X coverage) via COG’s TARGET project; and (2) the general population (available via public databases and via the PGRN—we presume that whole exome data from at least 100 normal control whites will likely be available from NHLBI and other funded projects). (1000 Genomes will be less valuable to us at low-coverage only, as several of our planned statistical evaluations include the presence of singletons, which this database is not powered to detect, due to its low-coverage design.) Assumptions: ? Among 18,000 known genes, 15,000 genes will have 90% coding region investigated at high coverage ? Total of 5500 genes annotated to 1030 canonical pathways (Reactome) ? The level of protein coding changing (NS, stop, nonsense, and gene-inactivating indels) variants of 10,000/person as the baseline (in controls). ? Moreover, we assume that the rare variants are more likely to be penetrant for this extreme phenotype, thus we assume that variants with an MAF < 5% are most likely to be important for this phenotype that occurs in ~2% of patients. We estimate there are 25% of NS variants with MAF < 5%, or ~2500 variants/person. ? Using SIFT (http://sift.jcvi.org) and ALL controls, we estimated that 20% of all coding NS were predicted to be damaging to function. ? Thus, we estimate that only ~ 500 low frequency variants/person are substantially deleterious (NS with damaging effects, splice site, stop, or nonsense codons, and with MAF <5%). Nonetheless, it will be a challenge to discern an increase in deleterious variants that accumulate in (a) one gene or a gene family or (b) a biological pathway to a substantially higher level in the cases compared to the controls. Based on whole exome sequencing of 20 ON cases and assuming the same number of controls, we will assess the number of genomic variants [coding synonymous, coding nonsynonymous (benign, deleterious by SIFT, stop and nonsense variants), and insertions/deletions disrupting the gene] and annotate these variants to genes. We will assess whether any one gene (of the 18,000 interrogated) has an excess of deleterious variants in the group of 20 cases vs 20 controls. We will also assess whether any of several candidate pathways have an excess of deleterious variants in the group of 20 cases vs 20 controls. Variant calling / genotyping We will use whatever genotype calls are available from the data deposited at dbGaP; however, we also request access to the BAM files from all germline and diagnostic ALL blast DNA from children with ALL who are part of this data set. Although we will only compare germline variants between ON cases and controls, call in the ALL blast (tumor) DNA can inform the accuracy of the calls made from the germline DNA (i.e. a variant present in both germline and tumor is more likely to be a true variant from “the reference genome” compared to a variant seen only in the germline DNA). We are aware that, due to variable coverage, the variant status at some sites will be uncertain. For our analyses that rely on aggregating variants at either nearby (genomic) loci or among genes in pathways of a priori interest, we propose to apply an expectation for any uncertain variant calls, based on the number and quality of reads containing particular alleles. Similar procedures for variant calling will be applied to the cases and the controls. Statistical analysis To conserve power by reducing the burden of multiple testing, we propose a staged testing procedure that adheres to an overall (family-wise) type-I error rate of .05, consisting of the following general procedure: 1) Test the primary subset of genes of a priori marginal interest. Test genes at the alpha level of .05 / Ncg, where Ncg is the number of candidate genes. Based on our preliminary GWAS for ON in a cohort of children with ALL (not limited to those < 10 years of age), as well as on data from the literature, we have a candidate list of 30 genes. Thus, our marginal p-value threshold for these candidate genes is p < 0.0017. We will compare the proportion of cases vs. controls that have any of these candidate genes altered by deleterious variants. 2) Test pathways of primary a priori interest at the level of .05 / Np, where Np is the number of pathways. Based on our preliminary GWAS for ON in a cohort of children with ALL (not limited to those < 10 years of age), as well as on data from the literature, we have a candidate list of 4 pathways. Thus, our (marginal) p-value threshold for these candidate pathways is p < 0.0125. We will compare the proportion of cases vs the controls with candidate pathways whose genes are affected by deleterious variants. We will also aggregate variants across genes that are members of these pathways, creating a super gene (or gene group) and apply statistical tests. 3) Test all remaining genes at the appropriate “genome-wide” level. This may be close to .05 / (15,000) = 3.3 x 10-6, depending on the extent of the exome capture. In its simplest form, we consider, for a power analysis, a test based on the following procedure: 1) classify an individual’s gene according to whether there is the presence of a gene-damaging variant (determined by prediction algorithms and NS mutation status); 2) tabulate the number of individuals with such variants within case/control status; 3) perform a Fisher’s exact test. This procedure may be applied to testing each gene, or may be aggregated over all Ncg genes. We consider several levels of tests (ie. Type I error), reflecting the different procedures (single test of multiple genes, 30 gene tests, all captured genes). In practice we will consider state-of-the-art tests for exome analysis, including modeling the accumulation of variants via a collapsing method (e.g. Li & Leal, 2008; Browning & Madsen, 2009) or through a hierarchical Poisson model for variant counts. However, we consider the above test and its implications for power in the following table in which we consider various frequencies of damaging variants (per gene) in cases vs. 2.7% (500 such variants per genome/ 18,000 genes) for controls, and provide power estimates assuming use of a relatively small number (e.g. n = 20) of ALL controls and a larger number (e.g. n = 100) of normal population controls: 20 controls 100 controls Freq. cases ? = .05 ? =.0017 ? = 3.3e-6 ? = .05 ? = .0017 ? = 3.3e-6 0.05 0 0 0 0.05 0 0 0.10 0.02 0 0 0.23 0.03 0 0.20 0.27 0.01 0 0.69 0.27 0.02 0.30 0.65 0.14 0 0.92 0.65 0.14 0.40 0.88 0.38 0.01 0.98 0.89 0.40 0.50 0.97 0.65 0.03 1.00 0.98 0.73 0.60 1.00 0.86 0.16 1.00 1.00 0.92 0.70 1.00 0.97 0.43 1.00 1.00 0.99 We note approximately 80% power at a 35% frequency in cases, using 100 controls, at an alpha of .0017. For only 20 controls, to achieve 80% power at the same alpha level, we need to assume a gene-damaging variant frequency of 0.57 (exact values not shown; relevant ranges in bold). Although the effect size for some of the higher power results is high, these values are conservative in that the Bonferroni correction was used and the test itself may be sub-optimal. We are aware that ON may be sufficiently complex so that risk may result from a lack of protective variants, which we can assess with a C-alpha test, based on recent methods for analyzing rare variants by M. Daly and colleagues (K. Roeder; personal communication). Implicit in all of our methods, we do not expect our 20 case individuals to necessarily share specific variants. Rather, we expect an excess of damaging variants within genes and pathways of interest. PGRN investigators to be involved in this project include senior statisticians/bioinformaticists/scientists at St. Jude (such as Drs. J. Yang, W. Yang, Fan, Cheng, Evans, Pui and Relling) as well as collaborators at MD Anderson (Dr. Scheet), Johns Hopkins (Dr. Rosner) and for the COG (Drs. Devidas and Hunger). Citations 1: Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG,Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J; Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C,Scherer SW, Hurles ME. Origins and functional impact of copy number variation in the human genome. Nature. 2010 Apr 1;464(7289):704-12. Epub 2009 Oct 7. PubMed PMID: 19812545. 2: Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008 Sep;83(3):311-21. Epub 2008 Aug 7. PubMed PMID: 18691683; PubMed Central PMCID:PMC2842185. 3: Lupski JR, Reid JG, Gonzaga-Jauregui C, Rio Deiros D, Chen DC, Nazareth L, Bainbridge M, Dinh H, Jing C, Wheeler DA, McGuire AL, Zhang F, Stankiewicz P, Halperin JJ, Yang C, Gehman C, Guo D, Irikat RK, Tom W, Fantin NJ, Muzny DM, Gibbs RA. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010 Apr 1;362(13):1181-91. Epub 2010 Mar 10. PubMed PMID: 20220177. 4: Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009 Feb;5(2):e1000384. Epub 2009 Feb 13.PubMed PMID: 19214210; PubMed Central PMCID: PMC2633048. 5: Ng SB, Nickerson DA, Bamshad MJ, Shendure J. Massively parallel sequencing and rare disease. Hum Mol Genet. 2010 Sep 21. [Epub ahead of print] PubMed PMID: 20846941. 6: Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, Shaffer T, Wong M, Bhattacharjee A, Eichler EE, Bamshad M, Nickerson DA, Shendure J. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009 Sep 10;461(7261):272-6. Epub 2009 Aug 16. PubMed PMID: 19684571; PubMed Central PMCID: PMC2844771. 7: Relling MV, Yang W, Das S, Cook EH, Rosner GL, Neel M, Howard S, Ribeiro R, Sandlund JT, Pui CH, Kaste SC. Pharmacogenetic risk factors for osteonecrosis of the hip among children with leukemia. J Clin Oncol. 2004 Oct 1;22(19):3930-6.PubMed PMID: 15459215. 8: Treviño LR, Yang W, French D, Hunger SP, Carroll WL, Devidas M, Willman C, Neale G, Downing J, Raimondi SC, Pui CH, Evans WE, Relling MV. Germline genomic variations associated with childhood acute lymphoblastic leukemia. Nat Genet 41:1001-5, 2009. (PMCID: PMC2762391) 9: Yang L, Boyd K, Kaste SC, Kamdem Kamdem L, Rahija RJ, Relling MV. A mouse model for glucocorticoid-induced osteonecrosis: effect of a steroid holiday. J Orthop Res. 2009 Feb;27(2):169-75. PubMed PMID: 18683891; PubMed Central PMCID:PMC2718787. Ren, Mingqiang GEORGIA HEALTH SCIENCES UNIVERSITY Targeting FGFR1 signaling in AML Nov07, 2013 closed The abnormal activation of fibroblast growth factor receptor (FGFR) signaling has been recently found in many types of cancers, including acute myeloid leukemia (AML). Moreover, FGFR can also form different types of fusion genes through chromosome translocations in different cancer types to drive cancer development. In human blood cancer FGFR1 fusion kinases are all associated with the stem cell leukemia/lymphoma syndrome (SCLL). Most of these SCLL patients are inevitably progressed into AML. In this study we propose to use the genomic datasets derived from AML patients in the database of Genotypes and Phenotypes (dbGaP) to study correlation of the changes of FGFR1 and MYC (another cancer gene) in AML patents with the clinical characteristics of AML patients. This will help us to identify the molecular basis for the diseases, refine clinical diagnosis, and even suggest useful therapeutic options. This is a renewal study. The goal of this project is to identify the driver mutations in acute myeloid leukemia (AML) through genomic analyses of AML patients. I will specifically focus on FGFR1 and MYC amplification and mutations in trisomy 8 AML patients to functionally study their role in the pathogenesis of AML. I will combine the data from SNP array and Affymetrix gene expression array as well exon or RNA-sequencing of AML patients in dbGaP to seek the correlation of genetic mutations of FGFR1 and MYC with the clinical phenotype of AML patients. Through these analyses it will help us to set up the further functional study in cellular models as well as in mouse models. In this renewal proposal I will extend our study on pediatric AML. Based on our studies (BLOOD, 2012 and 2013), pediatric AML patients are also involved in aberrant FGFR1 rearrangements as well as MYC over-expression. Therefore, I apply to add TARGET dataset in this application. Rheinbay, Esther BROAD INSTITUTE, INC. Analysis of regulatory drivers and sex differences of cancer Oct16, 2020 approved Little is known about cancer drivers outside of the genes that encode proteins in the human genome. Our research is focused on identifying and characterizing drivers in elements that regulate these genes through computational methods, and understand their role in tumorigenesis, treatment response and metastasis. In addition, we are interested in studying the genomic determinants underlying the bias in male vs female incidence in several cancer types, and particularly the role of the Y chromosome, which is frequently lost in cancer. Much has been learned about the protein-coding genes that drive cancer from comprehensive tumor exome sequencing of large patient cohorts. However, the role of regulatory, non-coding cancer driver alterations remains less clear, with only a few examples known to date (TERT and FOXA1 promoters; “enhancer hijacking”; enhancer tandem duplications). The multitude of cell types derived from the same genome is accomplished through complex interplay of epigenetic regulators and chromatin marks. Prior studies have shown that much of the mutational footprint in non-coding region is likely not a driving force for the tumor, and has highlighted the challenge of finding bona fide regulatory drivers. The presence of a considerable fraction of adult and pediatric tumors that cannot be explained by somatic copy number alterations, simple mutations or genomic rearrangements suggest that undiscovered regulatory drivers exist. The recently released whole genomes for >9,000 TGCA samples will further enable us to identify somatic mutations that alter transcriptional regulation in tumors. We propose to use this data set to integrate genomic (whole-genome sequencing, whole exome sequencing, targeted gene sequencing), epigenomic (chromatin profiling, DNA methylation) and transcriptomic information to identify and characterize regulatory, non-coding driver events in cancer using computational approaches. In addition, we are interested in studying the differences in cancer prevalence between males and females attributable to genomic determinants. The comprehensive datasets assembled by the TCGA that include some of the largest whole-genome sequenced tumor cohorts available to date provide a unique opportunity to study regulatory cancer drivers. Especially in tumors unexplained by oncogenic protein-coding alterations, these drivers will inform on the specific regulatory programs driving the tumor, and yield new targets for cancer treatment. We further plan to combine these data with ICGC whole genome information, as well as whole cancer genomes from metastatic solid tumors, which has recently become available from Priestley, Baber et al. We are also interested in the genomic basis of sex bias that is observed in many cancer types (e.g. kidney, brain). In particular, we are using TCGA data to quantify loss of the Y chromosome across cancer types. Although not much is known yet about the role if this chromosome in cancer, aging-related loss of Y in peripheral blood cells in healthy men is associated with increased morbidity and mortality, including from cancer. Despite this interesting observation, the Y chromosome has not been studied widely in cancer, possibly due to increased technical challenges (homology with X, gene expansions on Y) and the fact that few genes are located on this chromosome. In a separate effort, we are studying somatic alterations in the sex chromosomes of childhood tumors, many of which show sex bias (are more common in either boys or girls). Because children have not had the same environmental and hormonal exposures as adults, it is more likely that gender differences could be explained by genetic changes of the sex chromosomes themselves. Yet, very little is known about sex chromosome alterations in childhood cancers. Loss of the second sex chromosome introduces specific vulnerabilities in cells that we plan to investigate as biomarkers and therapeutic targets. We plan to combine TARGET data with additional data from the St. Jude PeCan and potentially other sources to increase statistical power. Summarized results of our analyses will be made available to the scientific community as peer-reviewed publications. Robine, Nicolas NEW YORK GENOME CENTER Genomic analyses of Neuroblastoma Feb29, 2016 closed We will process the TARGET Neuroblastoma dataset, in combination to the samples we are collecting to uncover molecular and transcriptional differences between neuroblastoma patients with widely disseminated disease (INSS stage 4) and patients whose tumors undergo spontaneous regression (Stage 4S). In addition, we will use the TARGET cohort as a background population to identify genes deregulated in our samples compared to the cohort. The New York Genome Center is sequencing tumors from many cancer types and trying to identify potentially actionable mutations. In addition to SNV, indel and structural variations identified by Whole-genome sequencing and Whole-exome sequencing, we are using RNA-Seq to validate coding mutations, detect fusion genes and measure gene expression. In this context, we wish to use the Neuroblastoma samples from the TARGET cohort as a background population to calculate z-score and identify genes and pathway deregulated in our samples compared to the cohort. We will reprocess the TARGET samples in our pipeline to harmonize the gene expression estimation, but will not undergo a full analysis of these samples. In addition, in collaboration with Memorial Sloan-Kettering and Columbia University, the New York Genome Center is interested in the differences between neuroblastoma patients with poor prognosis (stage 4) and patients whose tumors undergo spontaneous regression (Stage 4S). We already collected a few neuroblastoma samples of both categories and performed whole-genome sequencing and RNA-Seq. We will characterize the molecular profiles and the differential pathway expression in both. In order to achieve statistical significance, we will take advantage of the TARGET datasets, which comprise a number of stage 4 and stage 4S patients. Our pipelines identify somatic variants (SNV, indel and structural variants), estimate gene and transcript expression and detect gene fusions. However, the TARGET data will not be used for the development of methods, software, or other tools. We expect to focus our attention on potential differences in the immune response pathways, potentially involved in the difference between Stage 4 and Stage 4S tumors. Rodriguez, Georgialina UNIVERSITY OF TEXAS EL PASO Targeted Therapeutic Strategies Against Pediatric Hematological Cancers Aug18, 2017 expired Our team is interested in developing new methods of treating childhood cancers. To do this we have partnered with local hospitals which provide us with de-identified tumor samples. De-Identification is the process of removing all personal patient information so that the sample can no longer be traced back to the donor. We have sequenced the protein coding regions of DNA from 20 cancer patient samples and have developed a pipeline, Oncominer, for categorizing and sorting through the large dataset of mutations. Through this process we have identified many new and previously unreported mutations that our team is interested in studying further, however, our sample size is too small for meaningful statistical analysis. For this reason, we would like to access NCI’s TARGET data. These data sets will be analyzed independently and compared for similarity. The objective of the proposed research activity is to explore novel molecular rationale for innovative treatment strategies targeting pediatric cancers. One critical component for addressing these questions is to analyze primary tissue samples. Therefore, we have established a flourishing collaboration with local hospitals which provide us with de-identified samples. Using these specimens we have employed whole exome sequencing in combination with ONCOMINER pipeline sorting to identify novel molecular targets for early detection and drug development. This strategy has identified multiple exciting and previously unreported SNPs within our patient group. However, statistical analysis is limited by our sample size (20) and thus we are seeking to expand the current ONCOMINER data to generate a stronger leukemia/lymphoma cancer patient profile with the goal of bringing better medicine to our youngest cancer patients. Data sets will be analyzed independently and compared for similarities. Rokita, Jo Lynne CHILDREN'S HOSP OF PHILADELPHIA RNA splicing in Pediatric Cancer Mar19, 2020 closed The goal of this project is to identify pediatric brain tumor-specific changes in RNA transcriptional events that drive tumorigenesis and could be harnessed to identify novel therapies for children with pediatric brain tumors. The goal of this project is to find targetable immunotherapy neoepitopes resulting from aberrant splicing event in brain cancers. For this, we plan to utilize and leverage the GTEx RNA-seq datasets (fastq) (dbGaP Study Accession: phs000424.v8.p2) to identify alternative splicing changes that are distinct in our brain cancer cohorts. Integrating and leveraging GTEx normal samples will help us determine brain-tumor specific exon-exon junctions, as they will serve as controls in our study. Such an investigation is needed to expand the repertoire of brain-specific immunotherapy targets. Our study will only make use of de-identified sequencing datasets. Rong, Wang IMPACT THERAPEUTICS,INC Identification of specific genetic alterations in Neuroblastoma patients for a novel treatment Aug19, 2021 expired There is currently a deficit of therapies to treat Neuroblastoma cancer sufferers, where the survival rate of high-risk patients is less than 50%. Impact Therapeutics Inc. have developed a compound treatment that previous studies indicate would benefit individuals with a subtype of Neuroblastoma. Therefore, verifying the genetic profile of sufferers of this subtype would enable studies to validate this targeted treatment. Data from the TARGET study will be used to characterize genetic profiles of individuals suffering from this subtype of Neuroblastoma cancer by the collaborator, Fios Genomic. The output of this study may then be used to identify patients for recruitment in future clinical studies. Despite the improved therapeutics strategies, the survival rates of high-risk Neuroblastoma patients are still less than 50%. The need to develop a more effective and specific therapy is urgent for the treatment of stage 4 Neuroblastoma patients. At Impact Therapeutics Inc., we have taken an integrated approach to identify signalling pathways that are correlated with a compound response outcome of Neuroblastoma tumor cells. Our data from PDX model also support the finding from cell line-based assays. Our finding is that a group of children with type 4 Neuroblastoma, identified via gene expression profiling, may respond well to the compound. We believe that our intention of using phs000218 dataset meet the Data Use Limitations. The goal of this study is to develop more effective treatment by utilizing TARGET WES data of 110 Neuroblastoma tumor samples with matched blood samples to characterize significantly altered genomic profiles in those patients. The data of 110 WES data will be analysed securely by our collaborator, Fios Genomics. Fios Genomics will apply their combined expertise in statistics, computational biology, bioinformatics and genomics to identify the differences in genomic variants between two group of Neuroblastoma patients. Data QC will first be performed to remove any poor samples. Then identified copy number variants will be accessed for their association with patient groups. The results would then be used to identify children with type 4 Neuroblastoma who may benefit from this novel treatment. This analysis will not create any risks to patients and will be consistent with Use Restrictions for the requested datasets. We have no plan to combine the requested dataset with any other dataset outside of dbGaP. ROONEY, MICHAEL NEON THERAPEUTICS, INC. Genomics-based discovery of immunotherapy targets Sep01, 2021 approved To design effective cancer immunotherapies, we must have a better understanding of how cancer cells "look" different from normal cells and how they manipulate the immune system. For the first question, we plan to review genetic data for many different tumor samples and normal tissue samples and systematically scan for differences. If the results indicate the presence of a protein that is only made by tumor cells, then additional analyses will be performed to ensure that the result is robust. This will involve analyzing different sample cohorts and data types to make sure that pattern is consistent. If the pattern is confirmed, new therapies (such as vaccines) can be designed to target the protein. For the second question (how cancer cells manipulate immune cells), we aim to analyze cancer mutations that occur most frequently in patients whose tumors are being attacked by immune cells. These mutations might be an escape mechanism that the tumor uses to preserve itself. Discovering these mechanisms can give us clues on how to design new cancer therapies. To the extend that important new mechanisms are discovered, they will be followed with animal models and drug design. The goal of this project is to discover novel therapeutic cancer targets. The study includes two parts. The first part focuses on discovering and characterizing tumor-specific antigens, which can potentially be targeted by vaccine or cellular therapy. The second part focuses on discovering cancer pathways that alter immune function in the tumor microenvironment, which can potentially be drugged using monoclonal antibodies, small molecules, or RNA-based therapies that target these pathways. For the work focused on tumor antigens, our goal is to discover proteins that are expressed on tumor cells but that are absent (or very weakly expressed) on normal cells. We will comprehensively assess for suitable targets by assessing gene-level and transcript-level gene expression in tumor cells and normal cells and identifying recurrent somatic DNA changes. For the analysis of gene expression, tumor tissue RNA levels (per TCGA, TARGET, and MET500) and protein levels (per CPTAC) will be compared against normal tissue RNA levels (per TCGA and GTEx) and protein levels (per CPTAC). Furthermore, the analysis will investigate non-canonical transcripts and non-canonical protein translations. Since these novel isoforms will not be present in pre-computed FPKM / TPM matrices (for RNA-Seq) or in post-search peptide sequence match files (for proteomics), we will need access to .bam and .fastq files (from TCGA, TARGET, MET500, and GTEx) to reconstruct transcripts and raw spectra files (from CPTAC) to search for novel peptide sequences. For the analysis of somatic DNA variants, variant calls (from TCGA, TARGET, HNPCC-Sys, and MET500) will be tabulated, validated by inspecting local alignments and read counts (requiring access to WES/WGS .bam files from TCGA, TARGET, HNPCC-Sys, and MET500), and quantified in terms of variant-specific RNA expression (requiring access to RNA-Seq .bam files from TCGA, TARGET, and MET500). The utlimate goal is to create a short list of somatic variants and/or tumor-specific protein isoforms that are suitable targets for cancer vaccines or cellular therapy. For the work focused on discovering cancer pathways that alter immune function in the tumor microenvironment, our objective is to discover genetic variants in tumors (SNPs, indels, fusions, chromosomal alterations, etc.) that are correlated with the immune phenotype of tumors (as estimated by gene expression signatures). Our hypothesis is that these associations will often have a causative interpretation (e.g. the alteration promotes a certain TME phenotype OR the TME phenotype imposes selection pressure that promotes fixation of the variant). Immune phenotype will be estimated using RNA-Seq expression signatures related to the presence of different immune cell types or effector functions (e.g. "cytolytic activity"), which will require analyzing RNA-Seq data from TCGA, TARGET, and MET500. Somatic variants calls will be determined based on pre-computed MAFs (or VCFs) from TCGA, TARGET, and MET500 (in the case of SNPs and indels) or using pre-computed focal copy number amplificiation/deletion calls derived from copy number arrays from TCGA and TARGET. Focal copy number events may also be inferred using WES, WGS, or RNA-Seq (using pre-computed estimates or computing in-house estimates, as necessary) for datasets/samples for which copy number arrays were not employed (e.g. MET500). We plan to assess "TME phenotype"-"cancer genotype" associations across multiple tumor types while controlling for various possible confounders (histological subtype, tumor mutation burden, etc.). To the extent that the associations reveal druggable pathways, additional validation experiments and drug discovery work will be done to explore their therapeutic potential. Genomic summary results will not be published or otherwise publicly posted. There are no planned collaborations with investigators at other institutions. All research efforts are focused on the discovery of actionable therapeutic targets - methods development is not a major or minor focus of the research project. The aim of the study is to be comprehensive and consider targets in both adult cancers and pediatric cancers (e.g. neuroblastoma) as well as in both primary and metastatic cancers. For these reasons, datasets like TARGET and MET500 are required (and not just TCGA and CPTAC). Roychowdhury, Sameek OHIO STATE UNIVERSITY Bioinformatics Analysis of Pediatric Cancer Jan27, 2017 approved Cancer is a complex disease with various genetic aberrations. We aim to detect novel markers and targets in pediatric cancers. These findings would have applicability to improved diagnosis and treatment of pediatric cancers, as well as the repurposing of treatments for adult cancers in the pediatric population. We intend to publish or otherwise broadly share findings (but never raw data) from this study with the scientific community. We are engaged in the analysis of whole-exome, RNA-seq, and whole-genome sequencing for patients with cancer in a variety of tumor types. Among our aims, we wish to identify genetic markers and features shared among both pediatric cancers, especially features that have not been previously well investigated in pediatric cancers, and that could provide new opportunities for the treatment of pediatric patients. Furthermore, since these are well-characterized data sets, we feel that these data will provide an important means to expand our analyses of cancer in multiple organs and age groups, and contribute to the public domain. We plan to publish and share results obtained using these data for the scientific community. We will develop methods but we will not be sharing raw data. Ruiz, Christian UNIVERSITY OF BASEL Germline Driver Gene Analysis in Osteosarcoma Sep12, 2016 closed Osteosarcoma (OS) is a bone tumor which primarily affects children and young adults. Even though in the past research has lead to better treatment of OS, no significant improvements have been achieved in the last decades. Especially metastatic OS remains a disease with low survival rate. While the majority of OS patients acquired the disease sporadically, there are a few described cases where OS is inherited. Our study aims to identify and further investigate cases of inherited OS by looking at mutations in genes which have not previously been associated with osteosarcoma and therefore may have been overlooked. In the future this knowledge may be helpful to identify individuals who are at risk for developing OS in order to treat them timely and efficiently. Osteosarcoma (OS) is the most common malignant tumor in bone. It predominantly affects children and young adults. Even though modern treatment protocols, which include chemotherapy, surgery and sometimes radiotherapy, have significantly improved survival rate of localized OS (60-70% 5 year survival), metastatic OS remains a disease with low survival rate (30 % at 5 year survival), and no significant improvements have been achieved in this regard in the past decades. Recent sequencing efforts of OS have painted the picture of a genetically highly heterogeneous disease, which obfuscates driver gene discovery. While the majority of OS cases are sporadic OS has been associated with rare inherited syndromes, such as bilateral retinoblastoma, Li–Fraumeni syndrome, Bloom syndrome, and Rothmund-Thomson syndrome. Our study aims to further investigate inherited OS and the underlying germline mutations. We analyzed an initial OS cohort using next-generation exome sequencing. Data of the TARGET:Osteosarcoma cohort shall be used to verify the mutation frequency of the most frequently mutated genes as found in the germline of our initial OS cohort. To this end, next generation sequencing data of the TARGET:Osteosarcoma cohort will be filtered for the regions of interest and subsequently processed using in-house quality control-, variant calling-, and annotation-pipelines, supplemented by specialized analyses using expression and whole-genome sequencing data. Data from the TARGET project will be used to complement own datasets and no additional risk to participants will be created.Raw sequencing data will be analyzed only at the institution requesting permission and will not be processed in the cloud. In order to comply with the data usage terms as defined by the NCI for TARGET projects for which a global analysis has not been published, publication of results derived directly from the TARGET:Osteosarcoma dataset shall be limited to less than 5 genes. These genes of interest are derived from an initial cohort (dataset outside of dbGAP). Data will only be used to study pediatric cancer (osteosarcoma), and will not be used for method development or population studies. Ruppin, Eytan NIH Identifying clinically relevant genetic interactions in pediatric cancer Oct25, 2018 expired Genetic interactions (GI) are the interaction between a pair of genes. It has a potential to improve cancer treatment. Synthetic lethality (SL) is one of the representative example of GI, where inactivation of respective single genes is not lethal to a cancer cell, while their concomitant inactivation kills the cell. It is highly relevant to cancer treatment because inhibiting the SL partner of tumor suppressor gene will selectively kill tumor cells while sparing its normal counterparts (Hartwell et al. Nature 1997). We have built a computational algorithm to identify such SL interactions using cell line data (Jerby et al Cell 2014, Lee JS et al Nat Commun 2018). Building on this work, we aim to build a more advanced computational algorithm to identify the SL interactions that are clinically relevant, i.e. directly applicable to treat pediatric cancer patients. For this analysis, we will incorporate all available data of pediatric cancers from the NIH’s Genomic Data Commons. Our analysis will reveal patient specific vulnerabilities that can advance precision based oncology for pediatric cancer. It has been noted that genetic interactions can dramatically improve cancer treatment (Hartwell et al. Nature 1997), but its identification was not possible until recent years due to technical limitations. Now the recent accumulation of large-scale patients molecular and clinical data makes it possible for one to infer such interactions directly from patient data. Building on our previous publication (Jerby et al, Cell 2014, Lee JS et al, Nat Commun 2018), we aim to identify the genetic interactions that are clinically relevant in treating pediatric cancer patients by filtering the pairs that are associated with patient survival while controlling for all molecular and clinical confounders including patient age, gender, race, and tumor’s genomic instability, purity, and the effect of individual genes. For this purpose, we will mine Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET), where the first three datasets are available through Genomic Data Commons (https://portal.gdc.cancer.gov/). We will perform our statistical tests in each of molecular profiles, and select only those genetic interactions that show significance in multiple datasets available. Our analysis will help identifying patient specific vulnerabilities through genetic interaction network that may overcome the current limitations of mutation based precision oncology approaches for children's tumor. We will make sure to broadly share any findings from the studies with the scientific community. RYAN, RUSSELL UNIVERSITY OF MICHIGAN AT ANN ARBOR Mechanisms of gene dysregulation in pediatric leukemia May21, 2021 approved We are requesting access to “genomic sequences” (DNA) from blood cancer cells from several hundred patients. “Genomic sequences” refer to the unique DNA code that determines many qualities of both normal cells and cancer cells, including “genes” that code for the protein building blocks of the cell, and “regulatory elements” that act like light switches to turn genes on and off. We are interested in a particular set of “regulatory elements” that turn on genes that are important for blood cancer cells. We want to determine whether there are differences in the DNA sequence of these switches in blood cancer cells, and how that relates to whether the genes are on or off. By better understanding this subject, we hope to ultimately develop more effective treatments for blood cancers, or ways to prevent blood cancers from occurring. We have used a panel of B-ALL cell lines to functionally identify novel mechanisms of gene dysregulation in B-ALL, with a particular focus on distal regulatory elements. Specifically, we have identified a class of non-coding elements that function as enhancers in cases of B-ALL with somatic mutations affecting one particular transcriptional regulator. Activation of these enhancers results in characteristic patterns of gene dysregulation, including increased expression of candidate oncogenes. These findings raise the question of whether leukemias with these specific transcription factor alterations and patterns of dysreguated gene expression also bear genomic variants (somatic or germline in origin) in these key distal regulatory elements. We are requesting access to the Target datasets in order validate and extend our findings in primary leukemias as follows: 1. Identify correlations between specific patterns of gene expression and specific genetic variants, including large-scale chromosomal rearrangements, copy number abnormalities, and somatic driver mutations. 2. Identify mutations affecting distal regulatory elements and their association with specific B-ALL subtypes, defined by the presence of characteristic gene expression signatures and / or driver oncogene aberrations. 3. Identify differences in the frequency of common germline variants affecting distal regulatory elements in leukemia patients and the general population (comparison made to whole genome datasets summarize in the gnomAD database). While most of our analysis is focused on B-ALL, we are also requesting access to pediatric AML datasets, in part as a comparator, and because the transcription factor gene aberrations of interest may occur in a small subgroup of AML. We will not perform any analyses that might compromise the identity of participants, or investigate risks or genetic associations with diseases other than cancer. Ryan, Sarra UNIVERSITY OF NEWCASTLE Comparative Leukaemia Genomics Mar09, 2023 approved We research childhood leukaemia specifically Acute Lymphoblastic Leukaemia (ALL) we have sequenced the DNA of the genomes and the coding part of the genome the exome for many of our samples. Various changes in the genome or exomes of patients samples are interesting to us specifically if specific genes known to cause cancer are mutated or the genetic information for some genes is lost, as a so called gene deletion. We want to make use of data generated by The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research To Generate Effective Treatments (TARGET) to see if any of the changes and mutations in the genomes of our group of childhood Acute Lymphoblastic Leukaemia are also present in the DNA or genome of patients with Acute Myeloid Leukaemia and additionally other cancers in the TCGA dataset, finding commonalities between the these may identify common mechanisms which can lead to blood or other cancers as well as common ways of treating them. Our research concerns the investigation and characterisation of the landscape of genomic aberrations in various subtypes of paediatric Acute Lymphoblastic Leukaemia (ALL). We have both exomes and genomes for samples in our cohort of mostly childhood ALL, we are currently undertaking a study of both the Single Nucleotide Variants (SNVs) and indels which are present as well as primary and focal abnormalities such as structural variation and Copy Number Variation (CNVs). We wish to access the data in TCGA and TARGET to see if genomic aberrations present in our cohort are also present in B-lineage acute lymphoblastic leukaemias (B-ALL), T-lineage ALLs (T-ALL), acute myeloid leukaemias (AML) and other malignancies to see whether there are any common genomic aberrations or critical changes in common pathways. For example, mutations in the tyrosine-protein kinase FLT3 are important to both ALL and AML and is one such example of a commonly mutated gene which affects both haematological malignancies along with mutations in signalling pathways such as JAK-STAT and the RAS pathway, we aim to see if there are additionally commonalities between our ALL cohort and TCGA/TARGET data at the mutation, copy number and structural variation level. Our study specifically targets Paediatric ALL , advancements in identifying new primary subgroups as well as risk impacting secondary events in our data will enable better and more accurate patient risk stratification and more appropriate treatment with reduced cytotoxicity and side effects. Identifying common genomic aberrations or critical changes in common pathways in our data and other leukaemias present TCGA/TARGET will further enhance the benefit for paediatric patients as findings from functional studies and drug targets relating to these common changes can be investigated for use in paediatric Acute Lymphoblastic Leukaemia (ALL). The various omic datasets will initially be analysed independently from the event calling point of view, aberrations or critical changes in common pathways identified during analysis will then be aggregated and reported at publication level this poses no additional risk to participants. Sabatini, David WHITEHEAD INSTITUTE FOR BIOMEDICAL RES Assessment of the role of HAL, the rate-limiting enzyme of the histidine degradation pathway in the response of osteosarcoma tumors to methotrexate Jun19, 2018 closed We've previously shown an association between the expression levels of the rate limiting enzyme in histidine catabolism, HAL, and sensitivity to the anti-folate drug methotrexate in cancer cell lines and in pediatric ALL patients. We will further study HAL functional contribution to the sensitivity of osteosarcoma tumors to methotrexate treatment by analysis of HAL expression levels in these tumors as published by dbGaP. If indeed the pre-treatment expression levels of HAL, and maybe other relevant enzymes, will be found to be correlative with the response of tumors to methotrexate, we will study the potential of these enzymes as clinical prediction factors for the success of methotrexate treatment in collaboration with Dr. Katherine A. Janeway (Dana-Farber, Boston Children's Cancer and Blood Disorders Center). A good correlation has the potential to considerably improve treatment efficacy since treatment-outcome prediction markers are currently lacking, leading physicians to prescribe methotrexate regimens based on clinical parameters instead of personalized pharmacogenetic factors We propose to study a potential correlation between the expression levels of HAL, the rate-limiting enzyme of the histidine degradation pathway and the response of osteosarcoma tumors to methotrexate. We found such correlation in pediatric ALL patients treated with methotrexate. The first step will be to analyze the expression levels of the enzyme in osteosarcoma tumors in patients treated with methotrexate, as published in the phs000468 dataset. If we will find correlation between high expression levels of the enzyme and improved survival of the patients we will apply for funding for further assessment of the potential of HAL as a prediction factor for the response of osteosarcoma patients to the commonly used drug methotrexate. No combination of other datasets is planned. Saisanit, Sittichoke ROCHE HOLDING, INC. Analysis of genomic aberrations in pediatric cancers Dec03, 2015 closed We intend to study several pediatric cancers to assess frequencies of genomic variations (SNPs, indels, expression, amplifications, and structural variations) and we plan to use this informationto study and identify potential new targets and biomarkers. We will use our internal pipeline to analyze raw sequencing data; we plan to compare the results with corresponding data from our internal panel of cancer cell lines, from the Cancer Genome Atlas, and from the Pediatric Cancer Genome Project. We intend to study several pediatric cancers to assess frequencies of genomic variations (SNPs, indels, expression, amplifications, and structural variations) and we plan to use this information to study and identify potential new targets and biomarkers. We will use our internal pipeline to analyze raw sequencing data (e.g. RNA-seq); we plan to compare the results with corresponding data from our internal panel of cancer cell lines, from the Cancer Genome Atlas, and from the Pediatric Cancer Genome Project. We believe that this combination will not create additional risks to participants because each dataset will be analyzed independently from one another. Salomonis, Nathan CINCINNATI CHILDRENS HOSP MED CTR Characterization of a novel poor prognostic AML splicing signature in pediatric patients Nov24, 2015 approved AML is an aggressive cancer affecting both children and adults that is characterized by the uncontrolled growth initiating cells in the blood and bone marrow. While initial treatments may be successful, a high percentage of pediatric patients (30-40%) relapse and die within five years of diagnosis. These cases are believed to occur as a result of a small population of pre-therapy cancer cells that are resistant to chemotherapy. Using new computational approaches, we have uncovered a way to find changes in the molecular profiles of these blood cells that predict which patients will likely die and which will survive, following therapy. These molecular profiles result from a biological mechanism known as alternative splicing, which involves the rearrangement of gene sequences to produce proteins. To determine how splicing affects pediatric AML patient survival and identify novel therapeutic targets, we aim to analyze the vast amount of data produced through the TARGET project. Future studies will aim to validate our findings from recruited pediatric patients and propose novel therapies. Although alternative splicing has been implicated in both primary and relapse adult MDS and AML, the involvement of disrupted splicing pathways in pediatric AML has not. At Cincinnati Children's Hospital, we are committed to the discovery of known drivers of pediatric leukemia and factors that impact survival. We have discovered that alternative splicing is a prominent feature of adult AML and hypothesize that novel or common splicing signatures exist that drive tumorigenesis and overall patient survival. To test this hypothesis, we will first: 1) apply our sensitive and accurate alternative splicing workflow to identify novel splicing signatures that arise in pediatric AML using a series of novel unsupervised clustering methods, 2) evaluate the occurrence of these splicing signatures with patient survival, 3) associate specific mutations non-splicing factor mutations with the splicing signatures and 4) extend these analyses to AML patients with induction failure RNA-Seq datasets to evaluate therapeutic of different chemotherapies on the gain or loss of these signatures as they relative to overall survival. Our current hypothesis is that mutations acting upstream of splicing control impact downstream pathways of alternative splicing that lead to selective advantages of early AML blasts. To test these hypotheses we will first align the RNA-Seq data using TopHat2 to discover known and novel splicing variants, obtain high-confidence splicing estimates using the software AltAnalyze, identify de novo splicing signatures using the unsupervised pattern discovery method ICGS in AltAnalyze, determine cancer somatic mutations from the RNA-Seq data for known AML associated genes (TCGA) using the km software, use survival analysis packages in R to determine the over-all survival and treatment regimen associated with the distinct splicing factor signatures. Future focused studies will apply RNA-binding motif analysis to predict and eventually perturb specific splicing factors implicated with each observed signature. Any adult AML data will be evaluated independently until completion of these initial studies, at which time common splicing signatures will also be evaluated. Sanada, Masashi NAGOYA MEDICAL CENTER Genetic analysis of acute lymphoblastic leukemia Jan29, 2016 closed Acute lymphoblastic leukemia (ALL) is the most common cancer in childhood. The survival rate of pediatric ALL has greatly increased over time, but relapsed cases are chemo-resistance and long-term survival of these cases is still worse. And it is difficult to predict the risk of relapse accurately. In this study, we will identify the predictive biomarkers for relapse and their molecular mechanisms. Acute lymphoblastic leukemia (ALL) is the most common cancer in childhood. The survival rate of pediatric ALL has greatly increased over time, but relapsed cases are chemo-resistance and long-term outcome of these cases is still poor. And it is difficult to predict the risk of relapse accurately. In this study, we will perform whole-exome sequencing (WES), targeted sequencing, and RNA sequencing to identify the molecular mechanisms of relapse and the prediction biomarkers for relapse. We also would like to analyze the genetic difference between cured and relapsed cases using deposited ALL cohorts in dbGaP including WES, RNA-seq, and methylation analysis. The combined analysis of these data sets does not create any additional risk to participants. No request will be made for the identification of participants. We will publish and broadly share any findings from this project with the scientific community. Sanda, Takaomi NATIONAL UNIVERSITY OF SINGAPORE Analysis of oncogenic transcription factors in pediatric cancers Nov07, 2018 closed Cancer is a genetic disease caused by multiple abnormalities that affect the expression and/or function of genes that contributes to cancer formation. In pediatric cancers, many genetic abnormalities affect a specific type of protein called transcription factors that regulate gene expressions. We hypothesize that abnormal regulation of such factors may change a program which is essential for normal cell function and instead induces a new mechanism that promotes cancer formation. For this purpose, we have been studying several transcription factors which are overexpressed in acute lymphoblastic leukemia and neuroblastoma, which are two major subtypes of pediatric cancers. In this project, we will analyze the expression of these factors as well as the genes regulated by them (downstream factors) in three cohorts of samples deposited in the dbGaP database. Transcription factor abnormalities are frequently observed in pediatric cancers including acute lymphoblastic leukemia and neuroblastoma. Many oncogenic transcriptions factors are abnormally expressed in these cancers. We hypothesize that deregulation of these genes alters transcriptional regulatory program which are required for normal development and instead induces oncogenic machinery. For this purpose, we have been studying the roles of bHLH type transcription factors and LIM-domain only proteins, which include TAL1, LMO1 and LMO2. In the proposed project, we aim to analyze the expressions of these genes and their downstream targets in two independent cohorts of primary cancer samples (dbGaP IDs: phs001513 and phs000218). We will analyze each of these datasets independently and will not combine them. Also, we will not identify or access to original patient information. Sandberg, Rickard LUDWIG INSTITUTE FOR CANCER RESEARCH/S B Identification and analysis of novel gene fusions in neuroblastoma Mar09, 2015 closed We intend to use the information in this data set to understand how fusions between different genes can affect the progress of neuroblastoma. This extensive data set is unique both when it comes the number of patient samples and the depth of the analysis performed. Thus, it would serve as a good starting point to find which gene fusions occur and how they affect the aggressiveness of the neuroblastoma tumors. We hope that knowledged gained with the help of this data set evntually will further our knowledge about and our ability to fight the disease. Fusion proteins play a critical role in cancer development. We are interested in discovering novel protein fusion events in neuroblastoma. This would serve as a starting point to examine the function of these potential cancer-driving fusion proteins both in vitro and in vivo. TARGET Neuroblastoma dataset (accession number phs000218.v12.p2) contains more than 200 pairs fully characterized neuroblastoma patient cases (matched normal/tumor pairs); whole-exome sequencing were performed in 221 cases and whole-genome sequencing were performed in 18 cases; moreover, Pair-end RNA-seq were carried out on 10 cases of whole-genome sequenced samples. These whole-exome sequencing data together with pair-end RNA-seq data will be ideal for fusion protein detection. Pair-end RNA-seq data will be analyzed using available bioinformatics tools such as TopHat-fusion and R package Chimera. The Whole-exome sequencing data will be analyzed by subread algorithm. We will compare identified gene fusions with clinical/scientific data including: overall survival, event free survival, Mycn amplification, other chromosomal rearrangements (e.g. 1p36 deletion), genome wide patterns of expression and epigenetic status. Our analysis will not create any additional risk for the participants of the study. Currently, we only plan to analyze this dataset, as it contains tremendous sequencing data and it is enough for fusion protein detection in our projects. Savic, Daniel ST. JUDE CHILDREN'S RESEARCH HOSPITAL Gene regulatory effects of noncoding somatic variation Nov16, 2023 approved Although the spatial and temporal regulation of genes is critical for genome function and underlies diverse physiological and developmental processes, our overall understanding of how this activity is encoded in noncoding regulatory DNA sequences and how genetic alterations to these regulatory DNA sequences impact cellular function, complex traits or disease is still rudimentary. We plan on using WGS data from TARGET ALL biospecimens to identify and functionally investigate the impact of DNA sequence variation at non-coding regulatory DNA sequences on gene regulation. My primary research focus involves studying gene regulation in the context of childhood acute lymphoblastic leukemia (ALL) to define the impact of cis-regulatory disruptions on chemotherapeutic drug resistance and treatment outcome in patients with ALL. My laboratory applies functional genomic and high-throughput approaches to identify and functionally characterize cis-regulatory elements (i.e., promoters, enhancers, etc.) and their associated noncoding DNA sequence variants. The long-term goal of my research effort is to gain a better understanding of the underlying causes of chemotherapy failure in ALL patients by identifying novel gene regulatory mechanisms impacting chemotherapeutic drug resistance and treatment outcome. We plan to use WGS data from TARGET ALL biospecimens to functionally evaluate identified noncoding genetic variants located at cis-regulatory elements on gene regulation and on gene regulatory networks that are activated by diverse chemotherapeutic agents using massively parallel reporter assays and CRISPR genome editing in ALL cell line and/or PDX models. We will map somatic variants from WGS to cis-regulatory elements using chromatin annotations we have generated in primary ALL cells from patients, PDXs and ALL cell lines. We will further determine potential effects on transcription factor (TF) binding events as well as effects on chemotherapeutic drug resistance. Overall, this analysis will provide a better understanding of the functional effects of noncoding somatic variation on ALL genome function and biology. sayoldin, Bahar LEIDOS BIOMEDICAL RESEARCH, INC. CCDI- DCC Jun29, 2023 approved I need access to these data sets to make the data submitted to CCDI, findable and accessible through the NCI's Cloud Resources. The objective of this project is to manage the scientific data. The Childhood Cancer Data Initiative is a scientific data processing and repository under the Cancer Research Data Commons. CCDI makes the study data available and accessible on the cloud. To perform the scientific management of these data on the cloud, CCDI needs to access the associated metadata. To design and develop technical and scientific administrative and managing capabilities for this data repository, we need access to these datasets and the SRA metadata. This list of datasets is expected to expand to include other data types in the future. Schaefer, Martin ISTITUTO EUROPEO DI ONCOLOGIA Mapping transcriptionally relevant promoter DNA methylation change in cancer Nov29, 2022 approved DNA methylation is a chemical modification to DNA which plays an important role in controlling gene activity. In cancer, DNA methylation is drastically altered and contributes to changes in gene activity in cancer cells. In fact, changes to DNA methylation may be one of the earliest events in the development of cancer. However, we still have a very limited understanding of exactly how DNA methylation controls gene expression and consequently it is not clear what are the effects of the very large number of DNA methylation changes typically present in cancer. We will combine gene expression data and DNA methylation data from normal and cancer cells to perform a comprehensive computational analysis to advance our understanding of how DNA methylation controls gene activity and how altered DNA methylation in cancer results in aberrant gene activity. Promoter DNA methylation has been recognized for decades as one of the major mechanisms of epigenetic regulation of gene expression and has consistently been found to be perturbed early during tumorigenesis. However, exactly which alterations in promoter methylation play a causative role in cancer development and how these changes impact the cancer transcriptome remain poorly understood. We have noted that the size of designated promoter regions varies widely between different studies, ranging from several hundred to several thousand base pairs. In addition to this, we observed that the correlations between promoter methylation and expression of their associated transcript are generally fairly low, regardless of the promoter definition used, with only a tiny minority generally being statistically significant. We aim to use raw RNA-seq data from healthy and tumour samples to quantify expression at the transcript level and combine it with WGBS data from the studies phs001648.v2.p1 and phs000218.v24.p8 to perform a comprehensive study of the relationship between methylation of individual CpG sites close to transcriptional start sites and the expression of the associated transcript. We will create for the first time a detailed map of the CpG sites most important to transcriptional activity, providing a vast improvement on the use of arbitrary promoter definitions which currently pervades DNA methylation research. We will subsequently examine how methylation at these sites is altered in cancer cells and how this in turn affects the cancer transcriptome, providing novel insights both into the basic function of DNA methylation and the consequences of its perturbation in cancer. We believe that this could be particularly useful for childhood cancers, where there are often few mutations and epigenetic changes are thus believed to have increased importance in tumour development. Schaffer, Michael JOHNSON/JOHNSON/PHARM/RES/ DEVELOPMENT Genomic analysis of pediatric cancers Mar26, 2018 closed We propose to analyze the genomic and transcriptomic landscapes of several pediatric cancers and utilize this information to identify potential biomarkers and new therapeutic targets. We will detect SNPs, indels, copy number alterations, and structural variations using whole genome or exome data. Expression of genes will be quantified using RNA-seq, and novel gene fusions will be detected by whole genome and transcriptome analysis. We will compare the genomic landscapes and expression profiles of adult vs pediatric cancers using data from TCGA as well as our internal cancer cell lines and studies. We propose to analyze the genomic and transcriptomic landscapes of several pediatric cancers. Raw sequencing data will be analyzed using internal pipelines in a secure computing environment. No attempts will be made to identify individual patients and we anticipate no increased risk to participants. We will use bioinformatics software such as BWA, GATK, GEMINI, STAR, and Salmon to perform genomic analysis: we will detect SNVs, indels, copy number alterations, and structural variations using whole genome or exome data; expression of genes and isoforms will be quantified using RNA-seq; and novel gene fusions will be detected by whole genome and transcriptome analysis. We will compare the genomic landscapes and expression profiles of adult vs pediatric cancers using data from TCGA as well as our internal cancer cell lines and studies. These findings would be utilized to identify potential biomarkers and new therapeutic targets for pediatric cancers. Scheet, Paul UNIVERSITY OF TEXAS MD ANDERSON CAN CTR Whole exome analysis of severe osteonecrosis in young children with acute lymphoblastic leukemia (ALL) May04, 2011 closed Our goal is to use whole exome sequencing to determine whether there is an excess of genetic variants that increase risk for osteonecrosis, a condition that limits dosage for patients in treatment for childhood acute lymphoblastic leukemia (ALL). Our approach is statistical; we will use prior knowledge of biology of ALL to examine specific genes or biological pathways and compare the relative frequency of genetic variants in childhood ALL patients with severe ON to those who did not develop ON. Although our sample size is small, we will perform our statistical testing in a staged manner to reflect our prior information. Our goal is to use whole exome sequencing to determine whether there is an excess of deleterious variants in specific genes or pathways relative to the baseline frequency of inactivating variants across the exome in a group of young children who develop severe ON, compared to a control group of unaffected individuals. Although our sample size is small, we will perform our statistical testing in a staged manner to reflect our prior information. The purpose of this dbGaP request is to gain access to protected whole genome or whole exome sequence data from children with ALL to serve as “no-ON” controls to provide analytical and statistical support for the NIH- and site-sponsored (St. Jude) research related to dbGaP request, entitled “Whole exome analysis of severe osteonecrosis in young children with acute lymphoblastic leukemia (ALL)” (PI: M. Relling, St. Jude Children’s Research Hospital). Because there are some germline polymorphisms that differ in frequency between children with ALL and non-ALL controls (Trevino et al), the best control group for this project comprises children with ALL. Therefore, this proposal complies with the restriction that “requests for access to protected data will only be considered for those research projects using the data for research relevant to the biology, causes, treatment and late complications of treatment of pediatric cancers. Access to protected pediatric data will be granted solely for those research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that focus on the development of more effective treatments, diagnostic tests, or prognostic markers for childhood cancers.” Furthermore, we have several methods for future validation of possible hits from this project. There are over 2500 children enrolled on the COG’s AALL0232 study (most > 10 years of age), and we have germline DNA for all these children. In some subgroups, the frequency of ON is 10-20% from this cohort, and genes and pathways which predispose to severe ON in younger children may harbor variants in older children as well. In addition, our colleagues at St. Jude have the only mouse model for glucocorticoid-induced ON (Yang et al) and we can test whether mice that are homozygous or heterozygous deficient with respect to the most affected genes are more susceptible to the phenotype than their wild-type counterparts. We will use controls (genomically-defined whites) from at least two sources: (1) from at least 20 ALL controls: patients with ALL whose DNA has been sequenced via whole exome or whole genome sequencing (average of 30X coverage) via COG’s TARGET project; and (2) the general population (available via public databases and via the PGRN—we presume that whole exome data from at least 100 normal control whites will likely be available from NHLBI and other funded projects). (1000 Genomes will be less valuable to us at low-coverage only, as several of our planned statistical evaluations include the presence of singletons, which this database is not powered to detect, due to its low-coverage design.) Assumptions: + Among 18,000 known genes, 15,000 genes will have 90% coding region investigated at high coverage + Total of 5500 genes annotated to 1030 canonical pathways (Reactome) + The level of protein coding changing (NS, stop, nonsense, and gene-inactivating indels) variants of 10,000/person as the baseline (in controls). + Moreover, we assume that the rare variants are more likely to be penetrant for this extreme phenotype, thus we assume that variants with an MAF < 5% are most likely to be important for this phenotype that occurs in ~2% of patients. We estimate there are 25% of NS variants with MAF < 5%, or ~2500 variants/person. + Using SIFT (http://sift.jcvi.org) and ALL controls, we estimated that 20% of all coding NS were predicted to be damaging to function. + Thus, we estimate that only ~ 500 low frequency variants/person are substantially deleterious (NS with damaging effects, splice site, stop, or nonsense codons, and with MAF <5%). Nonetheless, it will be a challenge to discern an increase in deleterious variants that accumulate in (a) one gene or a gene family or (b) a biological pathway to a substantially higher level in the cases compared to the controls. Based on whole exome sequencing of 20 ON cases and assuming the same number of controls, we will assess the number of genomic variants [coding synonymous, coding nonsynonymous (benign, deleterious by SIFT, stop and nonsense variants), and insertions/deletions disrupting the gene] and annotate these variants to genes. We will assess whether any one gene (of the 18,000 interrogated) has an excess of deleterious variants in the group of 20 cases vs 20 controls. We will also assess whether any of several candidate pathways have an excess of deleterious variants in the group of 20 cases vs 20 controls. Scheurer, Michael BAYLOR COLLEGE OF MEDICINE The Role of Mutational Signatures in the Development of Childhood Cancer Apr29, 2020 closed Genetic mutations that an individual acquires over a lifetime are termed somatic mutations as opposed to mutations that an individual is born with (i.e., germline mutations). Moreover, different mutational processes generate different combinations of somatic mutation types (e.g., point mutations and large scale mutations), termed ‘signatures’. To date, twenty one validated mutational signatures were identified in 30 different types of cancer; predominantly in adults. In this proposed study, we aim to investigate the role of mutational signatures in development of different types of childhood cancers in comparison with adults. In addition, we seek to determine whether certain inherited genetic variation may predispose an individual to acquire these mutational signatures. To this end, we propose to utilize data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET), Kids First, and The Cancer Genome Atlas (TCGA) to investigate several types of cancers such as sarcomas, leukemias, brain tumors, Wilms tumors, and neuroblastoma. Background: Cancer is the major cause of childhood mortality worldwide. The cancer incidence rate is approximately 150 cases per million children in the U.S. of which leukemia followed by central nervous system tumors are the most common malignancies (1). Despite the improvements in treatment of childhood cancer, the survivors have an increased risk of developing secondary cancers, cardiovascular disease, and other chronic illnesses in later life; thus a poorer overall health status and quality of life compared to individuals without a cancer history (2). Despite their prevalence and clinical importance, knowledge on the molecular characterizations of childhood cancer is limited (3) and there are currently little data available on the integrative role of somatic and germline genetic variations, in concert with exposomic factors, in development of childhood cancer (4, 5). Recently, by employing sequencing technology and developing algorithms to identify mutational signatures from catalogues of somatic mutations (6, 7), twenty one distinct validated mutational signatures including substitution mutations and indels were identified in 30 different classes of cancer (8). However, the role of these validated genetic signatures in the development of childhood cancer is largely unknown (9). Aims: The purpose of the proposed project is to investigate the importance of the mutational signatures in development of childhood malignancies compared to adults. By evaluating the validated and new mutational signatures in a variety of childhood malignancies, we aim to compare the somatic aberrations across different histological types of childhood cancer, as has been done in adult cancers. Additionally, we aim to examine the association between these validated somatic profiles and any significant germline variants identified from our separate genome-wide association studies of the same pediatric cancers. We also seek to investigate the prognostic characteristics of childhood malignancies in relation to mutational signatures. Methods: The prevalence of the validated mutational signatures will be examined across different histological types of childhood cancer. Moreover, the correlation between age at diagnosis and number of mutations attributable to each signature in each sample will be investigated to identify the signatures showing positive correlations with age at diagnosis. The p-values corrected for false discovery rate (FDR) will be reported and the details of statistical analyses are available at Alexandrov LB et al. (8).The association between validated mutational signatures and validated germline variants for each histological type will be examined and OR, 95%CI and corresponding p-values will be reported. Overall survival rate comparing patients with lower vs higher mutation rates will be reported. To conduct the proposed study, we seek to utilize raw genetic data generated from all currently sequenced paired tumor-normal cases available at the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) (acute lymphoblastic leukemia, acute myeloid leukemia, kidney tumors, neuroblastoma, osteosarcoma), Kids First (neuroblastoma, sarcomas), and The Cancer Genome Atlas (TCGA) (BRCA, LUAD, LUSC, UCEC, GBM, HNSC, COAD, READ, BLCA, KIRC, OV, LAML, LGG, KIRP, SARC, KICH, DLBC). In addition, epidemiological data including age at diagnosis, age at death, sex, race/ethnicity, relapse, and malignancy subtype will be requested. Significance: By considering the limited knowledge available on the molecular characteristics of childhood cancer, the proposed histology-specific studies on somatic mutational signatures will provide essential new knowledge of somatic aberrations contributing to the development of childhood cancer. The study will be a large collaborative study including data or samples from a number of large institutions and/or consortia. References 1. Steliarova-Foucher E, Lancet Oncology. 2017;18(6):719-31. 2. Phillips SM, CEBP 2015;24(4):653-63. 3. Guerreiro Stucklin AS, Curr Opin Pediatr. 2018;30(1):3-9. 4. Grobner SN, Nature. 2018;555(7696):321-7. 5. Pui CH, Nat Rev Clin Oncol. 2019;16(4):227-40. 6. Alexandrov LB, Cell Rep. 2013;3(1):246-59. 7. Nik-Zainal S, Cell. 2012;149(5):979-93. 8. Alexandrov LB, Nature. 2013;500(7463):415-21. 9. Ma X, Nature. 2018;555(7696):371-6. Scheurer, Michael BAYLOR COLLEGE OF MEDICINE The role of genetic ancestry on treatment outcomes and surivival in children with childhood cancers Jan06, 2021 closed Despite the great advances in the treatment of childhood cancers, poorer survival rates have been observed among Hispanic and Black children with these diseases. The etiology that drive these survival disparities are likely multifactorial. The role of genetic ancestry has been implied as possible etiology of these survival disparities in childhood acute lymphoblastic leukemia, however, has been unexplored in other childhood cancers. Using genetic data in the TARGET dataset, we propose in this application to determine the genetic ancestry of the patients in several TARGET datasets and explore the role genetic ancestry and the risk of relapse, treatment outcomes, and survival for several childhood cancers. Over the past several decades, there have been great advances in treatment of several childhood cancers. Despite this accomplishment, disparities in survival exist. For example, while overall survival rates for childhood acute lymphoblastic leukemia (ALL) are approximately 90%, children of Hispanic ethnicity carry the greatest burden with regards to both incidence of ALL and poorer survival when compared to non-Hispanic white (NHW) children. Even with intensified chemotherapy regimens, improved risk stratification, and enhanced supportive care, current overall survival for pediatric acute myeloid leukemia (AML) lags far behind that of ALL, approaching only 70%. Ethnic and racial disparities in survival outcomes have been less studied in childhood acute myeloid leukemia (AML) and other tumors such as bone sarcomas and high-risk neuroblastoma. While these disparities have been well documented, the underlying reasons are unclear. The role of genetic ancestry has been implicated as a possible cause of the disparities in survival outcomes seen in childhood ALL. For instance, genetic variation that co-segregates with Native American ancestry, which is associated with Hispanic ethnicity, has been implicated in ALL relapse. However, the role of genetic ancestry and risk of relapse has not been explored in other cancers. The aim of our study is to explore the role of genetic ancestry on risk of relapse, treatment outcomes, and survival among children with several childhood cancers including AML, neuroblastoma, and osteosarcoma. The TARGET (Therapeutically Applicable Research to Generate Effective Treatments) projects contain genomic profile of samples of several childhood cancers. We propose to leverage the genotype data of normal tissue available from the copy number array dataset within the TARGET dataset to determine genetic ancestry and explore its role on survival in the following tumors: ALL AML, neuroblastoma, and osteosarcoma. This line of research will begin to unravel the biological factors that contribute to the poorer outcomes persistently observed among Hispanic and black children diagnosed with a form of childhood cancer. Schiffman, Joshua UNIVERSITY OF UTAH Focal 22q11.22 deletions combined with IKZF1 alterations are associated with worse clinical outcome in acute lymphoblastic leukemia Nov03, 2017 closed People with cancer often have a missing or extra copy of certain parts of their genome, as well as other deviations. Whether these abnormalities of some genes are more prevalent in such individuals over others could help to better guide treatment. The finding of any such deviation would allow further research into the genes of interest as potential key players in the development of cancer. Prognostic biomarkers in childhood acute lymphoblastic leukemia (ALL) are vital for risk-stratification and intensifying therapy for children at high risk for remission induction failure or relapse.  Copy number alterations in genes such as IKZF1 and VPREB1 have been shown to correlate with poor outcome in ALL, highlighting genetic alterations as prognostic markers (NEJM 360:470, 2009, Leukemia 28(1):216-20, 2014).  A second focal deletion in chromosome 22q11.22, 200 kilobases (Kb) in length, occurs more frequently and in the same IGLL region as VPREB1 and is distinct from deletions associated with physiologic IGLL rearrangement.  Our aims are to further investigate this novel genomic lesion, 22q11.22, the pervasiveness of co-occurrence with IKZF1, and the prognostic impact. We hypothesize that double deletion of IKZF1 and the 22q11.22 region is associated with a statistically significant increased risk of relapse, which is independent of other known risk factors in HR ALL. If confirmed, this finding would provide support for inclusion of IKZF1 and 22q11.22 deletion testing on upcoming COG ALL treatment protocols. Hence we are requesting permission for access to Copy Number data available for TARGET: ALL Phase 1 and 2 datasets in dbGaP to explore our query. Our investigation intends to publish the prevalence of copy number alterations in a specific set of genes thought to identify a group of patients with a very poor outcome. We will use the data within dbGAP as a validation cohort for our current findings from five different cohorts of independent patients. Specifically, our analyses will assess prevalence throughout the genome of: i) Copy Number Loss ii) Copy Number Gain Schultz, Nikolaus SLOAN-KETTERING INST CAN RESEARCH Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Dec06, 2012 closed We are interested in understanding the genomic causes of pediatric cancers, at the gene and pathway level. We have previously developed multiple novel tools that identify recurrently altered pathways. These methods have been successfully used in multiple projects of the Cancer Genome Atlas (TCGA), which focuses on adult cancers. We now plan to apply these methods to the childhood cancers analyzed by the TARGET Project. Furthermore, we plan to identify commonalities and differences between the different pediatric cancers, with the ultimate goal of identifying targeted treatments options for patients with specific genomic alterations, independent of cancer type. We are interested in understanding the genomic causes of pediatric cancers, at the gene and pathway level. We have previously developed multiple novel tools that identify recurrently altered pathways in adult cancers (Netbox and MEMo). These methods take use somatic mutation data, somatic copy-number changes, mRNA expression levels, as well as DNA methylation measurements. They have been successfully applied to all projects of the Cancer Genome Atlas (TCGA). We now plan to apply these methods to the childhood cancers analyzed by the TARGET Project. Furthermore, we plan to identify commonalities and differences between the different pediatric cancers, with the ultimate goal of identifying targeted treatments options for patients with specific genomic alterations, independent of cancer type. This work will be shared in public scientific forums or published, and all rules regarding germline data will be followed. SEIDMAN, JONATHAN HARVARD MEDICAL SCHOOL Germline mutations that exacerbate chemotherapeutic Induction of heart disease May31, 2017 closed Many cancer patients, both pediatric and adult, receive anthracyclines (e.g. doxyrubicin also known as adriamycin) treatment to reduce their cancer burden. A small fraction of these patients develop doxyrubicin induced cardiomyopathy, a serious heart disease which can lead to death. We have found that adult cancer patients who develop doxyrubicin-cardiomyopathy were born with mutations in genes (unrelated to their cancer) that predispose them to heart disease. By studying the DNA sequence of pediatric cancer patients treated with doxyrubicin, who did and did not develop heart disease, we will learn if pediatric cancer patients, like adult cancer patients are pre-disposed to develop heart disease. If pediatric patients with DCM mutations are more likely to get cardiomyopathy than patients who do not have DCM mutations. If germline mutations predict susceptability of pediatric cancer patients to heart disease, this discovery will lead to important changes in the chemotherapy appropriate for these children. We propose to use the TARGET germline DNA sequence data to understand the genetic drivers of cardiotoxicity in pediatric AML patients. We have shown that adults with breast cancer treated with anthracyclines develop cardiomyopathy (a life threatening heart disease) if they have mutations in specific genes that predispose to heart disease. We will test the hypothesis that mutations in the same genes predispose children to heart disease. Specifically, we will screen for recurrent germline missense and frameshift mutations in genes known to be associated with cardiomyopathy in the general population with the hypothesis that patients carrying such a mutation will be at increased risk of anthracycline associated cardiac toxicity. Data on cardiac toxicity events will be obtained from the Children’s Oncology Group. Dr. Aplenc, the PI for this data request, was the study chair for the AAML0531 trial and has actively curated the reported cardiac toxicities on the AAML0531 trial including the assembly of a data set of approximately 1,700 shortening/ejection fraction data points from patients with reported cardiac toxicity on the AAML0531 trial. Recognizing that children with specific germline mutations require a different chemotherapeutic regimen than other children will change cancer treatment in these children. The data requested here will not be combined with other datasets. The proposed studies will not produce any increased risk for participants in this study. Sekiguchi, Masahiro UNIVERSITY OF TOKYO The role of ETV6 alteration as a leukemic driver in acute myeloid leukemia Mar04, 2021 expired ETV6 is a protein that plays a key role in the formation of blood cells. Alterations of ETV6 gene is rarely but recurrently observed in acute myeloid leukemia (AML) in children. However, how ETV6 alterations in AML can cause leukemia remains unclear. Thus, we are planning to use the TARGET AML data, which include RNA sequencing and whole-genome sequencing data of a large cohort of AML. This will enable us to select and analyze AML cases with ETV6 alterations and to address how the alterations affect the expression profile of the tumor. ETV6 is one of the most important transcription factors involved in hematopoiesis. ETV6 gene alterations are rarely, but recurrently observed in pediatric cases of acute myeloid leukemia (AML). However, how ETV6 alterations in AML play a role as a leukemic driver remains unclear. Thus, we are planning to utilize the TARGET AML data, which include RNA sequencing and whole-genome sequencing data of a large cohort of pediatric AML. This will enable us to select and analyze AML cases with ETV6 alterations and to address how the alterations affect the expression profile of the tumor. Sese, Jun NAT'L INST ADVANCED INDUSTRIAL/SCI/TECH Identifying subtypes of pediatric diseases associated with biomarkers from pediatric clinical NGS Dec28, 2015 closed We propose to use the TARGET data to find new therapeutic targets by learning more about the germline and somatic genetic changes in the genome of pediatric cancers. Our primary target is neuroblastoma, which has a wide range of clinical presentations and responses to treatment. Currently, it is difficult to discriminate patients who will be cured by treatment from patients at the time of diagnosis. Therefore, it is important to find subtypes of the cancer and associate the subtypes associated with effective therapy methods. For this purpose, we will detect subtypes of the diseases from genomic sequences and gene expressions, and predict molecular markers of the subtypes. This study will help to identify candidates for surveillance to enable early diagnosis, as well as more informed therapeutics. Because genomic variations have a high effect on disease development and progression in pediatric diseases, genome-wide association studies (GWAS) have been applied to characterise the causal mutations and molecular mechanisms of the diseases. However, for example when we consider about neuroblastoma, it is still difficult to predict the responses to a selected chemotherapy. In this research, we tackle to use synthetic lethal (SL) experiments, a promising strategy for cancer therapy, to find novel therapeutic target genes because the SL approach has found poly ADP-ribose polymerase inhibitor, olaparib, for treating BRCA-deficient ovarian cancer. To select the target genes for neuroblastoma, we integrate SNPs, copy number variations, gene expression profiles, and clinical outcome in the TARGET data and detect sub-groups of patients. We will then find the synthetic lethal combinations of genes on each subtype. The genes would lead to the development of novel therapeutic targets and improve our understanding of neuroblastoma. Sexl, Veronika MEDICAL UNIVERSITY OF VIENNA Novel role of CDK6 as transcriptional regulator in ALL Apr25, 2016 rejected We intend to understand processes that drive leukemia formation in children. We have uncovered that one molecule that drives the growth of cells has unexpected functions that may be of a great relevance for leukemia in children. CDK6 is particularly high expressed in one form of leukemia that is currently a death-sentence for the patient. We now want to better understand what this molecule CDK6 is exactly doing in childhood leukemia in order to develop novel therapeutic concepts. We are convinced that we can exploit this novel role of the CDK6 molecule to improve therapeutic interventions. The G1 cell cycle kinase CDK6 has long been regarded as a redundant homologue of CDK4. Although the two kinases fulfill very similar roles in cell cycle progression, they are different in their tissue-specific functions and contributions to tumor development (Tigan et al., 2015). Recently, CDK6 has been demonstrated to be directly involved in transcription in leukemic cells, and in hematopoietic stem cells where it associates with transcription factors at multiple locations in the genome (Scheicher et al., 2015; Kollmann et al., 2012; Handschick et al., 2014). These novel transcriptional functions of CDK6 are partially kinase independent and not shared with CDK4. The vast majority of our studies has been performed using Bcr/Abl as the driving oncogene. The Bcr/Ablp185 oncogene transforms lymphoid cells and resembles B-ALL. It is found as causing chromosomal translocation in pediatric leukemia (ALL) where it is and associated with a deleterious prognosis. Interestingly, CDK6-/- mice display a significantly delayed Bcr/Abl induced disease latency in an ALL model. The delayed disease onset and the mitigated progression in CDK6-/- mice was accompanied by a CDK6-dependent transcriptional program within the leukemic cells and we confirmed the presence of CDK6 at promoter sites of genes essential for leukemogenesis by ChIP- experiments and ChIP-Seq. At this point we need to show the relevance of our findings in the human system. Therefore we intend to study RNA-sequencing data from ALL patients to stratify cohorts according to the CDK6 expression and Bcr/Abl status to validate target genes involved in leukemogenesis. This procedure will allow us to (i) correlate CDK6 expression with clinical outcome, (ii) define a set of CDK6 co-regulated genes (iii) screen for mutations in CDK6 and its regulated genes. We fully respect the limitations associated with the use of TARGET datasets. Genes that are identified in TARGET will guide further experimental strategies and shall be validated in in vivo mouse leukemia models. Thereby our studies will be restricted to genes of relevance in human patients. SHAHANI, SHILPA BECKMAN RESEARCH INSTITUTE/CITY OF HOPE Incidence of germline cancer mutations in pediatric acute myeloid leukemia (AML) Nov12, 2021 expired Technical advances have advanced our ability to detect and recognize pathogenic germline mutation, leading to increased incidence of germline cancer mutations among pediatric patients with cancer. This inquiry is to re-assess the incidence of germline cancer mutations in pediatric AML specifically, identify clinical characteristics that are associated with germline pathogenic mutations, and assess the impact of germline mutations on survival. We propose to use the TARGET dbGAP data to investigate: a) the incidence and types of inherited (germline) cancer mutations in pediatric AML; b) the frequency pathogenic mutations that are identified in somatic tissue are also present in germline tissue, stratified by mutation; c) if clinical characteristics are enriched among patients with germline cancer mutations (ie higher incidence of chloromas or CNS involvement) in the most commonly identified mutations; and d) survival outcomes (event-free survival (EFS) and overall survival (OS)) between those with germline cancer mutations and those without. We will interrogate DNA sequencing of normal tissue samples from pediatric patients diagnosed with AML to determine the incidence of germline cancer mutations. Concurrently, we will assess somatic gene expression, copy number, miRNA, sequencing and methylation data to characterize tumor tissue; we will match somatic and germline results to determine if there is a somatic signature that can suggest the presence of an underlying germline mutation in AML. Additionally, we will assess the somatic-germline concordance of pathogenic mutations, stratified by mutation, which can help inform decision-making to pursue germline testing based on a somatic mutation profile. The requested data set contains clinical data including EFS, OS; normal DNA samples; and tumor samples with somatic gene expression, copy number, miRNA, sequencing and methylation data. Review of the open dbGAP database reveals 218 of 476 records that have diagnostic tumor sample, matched normal DNA, and full clinical data. With a conservative estimate of 10% incidence of germline cancer mutations, analysis of this dataset will identify 22 subjects. This will be insufficient for statistical conclusions, but this pilot data will support larger scale inquiry of an expanded Children’s Oncology Group dataset. Additional metrics we plan to explore in a larger cohort include: clinical patterns that may indicate a germline mutation; sensitivity of pedigree analysis for germline genetic testing; and clinical outcomes such as toxicity, EFS and OS between those with germline cancer mutations and those without recognized mutations. Sharp, Phillip MASSACHUSETTS INSTITUTE OF TECHNOLOGY Investigation of splicing change in neuroblatoma Jul11, 2013 closed Most human genes are spliced and mutations disrupting splicing is one of the major causes of human disease and cancer. Neuroblastoma is one of the most common pediatric cancers yet the role of splicing regulation in the initiation and progression is not well studied. In this project we propose to study the role of splicing in neuroblastoma. The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative has generated matched genome and transcriptome sequences for multiple neuroblastoma patients. In this project, we propose to analyze the genome data to identify genetic changes in splicing machinery, the transcriptome data to measure global changes in splicing patterns, and the connections of these changes to neuroblastoma phenotype. Sheffield, Nathan UNIVERSITY OF VIRGINIA Identifying variants of regulatory significance in pediatric cancer May25, 2018 expired Tumor DNA from pediatric cancers has been sequenced to identify genes that are mutated in the cancer. These studies have already identified several genes that are more frequently mutated in cancer progression. However, most of these studies have focused on protein-coding DNA. But most of the genome is not protein-coding, leaving much of the genome under-analyzed. In this project, we will focus specifically on non-protein-coding mutations. By combining data from multiple sequencing studies, together with new analytical methods, we will boost statistical power to identify rare, but significant associations outside of protein-coding DNA. Our goal is to identify regulatory variants and gene pathways that contribute to either susceptibility or progression in rare pediatric cancers. We are particularly interested in analyzing variants found in regulatory regions that are of particular relevance to a given biological system, but we will also explore protein-coding variants and their relationships among gene pathways. We are primarily interested in Ewing sarcoma, but will also seek to understand variants in other pediatric cancers, including rhabdomyosarcoma and osteosarcoma. This pan-cancer approach will allow us to compare and contrast the different genetic signatures in these related cancers. We will use this data to study regulatory variants that contribute to Ewing sarcoma risk, as well as regulatory variants that accompany Ewing sarcoma carcinogenesis. We plan to collect all available Ewing sarcoma genomic data for a combined meta-analysis. By combining datasets from different sequencing studies, we will increase statistical power to identify rare genomic variants. We also intend to integrate this data with other epigenomic data from both cancer and normal tissues as a way of annotating genetic variants. We will use this data to assess genetic variation particularly in regulatory or pathway-based approaches. Our approach will require the use of pediatric data from various cancers. We do not anticipate that our data integration will increase risk to any participants. Shekar, Mamatha ILLUMINA, INC. Integration of genomic study results of childhood cancers with orthogonal data types Mar29, 2011 closed Our goal is to identify genomic factors associated with childhood cancers through meta-analysis of all the publicly available data from genome-wide scans. The collective power of meta-analysis will generate a comprehensive list of genomic factors associated with childhood cancers, which will enhance researchers’ understanding of these forms of cancer. We request access to the datasets in dbGaP studying childhood cancers. These datasets will be integrated into the Illumina/NextBio platforms which provide analytic tools to undertake correlation, association and meta-analysis of molecular and clinical data. Comprehensive integration of molecular and clinical data from dbGaP with other public/controlled-access resources such as GEO, ArrayExpress, caBIG, TCGA, ICGC, EGA, etc. will provide large cohorts to conduct analysis. Advanced analysis will enable discovery of new relationships between genomic elements and phenotypes in real time. Illumina/NextBio will abide by all use restrictions for the datasets. The raw data will be used for internal analysis and accessible by the PI only. We will provide limited processed molecular information according to the standards used in the dbGaP Genome Browser such as chromosomal position, genes in nearby regions and p-value associated with a genomic element. Our platform usage terms and conditions state that this information may not be used to determine the study participants' identities. All data will be kept private and protected by intrusion detection systems, SSL encryption, IP-level restrictions and proprietary security products. The raw data, including the back-ups, will be destroyed after the access period is over. We will acknowledge the contributing investigators on our website, oral and written presentations, disclosures and publications resulting from any analyses of the data. Per our standard policy, we will include the dbGaP accession information with the relevant version number of the analyzed dataset(s). Our goal is to enable identification of genomic factors associated with characteristics or attributes of childhood cancers through meta-analysis of all publicly available genome-wide scans of individual patient samples. Shi, Kevin STRAND THERAPEUTICS INC Cell Type Sensing via miRNAs for Targeted mRNA Therapeutic Payload Expression Jan19, 2023 approved Strand Therapeutics is developing next-generation mRNA therapies for treating cancer. When these therapies are given to the patient, a key concern ensuring the dose is high enough to kill the cancer cells but low enough to avoid toxicity-chemotherapy side effects. Strand’s mRNA therapeutics propose to achieve this by incorporating “circuit” elements which detect markers inside the cells they enter, producing the cancer-killing payload in just cancer cells and safely ignoring offtarget cells. The proposed research seeks to build these cancer and offtarget cell type marker profiles. The TCGA and TARGET datasets contain a wealth of RNA sequencing information from patients with the cancer types that Strand is interested in treating. Some of this information is available in unrestricted databases, but the more sensitive raw data, which requires careful handling to protect the patients they came from, is crucial for updated, comprehensive, and consistent determination of cancer and offtarget markers. This is crucial for developing robust circuits and drugs – those that work broadly in different patients and tumors, and at different stages of disease and treatment. Strand Therapeutics is developing programmable mRNA-based therapeutics that precisely express payloads in targeted cell types to improve efficacy and reduce toxicity. One proposed mechanism for detecting cell types is the use of miRNA-mediated mRNA degradation/translational inhibition. Different cell types, including cancer cells, express different miRNA profiles which can be differentially detected by synthetic biology circuits delivered mRNA strands and ultimately effect different responses. The objective of the proposed research is to build in-depth miRNA profiles for the breadth of patients and cell types in cancer indications in the TCGA and TARGET studies. mRNA profiles provide further cellular context for examining interactions with delivered mRNA therapeutics. Access to the raw underlying reads is crucial for miRNA and mRNA profiling using the most updated data processing pipelines, more comprehensive miRNA databases and reference transcriptomes, and consistent comparisons with other data sources (internal and public) for which raw reads are available. This work does propose to combine the data in dbGaP with other datasets for analysis, but primarily for examining population and sample source variability. This is crucial for developing robust circuits and drugs – those that work broadly in different patients and tumors, and at different stages of disease and treatment. For specific access to the TARGET dataset, we plan to use miRNA and mRNA profile data to design and validate circuits that target and detarget relevant cell types in childhood leukemias, with the goal of making mRNA therapeutics that transfect and express CAR T-cells in vivo, without immunodepletion and the ex vivo production process. We do not anticipate additional risks for any participant because no further information is sought or collected for any individual (beyond what is available in the TCGA and TARGET datasets). This overall study proposes to examine RNA-seq data associated with cancer indications of interest in tumor and healthy tissue, process raw reads into transcript counts, and perform statistical analyses and modeling to examine behavior and inform mRNA therapeutic designs. In the analysis, differential expression between transcripts among cell types of interest will inform detector circuit designs, which will then be integrated into broader mRNA strand designs. Computational pipeline development and validation is anticipated but not the primary purpose of the study. miRNA and mRNA profiles will be used to predict and validate desired ontarget and offtarget circuit behaviors in cell types covered in the cancer types in the TCGA and TARGET studies as these cover indications Strand is pursuing. Shlien, Adam HOSPITAL FOR SICK CHLDRN (TORONTO) Kids Cancer Sequencing Program (KiCS) Feb12, 2020 approved Childhood cancer is the leading cause of disease-related death among children in North America. With the majority of cancer research being conducted in adults, the mechanisms and timing of genomic progression in childhood cancers are poorly understood, negatively impacting patient care and hampering the prediction of how tumors evolve, respond to therapy, and recur. To increase our understanding of childhood cancers, our research focuses on mutations occurring in pediatric cancers, which we have shown are unique from those in adult cancers. By analyzing these pediatric cancer-specific datasets using our clinical and translational laboratory’s sophisticated detailed algorithms, we aim to uncover novel patterns, structural variations, mutational signatures, and drivers of childhood cancer to help improve diagnoses, prognoses, and therapeutic approaches. Our research focuses on mutagenic processes of pediatric cancers, the leading cause of disease-related death among children past infancy in North America. With the majority of research conducted in adults, mechanisms and timing of mutagenesis in childhood cancers are poorly understood, negatively impacting patient care with difficulty predicting how a tumor evolves, responds to therapy, and whether it will recur. While cure rates continue to improve for some pediatric cancers, outcomes of young patients with metastatic, refractory, or relapsed disease are poor. Long-term survival is <10% beyond two years, and many childhood cancer survivors suffer from long-term ramifications of therapy, e.g. relapsed or secondary cancers. We have shown that childhood cancers, such as Ewing sarcoma and osteosarcomas, are unique from common adult cancers with different drivers and signatures. Yet, the majority of newly discovered pediatric cancer drivers and subtypes are from studies on samples from initial diagnoses, thus incomplete knowledge of intrinsic or therapy-influenced pediatric malignant evolution persists. The underlying processes of mutational signatures also remain largely unknown (>50% unknown etiology). Aiming to resolve these, we have analyzed the TARGET dataset on our secure, protected internal high-performance computing cluster, independent of other datasets (e.g. TCGA, that are adult-based) consistent with TARGET data use limitations. Following initial analyses, the project has been expanded, and other datasets analyzed, for example, to examine the differences in RNA seq results from paraffin-embedded compared to fresh frozen samples. A new aim within the scope of takes a deeper look at sarcoma and Ewing sarcoma. Overall, our purpose is to examine pediatric cancers using our established pipelines and algorithms, at diagnosis, progression and recurrence, through analyses of mutational signatures, events (from single substitutions to large rearrangements) and event rates, investigations of chromatin instabilities and open chromatin ranges, as well as transcriptional consequences of somatic mutations. The data will not be used for method development; however, they may be analyzed using our ever-learning neural network algorithms, which improve in clustering accuracy with additional data. Similarly, we aim to translate cancer genomics outcomes to pediatric oncology clinics, emphasizing integrative analyses, by providing insights that could aid in the diagnosis, prognosis, or therapeutic approaches. Findings will be shared through scientific conferences presentations and publications in peer-reviewed journals. Shmygelska, Alena HUMAN LONGEVITY, INC. Characterization of cancer drivers in adult and pediatric tumors. Jan26, 2018 closed Pediatric and adult cancer progression is characterized by accumulating driver and passenger mutations. Driver mutations are those that provide selective advantage to cancer cells, and are of primary importance for understanding cancer prognosis and targeted therapies. Currently, we need better methods for both personalized driver detection in cancer and building better classifiers that elucidate cancer biology. We propose to use dbGap data to address the need, develop and test new algorithms and publish our findings to advance our understanding of cancer progression and result in the development of better targeted therapies for childhood and adult cancers, we request access to three independent datasets to develop and test methods for discovering cancer driver mechanisms independently in each setting. Introduction: The oncology program at Human Longevity Inc (HLI) is dedicated to comprehensive genomic analysis of cancer cases. Our evaluation includes: full genome sequencing of the germline and tumor genomes to identify inherited cancer susceptibility alleles and somatic variation in the tumor; RNA sequencing for validation of genomic results and tumor characterization; and an integrated analysis of the genomic information in the context of the patient’s cancer type and treatment history. Objectives: We strive to advance precision oncology through multiple lines of inquiry into the biology of adult and pediatric tumors. In this context, we are working on machine learning methods to refine models for cancer driver prediction in both adult and pediatric tumors, identification of cancer tissue of origin for metastatic tumors, and microsatellite instability sub-class prediction for tumors. Study designs: TCGA dataset: We propose to (1) characterize cancer drivers in each TCGA cohort combining novel models for background mutation detection that take into account sequence and structure conservation, (2) develop accurate machine learning methods for tumor sub-type classification focusing on the analysis of features defining tissue of origin, (3) increase accuracy and validate methods for microsatellite instability classification (this is especially relevant for uterine, colorectal, endometrial and stomach cancers). Foundation One dataset: We propose to (1) characterize cancer drivers in Foundation One cancer cohorts combining novel models for pathogenicity detection using gene panel setting, (2) assess micro-satellite status prediction algorithms in the panel and exome setting. TARGET dataset: We propose to (1) investigate and characterize structural and copy number cancer drivers in the TARGET cohorts using orthogonal approaches, (2) benchmark structural variant callers with the goal of developing better methods. All of the dbGAP datasets will be used independently, data will not be combined. Algorithms and findings will be made publicly available and published in the scientific journals. Shoemaker, Robert IGNYTA, INC. Using TARGET data to define prevalence of oncogenic drivers in pediatric cancers Aug28, 2017 closed Ignyta develops targeted therapeutics that are able to treat cancer in patients whose tumors harbor genetic alterations, namely gene fusions. These genetic alterations are relatively rare events; thus, identifying the subset of patients most likely to benefit from targeted treatments is challenging. Having a more complete understanding of the frequency of these mutations across different cancer types will enable us to narrow the search and position our assets for maximum impact to the cancer community. We are proposing that access to a large, well-defined dataset of molecular information, such as The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative would enable us to more precisely target and provide additional treatment options for pediatric cancer patients. Ignyta, a precision medicine oncology company, has an asset entrectinib, which has shown promising activity in Phase I (79% ORR, n=24) and continued promise in now Phase II clinical trials of adult patients with extracranial tumors harboring target gene fusions involving ROS1, NTRK1/2/3 or ALK genes. A separate clinical trial is underway to evaluate entrectinib in children and adolescents with recurrent or refractory solid tumors that either harbor target gene fusions or may be driven by NTRK expression (e.g., NTRK2 expression in neuroblastoma). Under compassionate use, an 18 month-old infant with ETV6-NTRK3 fusion positive congential fibrosarcoma was treated with entrectinib resulting in a confirmed RECIST response accompanied by improvement in disease sequelae. The Ignyta research team has also published preclincal data supporting potent anti-tumor activity of entrectinib both in-vitro and in-vivo in NTRK fusion driven AML models. Due to the rare nature of entrectinib’s targets, it is critical to define the frequency of these fusions across a spectrum of cancer types and identify populations that will most likely benefit from treatment. The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative is one of the richest sources of molecular data available to the pediatric oncology research community. Unfortunately, publicly available processed TARGET data provide limited ability to define gene fusion events. Our computational biology team has developed a robust and sensitive method to identify gene fusions relevant to our target molecules using raw RNA sequence files as input. We are requesting access to i. TARGET’s raw RNA sequencing data; and ii. TARGET’s raw exome sequencing data to screen more than 4,000 pediatric cancer specimens. These data will enable us to identify gene fusions and gene expression targeted by Ignyta’s compounds and determine whether other known oncogenic alterations co-occur in these samples. The results of this study will be used to drive current and future clinical programs to better identify and serve pediatric cancer patients who could potentially benefit from our precision therapeutics. SIDOW, AREND STANFORD UNIVERSITY Comprehensive subtyping of pediatric tumors Jul25, 2018 closed Pediatric cancers can lead to widely different outcomes in patients because they are very different from each other at the level of DNA. Therefore, we need to separate pediatric cancers into subtypes based on the features of their DNA. Here, we aim to use novel computational methods on many different types of genetic data collected by the TARGET project, in order to identify subtypes of pediatric cancers that have different origins and behavior from each other. This will help us understand the unique evolution of these cancers, and may also help us better predict the course of pediatric cancer in individual patients, hopefully leading to better treatment. Survival outcomes for pediatric cancer patients can vary greatly even within tumors initiated in the same tissue. In order to achieve effective personalized treatment for these cancers, we need to separate them into subtypes based on their molecular features. In this project, we will exploit recent efforts to produce large amounts of multidimensional genomic ('multi-omic') data on pediatric cancers by the TARGET project. By applying recently developed integrative clustering approaches to this dataset, we will identify distinct subtypes of pediatric tumors based on integrated patterns of gene expression, point mutations and copy number changes. Using clinical data, we will test whether these subtypes are different in terms of patient survival. Our aim is to identify subtypes of pediatric cancers leading to improved understanding of the unique evolution of these cancers, as well as informing the classification and prognosis of pediatric cancer patients. This work is therefore unique to pediatric cancers and cannot be carried out using datasets from adult patients. We do not plan to combine this data with other datasets. Sinha, Amit BASEPAIR, LLC Understanding the molecular basis of Osteosarcoma tumorigenesis Oct10, 2018 closed Osteosarcoma, a rare but deadly children's cancer, is generally associated with massive genomic instability, making it problematic to identify driver genes. We aim to utilize the vast information stored in TARGET to confirm predicted oncogenes, tumor suppressors and molecular pathways identified in our tumor models from model organisms as molecular targets for therapy. We will test this hypothesis and simultaneously conduct student-training exercises by re-examining raw TARGET RNA sequencing, methylation, mutation, and microRNA data and performing a broader analysis of perturbed genes. We will also attempt to correlate gene expression levels with other quantifiable genetic markers. The intended use is purely academic, and key findings will be submitted for peer review and ultimately publication. 1. Objective: Our project involves understanding the underlying molecular mechanisms of osteosarcoma to develop better therapeutics. To accomplish this, we are searching for molecular targets by identifying oncogenes, tumor suppressors and evaluating the presence of novel transcripts in osteosarcoma. This involves searches in RNASeq, mutation, copy number and methylation data. We would like to search TARGET data to confirm predicted genes and pathways involved in osteosarcoma development for follow up experiments and identifying molecular targets. We will test this hypothesis and simultaneously conduct student-training exercises by re-examining raw TARGET RNA sequencing data, re-determining the mutation calls, and performing a broader analysis of perturbed genes. We will also attempt to correlate expression levels with other quantifiable genetic markers. 2. Study Design: Upon approval, we will download raw RNA sequencing data and analyze Osteosarcoma data. If available, RNA sequencing data from matched normal tissues will be analyzed in parallel. In addition to using preexisting software tools such as TopHat2, deFuse, and Trinity, we will use methods already developed in-house to identify gene sets involved in osteosarcoma development. 3. Analysis Plan: We are interested in gene sets and their correlating features. We would like to test our data analysis pipeline to determine the best candidate genes and gene sets (pathways) for follow-up laboratory experiments. Appropriate statistical tests will be applied such as two-tailed student's t-tests, Fisher's exact tests for enrichment analyses with corrections for multiple hypothesis testing. 4. Explanation for how the proposed research is consistent with Use Restrictions for the requested dataset(s): We will not distribute the information to any other party beyond our academic group. Our studies will not require subject identification (i.e., all original donors will remain anonymous). We will only focus on pediatric osteosarcoma as stated in Use Restrictions. The intended use is purely academic and predominantly for training purposes. Key findings will be submitted for peer-reviewed publication. Sinnett, Daniel SAINTE-JUSTINE UNIVERSITY HOSPITAL CTR Genetic and genomic determinants of pediatric cancers May31, 2017 approved Understanding the molecular signature that drives the development and response to treatment of pediatric cancers is the key to development of personalized approach to patient management and care. We identify multiple biomarkers that help understand the aetiology of childhood ALL in the past 20 years. Our goal is to integrate that knowledge to a classification model to better discriminate between the different subtypes of ALL. We also begun a personalized targeted therapy study for which we collected samples from different types of solid tumours. The dataset from TARGET and CGCI will provide us with greater statistical power to better understand pediatric cancers. Childhood cancers constitute a heterogeneous group of rare diseases. In Canada, there are about 10,000 children living with cancer today and about 1500 new cases are diagnosed each year. The overall survival rate in Canada approaches 82% for children less than 18 years of age, but the prognosis for those with refractory, relapsed or metastatic (‘hard-to-treat’) disease is grim, and progress has stagnated over the last three decades. New tools that enable better individual tumor characterization and classification are required to improve patient care and outcomes, and reduce the adverse effects of cancer and its treatments on children. Key to implementing personalized approaches to patient management and care are the identification of genes and pathways that drive oncogenesis and modulate drug response, as well as the development of reliable biomarkers for disease prognosis and treatment. To achieve this, we need to include all samples available to delineate the underlying causes of cancer, increase the reliability of tumour classification tools and lead to targeted therapy specific to the patient tumour’s genomic profile. Toward this goal, we have a cohort of more than 250 matched tumour-normal whole exome sequencing (WES) and 250 whole transcriptome sequencing (WTS) tumour samples of Acute Lymphoblastic Leukemia (ALL) patients and 250 matched tumour-normal WES and WTS tumour samples with different types of solid tumours. Our dataset is stored on CHU Sainte-Justine’s internal clusters. Data from TARGET and CGCI would be a major addition to our local dataset. Main objectives: 1) to further analyse ALL samples to better classify the different subtypes and identify potential expression signatures; 2) to broaden this approach to all types of pediatric cancer in our cohort to determine if there are similarities between the different types of tumours; and 3) to build a knowledge base to accelerate discovery and the translation in precision medicine. The use of the data will be limited to health/medical/biomedical purposes, and will not include the study of population origins or ancestry. The findings will be disseminated via publications and conferences. Sirota, Marina UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Profiling the Immune Repertoire in Pediatric Cancers Feb04, 2021 closed While childhood cancers are relatively treatable, harsh therapies can have long-term detriments and some patients do not respond to treatment or relapse. Immunotherapy is state-of-the-art in oncology and is a possible means of improving survival for childhood cancers while reducing side effects. The use of immunotherapy can be improved with an understanding of how the immune system naturally responds to childhood cancers. Information that we hope to define includes the composition of immune cells at tumor sites among different tumor types, as well as the diversity & expansion of B/T cell clones. Connecting these characteristics to tumor type and clinical outcomes may reveal which diseases are particularly amenable to immunotherapy and whether specific vs diverse adaptive immunity is preferable to improve patient responses. Similar work using TARGET has been performed for neuroblastoma, rhabdoid tumor, and acute myeloid leukemia, but such studies focused on those diseases in isolation. We hope to add to this knowledge base by evaluating TARGET diseases together and comparing trends between cancer types. Immunotherapy represents a promising approach to treat childhood cancers while sparing patients from the long-term effects of existing therapy regimens. Efforts to understand immunology in the tumor microenvironment have been undertaken for massive datasets from TCGA (PMID: 29628290) to guide our understanding of how to best apply immunotherapy to cancers. Previous work using TARGET data has explored the immune repertoire in neuroblastoma (PMID: 29784674), rhabdoid tumor (PMID: 31708418), and acute myeloid leukemia (PMID: 31771646). We aim to expand upon these disease-specific studies by comparing characteristics of the immune repertoire between the five non-hematologic cancers with RNA-seq data available (WT, CCSK, RT, NBL, and OS) in the TARGET database. Using computational approaches we will quantify immune cell populations within each of these disease types and investigate whether certain childhood cancers appear to produce more potent immune responses than others. We will correlate such differences to measures such as event free survival to see if differences in immune repertoire manifest in the clinic. Such an analysis may reveal which disease types may be more responsive to immune-based therapy than others. Furthermore, we are interested in understanding whether childhood tumors are capable of driving specific adaptive immune responses (indicated by selective expansion of few clonotypes) or if they drive a diverse response (indicated by higher TCR/BCR diversity), and how these responses map to clinical outcomes. TCGA data has indicated that the former may be more important in many tumor types (PMID: 29628290). Finally, we have an interest in understanding the immune repertoire in hematologic childhood cancers including ALL, as such diseases are intrinsically immune-dysregulated. Since similar analysis has been performed recently for AML including samples from TARGET (PMID: 31771646), we are interested in comparing changes in the ALL immune repertoire to those in AML. Skok, Jane NEW YORK UNIVERSITY SCHOOL OF MEDICINE Transcriptional regulatory interactions of rearranged IGH-CRLF2 B-cell acute lymphoblastic leukemia pediatric patients Oct25, 2018 closed Acute lymphoblastic leukemia is a type of blood cancer and the most common cancer found in children. In general, patients who acquire leukemia or any other type of cancer, obtain the disease as a result of changes that occur to their DNA, referred to as mutations. We are interested in a specific mutation found in 5-15% percent of leukemia patients and associated with treatment failure and low overall survival rates. Therefore, there is a need to develop new drug therapies that will improve patients’ outcomes. We aim to better understand the genetic impact of this specific mutation and gain insight on how to refine therapies for this distinct group of patients. Acute lymphoblastic leukemia (ALL) is one of the most common childhood cancers. Although, significant progress has been made in treating children with ALL, relapsed patients continue to have a poor prognosis with an overall survival rate of 30%. There are still several challenges in improving ALL therapy; several of the drugs used to treat patients are extremely toxic, and patients within particular subtypes are less responsive to treatment. Therefore, our current aim is to better understand the genetic alterations driving leukemogenesis and identify new potential targets associated with these alterations. Alterations involving the overexpression of the cytokine receptor-like factor 2 gene (CRLF2), is found in 7% of patients with B-cell precursor ALL (B-ALL) and in 60% of pediatric B-ALL patients with Down syndrome. CRLF2 overexpression is associated with poor prognosis. Overexpression of CRLF2 in B-ALL can occur from a reciprocal translocation that merges CRLF2 to the IGH locus on chromosome 14 (IGH-CRLF2), which is commonly found in adolescents. Understanding the effects of CRLF2 overexpression in young patients is essential to designing more targeted therapies and improving patient outcomes. Here, we aim to identify therapeutic targets by investigating potential master regulators of the CRLF2 associated gene signature. To do so, we aim to construct a B-ALL specific transcriptional regulatory network. Transcriptional regulatory networks describe the regulatory relationships between transcription factors and their gene targets. Defining this network will enable us to better understand the regulatory hierarchies involved in B-ALL and illuminate possible targets in IGH-CRLF2 translocated pediatric leukemia. The patient B-ALL RNA-sequencing data we are requesting is necessary for the assembly of the regulatory network, as we require hundreds of gene expression samples to construct the network. The use of the requested dataset will enable the identification of targets for the treatment of IGH-CRLF2 associated leukemias, and provide a global B-ALL specific regulatory network that could be used as a resource for future studies. Skok, Jane NEW YORK UNIVERSITY SCHOOL OF MEDICINE Determining the contribution of non-coding elements to transcriptional regulation and chemoresistance in ALL Oct12, 2018 closed Patients get cancer due to changes to the DNA (genome) in their cells. These changes—also known as mutations--lead to overgrowth of a particular cell, but the cells in cancer often have hundreds or thousands of other random mutations due to their rapid growth. However, each person with cancer can have a unique set of mutations, different from most other patients. Even within a single patient’s cancer, different cells will likely have many different mutations. If we can understand the unique set of mutations in an individual’s cancer, we can sometimes predict whether a set of chemotherapy drugs will work, or if a different set would be more effective. Our project will benefit from knowing the whole range of mutations within a patient’s cancer to see if previously-unknown parts of the genome can predict whether a common drug used to treat leukemia in children should be used, or if an alternative should be prescribed to that patient. The fifth most common malignancy in children is ALL recurrence after treatment. While children with primary ALL have a 90% chance of reaching 5 years of event free survival, patients who relapse have a far worse prognosis. We aim to minimize the chances that pediatric patients with ALL experience relapse after treatment. ALL maintenance treatment usually includes thiopurines such as 6-mercaptopurine (6-MP). Genes in the mismatch repair (MMR) pathway can cause resistance to thiopurines in ALL, but also in other cell types and cancers. In addition to being important for ALL patients, germline deletions or somatic silencing of MMR genes causes Lynch syndrome. To better understand how non-coding regulatory elements could contribute to chemoresistance in ALL, we have performed a CRISPR saturating mutagenesis screen around two mismatch repair genes: MSH2 and MSH6. In the screen, we use sgRNAs targeted around a 2Mb region containing MSH2 and MSH6. Cells are treated with thiopurines to select for sgRNAs that have mutated regulatory elements of MSH2 or MSH6, causing downregulation of the genes. Mutations in regulatory elements for MSH2 and MSH6 could predict relapse in patients treated with thiopurines; therefore, patients with these mutations should be prescribed alternative maintenance treatment. We aim to determine if regulatory elements for MSH2 and MSH6 found in this screen are relevant to pediatric ALL patients and the data we are requesting access to is essential for determining whether the regulatory elements found in the screen are functional in patients. If whole genome sequencing reveals that there are increased mutations found in these regions after relapse, than at diagnosis, that would indicate that these regions are important for thiopurine sensitivity in patients. We will also test whether mutations in regulatory elements could cause Lynch syndrome in patients with intact MMR proteins. Together, these analyses could provide guidance for treating patients with ALL, explain the cause of Lynch syndrome in patients without MMR protein defects, and add to the general understanding of MMR regulation and significance in several diseases. Skok, Jane NEW YORK UNIVERSITY SCHOOL OF MEDICINE Translocations and Their Widespread Impact on Gene Regulation Jun04, 2014 closed Childhood leukemia is caused by genetic changes that occur within a given cell. The medicines that we currently use to kill leukemia cells are not specific to these genetic changes and kill rapidly dividing cells in a non-specific way. Many scientists have dedicated their time to finding medicines that will target the genetic abnormalities that occur in leukemia and other cancers with a goal of improving the therapy we have to treat these diseases. Our project involves looking at the organization of genes within leukemia cells that have a known genetic abnormality called the IGH-CRLF2 rearrangement. It is known that genes are located in specific places within a cell and we hypothesize that the nuclear position of these genes will affect the way the cell manages these genes as well as other genes that are located near IGH and CRLF2. We suspect that this rearrangement will place genes under different regulatory elements that the cell generally uses to produce different proteins. We plan to use data from the NCI/TARGET initiative as a means of comparing our findings to published data. Recent studies have identified alterations in the gene encoding the cytokine receptor-like factor 2 (CRLF2) in 15% of high-risk pediatric pre-B ALL patients. CRLF2 forms a heterodimeric complex with IL7Ra to generate a receptor for the cytokine, TSLP. TSLP- mediated signaling activates downstream effector molecules which play important roles in B cell precursor proliferation/survival. One of the chromosomal rearrangements that has been implicated in leukemia joins CRLF2 (pseudoautosomal region 1 of the sex chromosomes) with the antigen receptor locus, IGH (chromosome 14). This translocation places CRLF2 expression under the control of the powerful Eµ enhancer, increasing its transcriptional output. However, this change alone is insufficient for transformation, and additional cooperating mutations/deletions are thought to contribute. Because it is now well established that co-regulated genes come together in the nucleus in common transcription factories, we hypothesize that the translocated CRLF2 will be brought into contact with a different set of genes that are controlled by one or more of these factors. Thus, CRLF2 will have a different set of neighbors as a result of the chromosomal exchange and altered regulation under the control of the Eµ enhancer. To identify these loci we will perform genome wide circularized chromosome conformation capture (4C-seq) using bait sequences that are located on either side of each partner gene by comparing the analysis of two leukemic cell lines harboring IGH-CRLF2 translocations with a pre-B leukemic cell line & healthy human pre-B cells. We will also test our hypothesis that reorganization of the genome resulting from the IGH-CRLF2 translocation will impact the regulation of surrounding loci in a manner that could contribute to transformation. To assess this we will integrate data from 4C-seq with genome wide ATAC-seq, RNA-seq, exome sequencing and ChIP-seq analyses to determine how alterations in gene interactions impact the activity of regulatory elements, transcriptional output, genome stability and transcription factor binding. We request access to the TARGET data to compare our results to the published findings. Song, Fan ILLUMINA, INC. Therapy response prediction based on combined WES, WGS, and RNA-seq data Feb22, 2022 expired Next-generation sequencing (NGS) is a key technology being applied to improve cancer subtyping and treatment. We are developing a bioinformatics tool integrating multiple NGS platforms to classify cancer subtypes and identify biomarkers of therapy responses, particularly concerning acute myeloid leukemia (AML) which is one of the most common pediatric cancers. We intend to develop and evaluate the performance of a bioinformatics tool, which facilitates discovering potential biomarkers of therapy response and mechanisms of resistance to treatment from combined whole-exome and whole-genome sequencing (WES/WGS) and RNA-seq of tumor samples. The TCGA controlled data will provide us with raw sequencing data to reveal key largescale variants including structural variation as well as genome-wide transposition, which can improve our understanding of none coding alterations that can impact therapy efficacy. The data will only be used for optimization and validation of the bioinformatics tool. We plan to analyze the requested dataset independently. Addendum for the TARGET-AML dataset: Immune checkpoint inhibitor (ICI) has been proven effective across solid tumors including melanoma, renal cell carcinoma, MSI-H colorectal cancer, etc. However, only a subset of patients demonstrate durable response to this therapeutic agent. Therefore, Food and drug administration (FDA) has approved tumor mutational burden (TMB) as a pan-cancer biomarker of response to ICI in solid tumors. Nevertheless, recent trials have shown the efficacy of ICI in hematological malignancies. Notably, In 14 acute myeloid leukemia (AML) patients with post-transplant relapsed disease, treatment with ipilimumab induced complete responses in 5 patients (N Engl J Med, 375 (2) (2016), pp. 143-153). However, due to low SNV burden in AML, TMB is not an accurate predictor of response. Therefore, we seek to identify biomarkers of response to ICI in AML by evaluating public and private large structural variants and copy number alterations in these patients. Consequently, the TARGET-AML study including WGS of more than 200 patients with clinically validated SV and CNVs provides a unique opportunity to validate our findings with clinical-grade accuracy. Since leukemia is the most common type of pediatric cancer accounting for about 30% of diagnoses, our study will offer great potential to generate more effective treatments for pediatric cancer. SOULIER, jean UNIVERSITE DE PARIS Somatic lanscape of myeloid leukemia occuring in Fanconi anemia (FA) patients using TARGET and BEAT data as controls Dec30, 2020 approved Fanconi anemia (FA) is a rare inherited disorder with a considerable risk of developing acute myeloid leukemia during life (30 to 40% by the age of 40 years). These patients have poor prognosis, and the understanding of their oncogeneis is poorly known. In these patients, cells are hypersensitive to DNA interstrand crosslinks, leading to high level of spontaneous and induced chromosome breaks. We sequenced whole genome (WGS, somatic and germline) in 18 FA patients with AML, and whole exome (WXS, somatic and germline) in 15 FA patients. We aim to compare those data to non-FA AML from patients of the same age, in order to define a specific FA-related pattern of somatic lesions and clonal evolution. We plan to analyze the genomic lanscape of leukemia (AML) occuring in Fanconi anemia (FA) patients using, as AML non-FA controls, young BEAT patients and TARGET AML patients. 1. Research objectives o Compare WXS data results between young BEAT-AML and FA-AML patients, and WGS data between TARGET-AML and FA-AML patients using a common analysis pipeline. Since FA patients are children and young adults, the control cohorts have to include children (TARGET), and children and young adults (BEAT). Somatic driver mutations will be studied. o Apply analytical strategies to compare genomic mutational signatures from the FA-AML and the control cohorts (BEAT-AML and TARGET-AML). 2. Study design o Comparative mutation study of somatic mutations comparing somatic to germline (non-hematopoietic) WES/WGS for each patient. o Genomic data: 15 paired WXS in our Fanconi anemia cohort to be compared to 27 paired WXS in BEAT-AML (age between 0 and 35 years) cohort and, 16 paired WGS in the FA-AML cohort to be compared TARGET-AML pediatric data. 3. Analysis plan o A genomic analysis will be conducted to compare the average of somatic mutations, CN abnormalities, and mutational signature using paired WXS/WGS (germline and somatic for each patient). 4. Explanation of how the proposed research is consistent with Use Restrictions for the requested dataset(s), o Only subjects who have approved general use of their data are included in the dataset. All subjects have provided informed consent in accordance with the Declaration of Helsinki and French law. Our project has been approved by the INSERM IRB (No. 12-078). Spector, Logan UNIVERSITY OF MINNESOTA Network analysis of ALL genes Dec22, 2014 closed Genome-wide association studies (GWAS) have identified several common variants as strongly associated with ALL [Enciso-Mora et al., Leukemia 26(10): 2212-5, 2012] [Papaemmanuil et al., Nat Genet. 41(9):1006-10, 2009], [Perez-Andreu et al., Nat Genet. 45(12): 1494–1498, 2013]. Currently no mechanism has been identified to show how these variants relate to the development of ALL; they may be responsible for a change in the downstream pathway that drives the development of disease. Using a published algorithm [Deshpande et al., PLoS Comput Biol 6(12): 2010], we can integrate functional linkage networks with methylation and expression data to identify active subnetworks centered on ARID5B. We request to use the TARGET ALL expression and methylation data to determine whether somatic alterations in leukemia occur within the functional sub-networks these risk variants. Genome-wide association studies (GWAS) have identified several common variants as strongly associated with risk of pediatric ALL [Enciso-Mora et al., Leukemia 26(10): 2212-5, 2012] [Papaemmanuil et al., Nat Genet. 41(9):1006-10, 2009], [Perez-Andreu et al., Nat Genet. 45(12): 1494–1498, 2013]. In addition, ARID5B variants are associated with treatment outcome [Xu et al, Journal of Clinical Oncology 30(7): 751-7, 2012]. Currently no mechanism has been identified to show how these variants relate to the development of ALL; they may be responsible for a change in the downstream pathway that drives the development of disease. Using a published algorithm [Deshpande et al., PLoS Comput Biol 6(12): 2010], we will integrate functional linkage networks with methylation and expression data to identify active subnetworks centered on ARID5B. We request to use the TARGET ALL expression and methylation data to determine whether somatic alterations in leukemia occur within the functional sub-networks these risk variants. This research has the potential to enhance the treatment of ALL by identifying downstream genes regulated by a protein critical for leukemogenesis and survival. SPELLMAN, PAUL UNIVERSITY OF CALIFORNIA LOS ANGELES HCMI- Integrated TCGA and TARGET Aug29, 2023 approved The Center for Cancer Genomics (CCG) is leading an analysis working group (AWG) for the Human Cancer Models Initiative (HCMI). HCMI’s repository include patient-derived next-generation cancer models, case-associated tumors and matched normal samples annotated with genomics and molecular data, from rare adult and pediatric cancers. The purpose of the HCMI AWG is to classify HCMI’s patient-derived next-generation cancer models into cancer subtypes, validate that the models preserve the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. HCMI AWG intends to use the TARGET dataset as a reference dataset to characterize the HCMI models and associated tumors from pediatric cases. The findings from the AWG would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology. The Center for Cancer Genomics (CCG) is leading an analysis working group (AWG) for the Human Cancer Models Initiative (HCMI). The purpose of the HCMI AWG is to classify HCMI’s patient-derived next-generation cancer models into cancer subtypes, validate that the models retain the biological characteristics of the parent tumor from which they were derived, and to show the scientific community how these models could be used in functional genomics research. In order to map the HCMI tumors and models in cancer genetic taxonomy, the group will be using methods such as (1) Celligner algorithm to map tumors and models against reference datasets, and (2) OncoMatch approach to analyze regulatory networks and model fidelity, and (3) place each tumor and model in the subtypes classified by the Tumor Molecular Pathology (TMP) work group. The group will also be analyzing copy numbers, mutations, mutation signatures and structural variants from the Whole Genome Sequencing Data of HCMI tumors and models and compare them against those of the reference datasets: TARGET (for pediatric cases), and TCGA (for adult cases). The findings from the AWG would be valuable for the research community as these models could be widely used in identifying mechanisms of resistance and/or novel therapeutic targets, developing diagnostic biomarkers, and other aspects relevant to precision oncology. Polygenic risk scores (PRS) can estimate an individual’s genetic influence on their phenotype and potentially predict the lifetime risk of various diseases, especially when clinically actionable genes are taken into account. We are planning to calculate the PRS from tissue associated with breast and colorectal cancer in the HCMI project. The WGS coverage for some of these samples is 10-25x, which, while adequate, imputation can help improve the accuracy and reliability of the PRS and increase our statistical power. SPELLMAN, PAUL UNIVERSITY OF CALIFORNIA LOS ANGELES GDAN at UCLA Aug29, 2023 approved Our group uses TCGA data to understand the genetic underpinnings of cancer. These analyses identify the genetic risks that contribute to cancer as well to characterize the development of cancers. The eight specific areas we are interested in addressing are (1) the development of methods to order the molecular events that cause cancer so, for example, that early events can be used as diagnostic markers. (2) understanding how inherited and environmental risks influence tumor development by adding TCGA data to our own data sets now in development. (3) identify inherited risk factors not for the development of cancer but instead identify risk factors that influence the way the cancer genome changes. (4) systematic analyses of global RNAseq data sets to identify aberrant mRNAs in cancer. Additionally, we are constantly developing new methods for interpreting cancer genome data. As part of the Cancer Genome Atlas and other collaborative projects (e.g. ICGC) we engage in systematic coordinated analysis of TCGA and GDAN data. The objectives are to study individual diseases (i.e. KIRP) or general oncogenic processes. The objectives are diverse and adaptive based on the new scientific directions identified by our own members and other members of groups like TCGA, GDAN and ICGC. Additionally, we are considering the possibilities of merging data with external data generated from new modalities (like PacBio long read sequencing) and the Hartwig foundation. Sperber, Steven UPSTATE MEDICAL UNIVERSITY Osteosarcoma with rhabdoid features Jun22, 2023 expired Osteosarcoma is a cancer of the bone that can be fatal even when treated. It is usually very aggressive and difficult to treat, requiring surgery and follow-up with radiation and chemotherapy. Diagnosing osteosarcoma can be challenging due to the different forms it can look like. A new form of therapy has been introduced recently that has successfully treated other cancers. This treatment targets the genetic mutations of tumors that our normal cells don’t have. This form of therapy requires us to identify which tumors have what genetic mutations. Most cancers have genetic mutations that drive their growth. Using new techniques, we can map the entire genetic makeup of a cell, cancerous or not. Our goal is to study the genetic mutations of rare osteosarcomas and see if any unique mutations can be potential targets for treatment today or in the future. Research statement: Osteosarcoma is the most common primary malignant bone tumor in children and adolescents. Due to its invasive nature, radical surgery alone rarely results in a cure. The introduction of combination chemotherapy in the 1970s drastically increased survival rates in these patients. However, overall long-term survival rates have remained relatively high since then. Next generation sequencing (NGS) technology has enabled cancer genome-sequencing joint projects, including The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC), to sequence and analyze cancer. Similarly, the Catalogue of Somatic Mutations in Cancer (COSMIC) has summarized coding mutations in over a million cancer samples. With the availability of next-generation sequencing, new studies have demonstrated that osteosarcomas display high rates of genetic variations in targetable therapeutic locations. Studies comparing osteosarcomas using NGS have revealed several inherited tumor suppressor genes predisposing to osteosarcoma, including TP53, RB1, RECQL4, BLM, and WRN. With next generation sequencing, more genetic landscapes are being explored, looking for pathogenic variants amendable to targeted therapy. The World Health Organization has divided osteosarcoma into six classifications based on anatomic location and histology: conventional, secondary, low-grade, parosteal, periosteal, and high-grade surface types. Conventional osteosarcoma has multiple histologic subtypes from chondroblastic, small cell, and rhabdoid. An investigation showed that most subtypes showed no prognostic difference, but some were too rare for assessment. Osteosarcoma with rhabdoid features is a rare subtype not well represented in the medical literature. We aim to compare the exome sequencing results of our rare osteosarcoma sample, identifying novel mutations to the current literature and available datasets. We will use the datasets to compare our genomic analysis to the histologic variants found in osteosarcomas to identify novel mutation profiles observed. Study design: Tissues used for our project were obtained using paraffin blocks. The tissue chosen for analysis had the following criteria: greater than 500 tumor cells with less than 50% necrosis. All clinical history was accessed from within our institution (SUNY Upstate Medical Center) with IRB exemption. Using exome hybrid capture (Agilent, V8), next generation sequencing was performed by sequencing-by-synthesis (Nextseq 550, Illumina). The data was organized and filtered through a bioinformatic pipeline. Summary statement: We aim to determine whether osteosarcoma with rhabdoid differentiation and other osteosarcoma variants have novel gene mutations that might be targetable for therapy. We are comparing mutation profiles commonly observed between conventional osteosarcoma variants to determine what makes osteosarcoma with rhabdoid differentiation unique. Unique differences between the subtypes may lead to further diagnosis, prognosis, and targets for therapeutic intervention. Stark, Mitchell UNIVERSITY OF QUEENSLAND Functional effects of DNA sequence variants and their role in cancer Oct04, 2016 closed We will investigate functional aspects of inherited mutations and tumour mutations across different tumour types within the TCGA and TARGET WGS/exome datasets. Functional consequences of variants will be inferred from effects on matched RNA-Seq data and by experimental validations. Analyses will be carried out both within and between different tumour types to identify shared and distinguishing features. Using a pan-cancer approach, we will identify somatic and germline variants genome-wide (including somatic copy number variations), across cancer types deposited by the TCGA and TARGET to identify candidate risk and causal genetic variants. Our analyses will also integrate genomic and transcriptomic (mRNA/miRNA) data using bioinformatics pipelines established in our group. Our team has a number of specific research projects which include: paediatric cancers, AML, oesophageal cancer, and melanoma (cutaneous, acral, mucosal and uveal). Access to primary data is essential to analyse exonic and non-exonic germline/somatic variants, and transcriptomic data using our own analysis pipelines with updated and customized annotations. As our team has a particular interest in paediatric cancers, it is necessary to access both adult and paediatric cancer datasets due to the differences in molecular pathways affected in these. We will also use TCGA primary data as an independent dataset to replicate the findings from our in-house data from the project types described. Steidl, Ulrich ALBERT EINSTEIN COLLEGE OF MEDICINE Neoantigenicity of mutations in AML Apr25, 2024 approved Acute myeloid leukemia (AML) is one of the most common acute leukemias which is primarily characterized by uncontrolled proliferation and accumulation of immature myeloid progenitors (blasts) in the bone marrow, peripheral blood and spleen. Leukemogenesis, similar to several carcinogenic processes, is a multistep process involving the combination of structural and functional mutations in tumor suppressors or oncogenes in leukemia. These mutations altogether support the clonal expansion of pre-leukemic (pre-LSCs) and leukemic stem cells (LSCs) and disrupt hematopoiesis at early stages. Access to this dataset will enable us to study the interaction of the immune system with various mutated protein fragments (antigenic peptides). Understanding of these mechanisms will provide a new edge to the drug development and management of relapsed cases in the future. Myelodysplastic syndromes (MDS) are a heterogeneous group of malignant stem cell (SC)-derived diseases defined by inadequate hematopoiesis and consequent peripheral cytopenias. Delaying progression of MDS to a higher-risk stage and further AML is one of the key challenges in the clinical management of patients with MDS. Therefore, cellular and molecular insights into progression of MDS are needed. Recent technological developments in flow cytometry (FC) have made it possible to detect novel rare stem cell populations that play critical roles in disease initiation, progression, and resistance to treatment. These subclonal populations have historically been neglected by bulk sequencing studies due to the scarceness of (pre-)MDS/AML stem cells in the bone marrow. Recently published findings within the lab demonstrated the capability of targeted deep-sequencing and single cell analysis of sorted stem cell populations in overcoming these limitations. Interestingly, patients with MDS and resultant AML exhibit higher mutation burden and subclonal diversity at the stem cell level relative to MDS or AML blast cells. In addition, it has become evident that different MDS and AML subclones evolve at various stages in a parallel fashion, challenging the previous dogma suggested by bulk sequencing studies that MDS progresses linearly to AML. Given this underlying clonal diversity, as well as seminal findings in the recently discovered phenomenon of clonal hematopoiesis (CH) in which healthy individuals harbor mutations in their hematopoietic cells, we are interested in the many mechanisms by which pathogenic clones do not expand and have a direct causal link to disease. Our study aims to evaluate neoantigenicity of given mutations, particularly TP53, and how they interact with MHC-I context in various populations, and, ultimately, determine rates of clonal expansion. Therefore, we are requesting access to all of the genomic, transcriptomic, and HLA haplotype datasets in the Cancer Genome Atlas (phs001657). Access to this data will enable us to model mutated peptide binding to MHC-I (HLA alleles) and, in turn, predict likelihood of T-cell killing of cells harboring given mutations. This analysis has the potential to aid in development of effective immunotherapies in MDS and AML, thereby filling an unmet need for those patients that do not have sustainable responses to current first-line therapies. We assure that all the data provided will be stored, analyzed and available within our research group only and all the terms and conditions will be followed for further data management. Moreover, the data produced after the analyses will be dully acknowledged in our future publications and will be freely available for the research community. Stewart, Douglas Russell NIH Germline variation in DICER1 and other miRNA processing genes Dec13, 2017 closed MiRNA processing is crucial in development and control of cell growth. Mutations inherited from both parents (germline) in some miRNA processing genes (DICER1, DROSHA) is associated with risk of some kinds of cancer. Previous studies have shown that in cancer, tumor mutation in miRNA processing genes affects expression and disrupts function. However, the consequence of germline mutation of these genes in cancer, the goal of our study, is not well understood. With TARGET data, we plan to determine the frequency of germline mutations in DICER1 and other miRNA processing genes. From these data, we will be able to examine a variety of different cancers and estimate how often DICER1 and other miRNA genes play a role in the development of those cancers. Ultimately, we seek to identify novel genetic syndromes associated with these genes. The diagnosis and treatment of additional pediatric cancers may be facilitated by the identification of additional miRNA genes, as proposed by our strategy above. Access to the TARGET data is essential to conducting this analysis. MiRNA processing is essential to embryogenesis. Variants in some (DICER1, DROSHA) miRNA-processing genes are associated with risk of pediatric and adult cancers. Studies have shown that somatic mutation in miRNA processing genes affects expression and disrupts function. However, the prevalence and penetrance of germline variation in miRNA processing genes (including DICER1) is not well understood. To explore this, we developed a scheme to classify DICER1 germline variation. We then used population-based databases (ExAC, 1000 Genome, and ESP), data from a DICER1 sequencing lab (collaborator Ashley Hill, CNMC) and our variant classification scheme to categorize germline variation in DICER1 (Kim et al IJC, 2017). We found that the prevalence of germline loss-of-function (LOF) DICER1 germline variation is much higher than expected (1:10,600). There are two logical next steps. First, although DICER1 is known to be associated with an increased risk of a variety of pediatric cancers, the exact prevalence and penetrance of germline variation in DICER1 in pediatric cancer is unknown. We hypothesize that the unexpectedly high prevalence of DICER LOF variation in the general population (that we have established, above) suggests that such variation may be higher than expected in pediatric cancer. To determine this, we seek access to the TARGET genomic data so that we can better understand the link between rare germline variants in and pediatric cancer. The diagnosis, treatment and prognostication of pediatric cancers may be facilitated by the recognition of the involvement of pathogenic DICER1 variation. Access to the TARGET data (germline and somatic) is essential to conducting this analysis. Second, the prevalence, penetrance and phenotype of germline variation in other (non-DICER1) miRNA processing genes in pediatric cancer is (with the exception of DROSHA) unknown. Determination of these features may aide in the identification of novel pediatric cancer-predisposition syndromes, improve diagnosis and prognostication. Access to the TARGET data (germline and somatic) is essential to conducting this analysis. Stine, Megan INFORMATION MANAGEMENT SERVICES, INC. TARGET Demographic and Clinical data to GDC and AWG access Sep27, 2022 closed The primary objective of this work is to transform and submit to the GDC all TARGET demographic and clinical data, as part of a Statement of Work issued by the NCI. We also need access to MILD data to participate in the AWG. The primary objective of this work is to transform and submit to the GDC all TARGET demographic and clinical data, as part of a Statement of Work issued by the NCI. We also need access to MILD data to participate in the AWG. Stojdl, David UNIVERSITY OF OTTAWA Predicting the outcome of cancer-related mutations in the MHC-Binding process Dec19, 2016 rejected Work done by our group involves computation programs that would predict the outcome of cancer-related mutations once they arise in our body. Our workflow would create a list of mutations that are known to be to a specific cancer. Multiple datasets and prediction algorithms will allow us to create a list of mutated sequences that could be potentially displayed by our immune system’s molecules. Once our workflow is established, we will focus on cancer-related mutations in children since our group focusses on pediatric cancers. Determining whether these sequences would bind to these molecules will help create vaccination therapies that would target these mutated sequences, thus helping our bodies react to these mutations. Our work represents a key stepping stone towards improving cancer therapies, specifically in children. Preliminary work involves the brain cancer known as glioblastoma multiforme, but further analyses will be done on different types of cancers. Our group is developing a computational assay that would potentially identify immunological neo-epitopes derived from cancer-related mutations that would lead to T-cell vaccine strategies. Preliminary data will be extracted from COSMIC (Catalogue of somatic mutations in cancer) and filtered to create lists of cancer-specific mutations. However, limited information from patients and the expression of their mutations is publicly accessible through COSMIC. Gaining access to expression data found on the National Cancer Institute’s GDC Data Portal would allow us to improve the feasibility of our assay. We are established at the Children's Hospital of Eastern Ontario. In order to establish an optimal workflow, we will test our data on patient's of all ages but our end goal is to create a list of neo-epitopes that are specific to pediatric cases. By using data from adults only, we will not be assessing the outcome of mutations in children since these might be different. Therefore, having access to pediatric data is critical to the advancement of our research. We plan on respecting the limitations linked to the requested databases. Our goal is to explore the possibility of defining a unifying set of neo-epitopes that represent the totality of sequences found in the population and using this list to program the assay. In the future, a patient would be screened using the list of predicted neo-epitopes to identify the personalized set of epitopes for vaccination therapy. Our first trial will be focusing on patients affected by glioblastoma multiforme. If successful, we would like to broaden to other types of cancers. Stransky, Nicolas BLUEPRINT MEDICINES, INC. Molecular determination of pediatric tumor sub-populations to guide the development of next generation targeted therapies Sep25, 2015 closed Blueprint Medicines is committed to developing the most effective, targeted therapies for cancer patients. To that end, we are exploring the molecular underpinnings of individual pediatric tumors. Analysis and annotation of the data comprised by the TARGET project is critical so that we can tailor therapies to the molecular profile that underlies the disease of individual young cancer patients. Blueprint Medicines is committed to developing the most effective, targeted therapies for cancer patients. A prerequisite to this objective is to define the molecular alterations driving the growth of individual tumors. This focus will provide the insight we need to determine the appropriate molecular targets, and the subsequent combinations of those targets, for the next generation of drugs that we are developing. Analysis and annotation of the data comprised by the TARGET project is critical so that we can tailor therapies to the molecular blueprint that underlies the growth of cancers in individual pediatric cancer patients. Our objective is to analyze the molecular and genetic diversity of pediatric cancers to better define tumor subtypes as well as the pathways on which they rely. We will use newly developed analysis algorithms to better predict the functional consequences of mutations and structural aberrations that we discover in the data. We will look for molecular dependencies that can be exploited with new drugs and drug combinations. This knowledge will be used to inform the development of a new generation of targeted therapies tailored to genotypes that define these subpopulations. We do acknowledge the data use limitations for the TARGET dataset: since the goal is to discover molecular drivers of pediatric cancers, it is necessary to conduct the project on pediatric data. The data will not be used for the development of methods, software, or other tools. Sturgill, David Matthew NIH Center for Cancer Genomics Projects May28, 2024 approved As a Project Officer for the Genomic Data Commons (GDC), I am requesting access to the datasets produced and utilized by Center for Cancer Genomics (CCG) programs (such as TCGA and others) in order to manage and provide quality control of data within the GDC. The Center for Cancer Genomics (CCG) conducts cancer genomics research programs to improve cancer diagnosis, treatment, and outcomes. The CCG provides a repository and analysis platform for these data called the Genomic Data Commons (GDC). As a GDC Project Officer within CCG, I am requesting access to datasets produced and utilized by CCG’s programs, including TCGA, ALCHEMIST, CTSP, CGCI, HCM, and others. I am requesting access for administrative oversight and project management purposes, including data quality control and evaluation. SU, XIAOPING UNIVERSITY OF TX MD ANDERSON CAN CTR Integrative Analysis of Pretreated Anaplastic Wilms Tumors Reveals a TP53 Immune subgroup Aug28, 2018 approved Wilms tumors (WT) are highly curable in up to 90% of cases with a combination of surgery and radio-chemotherapy, but treatment-resistant types such as diffuse anaplastic Wilms tumors (DAWT) pose significant therapeutic challenges. Presence of anaplasia is a potent marker for adverse outcome in patients with Wilms tumors with diffuse anaplasia (DAWT). Patients with WT with focal anaplasia (FAWT) are considered as intermediate risk. Besides TP53 mutations thought to arise in anaplastic components, the crosstalk between genomic tumor features and immune landscape remains unknown. We will perform whole-exome and/or RNA next-generation sequencing on genomic DNA derived from 12 matched tumor-normal WTs and extended analysis in an independent dataset of 9 cases. TP53 mutations and tumor infiltrating lymphocytes (TILs: CD3, CD4 and CD8) were also assessed. Validation will be performed using TARGET-WT cohort (n=96). Prognostic role of TILs will be investigated in another 55 pre-treated WTs. Sullivan, Christopher UNIVERSITY OF TEXAS, AUSTIN Analysis of Viral microRNAs in Tumor Samples Jul14, 2014 closed Viruses have been associated with various forms of cancer. We are using the Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and Cancer Genome Characterization Initiative (CCGI) data sets to investigate the presence of viruses in tumors of diverse origin, and to understand basic in vivo aspects of human virus infection through the rich data contained in the TCGA. This project will provide a better understanding of which types of viruses are associated with and how they contribute to different types of cancer. UPDATE/ SUMMARY of last four years progress.: In the past four years we have downloaded datasets such as "LUAD, STAD, BCL, LAML, LUSC, DLBC". For the most part no samples expressed abundant viral miRNAs. Nonetheless, we did find the limited viral miRNA reads that were recovered useful as they helped validated previous miRNAs that we had predicted but otherwise had not confirmed in vivo. A newer area that we are undertaking with these datasets is exploring the role of miRNA and other pathways in hepatitis C virus life cycle by specifically analyzing hepatocellular carcinomas (HCCs) that are HCV+ versus HCV-. These results show correlation of host factors specifically enriched in HCV+ and we have linked this to noncoding RNA biology. Interestingly, some of these host factors restrict HCV in in vitro infection models. This implies a possible mechanistic explanation for some HCCS and we expect to be submitting a manuscript this year. PROJECT: The primary research objective are to use the Cancer Genome Atlas (TCGA), Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and Cancer Genome Characterization Initiative (CCGI) datasets to investigate the prevalence of viral miRNAs and study ncRNA biology as it relates to viruses and cancer from these in vivo datasets. This study will quantify viral miRNAs in large datasets of both adult and pediatric origin, as to identify miRNA, ncRNA and mRNA genes that correlate with virus infection such as HCV. The conclusion of this research will provide new insight into the role of viruses and ncRNA in cancers and may identify new biomarkers or therapeutic targets. All analysis will be conducted in accordance with human subjects protection and data access policies and the data sets will be analyzed separately from other data sources. Sumazin, Pavel BAYLOR COLLEGE OF MEDICINE Systems-biology approaches to identifying prognostic biomarkers in pediatric disorders and cancer Apr02, 2015 approved Genomic profiling is becoming commonplace in the pathology lab, however, for the majority of pediatric patients, these profiles fail to inform about therapeutic directions and risk of recurrence. To improve their benefit, we must improve the interpretation of genomic profiles, including improving the identification of relevant mutations that affect tumors individually or in concert with other mutations. We propose to improve the interpretation of genomic profiles using computational methods that account for interactions between genes and between DNA regions. The added information will point to mutations that are interchangeable – these mutations may be genomically distant but have similar effects on cancer, and mutations that are synergetic and only have an effect in aggregate. By considering the cellular environment and profiling data, we seek to improve the interpretation of pediatric genomic profiles. Success will help predict susceptibility leading to preventive steps and identifying candidates for surveillance to enable early diagnosis, as well as more informed therapeutics. We propose to develop a methodology to interpret genomic variants that work in concert to synergistically affect the pediatric pathogenesis of cancers and other disorders by combining analyses of regulatory networks with the analyses of DNA profiles. We propose to study pediatric brain disorders, CNS, blood, liver, and kidney cancers. Context-specific regulatory networks will be used to group variants based on functional inference. We will group genomic regions by inferred common function, where one region may be associated with multiple functions, and investigate potential effects of grouped alterations on gene expression programs. Finally, focusing on grouped alterations that were found overrepresented in each patient population, we will identify causal variants and infer function for specific alterations. We propose to leverage genome-scale regulatory network analyses to study synergistic effects of variants without having to exhaustively enumerate through prohibitively large combinatorial spaces. Follow-up experimental validation will include low-throughput and medium-throughput assays, testing our ability to identify functional alterations and the regulators that target them. Low throughput assays include assessment of the effects of non-coding variants on gene expression and activity and cell proliferation and migration through genome modifications in patient-derived cells. Medium throughput experiments will test the effects of variants in proximal regulatory regions on the expression and activity of associated targets, measuring the sensitivity and specificity of predictions. We do not plan to combine dbGAP datasets with datasets outside of dbGAP. Any findings from this study will be published and broadly shared with the scientific community. Sundaresh, Suman NEXTBIO Integration of genomic study results of childhood cancers with orthogonal data types Sep10, 2010 closed Our goal is to identify genetic factors and mechanisms associated with different childhood cancers through meta-analysis of all the publicly available data from studies of gene expression, somatic mutations and genome-wide scans. The collective power of meta-analysis will generate a comprehensive list of genetic factors that are associated with childhood cancer, which will enhance researchers’ understanding of the etiology of these diseases. Our goal is to identify common genetic factors associated with various childhood cancers through meta-analysis of all publicly available genomic data reflecting childhood cancer patients and associated controls. We request access to the raw gene expression and genomic structure data from patients with different childhood cancers and normal controls contained within the TARGET data set. The data obtained will be analyzed to identify signatures of gene expression changes, gene amplifications and deletions in childhood cancer. The signatures will be correlated across many similar study results in the NextBio database through meta-analysis. The collective power of meta-analysis will build a comprehensive list of genetic factors that are associated with childhood cancers, which will enhance researchers’ understanding of the genetic basis of the disease. Relevant queries in NextBio will bring up subsets of results obtained from mining of individual studies as well as correlation results from meta-analysis. We will provide limited information in summary results of each study according to the standards used in the dbGAP Genome browser – chromosomal position, genes in nearby regions and p-value associated with a genetic element. We will make it a part of our platform usage terms and conditions that this information may not be used to determine the identity of individuals involved in the study. The use of data will be consistent with the Use Restrictions for the dataset and we intend to abide by them. The raw data will be used for our internal analysis by the PI only and not made available to anyone within or outside of NextBio. All genome-wide association data will be kept private, protected by the latest firewalls, and accessible only to the PI. Additionally, data will be protected by intrusion detection systems, SSL encryption, IP-level restrictions and proprietary security products. Nextbio’s production servers and SOA architecture are hosted by Savvis, one of the most advanced data center facilities in the world. The raw data including the back-ups will be destroyed after the access period is over. We will acknowledge the contributing Investigator(s) on our website, oral and written presentations, disclosures, and publications resulting from any analyses of the data. We will include the dbGaP accession number with the relevant version of the analyzed dataset(s), a practice which we have implemented for datasets already present in NextBio’s database from public repositories such as GEO, AExpress, caBIG, dbGaP, NIBIO, etc. Suzuki, Hiroshi NAGOYA UNIVERSITY Roles of transcription and non-coding RNAs in pediatric cancer Jan19, 2021 approved Recent advances in large scale cancer genomics revealed frequent alterations of regulators of gene and genome regulation, such as RNA splicing factors, cohesin, and epigenetic regulators. By developing novel computational alogorithms, we will investigate the relationship between aberrant gene regulation, transcription, RNA processing, and RNA regulation, and cancer phenotypes of pediatric cancers. We aim to identify new molecular targets in cancer therapies through integrative omics analysis. The overall objective of this project is to investigate the roles of transcription and non-codingRNAs in regulation and their alterations in pediatric cancer toward the development and treatment of cancer. We will focus on the classes of microRNAs (miRs) and long noncoding RNAs (lncRNAs) as well as transcription and transcription-associated RNA processing, including RNA polyadenylation and RNA splicing. The roles of protein coding genes have been extensively studied in cancer; however, the molecular mechanisms underlying gene regulation in tumorigenesis remain to be elusive. Recent advances in large scale cancer genomics revealed frequent alterations of regulators of gene and genome regulation, such as RNA splicing factors, cohesin, and epigenetic regulators, especially in hematological malignancies (MDS/AML). We recently reported the functional significance of cohesin mutation in MDS/AML using adult datasets (Ochi et al. Combined Cohesin-RUNX1 Deficiency Synergistically Perturbs Chromatin Looping and Causes Myelodysplastic Syndromes. Cancer Discov. 2020 Jun;10(6):836-853. doi: 10.1158/2159-8290.CD-19-0982.); however, the significance remains unclear in pediatric cancer. By focusing on pediatric cancer datasets and comparing with adult datasets, we aim to investigation of the relationship between aberrant gene regulation, transcription, RNA processing, and RNA regulation, and cancer phenotypes of pediatric cancers. We aim to identify new molecular targets in cancer therapies through integrative omics analysis. The process of identifying patterns will not require or involve the identification of individual subjects. This is consistent with the intent of the Use Restrictions. Findings will be made public but, as agreed, the data will be secured and not transferred. Sweet-Cordero, Eric UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Fusion landscape in cancer Dec18, 2017 expired Fusions are a common mechanism that drives cancer. Fusions occur when chromosomes break and genes come together abnormally. We will use computational approaches to find these fusions and experimental approaches to validate what they do. Our goal is to identify novel structural rearrangements that lead to gene fusions relevant to the pathogenesis of cancer in pediatric patients and young adults. We are specifically interested in pediatric solid tumors but we plan to analyze pan-cancer Whole genome, exome and RNAseq data as a comparison. Using our own datasets, we have identified several novel fusions that are enriched in pediatric tumors. Our goals are as follows: 1)To determine whether a set of novel fusions we have identified can also be found in other pediatric cancers. This search for novel fusions will be performed using novel fusion detection algorithms for WGS and RNAseq data developed in our lab. We will compare our analysis of the pan cancer public data obtained through dbGAP with our own unpublished datasets for which we also have patient-derived xenografts with WGS and RNA. This will allow us to validate potential novel alterations by PCR and to analyze their functional role in cancer using PDX models. We do not feel that merging public dbGAP data with our own data represents any increased risk to participants. 2)A second goal is to correlate the presence of novel fusions with gene expression to identify expression outliers that are associated with specific fusions (ie, downstream events). In summary our research is specifically intended to shed new light on the pathogenesis of pediatric cancer. Thus, our research objectives can only be met though access to TARGET and other pediatric cancer datasets. Access to some adult datasets is also needed for comparison and contrast. Takita, Junko KYOTO UNIVERSITY Genetic analysis of pediatric leukemia and solid tumors Jun05, 2019 approved The survival rate of pediatric cancers has greatly increased over time, but relapsed cases are chemo-resistant and long-term survival of these cases is still poor, moreover it is difficult to predict the risk of relapse accurately. In this study, we will identify the molecular mechanisms of relapse or treatment failure in pediatric cancers. The survival rate of pediatric cancers has greatly increased over time, but relapsed cases are chemo-resistant and long-term survival of these cases is still poor, and it is difficult to predict the risk of relapse accurately. In this study, we will perform whole-exome sequencing (WES), targeted sequencing, and RNA sequencing to identify the molecular mechanisms of relapse or induction failure in pediatric cancers. We would also like to analyze the genetic difference between cured and relapsed/refractory cases combining other data sets of pediatric cancer cohorts deposited in dbGaP including WES, RNA-seq, and methylation analysis. The combined analysis of these data sets does not create any additional risk to participants. No request will be made for the identification of participants. Takita, Junko UNIVERSITY OF TOKYO Genome-wide multi-omic analysis of neuroblastoma Jul28, 2016 closed Neuroblastoma is the most common extracranial solid tumor in childhood. Despite the improvement of risk-stratified therapies, overall survival rate of high-risk group is still as low as 40-50%. We plan to use TARGET datasets to discover therapeutically targetable pathways and molecules of neuroblastoma. Neuroblastoma is the most common extracranial solid tumor in childhood. Despite the improvement of risk-stratified therapies, overall survival rate of high-risk group is still as low as 40-50%. Based on recent findings on recurrent features of high-risk neuroblastoma including aberrant telomere maintenance, our research interest is to describe downstream transcriptomic profile combined with genetic and epigenetic landscapes and identify therapeutically targetable pathways and molecules of neuroblastoma. We plan to use our inhouse next generation sequencing pipeline on TARGET datasets and our neuroblastoma cohort separately as a discovery cohort. Downloaded datasets will be stored and analyzed independently and will not create additional risks to TARGET participants. Takita, Junko UNIVERSITY OF TOKYO Genome-wide analysis of pediatric hematologic malignancy Oct12, 2016 closed Acute leukemia is the most common malignancy in childhood. Despite the improvement of risk-stratified therapies, prognosis of high-risk group is still poor, especially in relapsed cases. We plan to use TARGET datasets to discover therapeutically targetable pathways and molecules of pediatric leukemia. Acute leukemia is the most common malignancy in childhood. Despite the improvement of risk-stratified therapies, prognosis of several subgroups characterized by molecular features, such as BCR-ABL1, remains poor. Based on recent findings on molecular subgroups and heterogeneity of leukemia, our research interest is to describe downstream transcriptomic profile combined with genetic and epigenetic landscapes and identify therapeutically targetable pathways and molecules of pediatric lymphoblastic/myeloblastic leukemia. We plan to use our inhouse next generation sequencing pipeline on TARGET datasets and our acute leukemia cohort independently, and only at the final steps of the analysis, the results will be combined. Raw data from the two cohorts will be stored separately and securely, and will not create additional risks to TARGET participants. Tan, Kai CHILDREN'S HOSP OF PHILADELPHIA TARGET pan cancer analysis project Mar11, 2016 rejected The TARGET project has generated a large amount of genomic data for five pediatric cancers. In order to fully utilize this invaluable data set, advanced computational methods need to be developed that can integrate heterogeneous molecular data types. Application of the to-be-developed methods could better guide the effort for identifying driver genetic mutations and drug targets. NCI TARGET has generated large-scale NGS data for five pediatric cancers and a pan-cancer study is expected to provide new insight in the similarity and the differences of these pediatric cancers. We plan to combine these large datasets to investigate and characterize the pediatric cancer genome landscape and cancer type specific genetic lesions in pan-cancer study. Specifically, we plan to use the TARGET datasets to develop novel molecular network based methods to better understand deregulated pathways, to identify novel drug targets and to identify causal noncoding mutations. We will apply our methods to single cancer types and groups of cancer types to understand cancer-type-specific and shared genetic features. The proposed research will use TARGET data only for pediatric cancer research. Taylor, Deanne CHILDREN'S HOSP OF PHILADELPHIA The NCI Open Targets Pediatric Resource Jun03, 2021 closed Under the RACE for Children Act, new molecularly targeted compounds for adult cancers will need to be evaluated for children's cancers if the molecular target or mechanism of action of the drug is relevant to the growth or progression of pediatric cancer. This guidance is given by the FDA's relevant molecular target list (RMTL). Partnered with Frederick National Labs at NCI, we are extending the Open Targets portal as a site to display evidence of pediatric gene involvement in certain cancers and to display information on genes currently in the FDA RMTL. This will require summarization and analysis of molecular datasets from dbGAP, such as the Kids First, TARGET, TCGA, GTEx and the Pediatric Preclinical Testing Consortium (PPTC). These analyses will be displayed on the new portal to help in creating evidence to support the FDA RMTL and to empower cancer target discovery in children. We are creating a portal resource for NCI called Open Targets Pediatric, which will use summarized data and dataset comparisons to inform pediatric drug target development in response to the RACE for Children Act. In order to accomplish this, we will be summarizing and displaying data (non-identifiable) in the forms of transcriptional levels, somatic mutations, and associated open clinical data to help inform drug target discovery in pediatric cancers. All requested data will be used use as part of the NCI contract we were awarded to instantiate a pediatric instance of Open Targets platform. This platform will be part of the Childhood Cancer Data Initiative (CCDI) data ecosystem. Data will be summarized by cohort to yield average expression per gene. Gene expression in the pediatric datasets will be compared against GTEx and TCGA (in case-control gene expression analyses). GTEx and TCGA will be used as controls for the pediatric data. TCGA data will not be displayed as a summary but rather used as a control for differential gene expression analysis. Somatic mutation analyses will generate a summary for the number of somatic mutations at each position per histology (cancer type). We are collaborating with FNL and they will submit their own Data Access request for the data. dbGaP data may be combined with other pediatric cancer data from the Pediatric Brain Tumor Atlas consortium. Taylor, Martin UNIVERSITY OF EDINBURGH Quantifying oncogenic selection of germline variation in pediatric cancers Aug28, 2017 closed Differences in the DNA sequence between people can affect how likely one person is to develop cancer compared to another, the genetic risk. With their exceptionally early onset, childhood cancers are likely to be even more influenced by their genetic risk than adulthood cancers. During the development of cancers, they often loose sections of DNA or make extra copies of others. From the cancer's perspective, a successful cancer is one that looses DNA that would restrict it's growth and survival, and gains extra copies of DNA sections that allow it to continue growing and spreading. For most of our DNA we start with two copies, but which might have small differences between them. In this study we will look at how "successful" childhood cancers chose between these two starting copies to find sections containing changes that cancers prefer to keep. Finding such preferred changes means that they can be incorporated into genetic tests to help identify those at risk of developing childhood cancers, can also help us understand how they develop and how best to treat them. Inherited variation in DNA sequences can predispose to the development of cancer and as such is likely to be an important underlying cause of pediatric cancers with their exceptionally early onset. In this project we are systematically looking for biases in the loss or retention of germline variants during the development of childhood cancers. This will be identified as allelic imbalances in read counts in the tumor versus normal sample. Although the TARGET data samples have been extensively studied previously, our novel approach which is akin to a Genome Wide Association Study but with perfectly matched cases and controls (the two inherited haploid genomes) has not previously been applied to this data. The analysis of TARGET data will be performed in isolation from other individual level genetic datasets but parallel analyses will be performed on adult tumors from other sources (TCGA, ICGC) for which restricted data access permissions are already in place. These adult-sample based analyses will provide a counterpoint for the primary analysis of paediatric cancer samples. As we are working on this project to understand on the genetic predisposition to early onset (pediatric) cancers, this work can only be carried out using datasets from childhood cancers. The expected outcome of this study is the identification of germline genetic variation that increases the risk of childhood cancer development, using a novel and potentially powerful approach (systematic evaluation of biased allele retention). These results are likely to have utility in genetic testing for childhood cancer susceptibility and may also provide mechanistic insight into the development of childhood cancers laying the groundwork for new or re-purposed therapeutic interventions. Teber, Erdahl CHILDREN'S MEDICAL RESEARCH INSTITUTE Therapeutically amenable targets using stratified pediatric cancer genomic and gene expression profiles Dec19, 2013 closed The goals of this research are: 1) to use the TARGET data sets to develop and streamline, methodologies and bioinformatic pipelines to stratify pediatric cancer patients based on plausible drug-alteration specific relationships (initially using currently available pharmaceuticals); 2) to implement computational methods to identify putative driver mutations and driver genes for individual cases using probabilistic models; prioritise driver mutations or altered molecular functions that are amenable to effective targeted therapies; and 3) to seek grant opportunities to implement similar match-normal experiments (70-80% tumour nuclei with no more than 20-30% necrotic tissue) to those of TARGET (mRNA-Seq, whole genome and epigenetic tests) using local pediatric cancer patients and facilitate comparisons across studies. The research will be carried in accordance with the Data Use Certificate Agreement, Therapeutically Applicable Research to Generate Effective Treatments (TARGET), dated April 21, 2010. Childhood cancers are relatively uncommon, with an average of 620 new cases per year in Australia, and are the second commonest cause of death among Australian children aged 1- 14 years. Acute lymphoblastic leukemia cases accounted for 26% of the total incidence of Australian childhood cancers, acute myeloid leukemia (6%), neuroblastoma (6%), renal tumours (5.3%) and osteosarcoma (2%). The adverse effects from traditional treatments include immune suppression, hair loss, inflammation of the lining of the digestive tract, and in the worst cases, treatment-related secondary malignancies. Although the survival rates are about 75% across all types of childhood cancers, many childhood cancer survivors experience long-term effects, including infertility, growth and hormonal deficiencies, increased risk of obesity and lower life expectancy compared to the general population. Unlike chemotherapy, more-specific and targeted cancer therapies are expected to be less harmful to normal cells and have fewer side effects. Insights from large scale next generation sequencing cancer studies have highlighted the extensive genetic variability of cancer genome between individuals (inter-tumour heterogeneity) and variability within the same individuals (spatial and temporal heterogeneity). This complexity in heterogeneous mixtures of cancer genomes poses major challenges for the effectiveness of targeted cancer drug therapies and understanding of drug resistance. Given the relatively small numbers of childhood cancer cases available for early phase clinical trials, the use of the additional pediatric cancer data sets from TARGET would improve the statistical power and confidence in stratifying pediatric cancer patients based on their altered genomic structure and gene expression profiles (histology-independent and alteration-specific trials). Temiz, Nuri UNIVERSITY OF MINNESOTA APOBEC mutagenesis in pediatric tumors Jan27, 2022 approved The DNA cytosine deaminase APOBEC3B is upregulated and its preferred target sequence is mutated in several adult tumor types. We hypothesize that an APOBEC mutational signature will impact all human cancers including pediatric tumors. We will test this hypothesis and simultaneously conduct student training exercises by re-examining TARGET RNA sequencing data, re-determining the mutation calls, and performing mutation signature analysis. We will also attempt to correlate mutational signatures, gene expression levels, and other quantifiable genetic markers. The intended use is purely academic, and key findings will be submitted for peer review and ultimately publication. 1. Objective: The DNA cytosine deaminase APOBEC3B is upregulated and its preferred target sequence is mutated in several adult solid tumors. We hypothesize that an APOBEC mutational signature will impact all human cancers and it may play a role in pediatric tumors as well. We will test this hypothesis and simultaneously conduct student training exercises by re-examining raw TARGET RNA sequencing data, re-analyzing the mutation calls, and performing mutation signature analysis. We will also attempt to correlate mutational signatures, gene expression levels, and other quantifiable genetic markers. 2. Study Design: Upon approval, we will download raw sequencing data for one cancer at a time to facilitate data management. Once an analysis of one cancer is complete, we will proceed to analyze the next cancer until all TARGET cancers have been analyzed. If available, sequencing data from matched normal tissues will be analyzed in parallel to enable somatic mutation identification. All rna sequences will be aligned to the current human genome build using the hisat2 or star aligners. Gene expression analyses will be performed using the HISAT/subread/htseq algorithms. 3. Analysis Plan: We are interested in mutation patterns and correlating features in pediatric tumors. For instance, once mutations have been identified and subjected to signature analyses, we will ask whether a particular signature correlates with gene expression and/or a specific gene alteration. Appropriate statistical tests will be applied such as two-tailed student’s t-test with corrections for multiple hypothesis testing. 4. Explanation for how the proposed research is consistent with Use Restrictions for the requested dataset(s): We will not distribute the information to any other party beyond our academic group. Our studies will not require subject identification (i.e., all original donors will remain anonymous). The intended use is purely academic and predominantly for training purposes. Key findings will be submitted for peer reviewed publication. TENEN, DANIEL NATIONAL UNIVERSITY OF SINGAPORE Integrative Analysis of RNA editing and Alternative Splicing Crosstalk Across Different Tissues and Disease States Apr26, 2020 closed RNA editing and alternative splicing are important RNA regulatory mechanisms for generating diversity in the human transcriptome. Although these processes are often studied independently of each other, it is likely that such processes can mutually occur and affect each other. Here, we have developed computational algorithms to identify RNA editing and alternative splicing events. Using computational methods we developed, we plan to perform an integrative global analysis of RNA editing, mRNA modification, alternative splicing as well as alternative polyadenylation and these co-occurring events across a large panel of RNA-Seq datasets in order to decipher the tissue and disease specific signatures. Dysregulation in alternative splicing events has been shown to be a common feature of adult AML. We would like to utilize TARGET paediatric AML samples to study the frequencies of the intron retention and explore the possibility to use retained introns for development of neoantigen immunotherapy for paediatric AML. We further utilized various normal tissues as controls for selection disease-specific transcripts including spliced and intron-retained transcripts. Those transcripts could be served as neoantigens. Thomas, David GARVAN INSTITUTE OF MEDICAL RESEARCH Targeting inflammation in tumours Aug18, 2017 rejected Our goal is to identify new treatment options that can increase the survival rate and quality of life for children diagnosed with osteosarcoma, a cancer of bone. Using genetic mouse models we have identified candidate genomic variations that may be pivotal in pediatric osteosarcoma development and progression. This study will use the human pediatric osteosarcoma datasets to investigate the relevance of data obtained from mouse osteosarcomas. We are particularly interested in inflammation and immune regulation of tumour growth and how these relate to genomic complexity. In addition we will use the RNA sequencing and clinical outcome data to identify novel biomarkers for treatment response Objective: This project aims to use the data in dbGaP to investigate how genomic variations identified to predispose to osteosarcoma in genetic mouse models may translate to human pediatric osteosarcoma development and progression as well as identify biomarkers for selecting treatment options. Study design: We will use RNA seq data to define gene expression signatures identifying immune cell infiltration and activation. We are particularly interested in the expression of inflammatory mediators, how they sculpt tumour development and how they correlate to genomic complexity. Analysis plan: We will use gene expression data and sequencing data to interrogate somatic genetic variants, CNV and INDELs and analyse their effect on gene expression and treatment outcome. We plan to analyse pediatric patients in the TARGET osteosarcoma database to determine clinical outcome in relation to expression of inflammatory molecules and genomic instability. We will also determine whether molecular biomarkers for treatment response can be identified in human pediatric osteosarcomas compared to control samples. Thomas-Tikhonenko, Andrei CHILDREN'S HOSP OF PHILADELPHIA Alternative splicing in pediatric cancers Dec28, 2015 approved The proposed work aligns closely with the NCI-issued Pediatric Provocative Question 5: “What molecular and cellular mechanisms allow reactivation or bypassing of … tumor suppressor genes in pediatric cancers?” Our work will likely establish that one such mechanism is improper assembly of transcripts from building blocks called exons. 1. The objective of this project is to further study post-transcriptional gene regulation in pediatric solid and liquid tumors. Specifically, we aim to identify splicing alterations that are common across multiple histotypes (ranging from neuroblastoma to acute leukemia) and thus are likely to be key to malignant growth. Likely examples include truncated isoforms of the TP53 tumors suppressor gene, which contribute to both tumor progression and therapeutic resistance. 2. Our overall hypothesis is that deregulation of splicing is a hallmark of pediatric tumors. This is because in childhood cancers tumor suppressor genes (TSG) are seldom affected by genetic mutations, and post-transcriptional inactivation of TSG (e.g., via alternative splicing) is the likely underlying mechanism. At the same time, aberrant splicing has the potential to create neo-epitopes, which can be targets for novel immunotherapies. 3. We will test this hypothesis by systemically profiling exon usage using multiple RNA-Seq datasets representing various histotypes and both diagnostic and relapsed samples. Whenever matching whole exome sequencing (WES) datasets are available, we will determine whether genetic and post-transcriptional events in a given gene are mutually exclusive. 4. We propose to further develop a computationally robust pipeline allowing us to call RNA splicing events. This will be achieved by first comparing the output of splicing algorithms such as MAJIQ and rMATS and focusing on events predicted by both algorithms. We will then identify both aberrantly spliced oncogenes/tumor suppressors and genes encoding new surface proteoforms targetable with immunotherapy. 5. Although the Controlled Access data tier contains data that may be unique to an individual, the proposed research will be conducted using coded and/or de-identified samples. We do not seek to obtain survival or other clinically relevant data. 6. Individual datasets in TARGET/dbGaP will not be shared with outside collaborators. Thompson, Reid OREGON HEALTH & SCIENCE UNIVERSITY Global transcriptional profiling in development and disease Jan24, 2020 closed We plan to combine multiple datasets together, and mine them for insights about: (1) the different ways genes can be expressed; (2) whether all copies of a gene are created equal and (3) how genes may differ between individuals; (4) what happens after a gene is expressed; and (5) how different gene expression patterns relate to disease, especially among pediatric and adult tumors. We will share our work broadly with the public and the scientific community to the maximal extent allowable. This project represents an independent research effort at Oregon Health & Science University (OHSU) to reanalyze human RNA sequencing (RNA-seq) data across large collections of cancer and non-cancer tissues from both controlled-access and publicly available datasets, taking into account underlying genetic variability and other orthogonal data where available and relevant (including but not limited to DNA-seq, miRNA-seq, ChIP-seq, bisulfite sequencing, and proteomics data). Our objectives are: (1) to explore the diversity and specificity of RNA processing (including alternative RNA splicing, intron retention, RNA editing, polyadenylation and other modifications); (2) to explore the diversity and specificity of isoform- and/or allele-specific transcript expression patterns (including human and parasitic element sequences including but not limited to active and inactive viruses); (3) to explore HLA, cytochrome P450, and other sequence diversity in the population; (4) to explore translational and post-translational processes (including but not limited to protein production levels, alternative start codon utilization, nonsense mediated decay processes, proteasomal digestion and antigen processing and presentation); (5) to release resources that allow other investigators to easily pursue additional analyses of harmonized RNA-seq data leveraging the work described above; (6) to explore the developmental patterns of splice isoform utilization among adult and pediatric cancers. Our research outputs will include one or more (likely several) publications, including full analyzed/harmonized data as relevant and permissible to release, as well as detailed methodologies and software tools to recapitulate or extend our analyses. We specifically intend to confine released outputs for each RNA-seq sample to splice junctions, retained introns with respect to gene annotations of human reference genomes (e.g., GENCODE), RNA edits, HLA type, and viral read counts. We do not seek to release individual SNVs, CNVs, indels, or structural variation. TING, TAO ZHEJIANG UNIVERSITY The genomic differences in neuroblastoma across ethnic groups Sep16, 2022 expired The goal of this study is to compare the genetic differences between East and West pediatric neuroblastoma cohorts, and evaluate the impact of these genetic alterations on clinical outcomes. To reach this goal, we will analyze the gene mutations and fusions of the West neuroblastoma cohorts from the requested data, and compared with the East neuroblastoma cohorts from our own data. We will also draw the landscapes of neoantigens and other gene signatures for all these pediatric tumors in the dataset, and compare with East cohort from our own data. Together with the clinical characteristics of the patients, we are going to analyze how these genetic variants, neoantigens and gene signatures affect patient survival. The results will provide new diagnostic and prognostic markers for neuroblastoma across human ethnic groups. Neuroblastoma is the most common extracranial solid tumor in children, and accounts for 15% of all childhood cancer deaths. It is unknown whether gene mutations and fusions in neuroblastoma are distinct with different ethnic background. Better understanding of the population-specific genetic differences in neuroblastoma will help to improve the treatment of this devastating disease. We have performed whole-exome sequencing and mRNA sequencing of Chinese neuroblastoma samples. This project aims to compare the genetic differences between East (from our data) and West (from the requested data) pediatric neuroblastoma cohorts, and evaluate the impact of these genetic alterations on clinical outcomes. The data will not be compared with adult genomic data. The analysis will be performed with the standard bioinformatic pipelines including STAR, bowtie2, GATK, samtools, etc. We will also draw the landscapes of neoantigens and other gene signatures for all these pediatric tumors in the dataset, and compare with East cohort from our own data. The impact of these signatures on clinical outcomes will be evaluated. This analysis will be performed with the standard bioinformatic pipelines including pVACtools, GSVA, etc. All the data will be stored in an encrypted computer and analyzed according to the Data Use Certification (DUC) Agreement. The results will be freely shared with the scientific community. TORETSKY, JEFFREY GEORGETOWN UNIVERSITY Splicing difference comparison between cancer samples and tissue normal samples at select sites - TARGET comparison set Mar20, 2019 closed Cancer cells have developed unique ways to change the proteins they make to enable growth as well as immune system evasion. In this project we would like to evaluate one aspect of these changes that cancer cells make in pediatric cancer; the way that the genomic message is modified prior to translation to protein. We believe that this may present a uniquely targetable opportunity, and wish to determine the transcriptional changes that are unique to a specific pediatric malignancy, Ewing sarcoma. Objectives of proposed research: This project is designed to determine mRNA splicing variants that are unique to a pediatric malignancy, Ewing sarcoma. In order to determine target specificity, we will need to compare mRNA splicing patterns to related normal tissue as well as with other pediatric malignancies (neuroblastoma). Study design: We will analyze exon inclusion rates at specific sites in neuroblastoma using existing, published tools (Salmon aligner, SUPPA2 quantification). We have already created annotation for specific splicing events we wish to examine. Analysis plan: Analysis will be conducted in a streaming fashion, with raw sequence data stored only for the duration of an individual sample analysis on a full-disky encrypted RAID-10 storage solution. FASTQ files will not be retained once processing for an individual sample is complete. How research is consistent with data use limitations: The DUL for TARGET datasets indicates: "Use of protected TARGET datasets should be for research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that have likely relevance to developing more effective treatments, diagnostic tests, or diagnostic markers for childhood cancers. Applications proposing methods, software, or other tool development would not be considered acceptable uses of the data." The intended comparison would not be possible with an adult cancer dataset, as we wish to use a cancer that as closely as possible approximates our disease of interest, a related pediatric malignancy. This data will be analyzed using previously published tools and will not be used to develop methods or analytical strategies. All publications or posters based on this research will reference dbGaP, this phs, and the TARGET consortium Torrents, David BARCELONA SUPERCOMPUTING CENTER Role of transposase-derived genes in pediatric tumor formation. Jul06, 2016 closed Transposase-derived genes promote the movement of elements in the genome. These genes are particularly active in pediatric tumors. We will compare the DNA of normal and tumor cells from different childhood cancer patients in order to retrieve the modifications in the genome driven by the activity of these genes. Comparing these genomic modifications between patients of the same cancer type and across cancer types we will be able to evaluate the recurrence of these changes in the DNA and see if they affect more than one patient and more than one cancer type. Having information of the levels of activity of transposase-derived genes, we want to search for correlation between this activity and the presence of specific modifications in the genome of the patients. The project associated to this application aims to provide a better insight on how the expression of transposase-derived genes drives recurrent structural modifications in the genome of pediatric cancers. Our interest is focused on the linkage between the expression of these genes and tumor formation. Specific transposase-derived genes are highly expressed in some types of pediatric solid tumors such as Ewing sarcoma, primary Rhabdoid Tumors, Osteosarcoma, Medulloblastoma, Neuroblastoma and Rhabdomyosarcoma. We will apply a uniform set of alignment and variant calling algorithms to whole genome sequence tumor/normal pairs from different datasets of pediatric tumors. This data will be supplemented by RNA-seq expression analysis. Our final goal is to characterize and classify large recurrent genomic rearrangements derived from transposition events and evaluate whether these events are frequent across patients in the same cancer type and whether they are common between different types of cancer. Having access to TARGET data is crucial to push forward our analysis and contribute to the understanding of tumor formation and progression in childhood cancers. Triche, Timothy VAN ANDEL RESEARCH INSTITUTE Epigenetic and genetic abnormalities in relapsing and refractory myeloid leukemia Jan23, 2018 approved The majority of children and adults with acute myeloid leukemia (AML) respond to induction chemotherapy, which aims to "de-bulk" the disease by killing fast-dividing blast cells. Patients with high-risk features and a suitable donor then typically undergo a stem cell transplant (SCT); lower-risk patients receive consolidation chemotherapy. However, some patients do not respond to induction, and many more relapse after consolidation chemotherapy and/or SCT. It is these patients who most often die from their disease. In the TARGET AML project, we identified recurrent structural and genetic abnormalities predicting relapse and treatment failure. We seek to determine how pervasive these drivers are across age groups, which processes are most often affected, and what vulnerabilities carried by surviving leukemia-initiating cells can be targeted to improve outcomes. The goal of this project is to establish what epigenetic changes are shared and which are distinct between pediatric and adult relapsing and refractory acute myeloid leukemia (AML). eRRBS data from the Epigenetics of Acute Myeloid Leukemia study is requested for analysis alongside DNA methylation microarray data from TARGET AML diagnostic and relapse samples, primarily from COG AAML 0531; to test for association of specific structural and genetic variants with altered regional DNA methylation in clones and subclones; and to investigate both common and distinct features of pediatric and adult relapsing AML. To the best of our knowledge, no comparable dataset exists. RNAseq data from the Epigenetics of Acute Myeloid Leukemia study is requested for analysis alongside RNAseq data from the TARGET AML study, particularly as it relates to interchangeable epigenetic or mutational suppression of specific gene functions (particularly EZH2, DNMT3A, TET2, SETD2, and TP53) and the role of their suppression in promoting treatment-resistant relapsing disease within or across age groups. Intergenic and repetitive transcripts are of particular interest, hence raw data is required for this comparison. Results from these comparisons will be made public, and shared with the research community at large, in the most expeditious manner practicable. Triche, Timothy VAN ANDEL RESEARCH INSTITUTE Pan-TARGET analysis of aberrant epigenetic regulation in high-risk pediatric malignancies Mar12, 2018 rejected In both children and adults, high-risk cancers are distinguished by their ability to survive and adapt to both chemo and targeted therapies. The TARGET project (Therapeutically Applicable Research to Generate Effective Treatments) explicitly aims to find new regimens in pediatric tumors where existing treatments do not cure all patients. However, analyzing individual diseases in isolation, or without comparing their features to adult tumors in the same tissues, it is possible that groups of patients with similar features (whether genetic or otherwise) across diseases may be missed. More importantly, some of these groups may be treatable with existing drugs. This project aims to jointly analyze both adult and pediatric tumors, concentrating on functional ("epigenomic") features rather than solely upon genetic mutations, and identify uncommon but recurrent groups of children whose features more closely resemble adult disease, in order to improve their treatment and outcomes. In some disease types, we also need population genetic frequencies to make sense of existing "epigenetic" data. A useful pooled analysis of targetable epi/genetic alterations in pediatric malignancy satisfies three conditions: 1) it reflects the differences between pediatric (or, more generally, "young people") versus adult tumors and tumor types, which can only be observed by comparing data across age groups. 2) it considers inherited genetic variants (cancer predisposition; other predisposition; unknown significance) as possible influences on age of onset and co-mutational profiles by age. 3) it seeks to publish results as soon as practical and with the minimum of paywalling (historically, in TARGET AML, we just put everything on bioRxiv at the time of submission.) To accomplish this, we rely upon three sources of raw data: 1) Germline and somatic genetic variant calls in both kids and adults with cancer. These are available at GDC and BAMs may be recalled if necessary. We seek GTEx data to additionally map the landscape of expressed variants in normal tissues. We have begun using GMKF projects (Ewing trios, TARGET pAML X01) to inform potential predisposition and modifier variants, and seek to use AACR GENIE to expand the pool of age groups and tumors represented for co-mutational profiles. Unexpected results will be posted and published as fast as is practicable, whether via bioRxiv, medRxiv, ASH/AACR abstracts, or published papers (usually a combination of all three; a Google Scholar search for any TARGET author will confirm our publication habits). 2) mRNA/miRNAseq data. To the best of our knowledge, neither an "omnibus" transcriptome nor transcript-level quantifications against such a transcriptome, including transcribed repetitive elements, is available. We have shown (https://www.biorxiv.org/content/early/2017/11/09/217299) that the latter is prognostic in myeloid leukemia, independent of existing markers. We suspect that specific mutations or regulatory aberrations lie at the root of this prognostic relevance. Until we have the raw data to quantify known and/or novel transcripts in specific tissues and diseases, we do not believe its full potential can be realized, least of all in understudied pediatric malignancies. Therefore we seek to index and reprocess TARGET RNAseq data to facilitate this aim. As above, when we find something new, we submit and publish it ASAP. 3) DNA methylation data. The TCGA project conducted whole-genome bisulfite sequencing (WGBS) on a limited number of primary tumors and adjacent normal tissues. The TARGET project employed DNA methylation microarrays for similar purposes, along with a single TARGET project (rhabdoid tumors) that extensively assayed a number of primary and recurrent rhabdoid tumors by whole-genome bisulfite sequencing. Read-based quantification of subclonal structure is far more informative than array-based quantification, but the latter may yet be usable for inference about subclones, iff the inferences are validated. The paired TCGA microarray and WGBS data is ideal for this exercise. In adult diseases (e.g. chronic lymphocytic leukemia), epigenetic heterogeneity is a documented prognostic marker. We do not know if this is the case in some or all pediatric cancers. We are currently in the process of preparing a pan-pediatric comparison of primary tumor and cell-free DNA (methylation-aware CNA and mutation calling) to determine the diagnostic and predictive significance of epigenetic heterogeneity in response. The goal is to determine whether pediatric cancer risk stratification can be advanced by pooling specimens across pediatric & younger adult tumor projects, and focusing upon functional (rather than genetic) markers to stably identify strata. In some cases, recalling or reassembly of existing data from related tumors will be required. Raw pediatric data is therefore indispensable. Existing successes from the EISAI and TACL clinical trials, as well as preliminary results from the TARGET induction failure and TpAML trio studies, suggest that within-group heterogeneity (especially cell-of-origin signatures left in DNA methylation) is a significant and currently underused predictor of treatment response in (at least) leukemia and sarcoma. The goal of this project is to determine the extent of this phenomenon and determine which patients will benefit sufficiently to justify widespread non-genetic testing in clinical trials. Tsang, Hsinyi NIH NCI Cancer Genomics Cloud Pilots/Resources Evaluation Feb01, 2018 closed The NCI Cloud Resources cloud computing model allows scientists, clinicians, and researchers to access NCI-generated data and analyze these data with compute available on commercial cloud infrastructures. The co-localization of data and compute eliminates the need to download store petabyte-scale data and maintain a local compute environment. Cloud computing has been used as a low-overhead, cost-effective alternative to high performance computing, which might not be available at all research institutions. The objective of this project is to evaluate the NCI Cloud Resources, formerly known as NCI Cancer Genomics Cloud Pilots. These resources were funded by NCI and developed by the Broad Institute, the Institute for Systems Biology and Seven Bridges Genomics based on commercial cloud platforms, Google Cloud Platform and Amazon Web Service. (https://cbiit.cancer.gov/ncip/cloudresources) We assess and validate technical and scientific capabilities being continuously implemented on these platforms, which include analytical tools and workflows, using NCI Cancer datasets hosted on the platform. Currently, these cloud platforms host datasets from the Genomic Data Commons, which includes TCGA, TARGET, CCLE, as well as CPTAC and TCIA. This list of datasets is expected to expand to include other data types in the future. As part of the evaluation, we collaborate with the cancer research community, both intramurally and extramurally, using the platforms to analyze data by constructing proof-of-concept workflows and pipelines using above-mentioned NCI-generated data. Most of these collaborative projects work with controlled-access data in dbGaP. To date, we have developed containerized tools and workflows for whole exome sequencing (WES/WXS) variant/neoantigen prediction analysis, RNA-seq analysis, microbial sequencing (microbiome, metagenomics, pathogen) analysis, and data visualization. We collaborated with individual researchers to develop many of these tools and workflows. In addition, we offer education and training workshops for users of the platforms periodically and when requested. Tsirigos, Aristotelis NEW YORK UNIVERSITY SCHOOL OF MEDICINE Analysis of mutations at putative enhancer elements Feb23, 2023 approved We analyzed the chromatin conformation of a small cohort of pediatric patients affected by acute T cell leukemia and healthy controls and found a high number of interactions encompassing super-enhancer (SE) regions. We would like to investigate the non-coding genome of a larger cohort of pediatric patients in order to verify whether active super-enhancers are associated with recurrently mutated regions, with the final goal to better stratify disease types and reveal new therapeutic targets for children affected by T-ALL. Introduction We are going to analyze the raw files (.fastq) of whole genome sequencing from TARGET ALL study (phase 2 and phase 3, focusing only on T-ALL cases). We will check whether regions of active super-enhancers (detected from ROSE algorithm on our H3K27ac ChIP-seq from pediatric T-ALL patients) are enriched in somatic mutations compared to the rest of the genome and control regions of similar size. Analysis plan In agreement with the Data Use Limitations (DUL), TARGET ALL WGS files will be only handled and analyzed by the PIs indicated on the application and will be exclusively used to detect somatic mutations for research purposes, with the aim to reveal recurrent genomic alterations encompassing SE regions in pediatric patients affected by T-ALL. Such investigation is solely needed to identify recurrent genomic alterations in coding and non-coding regions on a sizable number of pediatrci patients, it will be treated as protected health information (only stored on restricted-access portions of our computing nodes) and will not be used for commercial purposes. TARGET program and the Children Oncology Group (COG) remain the will be acknowledged on every scientific output of this research. External collaborations with other institutions outside of NYU Grossman School of Medicine are not planned for this study. Link to TARGET We request access to NCI TARGET ALL dataset to investigate the frequency of single nucleotide and structural variations corresponding to SE regions across pediatric patients affected by T-ALL. The research objectives cannot be accomplished using adult patients data for a number of reasons, namely the unique molecular features of the disease in the pediatric age and the different therapeutic regimens used in the adult. Moreover, our preliminary SE analyses have been generated from pediatric patient samples, and a valid comparison can only be obtained from age-matched patient samples. Tuller, Tamir TEL AVIV UNIVERSITY Modeling and understanding gene expression in cancer Mar19, 2020 approved Personalized medicine based on extensive genomic information will become the gold standard in cancer research and therapeutics. Hence, a key challenge in cancer research is distinguishing consequential driver mutations from the inconsequential passenger ones. We believe that there is a fortune of undiscovered and/or yet non-understood information in the genome, the ‘dark-matter’ of the DNA, which drives cancer evolution. A key challenge in cancer research is distinguishing consequential driver mutations from the inconsequential passenger ones. However, current models consider only mutations that affect the amino-acid compositions of proteins, while crucial genomic information is ignored. To this end, we purpose to develop an integrative comprehensive predictive model, which represent the linkage between silent mutations and the fitness of specific cancer types (also in children). We believe that the novel information uncovered using our approach will be found beneficial for better treatment for rare and pediatric cancer types. In the future, the availability of such models may be used for various objectives such as improved diagnosis, rational drug design, and “tailor maid” individual therapeutic protocols. The purpose of this project is to perform comprehensive analysis of various cancerous human genomes (both pediatric and adults), while focusing on silent cancerous mutations (along with the 'trivial' non-silent ones). We aim at detecting novel non-trivial driving mutation, specifically focused on rare and/or hard-to-treat cancerous conditions, such as childhood cancers. We intend to broadly share our findings with the scientific community, by publishing in the relevant journals. To this end, we develop novel computational models, tools, and pipelines for the analysis of silent mutations for various practical health/medical/biomedical objectives. Specifically, our models are designed to provide better understanding and diagnosis abilities of driving cancerous mutations. These models can automatically detect and interpret novel cancerous mutations; thus, we think it can be very useful for the understanding of less studied and rare types of cancer types (such as can be found in the Foundation One dataset), via 'projecting' them on gene expression models. Moreover, the information found in the TARGET dataset can also be very useful in this respect, since it gives our models a unique opportunity to broaden the knowledge on intracellular mechanisms affecting the efficiency of current patient care, especially in pediatrics patients. We believe our approach has an advantage on other methods currently used, and will allow for better understanding of yet unknown driver mutations in rare and pediatric cancer types; thus, based on our analyses we have confidence that improving our understanding of cancer genomes in this dataset through the ability to connect mutations to gene expression is crucial for this mission. The research plan includes the following steps: the collection, normalization, and the standardization of thousands of public cancerous genomes and related phenotypic / clinical / demographical information that are available today. To this end, we develop models that enable the detection of positive silent evolution in different parts of the cancer genome, and tailor our models to connect various gene expression aspects to mutations, such that they can enable an efficient prediction of the cancerous mutations’ effect on the cancer fitness. Based on these models and data we plan to show that various types of silent mutations are under selection in cancer, and develop a predictive model that enables cancer genome diagnosis and classification based on silent (and non-silent) information. The controlled data that will be used in the study includes the available genomic data and all available clinical data (e.g., survival, stage, treatments, etc.). Update: Based on small-scale analyses of AML by novel computational approaches we found evidence of various “hidden” mutations that affect AML tumorigenesis and the fitness of AML cancer cells. Thus, we would like to improve our models via the analyses of a larger database. The result models are expected to provide better understanding and diagnosis abilities of AML driving cancerous mutations; they may also suggest novel therapies for AML. The data from the Leukemia study will only be used in research consistent with the data use limitation and will not be combined with other datasets / phenotype. Tuong, Zewen Kelvin UNIVERSITY OF QUEENSLAND Harnessing Adaptive Immune Repertoires for Detecting and Monitoring Paediatric Cancer Jun13, 2024 approved This project aims to help children with cancer by understanding how their immune cells work when facing cancer cells. Using sophisticated computer models, we will analyse the immune cells and the special receptors that these cells use to recognise and fight cancer. The results from this work is expected to lay the groundwork for next level childhood cancer detection and forecasting, potentially leading to a simple blood test for diagnosing cancer in children. A cancer diagnosis at any age is upsetting, but felt more harshly when the patient is a young child who has only started out in life. Compared to adult cancer patients, the window of opportunity to help child cancer patients is especially short. This project's main hypothesis is that adaptive immune receptors can be harnessed to track and understand the precise state of the immune response against cancer. This is because all aspects of T and B cell development and function depend on the T and B cell receptors (TCR/BCR); the receptors specifically recognise tumour antigens and fight cancer cells. The research aims to predict adaptive immune repertoires in childhood cancer using gene expression information contained in the RNA-sequencing data. This can be achieved using softwares such as TRUST4, which generates a predicted immune receptor rearrangement based on sequence alignment and highest expressing genes. The predicted immune receptors will be used train machine learning models that can be used to classify whether a new repertoire from a new sample/patient would be similar/dissimilar to the repertoires contained in the childhood cancer data. Other information contained in the associated dbGaP dataset(s) that may be used (if available) will not be specific to individuals e.g. broad cancer type/site/severity and thus should not post any additional risks to participants. The outcome of this project is to boost the number of paediatric cancer-associated immune repertoires that we can use to train the machine learning models for downstream evaluations. Ulitsky, Igor WEIZMANN INSTITUTE OF SCIENCE Comparison of the OMS neuroblastoma transcriptome with transcriptomes of low-risk and high-risk non-OMS neuroblastoma tumors Jul28, 2016 closed We are studying a rare autoimmune disease, called Opscolonus-Myoclonus Syndrome (OMS) that affects about 2-3% of children with neuroblastoma (NB), a common solid tumor of childhood. OMS causes primarily neurological symptoms because the brains of these children are attacked by their own immune systems, in a way that is thought to be triggered by the tumor. While OMS causes significant problems for these children, like difficulty walking and talking, difficulty controlling their hands, and cognitive problems, it also coincides with improved survival of the cancer: kids without OMS survive about 30-40% of the time. Kids with OMS survive >90% of the time. We wonder whether there is a differences between the NB tumors in kids with OMS that helps them survive the cancer, even as it causes other problems. We will look at what genes are expressed in the OMS tumors and compare to similar information from non-OMS tumors, to try to figure out from the differences what might make the OMS tumors turn out better, and what might be the gene product that triggers the autoimmune disease. We are studying opsoclonus-myoclonus syndrome (OMS), which affects 2-3% of children with neuroblastoma (NB). Since tumor related outcomes for children with OMS are significantly better than for children without, we wish to compare gene expression differences in the NBs from these two patient populations. Our hypothesis is that differences in gene expression, either from transcript levels or novel splice variants, etc. in OMS associated NBs underlie differences in immunogenicity which lead both to autoimmune outcomes and to improved immune surveillance and tumor immunity. We will also use this comparison to pan for the elusive antigen that is the target of autoimmunity leading to neurological symptoms in OMS. We plan to combine available data (RNA-seq) from the TARGET data and compare to a new RNA-seq data set that we are generating ourselves using NB tumors from patients with OMS. We will quantify gene expression levels and compare to levels in NBs that are not associated with OMS (and compare NBs with similar risk stratifications for the NBs themselves within those two subsets). For our experiment, we will be doing de novo sequencing of only few non-OMS associated NBs, which will not provide sufficient statistical signal to serve as a good control for our OMS dataset. No additional patient risk will be introduced, since we plan only to use deidentified patient data- specifically, RNAseq data from tumors and some limited information about patient outcomes, tumor pathology, and tumor genetics, to allow us to bin the tumor data according to tumor risk category. Unnikrishnan, Ashwin UNIVERSITY OF NEW SOUTH WALES Identification of aberrant splicing in T-ALL Dec22, 2020 approved T-lineage acute lymphoblastic leukemia (T-ALL) is an aggressive hematologic malignancy that is mainly diagnosed in children and requires treatment with intensified chemotherapy. This therapeutic regimen can result in life-threatening toxicities. Thus, further advances in the treatment of pediatric T-ALL requires the development of effective and highly specific targeted anti-leukemic drugs, which requires a better understanding of paediatric T-ALL disease biology. In the past, childhood T-ALL biology research has largely been focused on genetic and transcriptomic analyses. However, aberrant RNA splicing as an additional level of complexity implicated in the biology of paediatric T-ALL remains largely unexplored. Given this, we will here use a protected Therapeutically Applicable Research to Generate Effective Treatments (TARGET) dataset to confirm aberrant splicing in pediatric T-ALL. This work will be relevant to identify an aberrant splicing signature that can serve as a biomarker for the use of spliceosome inhibitors in the treatment of childhood leukemia. The objective of this study is to reconfirm that pediatric T cell acute lymphoblastic leukemia (T-ALL) consist of different subtypes that are characterised by differential RNA splicing. These differences were initially identified based on polyA RNA sequencing analysis of a cohort of 64 pediatric T-ALL patients (Peirs et al, Blood 2014) that we obtained from Saint-Louis Hospital (Paris, France) in collaboration with Prof Jean Soulier. Differential splicing analysis was performed using previously published bioinformatic pipelines (Anande G et al. Clinical Cancer Research 2020). For this, we would like to request access to the fastq files from a previously published study in which RNA sequencing has been performed on a large cohort of pediatric T-ALL samples (Lui et al., Nature Genetics 2017). In this manuscript, it is stated that FASTQ files from RNA-seq data from this study are accessible through the database of genotypes and phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap) under the accession numbers phs000218 (TARGET) and substudy specific accession phs000464 (TARGET ALL Expansion Phase 2). We will used these data to reconfirm the aberrant splicing signatures in pediatric T-ALL as mentioned above. Given that the initial discovery cohort only consisted of paediatric T-ALL patient samples, we can only use paediatric T-ALL samples to perform validation of our findings in an independent cohort. Therefore, and given potential differences between the biology of T-ALL in the paediatric and adult setting, we hereby confirm that our research objective cannot be accomplished using data from adults. In addition, given that we are using computational biology pipelines that have already been developed and are publicly available, we hereby confirm that we will not use these data for methods, software, or other tool development purposes. Finally, confirmation of aberrant splicing in an independent pediatric T-ALL cohort will eventually allow us to identify putative splicing based biomarkers that could eventually be used to select pediatric T-ALL patients that might benefit from spliceosome inhibitors, such as such as H3B-8800 (Seiler et al., Nature Medicine 2018). Given this, research use of these data will likely be relevant for developing more effective treatments, diagnostic tests, and/or prognostic markers for childhood cancer. Van Allen, Eliezer DANA-FARBER CANCER INST Characterization of the relationship between structural variants, mutations, and transcriptomes across pediatric cancers Oct04, 2019 approved Pediatric cancer genomes are relatively silent when compared to adult cancer genomes. Several pediatric cancers are characterized by recurring mutations, some of which have an effect on treatment and outcomes. Previously, insights informing therapy in pediatric cancers have come from looking at the intersection between changes in cancer DNA and RNA. This has been done in acute lymphoblastic leukemia and has led to the broader use of tyrosine-kinase inhibitors in some cases of very high-risk disease. We plan to extend this analysis to pediatric cancers where this has not been done as systematically, namely the extra-cranial solid tumors osteosarcoma, Ewings sarcoma, neuroblastoma, and Wilms tumor, and compare to adult tumors (e.g. prostate, breast hematologic cancers) to evaluate for similarities or differences. Pediatric cancer genomes are relatively silent when compared to adult cancer genomes, with a coding mutational rate of less than 1 somatic mutation per Mega base in many cases. Several pediatric cancers are characterized by recurrent coding mutations, some of which have prognostic significance. In some case, exploration of transcriptional patterns associated with these coding mutations has yielded clinical insights that have been used to guide therapy. As an example, the recognition that differentially expressed genes in B-cell acute lymphoblastic leukemia cases characterized by an IKZF1 deletion are similar to those characterized by a BCR-ABL translocation has led to the recognition of a “Ph-like” phenotype, which has broadened the role of tyrosine kinase inhibitors in the treatment of acute lymphoblastic leukemia. However, a fuller characterization of the relationship between recurrent coding mutations, non-coding mutations, structural variants, and transcriptomes in pediatric cancers - and how they relate to adult tumors with such variants (especially prostate/breast cancer, heme malignancies, among others) or to non-oncology pediatric genetic syndromes (e.g. congenital diaphragmatic hernia) more broadly is still needed. The most opportunity for new insights to guide therapy exists in the solid tumor space, among some of the more common extra-cranial solid tumors such as osteosarcoma, Ewings sarcoma, neuroblastoma, and Wilms tumor. We will continue to analyze the whole-exome, whole-genome, and RNA sequencing data (where available) from approximately 3,000 extra-cranial pediatric malignancies to better characterize patterns at the intersection of these datasets with the goal of identifying new biological insights to inform therapy. In order to have appropriate power to validate these insights, we will be combining datasets from the TARGET studies, previously published studies on Ewings sarcoma (not currently explored in TARGET), and the St. Jude Cloud database. This combination of datasets, as well as comparisons to control unaffected cohorts (including parental trios) will not create any additional risks to participants. We intend to publish our findings from this study, and also broadly share any findings with the scientific community through conference presentations. van Boxtel, Ruben PRINSES MAXIMA VOOR KINDERONCOLOGIE, BV Comparing the genetic landscapes of pediatric AML with normal human HSCs Aug13, 2018 closed Cancers are believed to result from sequential mutation accumulation, which may explain that old age is the most important risk factor for getting this disease. However, this idea does not explain why certain cancers, such as leukemia, show higher incidence in young children compared to young adolescents. Indeed, the cells of young children contain less (cancer) mutations than those in older people. By studying these DNA mutations in childhood leukemia, we aim to trace back the life history of the disease allowing us to determine the processes that cause transformation of a healthy cell into a leukemic cell. Also, it might be that there is an environmental factor, such as infections, important for the onset of childhood leukemia. From the available genetic data, we will fish out non-human pieces of DNA and test whether there is DNA from certain viruses or microorganisms present. This project attempts to shed light on why children can get cancer, which may ultimately provide opportunities to develop strategies for childhood cancer prevention. Cancer is associated with accumulation of mutations in the genome, which is correlated with aging. This association, however, raises the question why children get cancer, while their cells contain a low number of somatic mutations. To tackle this challenging question, we have generated whole-genome sequencing data of physiologically normal hematopoietic stem cells (HSCs) and multipotent progenitors (MPPs) derived from human donors of varying age (0 – 63 years old). Here, we request access to the pediatric acute myeloid leukemia (pAML) WGS and RNA-seq data for two sub-projects: (1) to compare mutation loads, and (2) to search for evidence of viral and or microbial infections. We focus our research on pAML as this is has the poorest outcome amongst all pediatric leukemia. In the first project, we will compare the number of mutations for all types (e.g. base substitutions, small insertions and deletions, copy number alterations, structural variation and chromosomal alterations). We will determine the clonality of the mutations by correcting the variant allele frequencies for allelic copy number state. For this, the raw sequencing data is needed. This analysis will allow us to determine the order of mutation occurrence and study each subclone separately. The RNA-seq data will allow us to test effects of these mutations on gene expression. In addition, we will perform mutational signature analysis and determine the genomic distribution. In the second project, we will focus on the sequencing reads that did not align to the human reference genome and assess to which species these reads belong. We will explore common factors among the AML samples as well as differences between AML and in-house normal samples. We will perform this analysis both for the whole-genome sequencing as well as the RNA-seq data. This analysis will allow to the the hypothesis that infection may be an important event in driving pAML. These projects will provide further insight into the etiology of pAML. Ultimately, the knowledge obtained by these studies will allow us to develop improved targeted treatment with the aim to improve survival of patients suffering from pAML. van der Reijden, Bert STICHTING RADBOUD UNIVERSITAIR MEDISCH CENTRUM I.O. Expression signatures, prognostic impact and oncogenic properties of novel RNA molecules in adult and pediatric AML Jan27, 2022 approved Leukemia is a very aggressive type of blood cancer that affects a lot of people worldwide, including adults and children. In general, leukemia patients have poor prognosis and this is why there is a increasing concern about the need of more efficient treatments. Currently most of the patients are treated with chemotherapy. However, this treatment has very limited efficacy as leukemia is known to be a very heterogeneous disease. Thus, there is a need for more specific treatments beside chemotherapy. Our goal is to find vulnerabilities of specific types of leukemia to use them for therapeutic intervention. This will be done by identifying signatures of RNA molecules that do not have information to generate proteins. We believe that these type of RNA molecules have a major role on regulation of different leukemic processes and therefor can potentially be suitable targets for future therapies. As childhood and adult leukemias might have different vulnerabilities, in this project we aim to study both types of leukemia independently, as well as to identify age-independent vulnerabilities. For decades, RNA molecules with poor coding-potential (ncRNAs) have been considered by-products generated during the transcription of protein-coding genes. A publication from 2018 revealed that there are more than 250.000 ncRNAs in humans, 10 times more than protein-coding transcripts. Several studies have been performed proving that deregulation of these ncRNAs (lncRNA, piRNA, snoRNA, siRNA, miRNA, tRNA etc.) can lead to cancer development and maintenance. In addition, several ncRNAs have been described to be good biomarkers for cancer detection. In acute myeloid leukemia (AML), the deregulation of ncRNAs and their pathobiological effects are still poorly understood. Our goal is to identify specific ncRNA signatures for different subtypes of AML as well as to identify novel non-annotated ncRNAs. In addition, the RNA-seq data will be used to assess the oncogenic properties of the ncRNAs and their prognostic impact in AML. Candidate ncRNAs will be further studied in in vitro assays to define the molecular mechanism behind their implication in malignant transformation and leukemia maintenance. We believe that this study will help to improve the risk classification of AML patients as well as to identify new biomarkers and targets for therapeutic intervention. It is well known that adult and childhood leukemias have great differences in terms of transcriptomics, epigenomics and pathobiology. Thus, in this project founded by KiKa (kinderen kankervrij) we want to study them independently. We aim to identify childhood specific leukemia vulnerabilities as well as age-independent vulnerabilities that would facilitate the development of new rational therapies. The requested data will be combined with the AML-05 childhood leukemia cohort. In addition, the data from the blueprint might be potentially used, as well as some internal RNAseq data. Van Vlierberghe, Pieter GHENT UNIVERSITY Identification of aberrant circular RNA expression in T-ALL Nov12, 2021 expired T-lineage acute lymphoblastic leukemia (T-ALL) is an aggressive hematologic malignancy that is mainly diagnosed in children and requires treatment with intensified chemotherapy. This therapeutic regimen can result in life-threatening toxicities. Thus, further advances in the treatment of paediatric T-ALL requires the development of effective and highly specific targeted anti-leukemic drugs, which requires a better understanding of paediatric T-ALL disease biology. In the past, childhood T-ALL biology research has largely been focused on genetic and transcriptomic analyses. However, aberrant circRNA expression as an additional level of complexity implicated in the biology of paediatric T-ALL remains largely unexplored. Given this, we will here use a protected 'Therapeutically Applicable Research to Generate Effective Treatments' (TARGET) dataset to confirm aberrant circRNA expression in pediatric T-ALL. This work will be relevant to identify specific circRNAs that can serve as a biomarker for the use of specific inhibitors for the treatment of childhood leukemia. The objective of this study is to show that the RNA-binding protein QKI regulates circular RNA expression in paediatric T cell acute lymphoblastic leukaemia (T-ALL). For this, we initially used total RNA sequencing data from a cohort of 25 paediatric T-ALL patients (Verboom et al, Haematologica 2018) which we obtained from Saint-Louis Hospital (Paris, France) in collaboration with Prof Jean Soulier. CircRNA analysis was performed using a previously published bioinformatic pipeline CirCompara2 (Gaffo E et al. Brief Bioinform 2021). To confirm these initial findings, we would like to request access to the fastq files from a previously published study in which total RNA sequencing has been performed on a large cohort of pediatric T-ALL samples (Lui et al., Nature Genetics 2017). In this manuscript, it is stated that FASTQ files from RNA-seq data from this study are accessible through the database of genotypes and phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap) under the accession numbers phs000218 (TARGET) and substudy specific accession phs000464 (TARGET ALL Expansion Phase 2). We will use these data to compare circRNA expression between paediatric T-ALL samples with high versus low QKI expression. Given that the initial discovery cohort only consisted of paediatric T-ALL patient samples, we can only use paediatric T-ALL samples to perform validation of our findings in an independent cohort. Therefore, and given potential differences between the biology of T-ALL in the paediatric and adult setting, we hereby confirm that our research objective cannot be accomplished using data from adults. In addition, given that we are using computational biology pipelines that have already been developed and are publicly available, we hereby confirm that we will not use these data for methods, software, or other tool development purposes. Finally, confirmation aberrant circRNA expression in an independent pediatric T-ALL cohort will eventually allow us to identify putative biomarkers that could eventually be used to select paediatric T-ALL patients that might benefit from specific novel treatment modalities. Given this, research use of these data will likely be relevant for developing more effective treatments, diagnostic tests, and/or prognostic markers for childhood cancer. Van Vlierberghe, Pieter GHENT UNIVERSITY Identification of aberrant splicing in T-ALL Nov12, 2020 expired T-lineage acute lymphoblastic leukemia (T-ALL) is an aggressive hematologic malignancy that is mainly diagnosed in children and requires treatment with intensified chemotherapy. This therapeutic regimen can result in life-threatening toxicities. Thus, further advances in the treatment of pediatric T-ALL requires the development of effective and highly specific targeted anti-leukemic drugs, which requires a better understanding of paediatric T-ALL disease biology. In the past, childhood T-ALL biology research has largely been focused on genetic and transcriptomic analyses. However, aberrant RNA splicing as an additional level of complexity implicated in the biology of paediatric T-ALL remains largely unexplored. Given this, we will here use a protected Therapeutically Applicable Research to Generate Effective Treatments (TARGET) dataset to confirm aberrant splicing in pediatric T-ALL. This work will be relevant to identify an aberrant splicing signature that can serve as a biomarker for the use of spliceosome inhibitors in the treatment of childhood leukemie. The objective of this study is to reconfirm that pediatric T cell acute lymphoblastic leukemia (T-ALL) consist of different subtypes that are characterised by differential RNA splicing and differential circRNAs. These differences were initially identified based on polyA RNA sequencing analysis of a cohort of 64 pediatric T-ALL patients (Peirs et al, Blood 2014) that we obtained from Saint-Louis Hospital (Paris, France) in collaboration with Prof Jean Soulier. Differential splicing analysis was performed using previously published bioinformatic pipelines (Anande G et al. Clinical Cancer Research 2020). For this, we would like to request access to the fastq files from a previously published study in which RNA sequencing has been performed on a large cohort of pediatric T-ALL samples (Lui et al., Nature Genetics 2017). In this manuscript, it is stated that FASTQ files from RNA-seq data from this study are accessible through the database of genotypes and phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/gap) under the accession numbers phs000218 (TARGET) and substudy specific accession phs000464 (TARGET ALL Expansion Phase 2). We will used these data to reconfirm the aberrant splicing signatures in pediatric T-ALL as mentioned above. Given that the initial discovery cohort only consisted of paediatric T-ALL patient samples, we can only use paediatric T-ALL samples to perform validation of our findings in an independent cohort. Therefore, and given potential differences between the biology of T-ALL in the paediatric and adult setting, we hereby confirm that our research objective cannot be accomplished using data from adults. In addition, given that we are using computational biology pipelines that have already been developed and are publicly available, we hereby confirm that we will not use these data for methods, software, or other tool development purposes. Finally, confirmation of aberrant splicing in an independent pediatric T-ALL cohort will eventually allow us to identify putative splicing based biomarkers that could eventually be used to select pediatric T-ALL patients that might benefit from spliceosome inhibitors, such as such as H3B-8800 (Seiler et al., Nature Medicine 2018). Given this, research use of these data will likely be relevant for developing more effective treatments, diagnostic tests, and/or prognostic markers for childhood cancer. Vandesompele, Jo GHENT UNIVERSITY Elucidation and functional analysis of lncRNA variation in neuroblastoma Aug07, 2014 closed Using the requested datasets we aim to chart the differences in non-coding regions of the genome between normal, healthy samples and neuroblastoma samples. Resulting differences/variants will be prioritized taking into account their frequency, conservation, predicted function and impact on folding of the surrounding RNA molecule. The variation status of a chosen set of most promising RNA molecules - called long non-coding RNA - will be confirmed and assessed in other cancer types to determine their relevance. After insertion of the corresponding variants in healthy cells, a phenotypic comparison between these mutant cell and healthy cells will be performed to help unravel the function of the affected RNA molecule. Finally, interaction partners of these RNA molecules will be determined. It is known that genomic variation can have an effect on disease development and progression. Genome-wide association studies (GWAS) have revealed that at least one-third of identified disease associated variants and more than 80% of cancer-associated single nucleotide polymorphisms (SNP) map to non-coding regions of the genome. Although variation in lncRNAs is less well characterized in comparison with protein coding genes or microRNAs, several lncRNA variants linked to diseases such as diabetes, obesity and cancer have been identified. However, the mechanisms behind the lncRNA/disease association are not always clear. We aim to expand the knowledge on lncRNA variation in cancer, more specifically in the childhood cancer neuroblastoma, and elucidate underlying disease-causing mechanisms. Characterization of lcnRNA variation in neuroblastoma will be based on whole-genome sequencing data in combination with an integrative, multi-variate prioritization scheme - taking into account sequence conservation, frequency, impact on lncRNA folding, gene neighborhood analysis (e.g. lncRNA in cis with known cancer gene), predicted function and lncRNA expression levels - resulting in a set of therapeutically and functionally most promising lncRNAs. Variation status of the prioritized lncRNAs will be confirmed by allele specific qPCR and re-sequencing. Selected lncRNAs will be re-sequenced in other cancer types to assess their relevance in cancer before taking them into the validation phase. Functional validation will occur on different levels. CRISPR will be used to create mutant cell lines harboring the specific variants, followed by molecular (gene expression) and phenotypic comparison between wild-type and mutant cells. Finally, interaction partners of the lncRNAs of interest will be identified using immunoprecipitation followed by mass spectrometry and sequencing. Vandesompele, Jo GHENT UNIVERSITY Genetic susceptibility in childhood cancer Aug14, 2015 closed In our research study we want to explore if childhood cancer patients and their families have an increased likelihood of developing cancer based on the presence of genetic variations in cancer predisposition genes. These specific genetic variations are often inherited, can contribute to cancer but do not directly cause cancer. Well-known examples are the genetic mutations in the BRCA1 or BRCA2 genes that cause an increased likelihood of developing breast cancer and ovarian cancer. In our project, we will explore if genetic variations are present in genes that are linked to an increased likelihood to develop cancer, by exploring if germline variations are present in these genes in childhood cancer patients. Finally, we want to develop guidelines in our clinic for possible further genetic testing of these childhood cancer patients and family members. The TARGET datasets would be ideal to explore if genetic susceptibility is present with pediatric malignancies. Childhood cancer patients might be susceptible to cancer by carrying germline mutations in cancer predisposing genes. A first explorative study of the occurrence of cancer in families of childhood cancer patients underscored that a fifth of the childhood cancer patients had several first-, second- and third-degree relatives with cancer. For the next step of this project, the whole exome sequencing data of remission samples from childhood cancer patients of the TARGET project are specifically needed to be able to explore our research question further. For this, the exome studies of the different tumor types will be used to retrieve a general view of childhood cancer and genetic susceptibility by exploring germline mutations in the remission samples of childhood cancer patients. If germline variations are frequently detected in the TARGET datasets, we will explore the presence of genetic variations in cancer predisposing genes on childhood cancer remission samples of our own hospital. The information that we would retrieve from these datasets would help us to explore if childhood cancer patients that were treated in our hospital and their families would benefit from the genetic testing of germline mutations in cancer predisposing genes resulting in further genetic counseling and intensive follow-up. We will not combine the requested datasets of TARGET with other datasets. Vaske, Olena UNIVERSITY OF CALIFORNIA SANTA CRUZ Treehouse Childhood Cancer Initiative: Identification of therapeutic leads for individual pediatric cancer patients via pan-cancer analysis MOVED FROM: DAVID HAUSSLER Oct24, 2016 approved The focal point of this project is comparing genomic data of an individual pediatric patient's tumor to the genomic data of thousands of pediatric and adult tumors. This comparison is called a pan-cancer analysis, because the individual's data is being compared to data from many different cancer types. The power of this pan-cancer analysis is in the numbers. Although pediatric cancer is rare, by comparing to all the pediatric cancer and adult data possible, similarities can be spotted. If an individual child's tumor shares similarities with another tumor-type that happens to have a successful treatment option, that treatment option might also be successful for the child's tumor. This is especially powerful when no other treatment options are available for the child. This approach could be an efficient method of determining which cancer fighting treatments developed for adult tumors might be good candidates in specific pediatric cancers. We are also interested in identifying and understanding RNA alterations that drive human cancers. These data will allow us to generate pipelines for detection and downstream characterization of aberrant transcripts. These data will also allow us to examine gene expression changes associated with these and other cancer-driver mutations. Treehouse Childhood Cancer Initiative uses tumor gene expression analysis based on RNA-seq for detecting aberrations in the genome & epigenome. The detection of differentially expressed genes requires context in which to examine the expression of individual genes in a patient’s tumor sample, cell line, or xenograft requiring similarly processed data for comparative research. Our N of 1 analysis is a pan-cancer & pan-disease comparative approach. It uses a growing compendium of gene-level expression estimates derived from >12K adult & pediatric cancer samples, and is derived from consortia datasets and data from individual labs, including but not limited to TCGA (phs000178), TARGET (phs000218), CGCI (phs000235), GMKF (phs001820), Cancer Moonshot (phs002192),Nationwide IGM (phs001820) ICGC, PCGP, CBTTC, DIPG (phs000900, phs001526 & phs001526), neuroblastoma (phs000868 & phs001436), osteosarcoma (phs000699), ALL (phs000522 & phs001738), liver carcinoma (phs000709), FGFR gene fusion (phs000602), University of Michigan CSER (phs000673), rhabdomyosarcoma (phs000720), Ewing’s sarcoma (phs000768 & phs000804), Burkitt lymphoma (phs001282), BASIC3 (phs001026), AML (phs001027 & phs001657), antisense gene expression (phs000937), nerve sheath tumors (phs00079), neurofibromatosis 1 (phs001993), rare peds sarcoma (phs001121), pediatric solid tumors (phs001052), T-ALL (phs001513, phs002276), and low grade gliomas (phs001054). LGG data will strengthen our research characterizing the role of stalled cellular differentiation in low grade pediatric nervous tissue tumors. We need to add data for cancers with few samples in our compendium, such as Burkitt Lymphoma (BL) & ped nervous tissue tumors. When we receive data from patients with a specific cancer, such as BL, brain tumor or chemo-resistant T-ALL patient, or leukemia, our analysis is stronger when we have data from more of these tumor types. In order to make the interpretation of this data stronger, we will include somatic variant and somatic fusion calls in the compendium. Unlike germline variant calls, somatic variants are not identifiable. The compendium doesn’t include raw RNA-seq data or identifying information, only gene-level expression estimates. These estimates are derivative work, computed from tumor RNA-seq data by our RNA-seq processing pipelines. We are also developing compendia composed of normal tissues (phs000424 & phs000819), cancer cell lines (phs001800) and xenograft data (phs001437), which are allowing us to see which expression features are shared between patient tumors, models (xenografts and cell lines) as well as which features are specific to tumor cells and not found in normal tissues. The information about the features shared across patient samples and cancer models is useful to guide laboratory research. For example, we and other investigators would like to study genes in cell lines, whose overexpression is conserved in primary tumors. The addition of the compendium of normal tissues allows us to see which of these genes are cancer-specific and not present in normal cells. We would also like to broadly understand RNA alterations in these datasets. These data are critical to understanding several important questions, such as gene expression variations in oncogenes and tumor suppressor genes, cancer-specific RNA splicing changes from mutations in spliceosome components, and the identification of aberrant splicing events and novel fusion genes that drive tumorigenesis. Several aspects of this project will look at RNA-specific changes associated or not associated with a genomic mutation, creating detrimental nonsynonymous mutations at the protein level, and harmful synonymous mutations at the RNA level. In the process of answering these questions, our researchers will also develop tools for better detection of fusion genes, somatic variants, abnormal splicing events, or other aberrant changes in RNA or gene regulation in cancer. However the data will not be used primarily for tool development. Although these investigations will not initially be limited to one disease, we have found that multi-disease analysis often leads to insights about specific diseases. We’re committed to sharing information. Our website links to the UCSC Xena Browser where the public can download our gene expression compendia, explore them via several visualization tools, & can access our processing pipelines. We will continue to publish our findings in scientific journals. Venteicher, Andrew UNIVERSITY OF MINNESOTA Epigenomic regulation in pediatric and adult sarcoma Jun06, 2024 approved This project will identify new epigenetic changes that can identify mechanisms of cancer and pediatric sarcoma initiation and treatment resistance. This project aims to create new prognostic and therapeutic strategies in a personalized medicine approach. Our laboratory is interested in studying noncanonical mechanisms of tumorigenesis, focusing on the role of DNA mutation and structural variations and its influence on transcription networks. We request these sequencing datasets to test computational pipelines that we have created to annotate mutations seen across different pediatric cancer/sarcoma types. The research objective of this study is to identify epigenetic alterations that correlate with patient outcomes and transcriptional network activation with an emphasis in children. The study design will be to combine these publicly available datasets to institutional datasets to link molecular signatures with patient outcomes. Specifically, genetic alterations that correlate with transcriptional profiles will be correlated with patient progression-free and overall survival. The analysis plan will be to use computational approaches using R to identify molecular signatures that can be validated by combining these independent datasets. The goal of the study is to identify new prognostic and therapeutic vulnerabilities that can improve the care of cancer patients. No risks to the included patients will result from this study. The required terms provided in the Data Use Agreement will be followed. The results will be shared with the general scientific community. Villa-Morales, Maria UNIVERSIDAD AUTONOMA DE MADRID Drug priorization and discovery of novel biomarkers for the benefit of pediatric T-ALL patients. Nov27, 2020 approved T-cell lymphoblastic leukemia/lymphoma (T-ALL/LBL) is an aggressive type of hematological cancer mainly affecting children and adolescent males. Current cure rates are high, but two main problems can arise during the clinical management of patients: toxicity induced by chemotherapy, which often leads to treatment discontinuation or long-term side effects; and re-occurrence of tumor after remission (relapse), which is a terrible scenario since cure rates in such case fall down to ~7%. Thus, more effective ways to treat these patients are urgently required. The objectives of our team are to suggest alternative tailored treatments and to discover novel alterations that help decide therapeutic options for pediatric T-ALL patients, in particular for patients suffering toxicity-derived side effects or not achieving complete remission with current protocols. All the results from the analysis undertaken in this project will be published in scientific journals for the benefit of the broad pediatric cancers research community. Despite the detailed knowledge of the global landscape of alterations accompanying pediatric T-ALL tumors, even leading to a more refined classification of this disease, these patients keep receiving a standard treatment protocol based on high-dose multiple chemotherapy, sometimes followed by HSC transplant. Although very successful, this therapeutic approach is not infallible. Toxicity caused by treatment leads to discontinuation or to adverse long-term side effects in the ever-increasing surviving pediatric population. Around 15% of pediatric T-ALL patients relapse, being current protocols roughly futile in such case, as cure rates collapse to ~7%. Our aim is to reduce treatment-derived toxicity and to improve the clinical management of pediatric T-ALL patients, in particular of those not achieving complete remission with current therapies. We propose to do so by two perspectives: 1) Personalized precision medicine: omics data will be integrated as input for the drug priorization algorithm PanDrugs. This will be done in collaboration with Dr. Al-Shahrour (CNIO) and Dr. Fernández-Navarro (ISCIII), following previously described approaches (1–3). The valuable availability of matched diagnosis, remission and relapse samples allows the integration of intra-tumor heterogeneity as an additional layer of information for drug prescription. We expect to suggest alternative tailored treatments for sufferers of toxicity, non-responders or relapsed pediatric T-ALL patients, as well as combinations potentially capable of targeting clonal diversity. 2) Improved therapeutic stratification: omics data will be searched for novel alterations affecting non-coding genes (miRNAs, lncRNAs, circRNAs…) that may represent new biomarkers with prognostic or therapeutic value. The exploration of these new biomarkers in future pediatric patients might improve as well the rate of success of their clinical management. All the results from the analyses undertaken in this project will be published in scientific journals for the benefit of the broad pediatric cancers research community. References: 1. doi: 10.1186/s12885-019-6209-9 2. doi: 10.3390/cancers11091361 3. doi: 10.1186/s13073-018-0546-1 Volchenboum, Samuel UNIVERSITY OF CHICAGO Cancer Clinical Data Commons (C3DC) Development (not research) Aug01, 2022 approved The University of Chicago's Pediatric Cancer Data Commons (PCDC) group has developed the world's largest collection of pediatric cancer data. The PCDC is expert in data dictionary development and data commons deployment. The PCDC is performing data harmonization work as part of a contract with the NCI/Leidos (task order 17X147F18). As part of this work, UChicago requires access to controlled-access data to download and harmonize. PCDC will be developing repeatable pipelines for harmonizing the data and facilitating ingestion of data into the emerging NCI-C3DC (part of the Cancer Research Data Commons ecosystem). The Childhood Clinical Data Commons (C3DC) is part of NCI’s Childhood Cancer Data Initiative (CCDI). Data will be used only for development - not research. The development of the C3DC will be crucial for ongoing pediatric cancer research, and researchers will access those data through independent mechanisms. The University of Chicago's Pediatric Cancer Data Commons (PCDC) group has developed the world's largest collection of pediatric cancer data. The PCDC is expert in data dictionary development and data commons deployment. The data are being requested as part of a contract (Task order 17X147F18) between University of Chicago, Leidos, and NCI. As part of this work, UChicago will be developing harmonization pipelines for pediatric cancer data - clinical and genomic - for deposition into the NCI C3DC. The Childhood Clinical Data Commons (C3DC) is part of NCI’s Childhood Cancer Data Initiative (CCDI). Note that these data are for development only and not research. No one will be accessing these data for research. Ultimately, the C3DC, part of the NCI's Cancer Research Data Commons ecosystem, will be a valuable resource for researchers wanting to study pediatric cancer and link data to other data commons. Volchenboum, Samuel UNIVERSITY OF CHICAGO Neuroblastoma Classifier Development Jan10, 2013 closed Most genomic tests developed thus far have had limited practical clinical utility due to the time and expense needed to use them. A simplified method of distinguishing among neuroblastoma risk groups would enable better stratification prior to treatment. We have developed an computer algorithm to rapidly test all possible two-gene combinations from gene expression data, and by leveraging the power of our high-performance computing grid, we can perform an analysis in a few hours that would otherwise take many months. Our goal is to develop simplified disease classifiers that can be used in the clinical for rapid decision-making at the time of diagnosis. High-resolution molecular genetic tools and transcriptome-wide gene expression profiling have revealed molecular subtypes within seemingly uniform cancers. Nevertheless, few molecular classifiers have been implemented in the clinic. Supervised analytical methods such as decision trees, support vector machines, and Naive Bayes classification all facilitate the generation of multi-gene classifiers able to separate two or more clinically significant classes. Until now, such algorithms were needed to compensate for the enormous number of feature combinations and the time needed to test all of them. As a result, the literature is rife with complex, multi-gene signatures of uncertain clinical value. Faster computers and parallelization now make it possible to rapidly test an enormous number of feature combinations. We hypothesize that a brute-force exhaustive search algorithm could be used to supplant complex gene classifiers as determined using traditional supervised methods with a much simpler set of features. We recently tested this concept on rhabdomyosarcoma, the most common soft tissue sarcoma in children. Using gene expression data from 100 rhabdomyosarcoma samples, we applied an exhaustive search algorithm and identified several two gene classifiers able to distinguish fusion-positive from fusion-negative disease with >98% efficiency. We confirmed several of these two-gene classifiers in three other unrelated rhabdomyosarcoma datasets. We now endeavor to recapitulate this strategy to distinguish neuroblastoma subtypes. In collaboration with our high-performance computing cluster, we have developed a version of our search software that runs on a distributed architecture, able test all 1.5 billion gene dyads in under six hours, a task that would take several months on standard desktop hardware. We will also use heuristics to prune the search space in order to test a proportionally large number three-gene combinations. We hope to develop simplified classifiers for neuroblastoma, ultimately confirming their use prospectively. Such a simplified classifier could readily be used in the clinical setting for stratification at the time of diagnosis. Volinia, Stefano UNIVERSITY OF FERRARA DETECTION OF NOVEL FUSION RNA SPECIES IN PEDIATRIC AML Mar19, 2020 closed Pediatric acute myeloid leukemia (AML) encompasses a heterogeneous group of malignancies, with great variability in terms of response to therapy. Approximately 20% of pediatric AML patients do not show any known chromosomal aberrations. The vast majority of sequencing studies have focused the attention on coding and regulatory regions. We aim to investigate the unmapped sequences in pediatric AML. We have developed a novel bioinformatic method that could help in identifying such sequences in pediatric AML. We will also adult AML for comparison and validation. Pediatric acute myeloid leukemia (AML) encompasses a heterogeneous group of malignancies, with great variability in terms of response to therapy. Even if the majority of patients harbor recurrent chromosomal translocations that are known to confer a proliferative and survival advantage, impair hematopoietic differentiation and promote self-renewal of hematopoietic progenitors, approximately 20% of pediatric AML patients do not show any known chromosomal aberrations. Up to now the vast majority of sequencing studies have focused the attention on coding and regulatory regions, but only marginal attention has been paid to unmapped reads, that are those that deviate significantly from human reference genome, often signaling the presence of a rearranged gene or transcript. We aim to investigate normally unmapped sequences in pediatric AML, since it is known that this disease frequently harbors chromosomal rearrangements that give rise to chimeric fusion proteins having an oncogenic potential. Additionally, we will also look for circular RNAs, which are still unmapped sequences, but do not arise from DNA rearrangements. We have developed a novel bioinformatic tool to look for novel funsion RNAs in unmapped reads, whose power, sensitivity and specificity in identifying hints of chimeric transcript presence needs recursive testing on a large dataset of pediatric and possibly adult AML for comparison. To this end the objective of the proposed research is to explore the performance of this method using data from the dbGAP database of AML RNA-sequencing, requesting access to the TARGET-AML and to the adult TCGA-LAML dataset, due to the marked differences that characterize myeloid neoplasms in adults and children. The identified chimeric events, if found, will be analyzed by bioinformatic predictions and experimentally by cell-line models. Data obtained will be correlated with clinical and biological phenotypes, if available. This methodological work will advance the genetic knowledge of the oncogenic events occurring in pediatric and adult AML. This novel approach, if confirmed on a large dataset, could improve the molecular classification of acute myeloid leukemia in children and adults. Requested datasets will only be used during methodological development, for validation purposes and data comparison. Data use limitations will be strictly observed. Access to requested databases will be restricted to the PI. Results obtained from the analysis of this data will be shared with the scientific community. Walsh, Kyle DUKE UNIVERSITY Post-GWAS discovery of cancer predisposition genes in pan-cancer datasets Mar13, 2018 approved We seek to determine whether the inherited genetic causes of a particular type of cancer (e.g. lung cancer) are also causes of other types of cancer (e.g. bladder cancer) and to determine if this contributes to the observation that cancer sometimes appears to "run in families". We will also investigate whether the genetic causes of other traits, like oral health, are associated with cancer risk. We will investigate genetic susceptibility to cancer in children and adults, comparing risk across malignancies to determine whether specific cancers have shared causes. Specifically, we will investigate the shared heritability and shared risk alleles across adult-onset cancers (e.g. lung, pancreatic, breast, CLL, prostate, bladder, renal, melanoma, glioma, NHL, others), childhood cancers (e.g. acute leukemia, osteosarcoma, neuroblastoma, medulloblastoma, rhabdoid tumors, Ewing sarcoma, others), and non-cancer phenotypes (e.g. dental caries, height attainment). We will also apply a phenome-wide association study (PheWAS) paradigm to determine if known risk loci for a particular phenotype (e.g dental caries) increase risk of different cancer phenotypes, answering questions like: "do genetic determinants of oral health predict cancer risk?". Case-case and case-control comparisons will be made using GWAS methods, SNP imputation, fine-mapping, Mendelian randomization, and heritability estimation. To increase statistical power, we will incorporate additional controls from dbGaP (Geisinger, POPRES) and external datasets (Wellcome Trust Consortium controls, Illumina iControl subjects). Combining genotype data will not create any additional risks because, like dbGaP data, Wellcome-Trust and iControl subjects are anonymized and de-identified. The proposed research does not include the study of population origins or ancestry. We agree to make study results available to the larger scientific community through presentations and publications. Where disease-specific ‘Data Use Limitations’ apply, we will abide by the DUL. Walsh, Kyle UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Genetic drivers of pediatric malignancies May01, 2014 closed We are performing genetic research into the underlying causes of pediatric cancers. To do this, we compare the genetic code of hundreds of children with cancer to the genetic code of thousands of people without cancer in order to identify genetic markers which predispose to malignancy. We additionally compare acquired mutations across cancer sites to identify genetic factors underlying differences in cancer presentation. We follow up on our findings by correlating these markers with gene expression and regulation in hopes of identifying biomarkers of disease and drugable targets. We will not use ALL relapse data (PHS000638.V1.P1) to study population origins or ancestry. We agree to make results of studies using these data available to the larger scientific community. The specific aim of our study is to identify and confirm constitutive and acquired genetic drivers of pediatric malignancies, with a specific focus on understanding the genetic heterogeneity underlying various pediatric malignancies. We will explore constitutive polymorphisms (e.g. SNPs), acquired mutations, and telomere parameters. These analyses will be integrated with results from our ongoing childhood leukemia genomewide association studies (5R01CA155461) and an active UCSF Brain Tumor SPORE. We plan to use these requested data in ongoing and future studies of the associations between subject genotype and case-control status by integrating existing control genotype data into analyses (e.g. Illumina iControls or Wellcome Trust Controls). Datasets from our current and ongoing cancer studies will be combined with the requested genetic data using both meta-analysis techniques, and as a replication/validation cohort. Somatic mutations and telomere parameters will be compared across cancer sites. Because there is genetic overlap in predisposition to pediatric and adult cancers (e.g. CDKN2A/CDKN2B variants predispose to ALL, melanoma, glioma and other cancers), we will also seek to determine if risk factors for childhood cancers also increase risk of specific adult cancers. Wan, Liling UNIVERSITY OF PENNSYLVANIA The underlying mechanism of how ENL mutation induces Wilm's tumor May26, 2023 closed Wilm’s tumor is a significant public health concern for children in the U.S., but the underlying mechanism of its initiation and development still remain incompletely understood. Our lab focuses on a series of somatic mutation on a specific protein ENL which can be found in ~9% of the Wilm’s tumor patients. We are trying to understand how these mutation lead to the tumorigenesis. We will use the gene expression profile data from the Wilm’s tumor patients to come up with the corresponding signature for different subtypes, and then figure out which subtype has the similar features as the one we get from ENL mutant mouse model. This study can provide novel insights about the potential identification of ENL as a Wilm’s tumor biomarker and facilitate the clinical treatment decision making. Wilms tumor is the most common kidney cancer type in children, and it counts about 5% in all the pediatric cancer every year. While the detailed mechanism of Wilm’s tumor development is not fully understood, more and more evidence suggests its link with abnormality of epigenetic regulation. Our lab discovered a series of oncogenic mutations in the chromatin regulator, Eleven-Nineteen-Leukemia (ENL), which can trigger aberrant condensates on chromatin and result in hyper-activation of target genes. In order to understand how these ENL mutations influence embryonic kidney development and Wilm’s tumor, we generated a mouse model to introduce the specific ENL mutation in kidney. Utilizing this mouse model and transcriptomics analysis, we discovered that ENL mutations induce transcriptional programs that result in abnormal expansion of the mouse nephron progenitor cells in embryonic kidney, which resemble the key features of Wilm’s tumor. To further understand how ENL mutation-induced transcriptional changes resemble those in Wilms tumor, we want to compare the gene expression profiles obtained from the mouse models to those obtained from different subtypes of Wilm’s tumors. Our proposed analysis will help inform the cellular and molecular features of ENL mutant Wilms tumor and its link to other subtypes of Wilms tumor. Wan, Shibiao UNIVERSITY OF NEBRASKA MEDICAL CENTER Investigating health disparities in Wilms Tumor from multi-omics data Jul20, 2023 approved Health disparities relate to differences in health outcomes and access to healthcare that are potentially influenced by social, economic, and environment factors. The TARGET project has multiple types of molecular data that can be used to unravel the molecular differences among individuals or populations. In this project, we are specifically focusing on Wilms tumor, the most common pediatric kidney malignancy which affects about 1 in 10,000 children. Our goal is to investigate biological factors contributing to the cancer health disparities in Wilms tumor. Health disparities relate to differences in health outcomes and access to healthcare that are potentially influenced by social, economic, and environment factors. Multi-omics data including genomics, transcriptomics, and epigenetics data can provide comprehensive molecular profiles that can be used to unravel the molecular differences among individuals or populations. In this project, we are specifically focusing on Wilms tumor, the most common pediatric kidney malignancy which affects about 1 in 10,000 children. Our goal is to investigate biological factors contributing to the cancer health disparities in Wilms tumor. By analyzing multi-omics data in the context of racial or ethnic differences, we can gain deep understanding of underlying biological, genetic, and molecular factors that contribute to disparities in cancer incidence, prognosis, and treatment outcomes. We will analyze the TARGET data independently. We expect that we may find some racial-related biological factors (in different omics data) that are highly related to cancer disparities in Wilms tumor. Wang, Hsei-Wei NATIONAL YANG-MING UNIVERSITY Characterization of disease-causing sequence variants in pediatric neuroblastoma Apr04, 2013 closed NGS are enabling genome-wide measurements of somatic mutations in large numbers of cancer patients. DNA sequencing studies have successfully identified disease-causing sequence variants by examining variants common to a group of patients. In this project, we aim to discover variants of neuroblastoma cancer that may suggest the targets of therapy. We have sequenced a dozen of pediatric neuroblastomas in Taiwan and would like to investigate the difference of Western (TAEGET) and Eastern cases. We aim to identify genetic factors and mechanisms associated with different pediatric cancers through meta-analysis of TARGET and our private sequencing data. Comparing with the sequencing data of TARGET, we would like find out the genetic difference between the Western and Eastern individuals. WANG, KANKAN SHANGH RUIJIN HOSPITAL Molecular genetics of childhood acute myeloid leukemia Jun26, 2015 approved AML is a very heterogeneous disease with numerous structural gene variations and aberrantly expressed genes. Although many efforts have been made in uncovering the development and relapse of AML, there remain some induction failure case during the treatment. Moreover, many AML patients suffer disease relapse. Thankfully, the genome-wide approach using whole genome sequencing and RNA-seq has provided plenty of useful information to help us to better understand the genesis of AML. Thus, the information hidden in these high-throughput data is urgent to seek. By using the datasets in the TARGET database, we have identified some recurrent mutation and translocation in pediatric leukemia, including RUNX1. We are planning to validate these findings in our clinical cohort. Also, we compared the genetic mutations and expression profiling from patients with pediatric leukemia and adult leukemia. We found that the mutation pattern and expression pattern were different between pediatric and adult mutations. We are planning to analyze the DNA or RNA-seq data using an improved method. And we are trying to establish a genetic panel for pediatric AML diagnosis and monitoring. Leukemia is a complex disease with various genetic aberrations and aberrantly expressed genes, although many efforts have been made in conquer the disease especially in genetic sequencing, the clinical outcome of treatment remains not optimistic, especially in pediatric AML. Thus, the aim of this study is to deeply understand and discover genes associated with the development and relapses of pediatric acute myeloid leukemia (AML), and acquires several useful targets or biomarker for the diagnosis and prognosis of pediatric AML. We plan to re-analysis the whole genome sequencing and RNA-seq data of pediatric AML in TARGET to reveal the driver mutation event in childhood AML. We will particularly pay attention to gene variations and differentially expressed genes associated with the clinical outcome in patients, and then we will try to conduct a diagnosis panel for pediatric AML based on the targets we analyzed using the TARGET AML datasets. Moreover, the AML Induction Failure (AML-IF) data will be used to discover the underlying mechanism of relapse and induction failure of AML. A potential set of gene mutation and aberrant expression will also be used for monitor the relapse event of AML in pediatric AML patient. WANG, Qianfei BEIJING INSTITUTE OF GENOMICS Validation of alternative splicing events in pediatric Acute Myeloid Leukemia by using Target AML dataset Mar10, 2021 approved Identification of cancer alternative splicing events is important in further exploring the mechanism of drug resistance in pediatric AML. We aim to verify the candidate alternative splicing events in pediatric AML cohort. As an independent validation cohort, it’s essential to have an access to the requested dataset (mRNA-seq, .fq or .bam format file). Completion of this study could further provide more evidence to be regarded as an effective instruction in treatment decisions in the clinic. Alternative splicing (AS) shapes the diverse delivery of genetic information. The AS process plays important roles in numerous critical biological processes and disruption of AS has been proven to affect AML. It has become clear that these aberrations contribute to many facets of AML, including cancer progression, response to anticancer drug treatment as well as resistance to therapy. By complementing existing schema with splicing information, we were able to further analyze the resistance mechanism. Increasing evidence indicates that deleterious mutations residing within splicing regulatory regions exert effects on pre-mRNA splicing in human diseases. Nevertheless, understanding of the dysregulation of splicing machinery components in pediatric AML remains unclear. TARGET AML dataset is an important independent pediatric cohort to validate aberrant alternative splicing events. An on-going project in our lab aims to identify aberrant alternative splicing events from a cohort of drug resistant pediatric AML. By RNA transcriptome sequencing and analyzing of the splicing, we have identified candidate aberrant alternative splicing events that associated with the disease, and have an impact on treatment response. We want to validate candidate aberrant alternative splicing events in a completely independent cohort. TARGET pediatric AML is a large RNA transcriptome sequencing dataset, with more than 200 patients. It will be an important independent pediatric cohort to validate aberrant alternative splicing events. In order to verify our candidate alternative splicing events, transcriptome data (mRNA-seq, .fq or .bam format file) will be needed. The requested dataset and our cohort will be analyzed independently, which would not create any additional risks to participants. Data storage and analysis will be done in Beijing Institute of Genomics Computing Center and will not be distributed. Clinical application of this study. We believe if our candidate alternative splicing events is recurrently occurred in drug resistant pediatric AML, It would have more evidence to be regarded as an effective instruction in treatment decisions in the clinic. WANG, Qianfei BEIJING INSTITUTE OF GENOMICS Validation of candidate predisposition genes in sporadic Acute Myeloid Leukemia by using Target AML dataset Jan29, 2020 closed Identification of cancer predisposition genes is important in early warning of high-risk population (including family members and sporadic cases) and guiding bone marrow transplantation in AML. We aim to verify the candidate predisposing mutated genes in sporadic AML cohort. As an independent validation cohort, it’s essential to have an access to the requested dataset. Completion of this study could further provide a genetic basis for regarding as an effective genetic marker in clinical detection. Both patients who consider HSCT and asymptomatic carriers who under disease warning or choose treatment in advance will be further benefited. A brief introduction of our lab. We primary focus on: (1) Study the genomic characteristic and clinical outcome of pediatric leukemia patients who treated with a new regimen (National Science Review, 2019) (2) Integrative genomic, clinical and functional analysis of human leukemia. (Nature Genetics,2014, Leukemia,2017, Cell Reseach,2017, Blood,2018, Leukemia,2018, J Clin Invest, 2018) Predisposition genes act as warning factors in both pediatric and adult AML. Revealing functional predisposition genes and applying in clinical detection can not only help early warning or treat high-risk patients, but also reduce donor derived hematopoietic stem cell transplantation (HSCT) failure rate. All pediatric, adolescent, and adults diagnosed with MDS/AML are recommended to screen for germline mutations in genes predisposing to MDS/AML, regardless of family history or phenotypic manifestations. (DOI: 10.1182/blood-2018-07-861070, 10.1182/blood2015-09-669937, 10.1111/petr.12667, 10.1097/MPH.0000000000000737). However, predisposition genes in MDS/AML are not fully revealed. TARGET AML dataset is an important independent pediatric cohort to validate predisposition genes. Pedigree is unattainable but serves as a powerful model to identify predisposition factors. An on-going project in our lab aims to identify candidate predisposition genes from a multiple cancer pedigree (affected family members contain AML patients). By whole genome sequencing and analyzing of the pedigree, we have identified candidate mutations within coding and noncoding regions that segregate with the disease. These candidate mutations might contribute to AML development. We want to validate candidate mutations in not only adult sporadic AML, but also extend to pediatric AML. TARGET pediatric AML is a large whole genome sequencing dataset, with 197 patients. It will be an important independent pediatric cohort to validate predisposition genes, especially alterations in the non-coding regions. In order to verify our candidate variants, mutation data (whole genome sequencing, .fq or .bam format file) will be needed. The requested dataset and our cohort will be analyzed independently, which would not create any additional risks to participants. Data storage and analysis will be done in Beijing Institute of Genomics Computing Center and will not be distributed. Clinical application of this study. We believe if our candidate predisposition gene is recurrently mutated in both adult and pediatric group. It would have more evidence to be regarded as an effective genetic marker in clinical detection. Both patients who consider HSCT and asymptomatic carriers who under disease warning or choose treatment in advance will be further benefited. Wang, Tao UT SOUTHWESTERN MEDICAL CENTER Probing gdTCR in childhood cancers Dec07, 2023 approved Gamma-delta (gd) T cells, a crucial subset of T cells, play a pivotal biological role, particularly in the context of pediatric health. Our primary objective is to delve into the intricate functions of gd T cells for both the progression and prognosis of childhood cancers. To achieve this, we plan to conduct a comprehensive examination utilizing the wealth of information stored in the TARGET database. We will extract the abundance and diversity of T cell receptors of gd T cells from the TARGET data. Subsequently, we intend to establish correlations between these gdTCR metrics and various patient phenotypes, thereby contributing to a more nuanced understanding of the intricate interplay between gd T cells and childhood cancer development. Gamma-delta (gd) T cells, a special subset of T cells, are pivotal components with significant biological implications, particularly in pediatric cancers. Despite their acknowledged importance, the precise functional intricacies of gd T cells within the realm of childhood tumors remain incompletely elucidated. To answer these questions, we request to access the RNA-sequencing and whole exome-sequencing data across all cancer types cataloged in the TARGET database. Our analytical approach incorporates the utilization of mixcr software to process and extract the T cell receptors (TCRs) specific to gamma-delta T cells (gdTCRs) from the acquired datasets. We aim to delineate the intricate correlation between the abundance and diversity of gdTCRs and cancer types, stage, grade, and prognosis of the TARGET patients. This project will provide a more granular understanding of the nuanced roles played by gd T cells in childhood cancers. Wang, Xi WESTLAKE UNIVERSITY Comparative genomics analysis of neuroblastoma from different countries Nov18, 2021 closed Neuroblastoma is the most frequent extracranial solid childhood cancer. The treatment options for high-risk neuroblastoma patients are very few. We will analyze genomic features in neuroblastoma patient tumors and find potential predictive biomarkers for risk stratification. Neuroblastoma is a pediatric cancer of sympathetic nervous system that occurs early in childhood. Previous genomics analysis shows neuroblastoma harbors very few mutations. Further analysis of neuroblastoma transcriptome suggests neuroblastoma has aberrant transcriptional and splicing programs. The goals of this study are (i) to identify genomics biomarkers that are associated with risk stratification in neuroblastoma patients and (ii) to see if these biomarkers are consistent in Chinese neuroblastoma patients and TARGET neuroblastoma dataset. We have collected RNA-seq, WES data and clinical information from Chinese neuroblastoma patients and have built internal computational pipelines for WES, WGS, RNA-seq data analysis. We will analyze mutational profile, gene expression and alternative splicing for each tumor in Chinese neuroblastoma cohorts and TARGET neuroblastoma cohorts. We will analyze if any of the above genomics features are associated with risk stratification of neuroblastoma patients in each cohort respectively. We will then compare the results from two cohorts. This study will be done entirely at my institution (Westlake University) under my supervision. Wang, Xuexia UNIVERSITY OF NORTH TEXAS Identifying genetic contribution to pediatric cancers Apr30, 2018 closed We are interested in exploring the genetic contribution to pediatric cancers based on all TARGET datasets. Employing a case control study design, we will use the data to perform the genetic association study in single nucleotide polymorphisms (SNP) level and gene level, respectively. Furthermore, we will investigate if pediatric cancer patients exhibit a gene expression and methylation (addition of methyl groups to DNA) pattern that is distinct from those exhibited by controls without pediatric cancers. Our study will provide new insights into the molecular mechanisms on potential diagnostic, prognostic, and predictive substance measures as well as targets for new therapies. TARGET datasets will only be used for research. Datasets will be maintained on a secure server. There will be no attempt to identify or combine data from different sources at an individual level. Therefore, there is no increased risk to study participants. The data will only be accessed by me and the individuals named in the application and will be kept in a password protected server in our university. We are interested in exploring the genetic contribution to pediatric cancers based on all Therapeutically Applicable Research to Generate Effective Treatments (TARGET) datasets. Employing a case control study design, we will use the data to perform the genetic association study in single nucleotide polymorphisms (SNP) level and gene level, respectively. Also, we will examine the methylome and transcriptome of the pediatric cancer patients. DNA methylation is the best-studied and most easily measured of the epigenetic modifications; it is influenced by genetic variation, and in turn exerts epigenetic influence on gene expression. We want to understand if pediatric cancer patients exhibit a gene expression and methylation pattern that is distinct from that exhibited by controls without pediatric cancers. We will integrate epigenetic and expression data to explore the pathogenesis of pediatric cancers. Our study will provide new insights into the molecular mechanisms on potential diagnostic, prognostic, and predictive biomarkers as well as targets for new therapies. TARGET datasets will only be used for research. Data will be maintained on a secure server. There will be no attempt to identify or combine data from different sources at an individual level. Therefore, there is no increased risk to study participants. The data will only be accessed by me and the individuals named in the application and will be kept in a password protected server in our university. Wang, Zhining NATIONAL CANCER INSTITUTE TARGET data migration Oct15, 2014 closed As the GDC project officer, I will download TARGET data from GDC to verify its availability. I may also sporadically download a few datasets to verify their integrity after data migration. My purpose of accessing TARGET data is mainly for QC/QA the data migration process to Genomic Data Commons (GDC). Verify the availability of TARGET data in GDC when data migration is done. Verify the integrity of data after migration. I also need to access to TARGET data at current location (before migration) in order to understand the data. Wei, Jun INSTITUTE OF HEMATOLOGY AND BLOOD DISEASE HOSPITAL, CHINESE ACADEMY OF MEDICAL SCIENCES TARGET patient data regarding to ALL and AML Oct13, 2022 closed In this study, firstly, we will integrate our in-house developed method and multiple omics data to predict targets of risk SNPs and structural variations for pediatric ALL and AML patients. Secondly, we plan to validate the genotype and RNA expression level of these targets in patient cohort datasets. Fortunately, RNA-seq and WGS/WXS seq have been constructed for hundreds of ALL and AML patient samples in TARGET cohort, and which is quite valuable and important for us to do the eQTL analysis to validate the impact of SNPs and SNVs on expression level of potential targets. So, we initial the application of patient data downloading. Study Plan: In this study, our research topic is mainly focus on potential targets identification of hematologic diseases risk SNPs and structural variations. The research plan includes prediction, transcriptional validation of potential targets. Genetic variants influence the risk of getting many diseases, including hematologic diseases, like acute Lymphoblastic Leukemia (ALL) and acute Myeloid leukemia (AML) in pediatric. With the development of high-throughput sequencing technology, many hematologic diseases risk SNPs and structural variations have been identified. However, the potential targets of risk SNPs and structural variations are rarely defined. We are constructing a study to predict and validate the potential targets of risk SNPs and structural variations related with hematologic diseases, including ALL and AML in pediatric. For the validation part, we would like to integrate the data from TARGET cohort to study the association between genetic variants and phenotypic characteristics. By utilizing the patient whole genome sequencing (WGS and WXS) and transcriptome data (RNA-seq), we could compare the expression level of targets in patients with different genotypes, which is called eQTL analysis. Consistence of the proposed research with the data use limitations for the requested datasets: The data we requested is consistent with our proposed research plan for several reasons below: 1) Whole genome sequencing data (WGS and WXS) and transcriptome (RNA-seq) data of pediatric ALL and AML patient samples is needed for the validation of SNPs’ and structural variations’ potential targets. We plan to generate eQTL analysis to study the impact of structural variations on expression level of potential targets. So whole genome sequencing (WGX and WXS) and transcriptome data (RNA-seq) from ALL, AML and patients with other hematologic diseases are needed. 2) For our research purpose, the resequencing data regarding to ALL and AML pediatric patients of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Initiative would provide an ideal source of additional mutations identification and target validation. We didn’t plan any collaboration for this research. We would like to publish academic research paper in high-impact journal to share our findings based on requested data to scientific field. For the data safety, we have prepared a storage machine to keep the data under control by the Information Technology director all the time. We could guarantee that all the data downloaded from dbGap will be used under the NIH genomic data security policy. Access control and accountability will be always required. And all TARGET disease-specific datasets will not be used outside of DULs. Weigman, Vic EXPRESSION ANALYSIS, INC. Generation of new associated biomarkers in NBL / OMS through expression profiling Nov07, 2017 closed I am requesting access to this dataset to support ongoing research with Miriam Rosenberg and Igor Ulitsky (Mwizmann Institute of Science) on Opsoclonus-Myoclonus Syndrome (OMS). Given its roots in Neuroblastoma, I wish to understand the differences in known mechanisms of its adult features versus those in the rare, pediatric form OMS, to which no known diagnosis exists. This research serves to take biomarkers identified in our OMS cohort, specifically those in immune-related genes to not only identify potentially diagnostic but to look for known Neuroblastoma processes in OMS. I plan to utilize the expression data primarily in the Target Neuroblastoma to compare and contrast findings being made in a new research cohort I am working in collaboration with Miriam Rosenberg and Igor Ulitsky (Weizmann Institute of Science) on Opsoclonus-Myoclonus Syndrome (OMS). This includes getting access to the ANBL00P3 Target Neuroblastoma dataset as well. We are processing the expression of 50-70 pediatric OMS patients from the Children’s Oncology Group (COG) to look in the tumors for: expression differences, immune response signatures, fusion events, adaptive immune repertoire, HLA types, virus presentation (like Epstein-Barr) or vaccination and mutational burden. We want to associate the biomarker development comparing to events like relapse, MYCN amplification and co-morbidities (immune-related, like lupus). Being able to compare these OMS finding to Neuroblastoma could elucidate common mechanisms to help understand how 2% of pediatric Neuroblastoma patients present with OMS, which currently is diagnosed as a process of exclusion. While RNA-Seq data was requested, there is a high likelihood to review Exome data as well for amplification sigatures, confirmation of structural variants and presentation of neoantigens as well, to support findings in expression. WEISS, WILLIAM UNIVERSITY OF CALIFORNIA, SAN FRANCISCO Splicing genetics in neuroblastoma May01, 2014 closed Functional mutations in cancer are difficult to identify and understand, in part, due to the overwhelming focus on mutations leading to known and obvious changes to the resulting protein. A process called alternative splicing naturally occurs to alter protein structures in normal tissue, and mutations that affect this process are poorly understood. We aim to analyze the genomes of patients with neuroblastoma for mutations that may affect alternative splicing, thus identifying new genes important for cancer growth and may perhaps represent possible targets for therapy. Regulation of mRNA splicing is a critical and tightly regulated cellular function. Disruption of this process is common in cancer and leads to new alternative splicing events and correlation with cellular phenotypes. Genome-wide studies have long revealed the existence of cancer-specific splicing patterns. The ability to commandeer alternative splicing could be beneficial to cancer cells if early developmental stage isoforms critical for proliferation are ectopically expressed to drive uncontrolled growth or if novel splice isoforms present gain-of-function or dominant-negative abilities. Using a transgenic model of neuroblastoma, we have identified splicing quantitative trait loci (sQTL) that modulate differential splicing across the genome. From these, we have derived de novo intronic splicing motifs, which we look to further validate as sites of recurrent somatic mutations in human neuroblastoma tumors. We intend to query whole-genome sequencing for recurrent somatic mutations within these splicing motifs and examine RNA-Seq data for evidence of related changes in alternative splicing of these genes. This information will be useful for identifying novel candidate genes that drive tumorigenesis or identifying possible targets for therapy. We acknowledge the data use limitations and agree to abide by them. We plan to combine the TARGET data with WGS sourced from the PCGP to garner a more comprehensive view of splicing motif mutations in cancer, but we will analyze these datasets independently and it will not create any additional risks to participants. Weksberg, Rosanna HOSPITAL FOR SICK CHLDRN (TORONTO) DNA methylation and RNA expression based subgroups of Wilms tumors Aug19, 2020 approved Wilms tumor is the most common kidney cancer in children. Although many children can be cured, this often requires a combination of radical surgery, chemotherapy, and radiation. These treatments have many side effects some of which stay with the child for the rest of their life. We are interested in developing new tests that can predict how likely it is for a child’s tumor to come back so that therapy can be more precisely tailored to that child’s tumor. We will look at a particular chemical change to DNA in Wilms tumors called “DNA methylation” and see how it affects which genes are turned on or off by comparing it to the products of the gene ("RNA Expression"). We will then look to see whether any of these patterns correspond to important clinical features such as how likely the tumor is to come back or to occur in the other kidney. We have used tumors at our hospital to start this work and can see that one pattern of these changes is associated with a lower likelihood of the tumor coming back. We will now use the tumors from TARGET to confirm these findings. Our group is interested in developing novel classifications of Wilms tumors from unsupervised analyses of molecular data. Using combined DNA methylation and RNA expression profiles, we were able to identify three subgroups of Wilms tumors with distinct clinical and molecular features. Each modality alone has only enough resolution to identify two subgroups. We used samples available in our centre as a discovery cohort and the publicly available DNA methylation data from the TARGET project to replicate the two DNA methylation-based subgroups. We are requesting raw RNA sequencing data for all Wilms tumors in the TARGET project for the purposes of completing two aims as described below: Aim 1) Replicate the three subgroups we have identified using combined DNA methylation and RNA sequencing data We have identified three molecular subgroups of Wilms tumors in our discovery cohort collected at our centre. These subgroups can be identified by measuring DNA methylation at a defined set of sites (“signature sites”) and RNA expression of all annotated genes. We then use hierarchical clustering and dimension-reduction techniques to identify the subgroup for each tumor. There are two overarching DNA methylation patterns – A and B. All tumors with DNA methylation pattern A have the same RNA expression profile however tumors with DNA methylation pattern B are split into two further groups by RNA expression. We have already shown that the data from TARGET fits into DNA methylation subgroups in the manner that we would expect based on the clinical features of the cases. However, we have not been able to adequately assess RNA expression groupings with the publicly available data because preprocessing steps such as alignment and trimming must be done by the same techniques as we have used for our discovery set. We are requesting access to the raw FASTQ files generated by mRNA sequencing for Wilms tumors in the TARGET dataset. Reads will be aligned by the STAR 2-pass technique. Duplicates will be identified and tagged and adapter trimming will be performed with the TrimGalore program. Raw read counts at each gene will be assessed with the htseq-count software and will then be used to assign RNA expression-based subgroups. We will combine RNA expression and DNA methylation data as described above. We will then assess whether these methods define subgroups in the TARGET data the way they have in our discovery cohort. We will review publicly available clinical and molecular (mutations, copy number variants) data for each case to ensure that the characteristics of the subgroups in this cohort replicates those found in our discovery cohort. Aim 2) Identify relationships between DNA methylation alterations and RNA expression at imprinted genes and at the HOXA and HOXB gene clusters Data from our discovery cohort suggests that the most consistent effect of DNA methylation levels on RNA expression in Wilms tumors is at several imprinted regions and at the HOXA and HOXB gene clusters – all of which are important in kidney development. We will use the combination of DNA methylation and RNA sequencing data in the TARGET dataset to demonstrate whether these findings hold in an independent cohort. Data preprocessing will be done using the steps described above. For genes in the HOXA and HOXB clusters, expression of each gene will be normalized to an RPKM value and differential expression between tumors with high and low methylation at these loci will be determined using the EdgeR package. For genes in imprinted regions, we will measure allele-specific expression as this is the appropriate measure to determine loss of imprinting. For this analysis we will use the ASE-TIGER package. Risk to Patients and Assurances: Although we propose to use raw sequencing data, we will be reporting only on expression at the transcript level. Other data will come from those that TARGET has made publicly available. Allele-specific expression analyses must take SNPs into account however these are only used to identify alleles and are not reported. We are using multiple datasets in this project but each is independent and we do not plan to combine these data. Data will be stored and analyzed on secure servers at our research institution at the Hospital for Sick Children. Weng, Andrew PROVINCIAL HEALTH SERVICES AUTHORITY Deciphering molecular mechanisms in pediatric T-ALL using a synthetic human model Oct09, 2019 approved We propose to explore precisely how certain genetic factors transform normal cells into leukemia cells. We have generated a synthetic model of pediatric blood cancer by adding a cocktail of cancer genes into normal human blood cells which allows us to specify precisely the genetic backdrop upon which the leukemias develop. We will use data from actual patient leukemias to show closely our synthetic leukemias resemble real life, and then confirm that our discoveries in the laboratory occur in real patients. The ultimate goal of our research is to find targets for new drugs that are specifically tailored to the unique genetic signature of each patient’s disease. Our proposed research seeks to define pathogenetic mechanisms required for initiation, establishment, and propagation of human pediatric T-ALL. To transcend limitations of mouse models and cell lines, we sought to create defined genetic models by a synthetic approach involving gene delivery/editing in normal human cord blood-derived hematopoietic progenitors. Once transduced/modified, we have shown these cells produce aggressive T-cell leukemias in immunodeficient mice that are highly similar to human pediatric T-ALL. We have also performed RNA-seq on these synthetic leukemias to compare their expression signatures against publicly available datasets of pediatric T-ALL samples to ascertain 1) how comparable synthetic T-ALLs are to primary pediatric T-ALLs, and 2) if introduction of specific combinations of oncogenes in synthetic T-ALL recapitulates gene signatures in specific genetic subtypes of primary pediatric T-ALL. Once the relevance/scope of limitations of the synthetic model is assessed, we will then use the synthetic system to create isogenic tumors in order to isolate the impact of individual genes. Our initial approach to characterize isogenic cohorts will be by RNA-seq. We will search for genes whose expression levels are correlated in isogenic synthetic tumor pairs, and then corroborated with gene expression associations in primary pediatric T-ALL datasets. Validated associations will then be prioritized for functional testing with relevant in vitro and in vivo assays. To date we have attempted to fulfill goals #1 and #2 above using microarray datasets; however, bioinformatic merging RNA-seq and microarray datasets has proven challenging and ultimately subject to undesirable caveats. Accordingly, we propose here to use RNA-seq data from primary pediatric leukemia samples comprised within the phs000218.v17.p6 dataset (specifically substudies phs000463.v14.p6, phs000464.v14.p6, and phs000469.v14.p6) to compare and test candidate gene associations emanating from the synthetic models. The proposed research addresses biological mechanisms operative in pediatric cancer. No collaborations with investigators at other institutions are planned at this time. Weng, Andrew BRITISH COLUMBIA CANCER AGENCY Deciphering molecular mechanisms in pediatric T-ALL using a synthetic human model Sep11, 2017 closed We propose to explore precisely how certain genetic factors transform normal cells into leukemia cells. We have generated a synthetic model of pediatric blood cancer by adding a cocktail of cancer genes into normal human blood cells which allows us to specify precisely the genetic backdrop upon which the leukemias develop. We will use data from actual patient leukemias to show closely our synthetic leukemias resemble real life, and then confirm that our discoveries in the laboratory occur in real patients. The ultimate goal of our research is to find targets for new drugs that are specifically tailored to the unique genetic signature of each patient’s disease. Our proposed research seeks to define pathogenetic mechanisms required for initiation, establishment, and propagation of human pediatric T-ALL. To transcend limitations of mouse models and cell lines, we sought to create defined genetic models by a synthetic approach involving gene delivery/editing in normal human cord blood-derived hematopoietic progenitors. Once transduced/modified, we have shown these cells produce aggressive T-cell leukemias in immunodeficient mice that are highly similar to human pediatric T-ALL. We have also performed RNA-seq on these synthetic leukemias to compare their expression signatures against publicly available datasets of pediatric T-ALL samples to ascertain 1) how comparable synthetic T-ALLs are to primary pediatric T-ALLs, and 2) if introduction of specific combinations of oncogenes in synthetic T-ALL recapitulates gene signatures in specific genetic subtypes of primary pediatric T-ALL. Once the relevance/scope of limitations of the synthetic model is assessed, we will then use the synthetic system to create isogenic tumors in order to isolate the impact of individual genes. Our initial approach to characterize isogenic cohorts will be by RNA-seq. We will search for genes whose expression levels are correlated in isogenic synthetic tumor pairs, and then corroborated with gene expression associations in primary pediatric T-ALL datasets. Validated associations will then be prioritized for functional testing with relevant in vitro and in vivo assays. To date we have attempted to fulfill goals #1 and #2 above using microarray datasets; however, bioinformatic merging RNA-seq and microarray datasets has proven challenging and ultimately subject to undesirable caveats. Accordingly, we propose here to use RNA-seq data from primary pediatric leukemia samples comprised within the phs000218.v17.p6 dataset (specifically substudies phs000463.v14.p6, phs000464.v14.p6, and phs000469.v14.p6) to compare and test candidate gene associations emanating from the synthetic models. The proposed research addresses biological mechanisms operative in pediatric cancer. No collaborations with investigators at other institutions are planned at this time. Wheeler, David BAYLOR COLLEGE OF MEDICINE Analysis of DNA sequencing data from TARGET Apr01, 2011 expired Cancer is a disease of the chromosomes. The hallmark of cancer is mutation and structural rearrangement of the DNA and associated proteins that constitute the chromosomes. Mutations accumulate in each tumor that cause the cell to lose control over the normal processes of growth and replication. The TARGET project aims to catalog all mutations in each of 10 childhood cancer types, and enable the biomedical research community to better understand the fundamental cellular mechanisms that underlie these diseases, and eventually to develop better treatments. The Human Genome Sequencing Center at Baylor College of Medicine contributes DNA sequencing data and analysis to TARGET. We deposit from HGSC to dbGaP and retrieve sequencing data from other TARGET participating centers for the purpose of analyzing and validating structural variation in tumor genomes. Our research objectives are to reanalyze whole exome and transcriptome data from TARGET subjects with two goals in mind. First is, many of the TARGET patients have been subsequently used to develop mouse xenograft models. We are engaged with collaborators from Children’s Hospital of Philadelphia, funded by the Alex Lemonade Stand Foundation, to sequence the patient-derived xenograft (PDX) tissues. To gain further insights into the quality of these models we need to compare them to sequencing data from the original tumors. Second, methods for detection of mutations in genomic DNA sequence have improved, and analyses of transcriptome data is far more sophisticated today. Reanalysis of the available data—including BAM files for DNA sequencing, RNA sequencing and SNP Arrays, might enable further discoveries to be made. Our center, the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine generated the initial whole exome DNA sequencing data and analysis to the on the following three tumor projects, ALL, AML and Wilms’ Tumor projects. Data analysis may include, but not be limited to, quality control exercises, such as evaluation of base calls, read mapping, read coverage; or mutation calling, which may include, but not be limited to, point mutation, copy number alteration, deletions, loss of heterozygosity, translocations and inversions. Mutation data will be subjected to analysis of significantly mutated genes, or significantly altered copy number regions, or statistical analysis of any other recurrent molecular alteration in the genome. Mutations may be organized by pathway to realize higher-level disruption of signaling pathways in cancer cells. For cancers that have associated RNA expression data we may compare the expression levels of mutated genes. The work we are doing is consistent with the Use Restrictions for the TARGET data sets. We will not distribute any dbGaP data with our collaborators. They may seek their own independent access to the data for their own analyses. White, Peter RESEARCH INST NATIONWIDE CHILDREN'S HOSP The Institute for Genomic Medicine Comprehensive Molecular Characterization of Pediatric Cancers Project May27, 2021 closed This project aims to improve upon current molecular characterization techniques and classification methods in pediatric cancers. The results of genomic testing are critical for optimal treatment and prognosis of cancer patients. Through access to the dbGaP genomic data, we will be able to better understand how molecular changes in our study participants correlate with those of study participants from other institutions. We will also classify cancer subtypes and interrogate transcriptome characteristics within these subtypes. If subgroupings within cancer types can be accurately predicted and characterized using RNA-Seq, molecular classification can be added to the large number analyses performed with transcriptomics data, thereby providing new opportunities to study cancer biology. We also aim to understand how germline genetic changes in an individual’s genome could lead to susceptibility to the disease. Our goal is to improve upon current molecular characterization techniques in pediatric cancer and hematological disease. To achieve this goal our group has three main areas of research that would be supported by access to the selected dbGaP projects, that focus on comparisons of molecular findings in our patient cohort with studies available in dbGaP, identification of tumor subtypes and prediction of outcomes, and discovery of novel genetic etiologies in cancer predisposition and molecular pathogenesis. It is our intention to publish or otherwise broadly share any findings with the scientific community: 1. The Institute for Genomic Medicine Comprehensive Profiling for Cancer and Blood Disorders study. This IRB approved study (IRB17-00206) aims to enable the unification of the clinical and research arms of comprehensive genomic profiling in the setting of cancer and hematologic disease. Increasingly, studies of the genomic etiology of cancer and hematologic diseases are being utilized for patient management, including prognostication, informing diagnosis, and evaluating eligibility for targeted therapeutics and clinical trials. Clinically available testing can be limited in scope and follow-up, and thus assessment of the impact on patient outcomes is difficult to ascertain. Additionally, even N-of-1 studies have the ability to more broadly impact our understanding of the mechanisms of cancer and hematologic disease, and can have benefit for the individual, as well as the larger population of patients with these diseases. Thus, we seek to study and characterize the genomic underpinnings of cancer and hematologic disease through genomic profiling studies bridging both clinical and research components. Access to the dbGaP data is requested to enable us to better characterize the genomic data we are generating for study participants. This will be achieved by comparing somatic variant signatures with those subjects in the dbGaP dataset. Additionally, we hope to use RNA expression profiling to cluster our subjects with those in the DBGAP dataset. 2. Developing new approaches to identify tumor subtypes and predict outcomes. To achieve this goal we will compare information gathered from multiple genomic assays ( including DNA-Seq, RNA-Seq and DNA methylation assays) and determine relationships with the available clinical data. For example, currently DNA methylation-based CNS tumor classification has proven to be a valuable asset in clinical decision making. We wish to use the DBGAP data to determine if this classification can be reproduced and improved upon when also taking into consideration the patients germline variants, tumor-specific somatic variants, and the tumor’s transcriptional profile. 3. Identify novel genetic etiologies in cancer predisposition and molecular pathogenesis. Our central hypothesis is that currently overlooked forms of genetic variation, such as synonymous SNVs, play a critical role in cancer and that utilization of big-data technologies will allow novel algorithm development to assess and annotate multiple forms of genetic variation. This hypothesis is based on our own preliminary results demonstrate that sSNVs which impact RNA stability and codon utilization show significant constraint in the human population. We have developed novel methodology to rank somatic and germline variants that impact RNA stability (Gaither et al., 2021 GigaScience). Using this methodology we wish to study the landscape of variants that impact RNA stability within the DBGAP dataset, both in coding and non-coding genes, across both pediatric and adult cancers. We will also test the hypothesis that variants that impact RNA stability in known cancer predisposition genes may also impact the likelihood of developing cancer. WHITESELL, LUKE WHITEHEAD INSTITUTE FOR BIOMEDICAL RES Investigation of the malignancy-enabling HSF1 transcriptional network in neuroblastoma Sep30, 2011 closed HSF1 is widely known for its role as the master regulator of the Heat Shock Response, which maintains protein homeostasis during times of cellular stress. As a transcription factor, HSF1 works by binding to DNA and coordinating the activity of numerous genes, which can collectively be thought of as its "transcriptional network". In breast cancer, we found HSF1 regulates a network of genes involved in many cellular processes that are vital for maintaining the malignant state. This malignancy-associated HSF1 transcriptional network had a striking correlation with increased metastasis and death in human breast cancer. We have subsequently found evidence for HSF1 activation in a broad range of human tumors, with tissue origins as diverse as lung, colon, nerve sheath, pancreas, testis and ovaries. Here we propose to investigate the HSF1 transcriptional network in neuroblastoma with the goal of gaining a better understanding of what is different about the most aggressive, most malignant cases of this heterogeneous disease. Such insights could be very useful in predicting those patients most likely to do poorly and adjusting their treatment accordingly. Over half of patients with advanced neuroblastoma currently die of their disease. By providing a better ability to define prognosis and tailor therapy, we hope to improve the outcome for these unfortunate children. Heat Shock Factor 1 (HSF1), the master regulator of the heat-shock response, facilitates malignant transformation and promotes cancer cell survival and proliferation in model systems. We recently found that HSF1 engages a transcriptional program distinct from heat shock to support highly malignant cancers. High expression of this HSF1 cancer signature had a remarkable correlation with poor prognosis with significant associations seen in 9 of 10 breast cancer datasets examined. Similarly, over-expression of HSF1 protein in 1,841 invasive ductal carcinoma breast cancer samples from the Nurses’ Health Study was associated with increased mortality. Far more broadly, numerous human tumors, with tissue origins as diverse as lung, colon, nerve sheath, pancreas, testis and ovaries, scored positive for increased expression and nuclear translocation of HSF1. This suggests HSF1 supports essential aspects of tumor biology, with profound prognostic and therapeutic implications. Here we propose to investigate the HSF1-regulated transcriptional network in neuroblastoma. Our goal is to determine whether evidence for network activation mined from the transcriptional profiling of tumor samples is associated with poor outcome in this clinically heterogeneous disease. Despite significant advances over the past decade, over half of children with advanced neuroblastoma still die of their disease. By providing a better ability to define prognosis and tailor therapy, we hope to improve the bleak outlook for these children. Wiemels, Joseph UNIVERSITY OF SOUTHERN CALIFORNIA Cytomegalovirus in childhood leukemia Oct23, 2019 closed Acute lymphoblastic leukemia (ALL) is one of the most common cancers in childhood, however the cause of ALL is unclear. There is evidence to suggest that cytomegalovirus (CMV), a very common virus in the population, is associated with the development of childhood leukemia. This research seeks to further elucidate this relationship. We will compare the gene expression of leukemia cells that are found to have CMV to those that do not have CMV. This will allow us to determine the effect of CMV on leukemia gene expression and ultimately understand how CMV may contribute to the development of childhood ALL. We aim to determine the role of cytomegalovirus (CMV) in the development of childhood Acute Lymphoblastic Leukemia (ALL). ALL is one of the most common malignancies of childhood and the etiology remains unclear. There is evidence to suggest CMV may be implicated in the development of ALL, and we seek to further elucidate this relationship. We hypothesize that the presence of CMV alters leukemia cell gene expression. We will use mRNA sequence data from this data set to identify CMV in leukemia samples. We will nominate CMV positive samples by search for specific genes which are processed with polyadenylation. We will then compare the gene expression of leukemia cells that are CMV positive to those that are CMV negative using the same mRNA sequence data. As leukemia gene expression will likely vary by genetic subtype, we will stratify samples by genetic subtype as well to elucidate the effect of CMV on leukemia cell gene expression controlling for subtype. This research aims to gain a better understanding of leukemia etiology and if the results confirm that CMV is a potential driver of leukemia, we will have identified a potentially modifiable risk factor for the most common childhood malignancy. We will also investigate additional pediatric cancer cases in TARGET as controls (non-CMV related cancers). WIEST, DAVID RESEARCH INST OF FOX CHASE CAN CTR RPL22 loss in pediatric pre-B acute lymphoblastic leukemia Nov13, 2014 closed RPL22 plays a critical role in regulating normal T-cell and B-cell development. RPL22 is lost in a significant subset of pediatric ALL patients, but more importantly it appears it has a prognostic value, as many of the patients harboring this deletion exhibit a more aggressive disease course. Our data show that RPL22 loss accelerates leukemic transformation in a variety of model systems. Moreover, we have established a link between RPL22 inactivation and known pathways associated with malignant transformation. Because RPL22 inactivation is linked to poor survival in pediatric ALL patients in the relatively small patient cohorts we have examined to date, we wish to expand this analysis to a much larger and diverse patient cohort. We want to explore this dbGap dataset and others in order to improve our understanding of this high-risk patient population as well as to solidify the link between RPL22 inactivation and adverse treatment outcomes and survivorship. We hope to find attractive molecular targets for therapeutic intervention in high-risk pediatric ALL patients with RPL22 loss. Ribosomal protein L22 (RPL22) is a ubiquitously expressed RNA binding protein with many extra-ribosomal functions. Recent data from our lab shows that RPL22 is a crucial mediator of both T and B cells development. Survival rates of pediatric acute lymphoblastic leukemia (ALL) have improved enormously over the past decades. However, a few subtypes are still considered high-risk and are associated with poor prognosis. Hypodiploid and early T-cell precursor (ETP) acute lymphoblastic leukemia (ALL) are high-risk pediatric ALL subtypes, with an extremely poor five-year survival of ~30%. RPL22 loss appears to be a negative prognostic factor in ALL and occurs in ~39% and ~13% of hypodiploid and ETP ALL patients, respectively. Moreover, RPL22 loss accelerates leukemic transformation in animal models by activation of the NFkB/Lin28B axis, and likely other pathways. Because RPL22 inactivation is linked to poor survival in pediatric ALL patients in the relatively small patient cohorts we have examined to date, we wish to expand this analysis to a much larger and diverse patient cohort. We wish to assess the linkage between RPL22 loss and: 1) clinical outcome in a large cohort of pediatric ALL patients; 2) ALL molecular subtypes; and 3) oncogenic alterations/dysregulation of cellular pathways in ALL, such as the NF?B/Lin28B axis. This will include changes in DNA copy number and RNA expression. One of the datasets we wish to analyze is the pediatric precursor-B-ALL cohort (TARGET dataset). We intend to study RPL22’s role in pediatric ALL, both in B-cell, as well as T-cell ALL, while focusing on the most aggressive subtypes: Hypodiploid and ETP-ALL. In order to do so we plan on examining the following datasets as well: phs000340.v3.p1, and phs000341.v1.p1. We hypothesize that RPL22 loss is a major contributor to high-risk pediatric ALL development. Our overall goal is to refine the genetic profile of these patients. We hope to find novel and attractive targets for therapeutic intervention in pediatric ALL patients with RPL22 loss. Williams, Lindsay UNIVERSITY OF MINNESOTA Identification of genomic markers for pediatric cancers Jun19, 2019 expired Our goal is to identify molecular features may place boys and girls at an increased risk of childhood cancer over a number of tumor types. By uncovering these features, a better understanding of cancer biology among children and adolescents may lead researchers to identify potential gene targets for more specific treatments. Childhood cancer remains a top cause of death among children and adolescents. By conducting integrated analyses of multi-layer molecular data, we may begin to identify new therapeutic targets in boys and girls. To identify potential, actionable genes in childhood cancer across multiple tumor types, including acute lymphoblastic leukemia, acute myeloid leukemia, Wilms tumor, neuroblastoma, and osteosarcoma, we will use multi-omic data to identify genes in boys and girls that may play important roles in cancer pathogenesis and progression. Our findings will be disseminated through presentations at national research meetings and through publication in peer-reviewed scientific journals. We will compare findings for some of our analyses with data from the same cancers that are found in the St. Jude PeCan datasets. There is no additional risk to participants in either datasets as all data is anonymous. Williams, Richard UNIVERSITY COLLEGE LONDON Genomic analysis of paediatric renal tumours Jun21, 2017 closed Scientists in different countries have found that in many childhood kidney tumour cases, the cancer cells contain specific 'mutated genes' (damaged DNA) believed to be responsible for making them cancerous, and may also differ from normal cells in how particular genes are ‘switched’ on or off. We hope that combining data from different labs will help tell us if there are any other mutated genes in these tumours that haven't yet been found (e.g., rarely damaged genes might only be identified if we look at data from many tumours at once), or if there are genes consistently switched on or off in the tumour cells that might be used in clinical testing or be targeted by new drugs. The purpose of our application is to identify recurrent somatic mutations, genomic abnormalities, or aberrantly expressed genes in paediatric solid tumours, focusing on renal tumours, and somewhat expanding the scope of our earlier project request (#10474). Recent genomic studies of Wilms tumour, malignant rhabdoid tumour, and clear cell sarcoma of the kidney have identified novel recurrent mutations specific to each tumour type. Expression analyses have defined gene signatures specific to these tumours, their subtypes and clinical outcome groups. Such analyses typically benefit from larger sample series, especially to identify genes that are recurrently but less commonly mutated, or are subject to consistently aberrant expression that may pinpoint novel biomarkers or therapeutic targets. We will combine TARGET data with comparable series from other studies, initially including our own unpublished SIOP Wilms tumour exome data, as well as malignant rhabdoid tumour expression data from the UK INSTINCT network. While this application focuses on renal tumour data, we may also, where functionally relevant, analyse and compare data from other TARGET samples. For example, renal and extrarenal malignant rhabdoid tumours have similar genetic aberrations, while the MYCN pathway is commonly disrupted by comparable mechanisms in both a subset of Wilms tumours and in neuroblastoma. All combined data sets will be analysed using a consistent methodology, such as our current BWA/GATK/Mutect 2 pipeline for somatic mutation detection in exome data, or our DESeq2-based pipeline for detection of differential expression in RNA-Seq data. Genes of interest may be investigated further by single gene or targeted panel assays, using independent, appropriately consented samples drawn from the SIOP WT 2001, UK IMPORT, SIOP UMBRELLA and INSTINCT sample series. We do not envisage that these analyses will create any additional risks to participants in either the TARGET or UK studies, since our current aims are entirely consistent with the original aims of the published studies - i.e., the identification of recurrently mutated or aberrantly expressed genes of potential clinical interest. Williams, Richard UNIVERSITY COLLEGE LONDON Mutational Analysis of Wilms Tumour Aug16, 2016 closed Wilms tumour in the most common kidney cancer of childhood. Scientists working in several countries have recently found that in many cases of this tumour, the cancer cells contain specific 'mutated genes' (damaged DNA) believed to be responsible for making them cancerous. We hope that combining data from different laboratories and applying new methods of analysis will help tell us if there are any other mutated genes in these tumours that haven't yet been found (e.g., genes that are more rarely damaged might only be identified if we look at data from many tumours at the same time). The purpose of our application is to identify recurrent somatic mutations in Wilms tumour, the commonest paediatric renal malignancy. Recent studies from the Childhood Oncology Group (the data we are currently requesting), the International Society of Paediatric Oncology (in which we are co-authors) and others have identified novel recurrent somatic mutations in several genes, including DROSHA, DGCR8, SIX1, SIX2 and MYCN. We hypothesize that additional recurrently mutated genes (especially those in which aberrations are less common) may be detected if data sets are combined, including our own unpublished exome data and the COG data. It is also possible that recent developments in mutational analysis of sequencing data, e.g. improved methods of somatic indel detection, will reveal novel aberrations in existing data sets. We therefore intend to combine and analyse all available sequencing data using a consistent somatic variant detection methodology, such as our current BWA/GATK 3.6/Mutect 2 pipeline. All novel recurrently mutated genes will be investigated further by single gene or targeted panel assays, using independent, appropriately consented samples drawn from the SIOP WT 2001, UK IMPORT and SIOP UMBRELLA sample series. We do not envisage that these analyses will create any additional risks to participants in either the COG or SIOP studies, since our current aims are entirely consistent with the original aims of the published studies - i.e., the identification of novel, recurrently mutated Wilms tumour genes. WU, CATHY UNIVERSITY OF DELAWARE Delivering Diagnostic and Therapeutic Targets from Pediatric Clinical NGS Data Jan22, 2015 closed Next generation sequencing allows for the massive analysis of the human genome, which has the potential to greatly enhance our understanding of human diseases, such as pediatric Acute Myeloid Leukemia (AML). The survival rate of pediatric AML has improved over the last decades, but relapse is still a major concern. We propose to analyze the requested dataset with well-established bioinformatics methodologies with the goal of delivering a pediatric AML specific gene list and a focused list of genomic alterations and their therapeutic consequence. Our analysis will be made publicly available and will enhance the scientific understanding of genomic alterations associated with pediatric AML. The project will involve a collaboration with clinicians and has support from the Leukemia Research Foundation of Delaware and the Children’s Oncology Group (COG). The COG has approved our data access in a Data Use Agreement (Maria Hendricks, COG Chief Administrative Officer), and recommended (Rhonda Ries, FHCRC) that we request access through dbGap for transferring these extremely large files. It has been established that genomic alterations found in adult acute myeloid leukemia do not translate into effective biomarkers for pediatric AML, highlighting the necessity to develop pediatric specific diagnostic and relapse markers. Analyzing large datasets produced by the TARGET project will enable two key deliverables: a pediatric AML specific gene target list, and a list of relevant therapeutic genomic alterations. We have been analyzing the TARGET AML next generation sequencing (genomic) datasets via our established bioinformatics workflows, and the processed results are prioritized based on clinical relevance. We have successfully detected and annotated structural variants located in relevant oncogenes, such as the internal tandem duplication (ITD) located in FLT3 (Crowgey et al., 2015, PMC4525226). This work was presented at the Children’s Oncology Group meeting (Dallas, Texas, October 2015) to lead investigators for the TARGET AML project. We are in the process of analyzing the whole genome sequencing generated from the AML TARGET project for clinically relevant cryptic translocations. Cryptic translocations are typically too small to be seen with conventional cytogentic techniques, but are noted to have a significant impact in pediatric AML. To further enhance the analysis, in regards to predicting the effect of structural variants, we would like to integrate the results with the TARGET AML RNA-seq transcriptomic data. The integration with gene expression will allow a better assessment of the clinical relevance of structural variants located in non-coding regions, such as 3’ / 5’ UTRs and introns. The original data provided in the download did include the RNA-seq data. Here we are amending our Research Use Statement to include the integrative analysis of this dataset. By integrating the genomic analysis results generated thus far with the transcriptomic data, we will be able to enhance our prioritization of structural variants by linking these alterations with functional consequences. Wu, Gang ST. JUDE CHILDREN'S RESEARCH HOSPITAL Understanding the role of inherited and acquired genetic variations in tumorigenesis in adult and pediatric cancers May23, 2024 approved Technology and computational innovations are the driving forces of new scientific discoveries. TCGA and TARGET projects generated a multi-dimensional dataset focusing on nearly 30 major adult and pediatric cancers, providing an excellent resource for integrative analysis to better understand a biological system and how it is rewired to benefit selfish cancer cells only. We aim to combine this dataset with childhood cancer datasets to learn the role of human development and other factors affecting children but not in adults. We not only focus on finding out the best practice to analyze multi-omics data, but also on the novel discoveries that could help us develop new treatments for both pediatric and adult cancers. TCGA and TARGET generated a wide variety of omics, clinical and image data types enabling multi-omics and multi-modal analyses to dissect the roles of germline and somatic mutations in the tumorigenesis processes. It not only serves as an excellent resource for algorithm and method development, but also provides a great dataset to contrast adult cancers against childhood cancers to understand the differences and commonality in the underlying mechanisms of tumorigenesis. The discoveries of novel genetic modifiers in adult and pediatric cancers through the integrative analyses will provide targets to develop early diagnosis assays, predict patient prognosis and identify new treatment. As the institutional bioinformatics shared resource at St. Jude Children’s research hospital, my group will harmonize and aggregate the omics data with the same pipelines used for omics data generated in house. This centralized copy of raw genomics data provides the consensus calls of germline and somatic mutations for approved collaborators and research investigators at St. Jude to minimize the batch effects due to data processing. Our ongoing efforts and future plans include: 1) continuous evaluation and benchmarking of various bioinformatics methods to measure molecular traits and association analyses. 2) development of novel methods to detect novel, abnormal or cryptic genomic events including RNA editing, mini-exon including, circular DNA/RNA, enhancer hijacking, telomere lengthening, mitochondrial DNA transfer, retro-transposition, genomic rearrangements, imprinting, rare haplotypes; 3) discovery of cancer associated molecular phenotypes and their underlying genetic modifiers either germline or somatic second hits; and 4) characterization of the genetic variation and gene expression of specific pathways or biological components in different cancer types. WU, Hong PEKING UNIVERSITY Prognostic prediction model for pediatric T-cell acute lymphoblastic leukemia Nov20, 2018 closed T-cell acute lymphoblastic leukemia (T-ALL) is a type of hematological malignancy that affects the development of T cells. Most of the T-ALL patients could achieve complete remission under current treatment guidelines. However, the relapse rate is still high. The aim of this study is to develop a model to predict the prognosis of patients with T-ALL. The study was designed as a perspective cohort. Demographic, clinical and genetical information will be used to predict the survival of the T-ALL patient. Applying the model in one or more external datasets. To evaluate the model performance and the generalizability of the model, it should be tested in at least one external T-ALL cohort. The requested data of T-ALL patients from the Target dataset will be utilized as an external cohort to validate the model performance. T-cell acute lymphoblastic leukemia (T-ALL) is a type of hematological malignancy that affects the development of T cells. Most of the T-ALL patients could achieve complete remission under current treatment guidelines. However, the relapse rate is still high. The aim of this study is to develop a statistical model based on clinical parameters and genetic alterations at diagnosis to pinpoint the risk factors of disease relapse and predict the prognosis of patients with T-ALL. The study was designed as a perspective cohort. We recruited pediatric patients who diagnosed as T-ALL in Peking University People’s Hospital from 2013 to 2018. The bone marrow of patients at diagnosis and complete remission phase were collected. Whole exome and transcriptome were sequenced. Genetic alterations including somatic mutation, insertion and deletion, copy number alteration, fusions, and gene expression levels were detected/measured. Clinical parameters including age, gender, treatment protocol, white blood cell count, status of immune surface markers, cytogenetic, date of diagnosis, date of achieve complete remission, date of death, date of relapse, date of stem cell transplantation were collected. The model will be developed in the in-house T-ALL cohort. After the optimization of the model, it should be tested in at least one external T-ALL cohort to evaluate the generalizability of the model. The requested data of T-ALL patients from the Target dataset will be utilized as an external cohort to validate the model performance. The current subjects were recruited from East Asian population. Applying the model in different ethnic groups could evaluate the generalizability of the model. The study has been proofed by the IRB in Peking University People’s Hospital. The proposed study will follow the data use limitations for the requested dataset. Wunder, Jay SINAI HEALTH SYSTEM Comprehensive Molecular Analysis of Osteosarcoma Mar19, 2015 approved The primary purpose of the research is to learn more about the germ line and somatic changes in the genome of pediatric cancers. Ultimately we hope to identify new biomarkers and targets for therapy. We are collaborators with the TARGET program (http://target.cancer.gov/). Whole genome, exome and transcriptome sequencing data, copy number and gene expression analysis as well as microRNA and methylation profiling data from TARGET study will be used to perform integrative analysis of the paediatric OS. Identification of paediatric molecular profiles may reveal potential biologically relevant targets to be studied. This research project aimed at a better understanding of this pediatric disease can only be conducted using pediatric data. We plan to use the TARGET data solely in connection with the research project described. Xiao, Xinshu UNIVERSITY OF CALIFORNIA LOS ANGELES Analysis of RNA variants in pediatric cancer Nov29, 2019 approved In the human transcriptome, a large number of RNA variants exist due to a myriad of mechanisms such as alternative splicing, RNA editing and allele-specific expression. We are interested in analyzing different types of RNA variants in pediatric cancer via TARGET RNA sequencing data, such as those resulted from alternative processing of the RNA or due to RNA modifications (RNA editing). Our work will lead to a better understanding of the regulatory mechanisms that generate RNA variants, and could reveal novel biomarkers and therapeutic targets for pediatric cancer. In the human transcriptome, a large number of RNA variants exist due to a myriad of mechanisms such as alternative splicing, RNA editing and allele-specific expression. My group has developed efficient computational tools for analysis of RNA editing, allele-specific expression and alternative splicing. In this project, we will use these tools to analyze TARGET RNA sequencing data set to characterize RNA variants and associated post-transcriptional regulatory mechanisms in pediatric cancer. Our study aims to reveal novel biomarkers and molecular targets for therapeutic development. It should be noted that this project is not a tool development project. Our goal is to apply well-established and state-of-the-art computational tools to the TARGET dataset to identify allele-specific expression, RNA editing, and alternative RNA processing events that can be used as biomarkers for patient outcomes, or as therapeutic targets of pediatric cancers. Given these objectives, the proposed research can only be achieved using pediatric data and meets the Data Use Limitations for the TARGET dataset. Xing, Yi CHILDREN'S HOSP OF PHILADELPHIA Alternative isoform variation in pediatric cancer transcriptomes Jan16, 2019 approved Aberrant pre-mRNA alternative splicing frequently occurs in cancer cells and is a major contributor to tumorigenesis and metastasis. In this project we will analyze the TARGET RNA sequencing data to identify cancer-associated changes in pre-mRNA alternative splicing and mRNA isoform variation patterns. This will lead to a better understanding of the pediatric cancer transcriptomes, and could reveal novel biomarkers and molecular targets for therapeutic development. Alternative splicing of pre-mRNA is a major contributor to gene regulation and disease in humans. Aberrant pre-mRNA alternative splicing frequently occurs in cancer cells and plays an important role in tumorigenesis and metastasis. High-throughput RNA sequencing provides a powerful genomic tool to globally elucidate the complexity of alternative splicing and mRNA isoform variation in cancer transcriptomes. My laboratory is experienced with studies of alternative splicing and has developed efficient computational tools for exon-level and isoform-level analysis of RNA-seq data. In this project, we will apply state-of-the-art computational tools to the TARGET RNA sequencing data set to characterize alternative isoform variation and associated post-transcriptional regulatory networks in pediatric cancer. Our study will lead to a better understanding of pediatric cancer transcriptomes, and could reveal novel biomarkers and molecular targets for therapeutic development. It should be noted that this project is not a tool development project. Our objective is to apply well-established and state-of-the-art computational tools to the TARGET dataset to identify cancer-specific transcripts and proteins that can be used as biomarkers for patient outcomes, or as targets for cancer therapy in particular immunotherapy of pediatric cancers. Given that our goal is to develop new prognostic biomarkers and therapeutic targets of pediatric cancers, the proposed research can only be achieved using pediatric data and meets the Data Use Limitations for the TARGET dataset. Xu, Jian ST. JUDE CHILDREN'S RESEARCH HOSPITAL Expression and Regulation of Transposable Elements in Cancer Genomes Apr11, 2024 approved Repetitive DNA sequences such as transposable elements play important roles in genome regulation and cellular function; however, the processes that regulate their expression and activity in cancer development remain incompletely understood. In this study, we will develop and optimize bioinformatic methods to analyze TE expression and regulation in different pediatric cancer types including acute leukemias. The successful completion of the proposed studies will provide new knowledge and computational methods for understanding the pathogenic roles of transposable elements in pediatric cancers. Altered expression of genomic transposable elements (TEs) is frequently found in human cancers, although the underlying mechanisms for their dysregulation are largely unknown. Recent pan-cancer analysis of whole genomes (PCAWG) reveals strong association between TEs and genomic structural variants (SVs) in cancer genomes. In this study, we aim to identify and characterize transposable elements associated with dysregulated gene expression and genome regulation in pediatric cancer genomes. We will also develop and optimize bioinformatic methods for analysis of TE expression and regulation. We will focus on pediatric cancers, including acute myeloid leukemias (AML), acute lymphoblastic leukemia (ALL), neuroblastoma (NBL), osteosarcoma (OS), rhabdoid tumor (RT), and Wilm’s tumor (WT), as the models to study how TE expression and regulation contribute to aberrant gene transcription and genome structure. To achieve this goal, we will first analyze TE expression profiles using RNA-seq datasets by improved bioinformatic algorisms to alignments to the human T2T genome assembly, which provides increased accuracy for mapping of repetitive DNA sequences. We will then integrate whole genome sequencing (WGS), WXS, targeted-Capture, and RNA-seq datasets to determine the extent to which aberrant TE expression may associate with the presence of genomic SVs across pediatric cancer samples. We will further analyze the chromatin regulatory landscape using bisulfite-seq datasets to understand whether and how altered chromatin regulation contributes to aberrant TE expression in cancer genomes. Lastly, we will identify TE subfamilies associated with recurrent SVs for detailed functional and mechanistic studies in cell models. The successful completion of the proposed studies will provide new knowledge and informatic methods for understanding the pathogenic roles of transposable elements in pediatric cancers. Xu, Lin UT SOUTHWESTERN MEDICAL CENTER Leveraging new deep learning algorithms to study health disparity in the patient cohorts Jun04, 2024 approved Artificial intelligence (AI) is revolutionizing and transforming healthcare systems by facilitating innovative algorithms and computer-based decision-support systems to enhance diagnostic precision, design more effective therapeutic strategies, streamline clinical workflows, reduce human resource costs, and optimize treatment choices. My goal is to integrate state-of-the-art AI technology to create a novel framework for identifying previously unrecognized effects of health disparities in various human diseases. This approach aims to discover new biomarkers for disease diagnosis and uncover new mechanisms of disease development under various race, gender, and age groups. Our laboratory has developed advanced deep learning algorithms designed to assess health disparities and mitigate biases in complex clinical datasets. This proposed study aims to harness the capabilities of our algorithms to uncover previously unidentified effects of health disparities in biomarker discovery and therapeutic target identification. We seek access to extensive DNA, RNA, and epigenomics sequencing datasets encompassing various patient cohorts. We are committed to disseminating the results of our analysis to the broader scientific community through publications. Additionally, the programming scripts used in our analyses will be made publicly available on GitHub upon acceptance of the publication. We pledge not to merge the requested datasets with any other datasets outside of dbGaP. All data from this study will be securely stored on a local server with restricted access, limited to relevant researchers within the PI’s lab. The use of the data is confined to not-for-profit organizations. Furthermore, all data referenced in this dbGaP proposal will be utilized strictly in accordance with the study's data use limitations (DUL). The usage is constrained by the terms of the model Data Use Certification, limited to health/medical/biomedical purposes, and excludes the study of population origins or ancestry. phs001039 Use of the data must be related to Age-Related Macular Degeneration. phs000486, phs001413, phs002528 Use of the data must be related to Mental Health Psychiatric Disorders and Related Somatic Conditions. phs000209, phs001416 Use of this data is limited to health/medical/biomedical purposes, does not include the study of population origins or ancestry. Use of the data is limited to not-for-profit organizations. Data may not be used to investigate individual pedigree structures or individual participant genotypes for the purpose of identifying individuals or families; assess variables or proxies that could be considered as stigmatizing an individual or a group; perform phenotype-only analyses (note: investigators may request data for such phenotype-only analyses through the NHLBI’s BioLINCC); or explore issues such as non-maternity and non-paternity and perceptions of racial/ethnic identity. All research must be related to the etiology and prevention of morbidity and mortality of the U.S. Population consistent with the demographic distribution in MESA. phs001023 Use of the data must be related to Reproductive Disorders. phs000168, phs000483 Can be used for Alzheimer's research and related disorders with phenotypic or cognitive similarities. phs000951, phs001726 Use of the data must be related to Asthma and Chronic Obstructive Pulmonary Disease. phs000714 Use of the data must be related to Pregnancy Complications and related disorders. phs000672 Use of the data must be related to Autoimmune Diseases. Limited to analyses related to Sjögren's syndrome or other autoimmune diseases. phs000017, phs000899 Use of the data must be related to Bipolar Disorder. . phs001654 Use of the data must be related to Barrett's Esophagus. . phs000218 Use of the data must be related to Pediatric Cancer Research. phs000235, phs000220 Use of the data must be related to Cancer. phs000482, phs001874 Use of the data must be related to Autism. phs001664 Use of the data must be related to Parkinson's disease or other neurodegenerative disorders phs001194 Use of the data must be related to Congenital Heart Disease. phs002641 Use of the data must be related to Dilated Cardiomyopathy. phs001754 Use of the data must be related to Addictive Disorders. phs000742 Use of the data must be related to Epilepsy. phs001524 Use of the data must be related to Prostate Cancer. phs000308 Data use is limited to research on the etiology of glaucoma and related areas phs002183 Use of the data must be related to Cancer. Requestor agrees to make results of studies using the data available to the larger scientific community. General methods development research is NOT permitted. phs001286 The informed consent document signed by the PLCO study participants allows use of these data by investigators for discovery and hypothesis generation in the investigation of the genetic contributions to cancer and other adult diseases as well as development of novel analytical approaches for GWAS. phs001518 Use of the data must be related to Myocardial Infarction phs002447 Use of the data must be related to Diabetes. Xu, Mingjiang UNIVERSITY OF TEXAS HLTH SCIENCE CENTER The Role of HOXBLINC LncRNA in B Cell Leukemia and B Cell Development May21, 2021 closed Acute lymphocytic leukemia (ALL) is a type of cancer of the blood and bone marrow — the spongy tissue inside bones where blood cells are made. Acute lymphocytic leukemia is the most common type of cancer in children, and treatments result in a good chance for a cure. Acute lymphocytic leukemia can also occur in adults, about 4 of every 10 cases of ALL are in adults, the chance of a cure is greatly reduced in adults. Based on the cancer cell origination, acute lymphocytic leukemia can be divided into B-ALL and T-ALL. In our ongoing study, we found HOXBLINC, as a non-protein-coding RNA, might be a critical oncogene for B-ALL, deregulation of HOXBLINC could block B cell development in the bone marrow. These data are all from mouse models, to confirm this conclusion in human patient samples, we plan to analyze the existing TARGET datasets. By analyzing the data such as the expression level of HOXBLINC in patient samples, we could understand more about the role of this gene in this cancer, which will help us to beat this cancer. This research focuses on the role of HOXBLINC in hematopoietic stem cell (HSC) function and leukemia development and it is supported by NIH RO1 grant: R01-HL141950. We are now applying for the dbGaP access to the raw data of 3 projects of the TARGET program: TARGET-ALL-P1, TARGET-ALL-P2, and TARGET-ALL-P3. The dbGaP Study Accession of these 3 projects is phs000218. The following is the reason why we need the raw data of this dataset. HOXBLINC is a newly identified long non-coding RNA locates between HOXB4 and HOXB5 (DOI: 10.1016/j.celrep.2015.12.007), it is important for hematopoietic stem cell regulation and leukemia development. In our recently published study (https://doi.org/10.1038/s41467-021-22095-2), we generated a mouse model (Vav1-HoxBlincTg) which overexpresses HoxBlinc in the hematopoietic system specifically, and we found 64% of the mice developed AML. Interestingly, in our recently completed 1.5-year observation of this mice model, we found 10% of the mice finally developed precursor B-cell lymphoblastic leukemia but not AML, suggesting that HOXBLINC might be an important oncogene not only in AML but also in B-ALL. Most B-ALL-associated mutated genes are also critical genes for B cell development, so we also performed experiments to confirm if HOXBLINC regulates B cell differentiation. We analyzed the Vav1-HoxBlincTg mice and found highly expressed HOXBLINC indeed significantly impaired B cell differentiation and blocked the B cell development at the pre-B cell stage. These interesting results triggered our interest to further study the clinical relevance of HOXBLINC in B cell lymphoblastic leukemia. We already performed qPCR on B-ALL patient samples, but due to the limited source of clinical samples, we are not able to draw a conclusion yet. In addition, HOXBLINC is a newly identified gene that is still not annotated, so we have to download the controlled raw data and perform a de-novo analysis using the direct sequence information of HOXBLINC. The phs000218 datasets contain RNA-seq data of >1000 acute lymphoblastic leukemia patient samples, it is a perfect source for the goal of our study. We plan to download the raw data of lymphoblastic leukemia patients and divide them into T cell leukemia and B cell leukemia. Then the B cell leukemia data will be analyzed for the expression level of HOXBLINC and the correlation study will be performed to figure out which gene mutation is significantly associated with highly expressed HOXBLINC. As HOXBLINC is the regulator of HOX genes, the correlation analysis of HOXBLINC with HOX genes and genes for B cell differentiation such as PAX5, IKAROS, E2A, and STAT5 will be also performed. Finally, the Kaplan Meier analysis of the patients with high HOXBLINC expression and patients with low HOXBLINC expression will be performed to figure out if highly expressed HOXBLINC is a poor prognostic marker of B cell ALL. The results of this analysis will only be disseminated as papers which includes an acknowledgment of the source of the data. We do not anticipate intellectual property (IP) or the development of commercial products or services. Xuan, Zhenyu UNIVERSITY OF TEXAS DALLAS Identify non-coding transcripts as biomarkers for pediatric cancer Apr07, 2016 approved While proteins have been widely considered as the major work-force in regulation cell function, in recent research, more and more functional non-coding transcripts, which are RNA molecules not being used in producing proteins, have been discovered as efficient biomarkers for diagnosis and prognosis in diseases. We plan to utilize our newly developed computational approaches to analyze childhood cancer data and identify potential biomarkers of pediatric cancers. With the development of sequencing technology, more and more non-coding RNA (ncRNA) transcripts have been discovered. While some ncRNAs have been reported as biomarkers in diagnosis and prognosis of diseases, most of them have no clear functions. We have recently developed a new computational analysis method to detect the novel transcripts from sequencing data. In our preliminary study of cell lines and a few cancer samples, we found our method very sensitive in detecting non-coding transcripts which have differential expression patterns in tumors comparing with normal samples. We would like to apply our method to analyze childhood cancer samples in dbGAP, particular from TARGET and PCGP projects, to identify more non-coding transcripts and study their potential functional relationships with pediatric cancers. We expect to identify specific non-coding transcripts as potential diagnosis and prognosis biomarkers for pediatric cancer. We will only use this data for childhood cancer study and will not combine it with adult data. YAMASHIRO, DARRELL COLUMBIA UNIVERSITY HEALTH SCIENCES Mechanisms of Metastasis in Neuroblastoma Jul06, 2012 closed We have identified in xenograft models of neuroblastoma genes involved in promoting liver metastasis in neuroblastoma. In order to identify the genes critical for liver metastasis, we will compare our gene set with the neuroblastoma data set. Inhibitors of VEGF can suppress tumor growth in a numerous pre-clinical models, but are insufficient to prevent tumor progression. In attempt to increase efficacy, we used combined blockade of two angiogenic pathways, VEGF and Notch. Surprisingly, dual inhibition resulted in markedly increased liver metastases in neuroblastoma xenograft models. Using ligand specific Notch decoys, we found that blockade of both DLL4 and Jagged1 signaling is required for increased hepatic metastasis. Gene expression profile analysis of cell lines isolated from hepatic metastases revealed upregulation (fold change>2) of 38 genes. We propose to analyze our liver metastatic gene set with the neuroblastoma data set, and determine correlation with factors such as prognosis and stage. This will help identify critical genes involved in metastasis in neuroblastoma. This work is supported by NIH 5R01CA124644-04, Yamashiro PI, “VEGF blockade and alternative angiogenic pathways in neuroblastoma. We will only use the requested data set for pediatric cancer research. Yan, Chunhua NIH Visualizing and analyzing TCGA and TARGET data Sep30, 2011 approved The user-friendly bioinformatics analysis software is developed for research scientists to compare genomic alterations in various cancers such as breast, colon, lung, and ovarian cancers. The somatic mutation, copy number alteration, gene expression and methylation data are generated by a number of projects including The Cancer Genome Atlas (TCGA), the Sanger Center's COSMIC initiative, NHGRI's Tumor Sequencing Project (TSP), NCI's Therapeutically Applicable Research to Generate Effective Treatment (TARGET). The scientists are able to explore the alternation patterns at genome and sample levels along with clinical features. The genomic alternations including mutation, copy number, expression, and methylation are discovered by analyzing next-generation sequencing and array data from NCI TCGA (The Cancer Genome Atlas) and TARGET (Therapeutically Applicable Research to Generate Effective Treatment) project with controlled-access. We participate in the analysis of TCGA breast cancer and TARGET Wilms tumor genomic data. In order to support TCGA and TARGET project research, we require access to the controlled-access TCGA and TARGET data in dbGaP. The TARGET data is for pediatric research only. TCGA data are now available through NCI Cancer Genomics Cloud Pilots which allow us to upload our own tools and evaluate the Cloud Pilots performance. The NCI Cloud Pilots use the same dbGaP credentials to access TCGA controlled-access data. TCGA data will be used to develop network analysis software and perform an integrated analysis of genomics data with the machine learning approaches. Additional genomics data from Functional Genomic Landscape of Acute Myeloid Leukemia (phs001657.v1.p1 and phs000159.v10.p5) will be used as the validation datasets, which are analyzed in the local computing environment only and will not be uploaded into any cloud storage. Both AML datasets will be stored and used in the local machine within the institution. Proteogenomic Studies of Ovarian Tumor Responses to Agents Targeting the DNA Damage Response dataset (phs003152.v1.p1) will be used to predict tumor response to treatment using multimodal machine learning. We will use the genomic data from dbGaP along with the whole-slide imaging and clinical information from the IDC with a PyTorch based multimodal model. We are planning to reanalyze the lung adenocarcinoma RNAseq, WGS sequencing data and clinical data from Apollo1 study in dbGap (phs003011.v1.p1) and extract the features for a multimodal machine learning study. We will use Apollo1 dataset as an independent testing dataset for a prognostic model that we are building from other lung cancer datasets. Yang, Lixing UNIVERSITY OF CHICAGO Identify new drug targets in pediatric cancer Feb06, 2019 approved Many different types of genetic variants, including single nucleotide variants (SNVs), copy number variants (CNVs), structural variations (SVs), microsatellite expansion and shrinkage, transposable elements insertions, etc., contribute to cancer predisposition, disease progression, metastasis and treatment response. In this project, we will focus on the SVs in pediatric cancer genomes. Pediatric cancers have very few somatic SNVs, but sometimes carry a lot of somatic SVs, such as adrenocortical carcinoma, pediatric high-grade glioma, pediatric osteosarcoma, etc. We aim to take advantage of the large-scale sequencing data to identify pediatric-cancer specific disease-causing SVs and potential new drug targets. Structural variations (SVs) are large scale changes of DNA, which include deletions, duplications, inversion, translocations and other more complex forms. They typically affect more nucleotides than other forms of genetic variants. We aim to study SVs, which are the major source of genome instability, in pediatric cancer genomes. We will screen for genes deleted/disrupted by SVs, genes amplified/activated by SVs and genes acquiring new functions through SVs in pediatric cancers. SVs form due to DNA replication errors or erroneous repair of DNA damage. We will infer how and when the SVs in pediatric cancer formed based on the nature of SVs and their breakpoint properties. New cellular vulnerability may be revealed through the study of mechanisms of genome instability in pediatric cancers, and new therapeutic opportunities may be uncovered. Yang, Xinan UNIVERSITY OF CHICAGO Computational explorations of cis-acting mechanisms in high-risk neuroblastoma Jul25, 2013 closed Our goal is to characterize the transcription factors n-myc (or n-myc)-dependent cis-regulatory elements in high-risk neuroblastoma (HR-NB). We hypothesize that MYCN (or MYC)-amplification-associated genes and ncRNAs are co-regulated during oncogenesis and progression of aggressive forms of neuroblastoma, regardless of the structure or expression of the MYCN locus. Our preliminary data, based on novel computational analysis, suggests that HR-NB has previously uncharacterized expression patterns that underpin the neuroblastoma oncogenesis. In addition, we expect that these signatures may represent novel and clinically relevant biomarkers and potential therapeutic targets. We aim to develop and test computational models that characterize myc-driven high-risk pediatric neuroblastoma (HR-NB). The abnormal transcription factors n-myc and c-myc respectively serve as an oncogenic driver in approximately 20% and 10% of HR-NB patients. However, the oncogenic role of myc proteins in HR-NB remains debating, specifically for those high-risk patients with normal MYCN or MYC copy-numbers. By computational explorations of myc-dependent cis-acting mechanisms in HR-NB, we expect an effective discovery of new prognostic markers and therapeutic agents. With the advent of next-generation sequencing, we now have become good at transiting the computational strategies for myc target genes to myc-dependent cis-acting noncoding regions. Given that most de novo enhancers are formed in progenitor cells during neural development (1), expanding the cis-acting mechanism is essential to decipher MYCN-driven tumor stem cell signature in HR-NB. Our general hypothesis is that TF-dependent enhancer-activity coordinates enhancer transcription, thus can be learned by a proper machine-learning model that was successful to detect gene markers. On the gene side, we have identified an MYC-dependent gene regulatory network in HR-NB that impacts a tumor stem cell signature (2). Using a traversal algorithm, we have successfully bridged cancer biology with clinical applications and to interpret genomic data functionally (3). Based on both, we identified a personalized prognostic model from the collection of the somatic mutations and transcriptomic profiles of over 300 TARGET samples (4). On the noncoding genome, we have shown that altered enhancer transcription directly tracks enhancer activity (5). We also showed that polyadenylated transcription is capable to capture, at least partly, HR-NB-associated enhancers (6). In order to enhance and validate our model, we here renew the access to the transcriptomic and GWAS of Neuroblastoma. We will incorporate data from the TARGET repository with (epi)genetic information in GENCODE. We will submit all the results from our analysis for publication without compromising dbGaP’s endeavors to secure and manage data access. Yarmarkovich, Mark NEW YORK UNIVERSITY SCHOOL OF MEDICINE Characterizing the tumor antigen landscape across pediatric tumors Jul11, 2024 approved In recent years, immunotherapies have demonstrated the potential to cure various types of cancer. These therapies rely on the knowledge of tumor-specific targets to which the immune response can be directed, which has been a major bottleneck in expanding these curative therapies. Our research is focused on uncovering these tumor-specific targets and developing novel therapies against those targets. In identifying tumor-specific targets, it is important to comprehensively characterize these targets in both tumor tissues and healthy tissues. There is an emerging class of “hidden antigens,” which are a very promising class of new targets, but require new tools to identify. Identifying these targets involves analyzing the genetic makeup from samples taken from patients and cross-reference the genetic variations we find with a database of genetic information from healthy tissues. By utilizing the valuable data resources of dbGaP these resources, we plan to build a comprehensive database of genetic information from healthy tissues. We expect that the tools developed using these datasets will allow the identification of important new targets in childhood cancer and enable the development of new immunotherapies by our lab and the wider scientific community. Our lab aims to enable the development of personalized immunotherapy to benefit all pediatric cancer patients. We previously applied data from TCGA, GTEx, and TARGET to drive the development of peptide-centric CAR T cells (Yarmarkovich et al., Nature 2023) which are now poised to enter a clinical trial scheduled for 2024 for neuroblastoma. Additionally, our use of these datasets has shed light on the cancer immunoediting process (Yarmarkovich et al., Frontiers in Immunology 2021). These developments were made possible through the invaluable data resources provided by dbGaP. Building upon our previous work and innovations, we aim to comprehensively characterize the tumor antigen landscape across various pediatric tumors datasets, with the ultimate goal of discovering new actionable immunotherapy targets. We have established a computational pipeline to efficiently detect various neoantigens from RNA-Seq data, and plan to test the applicability to real pediatric tumor samples. To achieve this, we seek access to the TARGET dataset. We plan to reanalyze the raw FASTQ data using the same methodology, ensuring methodological consistency. Since our pipeline utilizes the latest human Telomere-to-Telomere (T2T) reference for improved alignment accuracy, raw data reanalysis is indispensable for recalculating the coordinates of normal variants. The dbGaP TARGET dataset is intended for studies related to pediatric and childhood cancers. It is for non-commercial use and we are committed to adhering to all data privacy and security measures, along with publication policy stipulated by dbGaP. Ye, Chaoyang BLUEPRINT MEDICINES, INC. Molecular determination of pediatric tumor sub-populations to guide the development of next generation targeted therapies Jun04, 2018 closed Blueprint Medicines is committed to developing the most effective, targeted therapies for cancer patients. To that end, we are exploring the molecular underpinnings of individual pediatric tumors. Analysis and annotation of the data comprised by the TARGET project is critical so that we can tailor therapies to the molecular profile that underlies the disease of individual young cancer patients. Blueprint Medicines is committed to developing the most effective, targeted therapies for cancer patients. A prerequisite to this objective is to define the molecular alterations driving the growth of individual tumors. This focus will provide the insight we need to determine the appropriate molecular targets, and the subsequent combinations of those targets, for the next generation of drugs that we are developing. Analysis and annotation of the data comprised by the TARGET project is critical so that we can tailor therapies to the molecular blueprint that underlies the growth of cancers in individual pediatric cancer patients. Our objective is to analyze the molecular and genetic diversity of pediatric cancers to better define tumor subtypes as well as the pathways on which they rely. We will use newly developed analysis algorithms to better predict the functional consequences of mutations and structural aberrations that we discover in the data. We will look for molecular dependencies that can be exploited with new drugs and drug combinations. This knowledge will be used to inform the development of a new generation of targeted therapies tailored to genotypes that define these subpopulations. We do acknowledge the data use limitations for the TARGET dataset: since the goal is to discover molecular drivers of pediatric cancers, it is necessary to conduct the project on pediatric data. The data will not be used for the development of methods, software, or other tools. Yi, Song UNIVERSITY OF TEXAS AT AUSTIN Multi-omics based deep learning model for therapeutic target discovery in pediatric cancer Feb10, 2022 expired Precise classification of pediatric cancer patients will benefit pediatric cancer diagnosis and improve patient survival effectively. Besides, effective therapeutic targets for treating relapsed pediatric cancer patients with poor prognoses are still in need. Towards this end, integrating multi-omics data holds a great potential, despite of technical challenges and limitations. To solve the difficulty, we plan to develop and optimize a deep learning-based framework on pediatric cancer patients that can robustly distinguish poor prognosis subgroup from good prognosis subgroup. We expect the framework will achieve promising predictive performance in distinguishing high-risk from low-risk pediatric cancer patients. After patient subgrouping, we will filter potential multi-omics signatures and take advantage of these signatures to understand relapsed pediatric cancer development and progression from the multi-omics perspective. Finally, we will evaluate these ‘landmark’ multi-omics signatures and rank them based on the importance score. Some of the top landmark signatures we identify may serve potential therapeutic targets for curing pediatric cancer. Pediatric cancer incidence is increasing in the world. Heterogeneity and complex biological factors in pediatric cancer make prognosis prediction challenging. Besides, treatment strategies in relapsed pediatric cancer are still limited. Therefore, effective computational methods to distinguish relapsed pediatric cancer from those low-risk cancer will be promising in the disease diagnosis and prognosis. Based on these multi-omics data, we may gain new insight regarding the possible mechanism of how pediatric cancer develops and progresses systemically. Also, by taking advantage of the computational model, we may identify novel therapeutic targets for those relapsed pediatric cancer patients. Using our computational framework, we will identify the potential multi-omics features that are most correlated with relapsed pediatric cancer. Next, we will build and optimize our subgrouping models using a combination of potential multi-omics signatures or single-omics data. Furthermore, we will employ some feature extraction algorithms to the prediction models and rank these multi-omics signatures according to their relative importance when subgrouping the samples. Finally, we would deem those top-ranked signatures as landmark multi-omics signatures contributing to effective pediatric cancer patient classification and identify potential therapeutic signatures that are associated with relapsed pediatric cancer. Yu, Jiyang ST. JUDE CHILDREN'S RESEARCH HOSPITAL Systems Biology analysis of pediatric pan-cancer May25, 2018 approved Over the past few years, several pediatric pan-cancer studies including TARGET project of large amount of cancer genomic profiles have dramatically advanced our understanding of genetic lesions and genome landscape for pediatric cancer. However, there are a significant number of patients that lack known genetic alternations or present no changes at mRNA or protein levels of known oncogenic drivers. Researchers at St. Jude have demonstrated that network-based systems biology approaches are able to capture “hidden” drivers that lack obvious genetic or other molecular changes. In this study, investigators at St. Jude Children’s Research Hospital will integrate data-driven network-based computational approaches with experimental methods to perform pediatric pan-cancer studies towards identification of “hidden” drivers, therapeutic targets, biomarkers and combination therapies for progression, resistance, metastasis and relapse of pediatric cancer. NCI TARGET project has generated invaluable molecular profiles and genomic insights across five pediatric cancer types (ALL, AML, NBL, OS, kidney cancers). A deep analysis using data-driven systems biology approaches is needed to study pediatric cancers at the network level that can guide identification of cancer-specific therapeutic targets, combinations and biomarkers. With our investigators’ expertise in systems biology, we plan to do the following analysis: a) to reverse engineer cancer type or subtype-specific regulatory/signaling/immunological networks for pediatric cancers (leukemia, sold tumors and brain tumors) from transcriptomic data in TARGET and additional datasets via St. Jude collaborators or publically available on GEO; b) use data-driven and context-specific pediatric cancer interactomes to infer and reconstruct activity map of candidate regulators across various pediatric cancer types; c) integrate the computationally-inferred activity profiles with genetic, epigenetic, proteomic and functional genomic (RNAi, CRISPR) screens from TARGET and collaborators at St. Jude to identify drivers and therapeutic targets for tumorigenesis, progression, poor survival, metastasis, relapse, and drug resistance; d) identify biomarkers associated with targets identified in c; and e) identify network modules specific to cancer subtypes that may predict potential combinational therapies. The proposed research will use TARGET data only for pediatric cancer research. This work will be in collaboration with various experts in leukemia (Drs. William Evans, Jun J. Yang, Charles Mullighan), solid tumors (Drs. Anand Patel, Michael Dyer), genomics (Dr. Jinghui Zhang), immunology (Dr. Hongbo Chi) and proteomics (Junmin Peng) at St. Jude Children’s Research Hospital. YUAN, JIAPEI INSTITUTE OF HEMATOLOGY AND BLOOD DISEASE HOSPITAL, CHINESE ACADEMY OF MEDICAL SCIENCES TARGET: Hematological disease clinical cohort data in pediatric patient Jun15, 2023 approved Our study aims to anticipate genetic mutation sites in children with hematological cancers. However, the computational prediction method we are utilizing may produce numerous false positive targets. To enhance the reliability of our predicted dataset, we need to perform cross-validation across various patient groups. Fortunately, TARGET is an exceptional database that holds a vast number of pediatric patient samples related to our research interests. Consequently, we are keen to obtain access to these valuable datasets. Objectives of the proposed research: The focus of our study is the identification of potential targets for diseases risk SNPs and structural variations in children with pediatric hematological cancers. Our research plan involves predicting and validating these targets through transcriptional analysis. Study Plan: Hematologic diseases, such as acute Lymphoblastic Leukemia (ALL) and acute Myeloid leukemia (AML) in pediatric patients, are influenced by genetic variants. With the advancement of high-throughput sequencing technology, many risk SNPs and structural variations have been identified for such diseases. However, potential targets for these variants in pediatric cancer patients are still undefined. Our study aims to predict and validate potential targets for risk SNPs and structural variations in pediatric hematologic diseases, including ALL and AML. To validate our predictions, we will integrate data from the TARGET cohort to study the association between genetic variants and phenotypic characteristics in pediatric patients. This will help us determine the effectiveness of potential targets in developing more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. By analyzing patient whole genome sequencing (WGS and WXS) and transcriptome data (RNA-seq), we can compare the expression levels of potential targets in patients with different genotypes. This will aid in the identification of potential biomarkers for the diagnosis and treatment of childhood cancer. Analysis plan: In this study, firstly, we will multiple omics data to predict targets of risk SNPs and structural variations for pediatric ALL and AML patients. Secondly, we plan to validate the genotype and RNA expression level of these targets in pediatric patient cohort datasets. Fortunately, RNA-seq and WGS/WXS seq have been constructed for hundreds of ALL and AML pediatric patient samples in TARGET cohort, and which is quite valuable and important for us to do the eQTL analysis to validate the impact of SNPs and SNVs on expression level of potential targets. So, we initial the application of patient data downloading. YUAN, YAN UNIVERSITY OF PENNSYLVANIA Human Viruses in osteosarcoma Apr01, 2020 expired We will utilize the Target OS dataset, phsooo218.v22.p8, to analyze the association between the Osteosarcoma cancer and KSHV, an oncogenic virus usually causing Kaposi’s Sarcoma in human. Early childhood infection by KSHV has been reported. We have found the linkage between Osteosarcoma cancer and KSHV infection in the pediatric clinical samples from a group of minority patients in China. In this study, we want to further investigate the association between childhood Osteosarcoma cancer and KSHV infection in a large and diverse population to further support our scientific findings. Additionally, A clinical diagnostic methodology based on KSHV infectious profile could be developed to help healthcare professionals to identify the stages of Osteosarcoma development in children. The purpose of accessing to the Target OS dataset is to identify the potential infectious etiologies associating with the childhood Osteosarcoma and develop methodologies to diagnose Osteosarcoma development in children. According to our recent clinical studies, we have identified the Kaposi’s Sarcoma-associated Herpesvirus (KSHV) is linked to the pediatric Osteosarcoma development in the Uyghur population in Xinjiang, China. There is a high prevalence of early childhood infection by KSHV in that area [1]. Most surprisingly, we find the different patterns of viral gene expressions from KSHV between early and late Osteosarcoma development in the Children's clinical samples (age 5 – 20 years) using RNA-seq. For example, at the early clinical stage of Osteosarcoma samples, the KSHV PAN RNAs have a high expression. However, the K2(vIL-6) expression has been significantly reduced at the late clinical stage of Osteosarcoma samples. Instead, KSHV has demonstrated significant expression for PAN RNAs. Our manuscript on this finding is under review in the New England Journal of Medicine. Therefore, the potential clinical methodologies based on KSHV gene expression profiling would help healthcare professionals diagnose the phases of Osteosarcoma development, thus providing optimized plans to treat different Osteosarcoma cancer in children. Our finding is based on the patients from a special ethnic (Uyghur) population in a special geographical area in China. We would like to extend our research to the patients from other populations in the world (including USA) by exploring more genomic samples to support our scientific evidence as well as develop clinical diagnostic methodologies. In this study, we would analyze the genomic data from the database of genotype and phenotype, particularly RNA-seq data in the Target OS dataset with Osteosarcoma. In addition, we would explore other infectious etiologies that could be linked to pediatric Osteosarcoma development. [1] Cao Y, Minhas V, Tan X, Huang J, Wang B, Zhu M, Gao Y, Zhao T, Yang L, Wood C. High prevalence of early childhood infection by Kaposi's sarcoma-associated herpesvirus in a minority population in China. Clin Microbiol Infect. 2014 May;20(5):475-81. doi: 10.1111/1469-0691.12340. Epub 2013 Aug 28. PMID: 23992104; PMCID: PMC3868646. Yue, Tao UT SOUTHWESTERN MEDICAL CENTER Identification of genes implicated in the initiation and progression of osteosarcoma Nov15, 2022 closed This study will use multivariate COX regression analysis to select the genes in patients with osteosarcoma, and use the expression of these genes to divide the patients with osteosarcoma into high-risk and low-risk groups. Then, I will first construct a prognostic model based on the patient’s risk value and compare the survival difference between the high expression group and the low expression group. Second, compare the differences in tumor invasion and inflammatory gene expression between the two groups of immune cells. Finally, the correlation between these genes will be analyzed. In the high-risk group, immune cells with higher tumor invasiveness, macrophages M0 and immune cells with lower invasiveness included: mast cell resting, regulatory T cells (Tregs), and monocytes. Finally, these osteosarcoma-related genes will be discovered to develop promising therapeutic targets in osteosarcoma. Close exploration of the relationship between various genetic and epigenetic factors of osteosarcoma is very important for screening promising therapeutic targets. In this study, I will use multivariate COX regression analysis to select the genes in patients with osteosarcoma, and use the expression of these genes to divide the patients with osteosarcoma into high-risk and low-risk groups. I will construct a prognostic model and explore their correlation with immune cells and inflammatory markers. I will not combine requested datasets with other datasets outside of dbGaP, and I am focusing on outcomes or hypotheses that were the focus of the primary study (or studies). Yunes, José Andrés HOSPITAL INFANTIL BOLDRINI Characterizing SVs and TADs in pediatric ALL Dec02, 2020 closed Despite improvements in the success of Acute Lymphoblastic Leukemia (ALL) therapy, this disease is still the most common cause of cancer-related death in children worldwide. The results of ALL treatments have deeply improved and today they reach about 80% cure. However, even with the intensification of treatment for high-risk cases, 20% of patients are not cured. In this study, we will use ALL patients sequencing data provided by the TARGET consortium to analyze structural variations (SV) and gene expression within Topologically Associated Domains (TAD) to explore how SV may disrupt the regulatory roles of Long non-coding RNAs (lncRNAs) in gene expression for normal and disease cases. Acute Lymphoid Leukemia (ALL) affects both adults and children, but has highest incidence in children up to four years. It is the most common type of cancer in childhood, accounting for about 25% of all cancer cases and 80% of all leukemias that occur up to the age of 15 years. Despite improvements in the success of ALL therapy, this disease is still the most common cause of cancer-related death in children worldwide. It is a heterogeneous disease from a genetic and clinical point of view. Adult and pediatric ALL can have distinct genomic signatures, such as different gene fusion patterns and specific gene mutations involving epigenetic regulation and B-cell development. Some molecular subtypes of ALL have a different prognosis and can be identified based on the presence of certain genetic alterations. Some pediatric ALL subtypes have been characterized regarding their genetic mutation profiles, however, their "normal" genomic structural variation (SV) profiles are yet to be analyzed. In populational terms, genetic differences seem to result in a higher frequency of structural genomic variations than any other class of mutations. Another issue that began to be studied just recently is the effect of chromosomal rearrangement over TADs (Topologically Associating Domains) regions. TADs represent cromosomal modules responsible for regulating genic expression through genes encapsulation and cis-action regulatory elements. Mutations that affect TAD regions and their intrinsic elements, such as protein coding genes, long non-coding RNAs (lncRNA) and epigenetic regulators can lead to the development of some types of pediatric cancer. In this project, we will analyze public data from RNAseq, exome and genomic structural alterations regarding pediatric ALL, available in the NCBI dbGaP. Since we are interested in unveiling genetic alterations in pediatric ALL and their outcomes in children, we are requesting access to the TARGET data. We have the double objective to (1) characterize SVs in pediatric ALL subtypes and to (2) evaluate the contribution of those SVs and eventual alterations in TADs regarding gene expression modulation of lncRNAs. All the data will be securely analysed using in-house pipeline in our private Linux server, thus there is no additional risks regarding TARGET data and its participants. Considering the importance of chromosomal translocations and insertions/deletions in ALL genesis, we believe that unraveling their effects on chromosomal topology may bring us closer to understand the first stages of subversion of cellular homeostasis regarding gene expression regulation. YUNG, CHRISTINA ONTARIO INSTITUTE FOR CANCER RESEARCH GDC QC Analysis of TARGET Pediatric Cancer Clinical Dataset Feb27, 2019 closed The NCI Genomic Data Commons (GDC) is a data service providing authorized researchers access to cancer genomics data in a uniform way and from a single data repository. It is also designed to serve as a foundation for future expanded data access, computational capabilities and bioinformatics cloud research. Therapeutically Applicable Research To Generate Effective Treatments Clinical Dataset (phs000218) is a project whose data being distributed via the GDC. As part of the release process for phs000218, the OICR team would like to QC the results. Because the GDC uses dbGaP authorization to download controlled access data, this access is required for the GDC team to verify that the portal is working correctly and that the data are in the correct format. The access will also be used to verify the integrity of the TARGET data in future GDC Data and software releases. The National Cancer Institute (NCI) Genomic Data Commons (GDC) is a data service providing authorized researchers access to cancer genomics data in a uniform way and from a single data repository. It is also designed to serve as a foundation for future expanded data access, computational capabilities, and bioinformatics cloud research. The GDC provides authorized researchers access to The Cancer Genome Atlas (TCGA), Therapeutically Applicable Research To Generate Effective Treatments (TARGET, phs000218), and other datasets authorized by the Center for Cancer Genomics at the NCI including the Foundation Medicine Adult Cancer Clinical Dataset. Note that all external users of the GDC must independently apply for dbGaP access and receive approval from the Data Access Committee before getting access to the data through the GDC. The Ontario Institute for Cancer Research (OICR) is developing the Data Portal, and the Data Analysis, Visualization and Exploration (DAVE) tool for the GDC. As part of the development and release process, the OICR team needs to access phs000218 to verify that the GDC Data Portal is functioning correctly, that the data is visualized correctly, and that the downloaded data is in the correct format. The access will be used to verify the visualization and analysis of the TARGET clinical data in future GDC Data and software releases. Zage, Peter BAYLOR COLLEGE OF MEDICINE The Role of FGFR4 Gene Polymorphisms in the Incidence and Pathogenesis of Pediatric Cancer Aug21, 2014 closed The fibroblast growth factor receptor (FGFR) family plays important roles in the growth and spread of many cancers, and a polymorphism (a change in the DNA sequence) in the gene for the fourth FGFR (FGFR4) is associated with the incidence and aggressiveness of many cancers. We have identified a likely association between this FGFR4 polymorphism and the incidence of neuroblastoma in children. Some tumor cells control their levels of GFRs by directing used receptors for degradation. We have identified a connection between GFR degradation and tumor growth. The FGFR4 gene polymorphism results in reduced degradation of the receptor, leading to increased FGFR4 levels and activity, suggesting a link between the FGFR4 polymorphism and tumor growth via reduced degradation. We propose to determine the association of the FGFR4 gene polymorphism with the survival rates for children with neuroblastoma. These studies will provide a better understanding of the ways pediatric solid tumors develop and respond to therapy, leading to new treatments for children with cancer. Fibroblast growth factors (FGFs) and their receptors (FGFR1-4) play important roles in tumor growth and progression, and FGFR4 expression and activity have been linked to the pathogenesis of a variety of cancers. A germline polymorphism in the FGFR4 gene is associated with increased incidence and poor outcomes for many malignancies. The FGFR4 variant also demonstrates reduced degradation and sustained activation and signaling. The association of the FGFR4 genotype with neuroblastoma patient incidence, clinical features, and outcomes, however, has not been established. We have shown that ligand-induced FGFR4 degradation is reduced in neuroblastoma cells. We have also shown in an institutional cohort that the FGFR4 Arg388 allele is associated with increased neuroblastoma incidence. These results suggest a role for the FGFR4 genotype and trafficking in neuroblastoma pathogenesis and suggest an association between neuroblastoma patient incidence and outcomes and functional mutations and polymorphisms in genes involved in growth factor receptor (GFR) trafficking. We propose to utilize genome-wide single-nucleotide polymorphism data and whole exome DNA sequence data to determine the associations of mutations and polymorphisms in genes relevant for GFR trafficking with neuroblastoma patient incidence and outcomes. This data will allow us to determine their associations that would not be possible using other databases. We will be using a secure database at Baylor College of Medicine to store these large datasets and do not anticipate any increased risk to participants. Our specific aims are 1) to determine the correlation of FGFR4 genotype with neuroblastoma incidence by comparing the FGFR4 genotype in neuroblastoma patients with estabslihed population-based norms and with an institutional race- and sex-matched control population; 2) to determine the correlation of FGFR4 genotype with neuroblastoma patient outcomes and clinical features using patient-related data from COG to determine their association with FGFR4 genotype; and 3) to identify mutations in genes relevant for growth factor receptor trafficking, including UBE4B, UBE4A, Mig6, Cbl, USP8, and Hrs. ZANOTTO, PAOLO UNIVERSITY OF SAO PAULO HERV expression in tumor samples of neuroblastoma from pediatric patients Nov24, 2015 expired The Laboratory of Molecular Evolution and Bioinformatics of the University of São Paulo is developing a research project on human endogenous retroviruses (HERV) and their possible roles in pediatric cancer. The HERVs were incorporated into our genomes and are generally not active in normal cells. On the other hand, tumor cells are under such conditions their cell cycle regulation is compromised. It has been postulated that this context allows some HERVs to become active again. The purpose of this research is to evaluate the activity of these specific integrated viruses, via quantification of its specific genetic material and associate this information with human genome. This first result will be compared with ongoing analysis from Brazilian pediatric patients from our cohort that were diagnosed with neuroblastoma. This study has no immediate therapeutic or diagnostic purposes, the knowledge generated from this effort will elucidate important aspects of the HERV role in normal and pathological conditions with great value specially and specifically for patients with pediatric cancer. Cancer is accountable for over 12% of overall causes of death worlwide, affecting over 7 million people. Neuroblastoma is both the most frequent and lethal solid tumor detected during childhood. Moreover, this tumor presents an atipical behavior regarding progression. It may go spontaneously into remission or evolve agressively over a short period time. Viruses are known as ethiological agents for human neoplasias, mainly interfering with immunosuppression and host genome alterations driven by viral oncoproteins expression. Furthermore, viruses may downregulate tumor suppressor genes and alter protein expression by viral DNA integration on host genome. Several endogenous retroviruses families (HERV) are localized on human genome in different levels of integrity and expression in different conditions. HERV activity is commonly tissue-specific and it has been correlated to pathologies such as cancer, autoimmune diseases and even with presence of other viruses. Although, there are extensive studies aimed at structure, function and HERV impact on the host genome, the factors leading to differences in the expression patterns of HERVs are not fully understood. The main goal of this project is to detect HERV expression in tumor samples of neuroblastoma from pediatric patients, aged between 0 - 18 years. For this, we intend to use the assembled whole human genome and the raw transcriptome data available on dbGAP. Based on those datasets, we intend to evaluate to which extent HERV transcripts are expressed in comparison with our own data. Methodology includes use of EGene pipeline generator (http://www.coccidia.icb.usp.br/egene/) developed by Durham et al. We intend to localize the origin of our transcripts using individual complete genome assembled as reference. Finally, sequence similarity analyses will be performed to look for any correlation between our samples and those in dbGAP. Zhan, Zhou ZHEJIANG UNIVERSITY Analysis of clone evolution and molecular characteristics of pediatric hematologic malignancies. Jun15, 2023 approved Most pediatric hematologic malignancies (HMs) originate from aberrant genomic alterations of hematopoietic progenitor cells. Additionally, the poor prognosis and clinical response of patients with HM are closely associated with molecular genetic characteristics, which are illustrated by chromosomal translocations and recurrent mutations. We plan to conduct a deep analysis of molecular characteristics of pediatric HMs samples, such as gene fusions, gene mutations, and abnormal gene expression, as well as the leukemic clone structure and evolution, compared with sequencing data from other datasets, which would benefit the research of pediatric HMs pathogenesis and precision treatment. Studies on the mechanisms of pediatric tumor onset, malignant progression, and targeted therapy research are unable to meet clinical needs currently. Thus, we are deeply interested in revealing the correlation between genetic abnormalities and clinical manifestations in pediatric hematologic malignancies (HMs), especially the diverse subtypes and potential oncogenic mechanisms caused by RNA splicing or DNA fragmentation leading to protein diversification. Interestingly, numerous fusion genes serve as classification markers for pediatric HMs, but there are still many uncharacterized fusion genes. We aim to conduct in-depth studies on these unknown fusion genes to explore their oncogenic functions and mechanisms as well as deepen the understanding of classic fusion genes. Additionally, numerous studies have shown significant differences between pediatric and adult tumors in terms of omics profiles, particularly in transcriptomic research. Hence, Investigating the fusion and transcriptomic differences between adult and childhood cancer can grant us insight into the process of tumorigenesis and malignant progression. Zhang, Jie INDIANA UNIV-PURDUE UNIV AT INDIANAPOLIS Integrative Genomic Analyses for Pediatric Cancer Precision Medicine May11, 2018 closed The goal of this study is to identify biomarkers, genes and pathways associated with pediatric cancers, especially osteosarcoma, rhabdomyosarcoma and leukemia with advanced bioinformatics and computational methods. This study aims to use the data to identify genetic mutations, gene expression signatures, and integrative markers predicting the outcome of the pediatric cancer patients, especially for osteosarcoma and blood cancers by the means of integrative genomic analyses. In addition, the research will also expect to discover driver genes, networks and pathways related to the development and progression of these cancers. We will process all the genomics and gene expression data by advanced bioinformatics and gene network mining methods and machine learning algorithms to achieve above aims. Specifically, for the phenotype, we will focus our research on patient survival status and time for prognosis prediction and response to treatment for predictive markers. Our recent goal is to determine the cancer phenotypes based on the germline predispositions available in the dgGap datasets. In order to do that, a combination of datasets as well as individual analysis of singleton datasets would help us reach our goal. Zhang, Jinghui ST. JUDE CHILDREN'S RESEARCH HOSPITAL Genomic Determinants of Pediatric Cancers Oct22, 2020 approved Recent technological advances in DNA sequencing and associated methods have allowed cancer researchers to generate unprecedented amounts of genetic data. The value of these data sets is only increased with they are combined with each other in order to identify similarities and differences in cancer biology across many patients. The aim of our research is to identify the genetic and other molecular mechanisms that cause pediatric cancer. Our ultimate goal is to translate novel findings from these data into methods for preventing, treating, and curing pediatric cancer. Due to the similarities that exist between pediatric and adult cancers, we would like to use the data from TCGA to increase the power of a wide range of genomic analyses that we are employing within our research. We plan to use the data available through TCGA to complement additional genetic data sets for the purpose of furthering our understanding of the genetic risk factors and molecular mechanisms underlying pediatric cancer. The ultimate goal of this research is to translate these findings into preventative measures, treatments, and cures for pediatric cancers. While there are both similarities and differences among pediatric and adult cancers, data pertaining to each form has the potential to provide important insights into the other. Combining the TCGA data with other data sets in our pediatric cancer studies increases the power of a wide range of analytic methods. In particular, we aim to use the TCGA data in the following ways: 1) to look for relationships between somatic variants found within specific cancer subtypes within the TCGA data set and germline variants discovered within our pediatric cancer patients. 2) to examine gene expression data to determine if specific genetic variants impact mRNA levels or aberrant splicing in known or putative oncogenes and tumor suppressors. The variants in these cases could consist of single nucleotide variants, small indels, or larger structural variants. To characterize the allelic expression status of genes and make this information visualized along with genomic variants on Pecan through the ProteinPaint genome browser. 3) to increase the power of pathway-based genetic analyses in order to identify novel actionable genes. 4) to evaluate the impacts of variants in noncoding regulatory regions, including regions both in proximity to and distant from canonical genes. 5) analyze mutational signatures to compare these signatures in pediatric and adult cancers. 6) to identify targets for immunotherapy through TCGA gene expression data. 7) to better understand technical artifacts in next-generation sequencing data in order to separate real variants from artifacts. TCGA data will be combined with PCAWG, PCGP, TARGET (under accession phs00218, requested via project 8112), and sequencing data from Shanghai Children’s Medical Center, to observe how genomic features are similar and different in adult vs. pediatric cancers. Zhang, Jinghui ST. JUDE CHILDREN'S RESEARCH HOSPITAL Pediatric pan cancer research project Feb05, 2015 approved In the past few years, there are several pediatric pan-cancer wide studies generated large amount of NGS data, including the TARGET pediatric cancer genomic data. We plan to combine these large datasets to investigate and characterize the pediatric cancer genome landscape and cancer type specific genetic lesions in pan-cancer study. The combined data will significantly increase statistic power to investigate the most important driver genetic mutations and actionable genes, disrupted genetic network. Therefore, the gained information could better guide the treatment for individual patient. NCI TARGET has generated large-scale NGS data for five pediatric cancers and a pan-cancer study is expected to provide new insight in the similarity and the differences of these pediatric cancers. We plan to combine these large datasets to investigate and characterize the pediatric cancer genome landscape and cancer type specific genetic lesions in pan-cancer study. The combined data will significantly increase statistic power to investigate the following analyses: 1) Characterization of frequency, spectrum and context of point mutations, which aim to reveal the mutational landscape across the studied pediatric cancers; 2) Large-scale genomic disruption (Copy number variation and Structural variation) analyses could gain better understanding of how these variations promote tumorigenesis; 3) Pathway-based mutation profiling. The complexity and heterogeneity of tumor requires systematically profiling tumors at multiple layer of genome-scale information, which could potentially help to classify cancer subtypes, find the driver mutations and actionable genes; 4) Investigation of mutations in non-coding region. For each aim, our study will be performed at single cancer types, subset of cancer types, and all cancers as a single group to maximize our opportunity to make novel discoveries. 5) Combining the TARGET data set with other pediatric cancer data sets generated from the same subset of tumors will increase the power for identifying driver mutations. 6) Determining whether gene expression (RNA-Seq) profiles correlate with specific mutations or mutational signatures. The proposed research will use TARGET data only for pediatric cancer research. Beyond the previous research goals in above, we will continue focusing on the etiology of structural rearrangements that frequently are the main drivers of pediatric cancers such as TCF3-PBX1 rearrangements. We will study the patterns using the large volume of TARGET datasets and cross validate our findings using our published Pediatric Cancer Genome Project dataset and samples sequenced from Shanghai Children’s Medical Center. We will also use the adult TCGA data set for comparison in these studies. Zhang, Jinghui ST. JUDE CHILDREN'S RESEARCH HOSPITAL Enhance pediatric cancer tumor-subtype classification by integrating NCI TARGET data with St Jude PCGP data Dec22, 2020 approved Pediatric cancer is a rare disease with around 15,000 new cases per year in the United States. Given the rarity of this disease, it is vital to aggregate datasets in order to achieve scientific discoveries that will aid the treatment of these patients. The ability to classify these cancers using genomic and transcriptomic molecular data is very important for the diagnosis and also determination of treatment strategies for a given patient. Here we propose to extend our pediatric cancer transcriptome map using an existing pipeline to include data generated from NCI TARGET in addition to those generated from St. Jude. Further, we propose to assess the predictive ability of our pediatric pan-cancer classifier using data generated from NCI TARGET. We have recently developed an analytical pipeline “RNA-Seq Expression Classification” for classifying pediatric cancers by gene expression, using 1,500 RNA-seq data hosted on St. Jude Cloud and generated by the St. Jude/Washington University Pediatric Cancer Genome Project, St. Jude Genomes4Kids Clinical Protocol and St. Jude Real Time Clinical Protocol. Feature counts from RNA-seq were normalized using DESeq2’s variance stabilizing transformation and batch effects (read length (bp), library strandedness (stranded forward, stranded reverse, and unstranded), RNA selection method (PolyA versus Total RNA), and read pairing (single- versus paired-end)) were removed using ComBat (sva package). The top 1000 most variably expressed genes based on median absolute deviation were then selected from each of the three major cancer types after which two-dimensional t-Distributed Stochastic Neighbor Embedding (t-SNE) was performed. The pipeline has recently been published on St. Jude Cloud (https://platform.stjude.cloud/workflows/rnaseq-expression-classification, McLeod, Gout et al, Cancer Discovery, 2020). The proposed study will enable integration of NCI TARGET RNA-seq data to St. Jude samples which will increase the diversity of cancer subtypes of this pediatric cancer map. We will also incorporate the genomic variants we classified in our published pan-TARGET study (Ma et al, Nature 2018) using a visualization feature we developed recently which enables query and visualization of genomic variants considered to be subtype classifiers. In addition, we are developing a pediatric pan-cancer classifier using the abovementioned RNA-seq data hosted on St. Jude Cloud. Here, the NCI TARGET RNA-seq data will serve as a test dataset, along with two other non-dbGaP data sets (ICGC and CCI ZERO data) to assess the prediction accuracy of the developed pan-cancer classifier. The proposed study will enable integration of NCI TARGET RNA-seq data to St. Jude samples which will increase the diversity of cancer subtypes of this pediatric cancer map. We will also incorporate the genomic variants we classified in our published pan-TARGET study (Ma et al, Nature 2018) using a visualization feature we developed recently to allow enables query and visualization of genomic variants considered to be subtype classifiers. In addition, we are developing a pediatric pan-cancer classifier using the abovementioned RNA-seq data hosted on St. Jude Cloud. Here, the NCI TARGET RNA-seq data will serve as a test dataset, along with two other non-dbGaP data sets (ICGC and CCI ZERO data) to assess the prediction accuracy of the developed pan-cancer classifier. Zhang, Jinsong SAINT LOUIS UNIVERSITY Exploring novel therapeutic targets in pediatric leukemias Nov06, 2015 closed Chromosomal translocations are frequently involved in pediatric leukemias. Many of these translocations affect genes that function in the transcription regulatory pathways. One such example is CBFA2T3 (also called myeloid transcription gene 16), which forms aberrant leukemia fusion proteins such as GLIS2-CBFA2T3 that are frequently observed in pediatric leukemias. Recently, we have reported that CBFA2T3 is an important player in leukemia relapse, which is the main cause of treatment failure in leukemias. Based on these studies, the goal of this project is to further understand the mechanisms underlying the relapse-promoting function of CBFA2T3. Given that relapse is also associated with accumulation of gene mutations in leukemia stem cells and pre-leukemia stem cells, the current study will explore the relationship between CBFA2T3 expression and the accumulation of these mutations. This study will benefit from the use of the TARGET leukemia datasets, which contain a large number of patient-matched normal, primary and relapsed samples. The derived results will help us better understand the role of CBFA2T3 in relapse and will establish CBFA2T3 and possible other genes as new therapeutic targets for the treatment of many pediatric leukemias. Raw data of the patient samples will be downloaded from the TARGET acute leukemia datasets, including both AML and ALL. They will be analyzed using the established bioinformatics pipelines including the GATK best practices workflows. First, somatic and germline gene variants will be called using these pipelines to compare patient-matched normal, primary and relapsed leukemia samples. These variants will then be filtered to focus on gene mutations associated with pre-leukemia stem cells (pre-LSCs) and leukemia stem cells (LSCs). Second, parallel studies will determine the expression levels of CBFA2T3 and correlate CBFA2T3 levels with the frequency of pre-LSC and LSC mutations. This will be carried out by comparing the changes of their levels during relapse for each patient. Third, the results will also be correlated with the clinical profiles of the patients, including their survival, cytogenetics, and mutation information. These studies are expected to elucidate whether CBFA2T3 expression is specifically associated with leukemia stem cell (LSC) or pre-LSC mutations or both. Finally, given that many mutated genes are transcriptional and epigenetic regulators, we will also attempt to determine the impact of these mutations on gene expression and the associated biological pathways by differential gene expression (DGE), gene set enrichment analysis (GSEA) and weighted gene co-expression network analysis (WGCNA). The derived results will provide important new insight into the role of CBFA2T3 in leukemia relapse and establish CBFA2T3 and possibly other genes as new therapeutic targets for pediatric leukemias. ZHAO, Lue Ping FRED HUTCHINSON CAN RES CTR Validating Association for Pediatric AML using NCI TARGET data Apr18, 2013 closed Our research group, Quantitative Genetic Epidemiology (QGE) of Fred Hutchinson Cancer Research Center, focuses on the development of statistical/bioinformatics methodologies for analyzing genetic/genomic, clinical and environmental data. We have identified DNA variant (SNPs) that associated with pediatric AML and would like to validate them using an independent dataset. If we can validate these finding from an independent dataset, these discovered genes could be very valuable in pediatric AML or other children cancers. Instead of looking the needles in the hay stack, our discovered and validated genes would be the focus in the future for both diagnosis and treatment researches for the pediatric AML or other children cancers. Our research group, Quantitative Genetic Epidemiology (QGE) of Fred Hutchinson Cancer Research Center, focuses on the development of statistical/bioinformatics methodologies for analyzing genetic/genomic, clinical and environmental data. In one of our GWAS studies we have identified SNPs that associated with pediatric AML using the historical FHCRC transplant cohort. We would like to validate these hits from an independent pediatric AML dataset. If validated we would like to further study the expression, CNV etc data type for the discovered genes. These genes or the DNA variants in these genes could play key roles in Pediatric AML or other children cancers. Certain genetic makeup may make those kids more likely to develop AML; this information would be very valuable in early diagnosis for the Children AML. If some of the genes we discovery play roles in pathological of the AML, our discovery would let the research develop the treatment based on these genes for the cancers. Before these benefit for the diagnosis and treatment for the pediatric cancer we have to validate them using the public data that we are requesting now. Zheng, Siyuan UNIVERSITY OF TEXAS HLTH SCIENCE CENTER Comprehensive characterization of telomere maintenance mechanisms in childhood cancer Jan23, 2018 closed Human chromosome can be simplistically viewed as a linear strand of genetic codes. Occasionally chromosomes acquire breaks generating small nicks due to environmental factors or simply by random. DNA damage response (DDR) pathways sense and repair these nicks, making sure genetic information passed along generations with high fidelity. Even though structurally similar, chromosomes ends are not recognized by DDR mechanisms thus avoid being ligased together. The reason lies in telomere, a structure that caps thus protects chromosome ends by forming a physically inaccessible loop to DDR. Telomere is dynamic; it shortens with each cell division. When shortened to a certain length, it loses the ability to protect chromosomes resulting in increased chromosomal aberrations and ultimately cell death. Importantly cells surviving this fate become cancer. The proposed study leverages the recent advances of computational methods and the availability of large volumes of cancer genomic data to disentangle the contribution of telomere to the observed cancer genome. We propose to perform a systematic survey of telomere maintenance programs in pediatric cancer. The goal is to characterize genetic and epigenetic alterations that underlie the reactivation of telomerase or alternative lengthening of telomere pathway. Recently TERT promoter rearrangement has been reported in 20% of high risk neuroblastoma, and these rearrangements define a group with particularly poor outcome (Peifer et al. Nature, 2015). Genetic and epigenetic alterations of TERT were reported to predicate worse outcome in adolescent and young adult melanoma patients (Seynnaeve et al. Sci Rep, 2017). A systematic analysis in many pediatric cancers has the potential to unveil TERT aberrations that may present new prognostic and therapeutic opportunities. Our pan-cancer analysis in adult cancer ratifies our experience with the proposed study (Barthel et al. Nat Genet, 2017). In the coming year, we will expand our algorithms to single cells and aim to answer important questions including cell of origin. The project is designed to utilize the comprehensive genomic dataset provided by TARGET. We will estimate telomere length using whole genome and whole exome data. We will catalogue mutations affecting TERT, ATRX and DAXX. We will call DNA rearrangement and gene fusions using whole genome and RNA sequencing data. We will catalogue TERT expression in all disease groups and use a previously defined gene signature to predict telomerase activity. The obtained information will be analyzed and delivered to advance our understanding of telomere maintenance mechanisms in childhood cancer. We will compare insights learned from TARGET with those from TCGA (adult cancers). Both data sets will be analyzed independently. The proposed research is a computational study thus poses no risk to participants. Zheng, Siyuan UNIVERSITY OF TEXAS HLTH SCIENCE CENTER A uniform PDX resource for childhood cancer Mar03, 2021 closed Patient Derived Xenograft (PDX) models are valuable tools for childhood cancer research. They recapitulate parental tumors in histology, genetics and pharmacokinetics. In the current plan, we aim to build a resource consisting hundreds of childhood cancer PDX models. This resource will facilitate a range of studies including finding new driver genes, biomarkers to drug responses, and new etiological factor, etc. We aim to build a uniform PDX resource for pediatric cancer. We plan to collect and uniformly process these PDX genomic datasets, and use the resulting data to find molecular correlates with drug responses. We are part of the PPTC project. As a source site, the Houghton laboratory at GCCRI, UT Health San Antonio contributed solid tumor samples to the consortium. The requestor (Dr. Siyuan Zheng) and Dr. Houghton were both on the PPTC publication, and were involved in the data preparation and manuscript preparation. Zhong, John UNIVERSITY OF SOUTHERN CALIFORNIA Molecular profiling of cancer May29, 2018 approved We are doing molecular profiling of cancer and request to access the raw data (BAM or FASTQ files) in NIH datasets. Our hypothesis is that single-cell molecular analysis will provide higher resolution of molecular characteristics of specific cancers than those extracted from population cell lysate studies. We developed and applied a microfluidic single-cell molecular analysis platform to obtain single-cell transcriptome and DNA sequencing from tumor cells including circulating tumor cells. Our single-cell molecular analysis reveals multiple mutations in multiple cancer types including pediatric cancers. Therefore, we would like to compare our data with the NIH data set from cell lysates to confirm whether the novel mutations are also in data from cell lysates. We will test the hypothesis that single-cell molecular profiling has higher resolution in identifying cancer related molecular characteristics including mutations and gene fusion. We have compare our own single-cell data with our lysate data and preliminary results indicate that single-cell molecular profiling has much higher resolution in mutation calling. Now, we want to include a larger data set from NIH to further improve our mutation calling mechanism. In our study, we also need access to pediatric cancer database such as TARGET. Our research on neuroblastoma and leukemia affects many children. In order to determine the molecular difference of cancers between adult and children, we need to compare our data to the pediatric cases, which can only be conducted using NIH pediatric data sets. By comparing our single-cell data to these TARGET pediatric cancers, our finding can be used to develop new target therapy for diagnosis and treatment of childhood cancers. We are doing molecular profiling of cancer and request to access the raw data (BAM or FASTQ files) in NIH datasets. We will not combine requested datasets with other datasets outside of dbGap. our hypothesis is that single-cell molecular analysis will provide higher resolution of molecular characteristics of specific cancers than those extracted from population cell lysate studies. We developed and applied a microfluidic single-cell molecular analysis platform to obtain single-cell transcriptome and DNA sequencing from tumor cells including circulating tumor cells. We would like to compare our single-cell data with the NIH data to investigate whether our single-cell data can reveal similar mutation profiles in NIH data from cell lysate. Our single-cell molecular analysis reveals multiple mutations in multiple cancer types including pediatric cancers. Therefore, we would like to compare our data with the NIH data set from cell lysates to confirm whether the novel mutations are also in data from cell lysates. We will test the hypothesis that single-cell molecular profiling has higher resolution in identifying cancer related molecular characteristics including mutations and gene fusion. We have compare our own single-cell data with our lysate data and preliminary results indicate that single-cell molecular profiling has much higher resolution in mutation calling. Now, we want to include a larger data set from NIH to further improve our mutation calling mechanism. We also would like to know whether our population study with lysate has result similar to these from other investigators. By using these data, we can increase the case number in our study. We will analysis the data independently but compare the results. In our study, we also need access to pediatric cancer database such as TARGET. Our research on neuroblastoma and leukemia affects many children. In order to determine the molecular difference of cancers between adult and children, we need to compare our data to the pediatric cases, which can only be conducted using NIH pediatric data sets. By comparing our single-cell data to these TARGET pediatric cancers, our finding can be used to develop new target therapy for diagnosis and treatment of childhood cancers. We will publish any findings from our studies in scientific journals, present them at conferences and shred the findings with the scientific community. Zhou, Chan UNIV OF MASSACHUSETTS MED SCH WORCESTER Identify and characterize noncoding RNA-derived fusions in pediatric cancer Aug11, 2023 approved Fusion RNAs and their encoded proteins are widely found in various cancers and have been used as biomarkers and therapeutic targets for multiple cancers. However, long noncoding RNAs (lncRNAs) derived fusion RNAs (lnc-fusions) have been largely ignored, especially in pediatric cancer. This study will integrate machine learning and computational approaches to identify and characterize lnc-fusions in pediatric cancer using multi-omics data from the Common Fund Datasets and external databases. Research Objective: Objective of the proposed research is to integrate multiple omics data and develop computational methods to identify long noncoding RNAs (lncRNAs)-derived fusion transcripts (lnc-fusions) as putative novel molecular basis for pediatric cancers and identify lnc-fusions as translational biomarkers. Rationale: 1) Fusion RNAs and their encoded proteins have been found in many pediatric tumors, such as Ewing sarcoma, neuroblastoma, pediatric acute myeloid leukemia (AML). Fusion molecules are common in pediatric cancers as early molecular alterations. Many fusion molecules can be used as therapeutic targets or biomarkers in various cancers, such as EWSR1-FLI1 in Ewing sarcoma and BCR-ABL1 in leukemia. Due to the availability of RNA-seq data of many tumor samples and normal tissue samples in public datasets, fusion events have been widely studied in both adult cancer and normal tissues. 2) The majority of human genome encodes lncRNAs, which play important roles in various cancers. In addition to mRNAs, lncRNAs also participate in fusion events and thus form lnc-fusions in adult tumor cells. For example, the mRNA-lncRNA fusion EPS15L1-lncOR7C2-1 has played critical roles in proptosis and anti-tumor immunity. Therefore, we hypothesize that lncRNAs may also involve lnc-fusions in pediatric cancers. To test this hypothesis, we will develop a computational method and pipeline to identify lnc-fusions from RNA-seq data. Study design and analysis plan (updated on March 9, 2023): (a) First, we will apply our recent developed and published Flnc tool (https://github.com/CZhouLab/Flnc) to identify novel lncRNAs from the RNA-seq data of normal tissues in the GTEx database. To identify novel lncRNAs which have not been annotated yet, but are expressed in normal tissues, we request to access the RNA-seq data of normal tissues in the GTEx dataset of the dbGaP. (b) Second, we will build a reference index for both the annotated and novel lncRNAs which are discovered in the first step. Next, we will develop a computational pipeline that can identify lnc-fusions originating from both annotated and unannotated lncRNAs. (c) Third, to identify lnc-fusions that are specifically expressed in pediatric cancers, we will collect RNA-seq data generated from various pediatric cell lines and patient samples from three sources (Kids First dataset (dbGaP portal), GEO database and St. Jude Cloud). (d) Forth, we will apply the automated pipeline developed in (a) to qualified RNA-seq data to identify lnc-fusions. Then, we will perform differential expression analysis to detect lnc-fusions that are specifically expressed in pediatric cancers. We will also compare these lnc-fusions with each other to identify the lnc-fusions that are specific to different types of pediatric cancers, such as neuroblastoma specific lnc-fusions and pediatric AML specific lnc-fusions. Expected outcomes: We will develop a pipeline to predict lnc-fusions directly from RNA-seq data. By applying this method to the collected data, we will identify lnc-fusions in tumors and non-tumors. Through comparing the lnc-fusions expressed in pediatric cancers with controls, we will identify pediatric cancer specific lnc-fusions. We will also experimental verify and interpret predicted results. Zhou, Jian-Guo THE SECOND AFFILIATED HOSPITAL OF ZUNYI MEDICAL UNIVERSITY Identify biomarkers for predict the efficacy of pediatric cancer patients with different treatments Aug05, 2020 approved Overall, childhood cancer is relatively rare, with fewer than 13,500 cases and about 1,500 deaths annually among children aged 0 to 14 years. In comparison, there are 1.4 million cases and 575,000 deaths annually among adults. However, cancer is the 2nd leading cause of death among children, following only injuries. Childhood cancers include many that also occur in adults. Leukemia is by far the most common, representing about 33% of childhood cancers, brain tumors represent about 25%, lymphomas represent about 8%, and certain bone cancers represent about 4%. Consequently, large effort was made to identify better predictive and prognostic markers. These efforts included different genetic mutations. Genomic instability was assessed via tumor mutational burden or mismatch repair deficiency. All of these methods predicted treatment response to a certain extent, but were not precise enough to be used in clinical routine. A further challenge is the imaging of the treatment response. This research aim to identify the ncRNA or immune subsets to predict the survival benefits of treatments. Overall, childhood cancer is relatively rare, with fewer than 13,500 cases and about 1,500 deaths annually among children aged 0 to 14 years. In comparison, there are 1.4 million cases and 575,000 deaths annually among adults. However, cancer is the 2nd leading cause of death among children, following only injuries. Childhood cancers include many that also occur in adults. Leukemia is by far the most common, representing about 33% of childhood cancers, brain tumors represent about 25%, lymphomas represent about 8%, and certain bone cancers represent about 4%. Despite increases in overall survival rates, about 20% of pediatric cancer patients do not respond well to therapy and ultimately die from their diseases. The number of children and adolescents diagnosed with cancer is trending slightly upward. Current treatments are particularly harsh on growing children. They often cause severe short- and long-term side effects, such as secondary cancers, physical and emotional health issues, developmental delays, and infertility. Current treatment protocols are mostly derived from therapeutic regimens that were formulated for adult cancers. Previous genomics studies revealed that childhood cancers can be genetically distinct from their adult counterparts, suggesting the need for alternate treatment approaches. This research aim to identify the biomarkers including lncRNAs, circRNAs, RNA editing, fusion, alterative splicing, to predict the survival benefits of different treatments in pediatric cancers. The datasets were aligned to the reference genome (UCSC hg38) using STAR24 (v.2.5.2a). Gene expression was subsequently quantified using RSEM25 (v.1.3.0). Expression counts estimated from RSEM were then normalized using DESeq2 (v.1.16.1) followed by log transformation (adding one pseudo-count and log2 transformation). Furthermore, validation in the individual datasets, and use machine-learning methods like LASSO regression, Neural Network, and Random Forest, et al, to generate the signature. Zhou, Jian-Guo THE SECOND AFFILIATED HOSPITAL OF ZUNYI MEDICAL UNIVERSITY identify immune biomarkers for predict the efficacy and safety of cancer patients treated with immune checkpoint therapy Feb17, 2021 approved Immune checkpoint involves the initiation, boosting and dampening of immune responses in a well-regulated spatio-temporal manner. In order to evade immune attack, cancer cells evolve to express the immune checkpoint molecules, e.g. PD-L1, that dampen immune responses. Hence, immune checkpoint inhibitors (ICIs) have been developed in recent years and brought into clinical use. ICI therapy is a kind of immunotherapy that regulates the immune checkpoint response to fight cancer. ICI therapy became popular in recent years due to high specificity and low toxicity as compared to chemotherapy. A cross-sectional study reported that as of 2018, 46.3% of US cancer patients were considered eligible for ICI treatment. However, the treatment response of cancer patients to ICI therapy remained below 20%. This suggests that additional factors are present to prevent the immune response against cancer. One hypothesis involves the genetic mutations that are rapidly evolving in cancer cells due to genomic instability and mismatch repair deficiency. Hence, this study aims to analyze the genetic factors of tumor tissue to discover novel cancer biomarkers to predict treatment efficiency and adverse effects of ICI therapy to improve treatment outcome. Immune checkpoint inhibitors (ICI) can effectively restore the activity of exhausted CD8 positive cytotoxic T cells. During the last years, several ICI were approved for more than ten different tumor entities. However, only some patients show durable responses, whereas the majority do not benefit from the immune checkpoint inhibitor. In clinical trials, the expression of PD-L1 on tumor and/or immune cells were frequently used to select patient subgroups with higher chances of treatment response. However, in several trials, PD-L1 expression only slightly increased the rate of treatment responders. Consequently, large effort was made to identify better predictive and prognostic markers. These efforts included different genetic mutations. Genomic instability was assessed via tumor mutational burden or mismatch repair deficiency. All of these methods predicted treatment response to a certain extent, but were not precise enough to be used in clinical routine. A further challenge is the imaging of the treatment response. This study aims to uncover genetic biomarkers that are relevant to assessing the efficacy and safety of ICI therapy to improve treatment outcome. Specifically, we are looking for genetic biomarkers that could provide insight for the following:(1) Identification of genetic biomarkers and matrices of such biomarkers that are involved in immune evasion by cancers.Cancers express immune suppressive molecules like PD-L1 and undergo rapid evolution to alter its surface markers to evade immune response. Hence, studying the biological mechanisms of immune evasion by cancers is essential to develop novel therapeutic means and manage cancer relapse.(2) Identification of biomarkers and matrices of such markers in addition to direct ICI targets that can predict therapy success, which includes the predication of progression-free survival (PFS) and overall survival (OS) of cancer patients undergoing ICI therapy. A cross-sectional study reported that as of 2018, 46.3% of US cancer patients were considered eligible for ICI treatment. However, the treatment response of cancer patients to ICI therapy remained below 20%. This suggests that additional factors are present to prevent immune attack on cancer. One hypothesis involves the genetic mutations that are rapidly evolving in cancer cells due to genomic instability and mismatch repair deficiency. Therefore, we aim to analyze the genetic signatures of cancers to identify biomarkers that are correlated to favorable therapeutic outcome during ICI therapy. This study aims to improve treatment outcome of ICI therapy by: 1. Development and validation of genetic signatures to predict the survival benefits of ICI therapy on cancer patients;2. Development and validation of genetic signatures to predict the irAEs of ICI therapy on cancer patients.This dataset will only be used in research consistent with the data use limitation for this study and will not be combined with other datasets of other phenotype. After our work, we will share our resultswith the scientific community by publishing my findings. Zhou, Wanding CHILDREN'S HOSP OF PHILADELPHIA DNA methylation-based pediatric cancer classification Aug01, 2024 approved Our research focuses on understanding how changes in DNA, specifically DNA methylation, can help diagnose and treat childhood cancers like leukemia, osteosarcoma, and Wilms' tumor. By studying these changes, we hope to find unique patterns that can be used to detect these cancers early and tailor treatments to each patient. We use advanced tools and data from the TARGET project to find these patterns and improve cancer care for children. DNA methylation is a crucial epigenetic modification that plays a significant role in regulating gene expression. In pediatric cancers such as acute myeloid leukemia (AML), osteosarcoma, and Wilms' tumor, aberrant DNA methylation patterns can serve as biomarkers for diagnosis and prognosis. The study of DNA methylation dynamics in these cancers is particularly promising because these patterns can be detected with high sensitivity and specificity, providing a non-invasive method for early detection and monitoring of disease progression. The Infinium HumanMethylation450 BeadChip (450K array) and the Infinium MethylationEPIC BeadChip (EPIC array), widely used in research, offer comprehensive coverage of the methylome, making them ideal for identifying methylation changes associated with pediatric cancers. In our research, we will utilize Infinium methylation data generated by the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. TARGET has produced a wealth of high-quality methylation data from various pediatric cancer samples, providing an invaluable resource for our studies. By analyzing these data, we aim to identify specific methylation signatures that can differentiate between cancer types and subtypes, understand the underlying epigenetic mechanisms driving tumorigenesis, and potentially uncover novel therapeutic targets. Our research objectives are to: (1) identify DNA methylation signatures associated with pediatric cancers; (2) investigate the epigenetic mechanisms underlying these cancers; and (3) develop diagnostic and prognostic biomarkers. To achieve these objectives, we will conduct a comprehensive study design that includes the collection of methylation data from pediatric cancer samples, integration with clinical and phenotypic data, and rigorous statistical analysis. We will test for associations between methylation patterns and phenotypic characteristics such as age at diagnosis, tumor stage, and response to treatment. Our analysis plan includes the use of bioinformatics tools to preprocess and normalize the methylation data, differential methylation analysis to identify significant loci, and integrative genomic approaches to correlate methylation changes with genetic variants and clinical outcomes. This research will not only advance our understanding of pediatric cancers but also pave the way for developing methylation-based diagnostic tools, improving early detection, and personalizing treatment strategies for young patients. Zhu, Hongtu UNIV OF NORTH CAROLINA CHAPEL HILL Pan-cancer analysis for childhood cancer Sep24, 2018 rejected Recently, there has been more and more work towards pan caner analysis for adult cancers, but there have been only a few studies focusing on childhood pan caner analysis. Pan cancer analysis for adults have been proven helpful and generated a lot of meaningful biomarkers for diagnosis and prognosis, and therapeutical targets, while it is almost certain that such analysis can be carried over to the childhood cancers, and we believe TARGET projects is a good starting point. Our research will include possible molecular subtypes, childhood tumor evolution, driving mutation/genes, and clonality in single nucleotide variants and copy number alteration. Our group is specialized in developing powerful statistical method to analyze such genetic data and we have published various articles on such studies. And our will provide the community with new findings that will help us better understanding the pediatric cancer and novel methods. We plan to: 1. Conduct cluster analysis using genomic data to discover possible molecular subtypes, for childhood pan caner analysis. 2. Conduct statistical analysis in childhood tumor evolution, driving mutation/genes, and clonality in single nucleotide variants and copy number alteration which requires the whole genome sequencing data. 3. Conduct eQTL for association and using these association along with the demographic information to determine whether or not certain associations are affecting the tumor behaviors, e.g. evolution. Zhu, Hongtu UNIVERSITY OF TX MD ANDERSON CAN CTR Pancan analysis for childhood cancer Jul31, 2017 closed We will use the genetic data to guide us to better understand the pediatric cancer. Our plan is not only looking at a cancer type at a time rather looking at several cancer types at the same time to discover their similarities and differences. Understand the similarity is important, so that the cancer treatment will be performed more systematically. While understand the difference is also critical since it helps the design of precision medicine. There have been a lot of study focused on adults, but we believe it is critical for childhood cancer as well. Therefore, we propose this project to better understand various aspects of childhood cancers through a pan cancer analysis. Recently, it has been more and more work towards pancan analysis for adult cancers, but there have been only a few study focused on childhood pan caner analysis. Pan cancer analysis for adults have been proven helpful and generated a lot of meaningful biomarkers for diagnosis and prognosis, and therapeutical targets. it is almost certain that such analysis can be carried over to the childhood cancers, and we believe TARGET projects is a good starting point. We plan to conduct cluster analysis using genomic data to discover possible molecular subtypes. We are also interested in childhood tumor evolution, driving mutation/genes, and clonality in single nucleotide variants and copy number alteration which requires the whole genome sequencing data. Besides these, we are planning to conduct eQTL for association and using these association along with the demographic information to determine whether or not certain associations are affecting the tumor behaviors, e.g. evolution. Our group is specialized in developing powerful statistical method to analyze such genetic data and we have published various articles on such studies. And our goal to provide the community with new findings that will help us better understanding the pediatric cancer and novel methods. ZHU, JINGCHUN UNIVERSITY OF CALIFORNIA SANTA CRUZ Data visualization of cancer genomics data Apr22, 2020 closed We build web-based interactive data visualizations to enable the research community to interact with and explore the large amount of cancer genomics data. Our team is the CCG’s Genomics Data Analysis Network’s data visualization center. Our research goal is to enable the research community to interact with and explore the cancer genomics data, in particular, those available through GDC. We developed UCSC Xena, an interactive interface for visualizing genomics data, and will continue to refine our platform as the network’s visualization center. We intent to publish or otherwise broadly share any findings from our work on this study with the scientific community. Zhu, Qianqian ROSWELL PARK CANCER INSTITUTE CORP Susceptibility of ALL, AML and MDS Mar17, 2022 approved The purpose of this study is to identify new genes and pathways contributing to development of blood cancers by integrating comprehensive genetic data from the DISCOVeRY-BMT cohorts with four additional cohorts with existing genetic and genomic data from dbGap. Our study will help improve risk prediction and prevention of blood cancers. The aim of the study is to identify new susceptibility genes of Acute Lymphoblastic Leukemia (ALL), Acute Myeloid Leukemia (AML) and Myelodysplastic Syndrome (MDS). We performed a genome-wide association study of ALL, as well as AML and MDS, based on the DISCOVeRY-BMT cohorts and identified novel susceptibility loci (Clay-Gilmour et al., Blood Adv. 2017; Wang et al., Front Genet. 2021). The DISCOVeRY-BMT cohorts included ~5,600 cases and ~5,600 controls. All cases were diagnosed with ALL, AML, or MDS and received allogeneic blood or marrow transplant. The controls were healthy unrelated donors. We are performing whole-exome sequencing on the DISCOVeRY-BMT cohorts to systematically investigate the contribution of coding variants to these three hematologic malignancies. To increase our power to detect new susceptibility genes of pediatric ALL, we will combine the pediatric cases from DISCOVeRY-BMT cohorts with pediatric cases from the TARGET project in our case-control association analysis. To increase our power to detect new AML and MDS susceptibility genes, we will combine the DISCOVeRY-BMT cohorts with AML and MDS patients from the Functional Genomic Landscape of Acute Myeloid Leukemia (BEATAML1.0-COHORT) project, the Clinical Resistance to Crenolanib in Acute Myeloid Leukemia Due to Diverse Molecular Mechanisms (BEATAML1.0-CRENOLANIB) project, and the TCGA project. To identify susceptibility genes specific for pediatric AML and MDS, we will combine the DISCOVeRY-BMT cohorts with pediatric cases from the TARGET and BEATAML1.0-COHORT project. Both single-variant and gene-level case-control association analysis will be carried out to identify variants and genes significantly associated with disease risk. We will further evaluate the significance of these newly identified variants/genes and elicit the underlying biological pathways by incorporating the rich genomic data, including RNA-seq, and/or miRNA-seq, from the TARGET, BEATAML1.0-COHORT, and TCGA projects. We will associate these variants/genes with age of onset, sex, disease subtypes, and additional clinical features. We will also study the interaction of these new variants/genes with known cancer genes. Zou, Lihua NORTHWESTERN UNIVERSITY AT CHICAGO Multi-scale modeling of oncogenic pathways in pediatric cancer Nov18, 2020 expired This is a research project that focuses on identifying an aberrant pathway that fuels the growth of cancer cells in multiple human cancers. Our initial analysis indicates there are additional genes and molecules in this pathway that can be identified and targeted. We want to use the TARGET and KidsFirst dataset to develop systematic ways to identify additional targets involved. The findings could contribute to novel biomarkers and mechanistic insights in pediatric cancers. To identify novel therapeutic targets of childhood cancer, we plan to conduct an unbiased meta-analysis of gene signatures from multiple public datasets including TCGA and pediatric sequencing data including TARGET and KidsFirst. Our preliminary work identified a gene network centered on the FOXM1 transcription factor. FOXM1 is a master regulator of cell cycle and plays important roles in maintaining neural, progenitor, and GBM stem cells. FOXM1 has been shown to play roles in radio-resistance, EMT, invasion and metastases in glioblastoma cells. It is emerging as a biomarker of poor prognosis and a potential therapeutic target in multiple brain tumor types. It is involved with DNA repair, centromere and kinetochore assembly and is downstream of multiple signaling pathways including PI3K/Akt/Foxo, Nf-kB/Stat3, -catenin/Wnt, HIF1a as well as metabolic/oxidative stress response. Given the central role of FoxM1 in proliferation, differentiation and cell death, its transcription must be tightly regulated throughout the development of a glioma cell. Here are our specific aims: 1) we plan to integrate multi-omic datasets with pediatric sequencing data including TARGET and Kids First Project to identify additional molecular players involved in the FOXM1 oncogenesis; 2) we plan to project gene signatures across cancer types to systematically delineate oncogenic pathways involved in human cancers.