An ultra-fast and scalable quantification pipeline for transposable elements from next generation sequencing data

Pac Symp Biocomput. 2018:23:168-179.

Abstract

Transposable elements (TEs) are DNA sequences which are capable of moving from one location to another and represent a large proportion (45%) of the human genome. TEs have functional roles in a variety of biological phenomena such as cancer, neurodegenerative disease, and aging. Rapid development in RNA-sequencing technology has enabled us, for the first time, to study the activity of TE at the systems level.However, efficient TE analysis tools are not yet developed. In this work, we developed SalmonTE, a fast and reliable pipeline for the quantification of TEs from RNA-seq data. We benchmarked our tool against TEtranscripts, a widely used TE quantification method, and three other quantification methods using several RNA-seq datasets from Drosophila melanogaster and human cell-line. We achieved 20 times faster execution speed without compromising the accuracy. This pipeline will enable the biomedical research community to quantify and analyze TEs from large amounts of data and lead to novel TE centric discoveries.

MeSH terms

  • Algorithms
  • Amyotrophic Lateral Sclerosis / genetics
  • Animals
  • Argonaute Proteins / genetics
  • Computational Biology / methods
  • DNA Transposable Elements / genetics*
  • DNA-Binding Proteins / genetics
  • Databases, Nucleic Acid / statistics & numerical data
  • Drosophila Proteins / genetics
  • Drosophila melanogaster / genetics
  • Gene Knockdown Techniques
  • Gene Library
  • High-Throughput Nucleotide Sequencing / statistics & numerical data*
  • Humans
  • K562 Cells
  • Likelihood Functions
  • Models, Statistical
  • Sequence Analysis, RNA / statistics & numerical data*

Substances

  • Argonaute Proteins
  • DNA Transposable Elements
  • DNA-Binding Proteins
  • Drosophila Proteins
  • TARDBP protein, human
  • piwi protein, Drosophila