IDBA-MT: de novo assembler for metatranscriptomic data generated from next-generation sequencing technology

J Comput Biol. 2013 Jul;20(7):540-50. doi: 10.1089/cmb.2013.0042.

Abstract

High-throughput next-generation sequencing technology provides a great opportunity for analyzing metatranscriptomic data. However, the reads produced by these technologies are short and an assembling step is required to combine the short reads into longer contigs. As there are many repeat patterns in mRNAs from different genomes and the abundance ratio of mRNAs in a sample varies a lot, existing assemblers for genomic data, transcriptomic data, and metagenomic data do not work on metatranscriptomic data and produce chimeric contigs, that is, incorrect contigs formed by merging multiple mRNA sequences. To our best knowledge, there is no assembler designed for metatranscriptomic data. In this article, we introduce an assembler called IDBA-MT, which is designed for assembling reads from metatranscriptomic data. IDBA-MT produces much fewer chimeric contigs (reduce by 50% or more) when compared with existing assemblers such as Oases, IDBA-UD, and Trinity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Bacteria / genetics
  • Computer Simulation
  • Gastrointestinal Tract
  • Gene Expression Profiling*
  • High-Throughput Nucleotide Sequencing*
  • Metagenomics*
  • Mice
  • RNA, Messenger / genetics
  • Real-Time Polymerase Chain Reaction
  • Repetitive Sequences, Nucleic Acid
  • Reverse Transcriptase Polymerase Chain Reaction
  • Sequence Analysis, DNA*
  • Software*

Substances

  • RNA, Messenger