Functional classification of long non-coding RNAs by k-mer content

Nat Genet. 2018 Oct;50(10):1474-1482. doi: 10.1038/s41588-018-0207-8. Epub 2018 Sep 17.

Abstract

The functions of most long non-coding RNAs (lncRNAs) are unknown. In contrast to proteins, lncRNAs with similar functions often lack linear sequence homology; thus, the identification of function in one lncRNA rarely informs the identification of function in others. We developed a sequence comparison method to deconstruct linear sequence relationships in lncRNAs and evaluate similarity based on the abundance of short motifs called k-mers. We found that lncRNAs of related function often had similar k-mer profiles despite lacking linear homology, and that k-mer profiles correlated with protein binding to lncRNAs and with their subcellular localization. Using a novel assay to quantify Xist-like regulatory potential, we directly demonstrated that evolutionarily unrelated lncRNAs can encode similar function through different spatial arrangements of related sequence motifs. K-mer-based classification is a powerful approach to detect recurrent relationships between sequence and function in lncRNAs.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Base Sequence
  • Cluster Analysis
  • Conserved Sequence
  • Databases, Genetic
  • Hep G2 Cells
  • Humans
  • K562 Cells
  • Mice
  • Molecular Sequence Annotation
  • Nucleic Acid Conformation
  • Nucleotide Motifs* / genetics
  • Potassium Channels, Voltage-Gated / genetics
  • RNA, Long Noncoding / chemistry
  • RNA, Long Noncoding / classification*
  • RNA, Long Noncoding / genetics*
  • Sequence Alignment
  • Sequence Analysis, RNA / methods*

Substances

  • KCNQ1OT1 long non-coding RNA, human
  • Potassium Channels, Voltage-Gated
  • RNA, Long Noncoding
  • XIST non-coding RNA