Diversity of translation start sites may define increased complexity of the human short ORFeome

Mol Cell Proteomics. 2007 Jun;6(6):1000-6. doi: 10.1074/mcp.M600297-MCP200. Epub 2007 Feb 21.

Abstract

Our previous proteomics analysis of small proteins expressed in human K562 cells provided the first direct evidence of translation of upstream ORFs in human full-length cDNAs (Oyama, M., Itagaki, C., Hata, H., Suzuki, Y., Izumi, T., Natsume, T., Isobe, T., and Sugano, S. (2004) Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res. 14, 2048-2052). In the present study, we performed an in-depth proteomics analysis of human K562 and HEK293 cells using a two-dimensional nano-liquid chromatography-tandem mass spectrometry system. The results led to the identification of eight protein-coding regions besides 197 small proteins with a theoretical mass less than 20 kDa that were already annotated coding sequences in the curated mRNA database. In addition to the upstream ORFs in the presumed 5'-untranslated regions of mRNAs, bioinformatics analysis based on accumulated 5'-end cDNA sequence data provided evidence of novel short coding regions that were likely to be translated from the upstream non-AUG start site or from the new short transcript variants generated by utilization of downstream alternative promoters. Protein expression analysis of the GRINL1A gene revealed that translation from the most upstream start site occurred on the minor alternative splicing transcript, whereas this initiation site was not utilized on the major mRNA, resulting in translation of the downstream ORF from the second initiation codon. These findings reveal a novel post-transcriptional system that can augment the human proteome via the alternative use of diverse translation start sites coupled with transcriptional regulation through alternative promoters or splicing, leading to increased complexity of short protein-coding regions defined by the human transcriptome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 5' Untranslated Regions / chemistry
  • 5' Untranslated Regions / genetics
  • Alternative Splicing / genetics
  • Amino Acid Sequence
  • Base Sequence
  • Chromatography, Liquid
  • Codon, Initiator / genetics*
  • Genome, Human / genetics
  • Humans
  • K562 Cells
  • Mass Spectrometry
  • Molecular Sequence Data
  • Neoplasm Proteins / chemistry
  • Neoplasm Proteins / genetics
  • Open Reading Frames / genetics*
  • Protein Biosynthesis / genetics*
  • Proteome / genetics*
  • Proteomics
  • RNA Polymerase II
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Receptors, Glutamate / genetics

Substances

  • 5' Untranslated Regions
  • Codon, Initiator
  • Neoplasm Proteins
  • POLR2M protein, human
  • Proteome
  • RNA, Messenger
  • Receptors, Glutamate
  • RNA Polymerase II