KNIME for reproducible cross-domain analysis of life science data

J Biotechnol. 2017 Nov 10:261:149-156. doi: 10.1016/j.jbiotec.2017.07.028. Epub 2017 Jul 27.

Abstract

Experiments in the life sciences often involve tools from a variety of domains such as mass spectrometry, next generation sequencing, or image processing. Passing the data between those tools often involves complex scripts for controlling data flow, data transformation, and statistical analysis. Such scripts are not only prone to be platform dependent, they also tend to grow as the experiment progresses and are seldomly well documented, a fact that hinders the reproducibility of the experiment. Workflow systems such as KNIME Analytics Platform aim to solve these problems by providing a platform for connecting tools graphically and guaranteeing the same results on different operating systems. As an open source software, KNIME allows scientists and programmers to provide their own extensions to the scientific community. In this review paper we present selected extensions from the life sciences that simplify data exploration, analysis, and visualization and are interoperable due to KNIME's unified data model. Additionally, we name other workflow systems that are commonly used in the life sciences and highlight their similarities and differences to KNIME.

Keywords: KNIME; Life science; Workflow systems.

Publication types

  • Review

MeSH terms

  • Biological Science Disciplines
  • Computational Biology*
  • High-Throughput Nucleotide Sequencing
  • Image Processing, Computer-Assisted
  • Mass Spectrometry
  • Software*