Substitution matrix based color schemes for sequence alignment visualization

BMC Bioinformatics. 2020 May 24;21(1):209. doi: 10.1186/s12859-020-3526-6.

Abstract

Background: Visualization of multiple sequence alignments often includes colored symbols, usually characters encoding amino acids, according to some (physical) properties, such as hydrophobicity or charge. Typically, color schemes are created manually, so that equal or similar colors are assigned to amino acids that share similar properties. However, this assessment is subjective and may not represent the similarity of symbols very well.

Results: In this article we propose a different approach for color scheme creation: We leverage the similarity information of a substitution matrix to derive an appropriate color scheme. Similar colors are assigned to high scoring pairs of symbols, distant colors are assigned to low scoring pairs. In order to find these optimal points in color space a simulated annealing algorithm is employed.

Conclusions: Using the substitution matrix as basis for a color scheme is consistent with the alignment, which itself is based on the very substitution matrix. This approach allows fully automatic generation of new color schemes, even for special purposes which have not been covered, yet, including schemes for structural alphabets or schemes that are adapted for people with color vision deficiency.

Keywords: Color space; Open source; Optimization; Python; Sequence alignment.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Amino Acid Substitution*
  • Color
  • Computer Simulation
  • Humans
  • Sequence Alignment / methods*