Optimal selection of gene and ingroup taxon sampling for resolving phylogenetic relationships

Syst Biol. 2010 Jul;59(4):446-57. doi: 10.1093/sysbio/syq025. Epub 2010 May 19.

Abstract

A controversial topic that underlies much of phylogenetic experimental design is the relative utility of increased taxonomic versus character sampling. Conclusions about the relative utility of adding characters or taxa to a current phylogenetic study have subtly hinged upon the appropriateness of the rate of evolution of the characters added for resolution of the phylogeny in question. Clearly, the addition of characters evolving at optimal rates will have much greater impact upon accurate phylogenetic analysis than will the addition of characters with an inappropriate rate of evolution. Development of practical analytical predictions of the asymptotic impact of adding additional taxa would complement computational investigations of the relative utility of these two methods of expanding acquired data. Accordingly, we here formulate a measure of the phylogenetic informativeness of the additional sampling of character states from a new taxon added to the canonical phylogenetic quartet. We derive the optimal rate of evolution for characters assessed in taxa to be sampled and a metric of informativeness based on the rate of evolution of the characters assessed in the new taxon and the distance of the new taxon from the internode of interest. Calculation of the informativeness per base pair of additional character sampling for included taxa versus additional character sampling for novel taxa can be used to estimate cost-effectiveness and optimal efficiency of phylogenetic experimental design. The approach requires estimation of rates of evolution of individual sites based on an alignment of genes orthologous to those to be sequenced, which may be identified in a well-established clade of sister taxa or of related taxa diverging at a deeper phylogenetic scale. Some approximate idea of the potential phylogenetic relationships of taxa to be sequenced is also desirable, such as may be obtained from ribosomal RNA sequence alone. Application to the solution of recalcitrant unresolved nodes in an otherwise well-known phylogeny is the most obvious application. We validate the theory by analysis of its predictions regarding the phylogenetic informativeness for taxon addition of 46 amino acid alignments of 21 fungal taxa. Gene and taxon sampling according to the theory herein and following a "deepest ingroup" heuristic are shown to provide significantly improved resolution of specified deep internodes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Ascomycota / genetics
  • Genetic Speciation
  • Models, Genetic*
  • Phylogeny*
  • Time Factors