TitleExperimental design in phylogenetics: testing predictions from expected information
Publication TypeJournal Article
Year of Publication2012
AuthorsSan Mauro D, Gower DJ, Cotton JA, Zardoya R, Wilkinson M, Massingham T
JournalSystematic Biology
Volume61
Pagination661-674
KeywordsINFORMATION EXPERIMENTAL DESIGN ebigroup
AbstractTaxon and character sampling is central to phylogenetic experimental design yet we lack general rules. Goldman introduced a method to construct efficient sampling designs in phylogenetics, based on the calculation of expected Fisher information given a probabilistic model of sequence evolution. The considerable potential of this approach remains largely unexplored. In an earlier study, we applied Goldman’s method to a problem in the phylogenetics of caecilian amphibians and made an a priori evaluation and testable predictions of which taxon additions would increase information about a particular weakly supported branch of the caecilian phylogeny by the greatest amount. We have now gathered mitogenomic and rag1 sequences (some newly determined for this study) from additional caecilian species, and studied how information (both expected and observed) and bootstrap support varies as each new taxon is individually added to our previous dataset. This provides the first empirical test of specific predictions made using Goldman’s method for phylogenetic experimental design. Our results empirically validate the top three (more intuitive) taxon addition predictions made in our previous study, but only information results validate unambiguously the fourth (less intuitive) prediction. This highlights a complex relationship between information and support, reflecting that each measures different things: information is related to the ability to estimate branch length accurately, and support to the ability to estimate the tree topology accurately. Thus, an increase in information may be correlated with but does not necessitate an increase in support. Our results also provide the first empirical validation of the widely held intuition that additional taxa that join the tree proximal to poorly supported internal branches are more informative and enhance support more than additional taxa that join the tree more distally. Our work supports the view that adding more data for a single (well chosen) taxon may increase phylogenetic resolution and support in weakly supported parts of the tree without adding more characters/genes. Altogether our results corroborate that, although still underexplored, Goldman’s method offers a powerful tool for experimental design in molecular phylogenetic studies. However, there are still several drawbacks to overcome, and further assessment of the method is needed in order to make it better understood, more accessible, and able to assess the addition of multiple taxa.