Uncertainty in phylogenetics

Inferring phylogenies is an inherently uncertain process because we usually have no more information to guide us than the sequences from our present day taxa. Some sequences are more informative than others, and so provide us with better estimates of genealogical relationships. 

Estimating our confidence in trees is itself a difficult problem. In other areas of bioinformatics we can examine metrics such as sensitivity and specificity, which assess the estimated or inferred results of a method against empirically known true positive and negative examples. However this is not possible in case of phylogenetics because we do not have examples of ‘known’ ancestral sequences and phylogenies.

Approaches for estimating confidence

There are a several approaches that are commonly used to estimate our confidence in the inferred tree topology including bootstraps, likelihood and Bayesian approaches. Whilst it is beyond the scope of this course to explain how these are estimated, we will explore basic aspects of their interpretation using the most widely used example of the (non-parametric) bootstrap.  

Learn more about how bootstraps are computed in this tutorial by Sandra Baldauf (4).

How to read confidence estimates on a tree

Confidence estimates on a tree refer to the internal branches that they are shown on. An example is provided in Figure 11.

Confidence can be represented in multiple ways including assigning numbers out of 100 or a star-based system.
Figure 11 Two typical representations of confidence on part of a phylogeny.

Typically you will see bootstrap (or other confidence estimate) values shown on a phylogeny. For example in the left-hand tree in Figure 11, bootstrap values are shown from 100 replicates. Sometimes asterisks (*) are used instead to indicate branches where there is greater than 80% (or 90%) bootstrap support, as shown in the right-hand tree in Figure 11. In this example, the bootstrap value of 63 out of 100 (63%) is not represented by an asterisk because it is less than 80% support.

Interpreting confidence estimates on a tree

Interpreting the exact meaning of confidence values on a phylogeny is still an area of debate, but experts generally agree that we can accept branches with >80% bootstrap support (or 90% depending on who you ask!), provided that an appropriate evolutionary model was used to estimate the phylogeny.  

Taking the above approach enables us to to make the following statements about Figure 11:

  • “there is consistent (100% bootstrap) support that taxa C and D are more closely related to each other than they are to B”
  • “from these data it is unclear that B, C and D are each other’s closest relatives”
Mark as complete