Goldman Group Softwares
RATS: Resource Aware Taxon SelectionThe phylogenetic diversity (PD) of a set of taxonomical units (e.g. genes, individuals, populations, species) is the total length of the evolutionary tree connecting them. This measure is relevant for choosing taxa in a variety of applications. In comparative genomics, the statistical power in testing various evolutionary hypotheses (e.g. low substitution rates) and in finding interesting genomic features (e.g. protein-coding genes, noncoding conserved elements) is strongly correlated with the total PD of the sequences being compared. Therefore, sequencing projects (both at the genome and gene level) should target taxa with a high total PD (see  and references therein). In biodiversity conservation, when not all species or populations in a geographical area can be protected, it is reasonable to concentrate conservation efforts on a subset with maximum PD .
Given the growing interest in PD, Fabio Pardi has worked on a hierarchy of optimisation problems where the aim is to select, from a collection of candidate taxa, a subset with maximum total PD. Depending on the nature of the constraints on this subset, the problems have varying computational complexity and different algorithmic solutions are devisable. When the aim is to select a fixed number of taxa, a simple greedy algorithm can be shown to produce optimal solutions . When different taxa require different amounts of resources (money, time, etc.) for their selection (sequencing or conservation) and we have a limit on the total amount of resources available, however, it transpires that the problem is more difficult. We have devised a novel dynamic programming algorithm that can compute the optimal solution efficiently .
This algorithm is implemented by the program rats. C++ sources and documentation are available here .
Our ideas can also be applied to a framework for bioconservation proposed by economists, called the Noah's Ark Problem or 'NAP' . The NAP explicitly models species-specific extinction risks and their relationships with the amount of resources used for conservation (see figure 1 below). The objective now is to find the optimal way to distribute the available resources among the species under consideration, so as to maximise the expected PD of the species that will survive until some specified future time. The algorithms mentioned above can be adapted to efficiently solve some special cases of the NAP . Solving more general formulations of the NAP is also possible, albeit at the expense of some efficiency .
Figure 1. A special case of the Noah's Ark Problem (NAP). Ten species of butterflies (related through the evolutionary tree shown) are threatened with extinction (survival probabilities in blue). Each of them can be saved from extinction by investing the amount of resources (e.g. millions of euros) shown in red. The solution of the NAP when the limit on the total investment is EUR220 million consists of conserving the three species in the red circles. The resulting expected phylogenetic diversity of the species that will survive (i.e. the expected length of the tree that will remain) is represented by the portion of the tree highlighted in blue. Note that this scenario is a fictional example, for illustrative purposes only.
 F. Pardi and N. Goldman. Species choice for comparative genomics: no need for cooperation. PLoS Genetics, 1:e71, 2005.
 G.M. Mace, J.L. Gittleman, and A. Purvis. Preserving the Tree of Life. Science, 300:1707, 2003.
 F. Pardi and N. Goldman. Resource aware taxon selection for maximizing phylogenetic diversity. Systematic Biology, submitted.
 M.L. Weitzman. The Noah's Ark Problem. Econometrica 66:1279-1298, 1998.
 F. Pardi and N. Goldman. An algorithmic solution to the Noah's Ark Problem. Manuscript in preparation.