Computing accurate phylogenies from gene-order data
Jijun Tang1, Bernard M.E. Moret2
1jtang@cs.unm.edu, University of New Mexico; 2moret@cs.unm.edu, University of New Mexico
DCM-GRAPPA is a highly accurate method for phylogeny reconstruction from
gene-order data that scales gracefully to one thousand genomes, greatly
extending the range and accuracy of existing methods.
DCM combines the disk-covering method (DCM) of Warnow et al. with
the GRAPPA suite of software. GRAPPA, based on an approach pioneered by
Sankoff, is the most accurate method to date for phylogenetic
reconstruction from gene-order data, but is limited computationally
to 16 genomes; DCM-GRAPPA removes that limit without losing accuracy
through a two-step approach: it first decomposes the dataset into smaller
overlapping pieces and runs GRAPPA on the pieces; it then uses the strict
consensus method of DCM to produce a single tree from the overlapping
trees produced by GRAPPA. (Details of our work with DCM-GRAPPA appear
in this conference.)
We also extended GRAPPA itself to handle limited amounts of duplication
and deletion among the genomes -- a necessary feature to work with real
datasets. Again, the resulting software produces reconstructions
accurate within a few percent. (Details of our work with unequal gene
content will appear in the proceedings of the 8th Workshop on Algorithms
and Data Structures, WADS'03.)
GRAPPA and DCM-GRAPPA are available in source form at
http://www.cs.unm.edu/~moret/GRAPPA/