The Evaluation of Different Approaches to Infer Positive Selection Sites

Li Jia1, Tao Jiang2, Michael Clegg
1lijia@cs.ucr.edu, University of California; 2jiang@cs.ucr.edu, University of California

It is essential in the study of protein functions to identify single amino acids that are responsible for improving the relative fitness of genotypes of their carrier in the population, which is subjected to positive selection. However, it is not trivial to infer positive selection sites associated with the evolution of a gene family when negative selection and/or purify selection played major roles in the phylogeny of such gene family. Several different approaches have been proposed to detect positive selection at single amino acid sites (Fitch et al. 1997; Nielsen and Yang 1998; Suzuki and Gojobori 1999; Xun Gu 2001; Li Jia, Michael Clegg and Tao Jiang 2003). The performance of these approaches was evaluated, in terms of their sensitivities and specificities to predict positive selection sites, by conducting computer simulations and analyzing the human leukocyte antigen (HLA) gene and HIV-1 env gene. Different crucial parameters were used in our test, including the degree of positive selection, the number of taxa involved, the divergence of gene sequence and the type of homologs (orthology or paralogy or both). Benchmarks were assigned to these approaches so that researchers should be able to apply an appropriate method to their positive selection analysis under certain circumstance. Our initial result suggests that both Jia’s and Suzuki’s methods give accurately estimated positive selection sites as long as the substitution number on each branch was relatively small. The false-positive rate for detecting the selective force was generally low. On the other hand, the true-positive rate for detecting the selective force depended on the parameter settings. The supplementary materials will be made available at http://www.cs.ucr.edu/~lijia.