PyPop: A framework for large-scale population genomics analysis
Alex Lancaster1, Mark P. Nelson2, Richard M. Single; Diogo Meyer; Glenys Thomson
1alexl@socrates.berkeley.edu, UC Berkeley; 2, UC Berkeley
PyPop (Python for Population Genetics) is a suite of programs for the
large-scale analysis of multi-locus population genetic data. It
includes tests for conformity to Hardy-Weinberg Proportions (HWP);
tests for balancing or directional selection; estimates of haplotype
frequencies; measures of linkage disequilibrium (LD) and tests of
significance of LD. It can also interoperate with other population
genetic packages such as Arlequin. PyPop is an object-oriented
framework implemented in Python, and was originally developed to
analyze the highly polymorphic HLA region in the human genome, but can
be used for any multi-locus data. Outputs of the analyses are stored
in XML which can then be transformed into many other data formats
suitable for machine input (such as PHYLIP) or input for spreadsheet
programs or statistical packages, such as R, plain text, or HTML.
Storing the output in XML allows the final viewable output format to
be redesigned at will, without requiring the time-consuming re-running
of the statistical tests. The XML output facilitates the processing
of results from analyses on large numbers of populations. PyPop will
be made freely available under the GNU GPL at:
http://allele5.biol.berkeley.edu/pypop/