The Encyclopedia of Life (EOL) Project
Phil Bourne1, Wilfred Li2, Baldridge, K.; Baru, C.; Byrnes, R.; Clingman, E.; Cotofana, C.; Ferguson, C.; Fountain, A.; Greenberg, J.; Jermanis, D.; Matthews, J.; Miller, M.; Mitchell, J.; Mosley, M.; Pekurovsky, D.; Quinn, G.B.; Reyes, V.; Rowley, J.; Shindyalov, I.; Smith, C.; Stoner, D.; Veretnik, S.
1bourne@sdsc.edu, San Diego Supercomputer Center; 2wilfred@sdsc.edu, San Diego Supercomputer Center
There are currently more than 800 genomes for which sequence data is
publicly available. Accompanying this massive supply of genomic data is a
need to annotate putative protein sequences from structural and functional
points of view. The Encyclopedia of Life (EOL) is an ambitious project to
extensive catalog the complete proteome of every living species in a
flexible,
powerful reference system. An open collaboration led by the San Diego
Supercomputer Center, EOL will generate biological insight using the world's
foremost academic computational resources. This includes calculating
three-dimensional models and assigning biological function for all
recognizable proteins in all currently known genomes.
Central to EOL genomic data processing is the use of an annotation pipeline,
a computationally intensive process utilizing Grid, supercomputer and cluster
computing resources. Important issues in the pipeline process are automation
and associated automated quality assessment. In the pipeline model, this was
addressed by the introduction of six reliability categories, a benchmark
based on 1000 non-redundant SCOP folds and testing a variety of search
conditions and methods within the benchmark.
Scientists will be able to uncover the prevalence of a given protein across
all kingdoms of life, molecular interactions with that protein, and whether
the function of the protein varies across species. EOL caters to a diversity
of users, from researchers interested in proteomic associations, to
undergraduates wishing to know the name and function of proteins associated
with a particular organism, and even to elementary school students learning
about proteins for the first time. For further information about the EOL
project and to access the beta development version, point your web browser
to: http://www.eolproject.info