Structural Classification in the Gene Ontology

Cliff Joslyn1, Susan Mniszewski2, Andy Fulmer, Gary Heaton
1joslyn@lanl.gov, Los Alamos National Laboratory; 2smm@lanl.gov, Los Alamos National Laboratory

Use of ontological structures such as the Gene Ontology (GO) [1] are increasingly a standard part of a typical biologist's work day. We have been pursuing work in structural classification of the GO: given a list of genes of interest, how are they organized with respect to the GO? Are they centralized, dispersed, grouped in one or more clusters? With respect to the biological functions which make up the GO, do the genes represent a collection of more general or more specific functions, a coherent collection of functions or distinct functions? Existing approaches to these questions [2,3] have relied on the statistics of how ontology nodes are generally populated, and/or use a distance based on the minimal path length between two nodes [4]. Our approach [5] is based on the following principles: We present our approach to structural classification in the GO based on pseudo-distances in posets. Our system, the Gene Ontology Clusterer (GOC), uses pseudo-distances between comparable nodes only, in conjunction with scoring algorithms, to rank-order the GO nodes with respect to the requested genes. We will also present the lessons we've learned about working with the GO, in particular the following kinds of issues: [1] Ashburner, M; Ball, CA; and Blake, JA et al.: (2000) ``Gene Ontology: Tool For the Unification of Biology'', Nature Genetics, 25:1, pp. 25-29
[2] Lord, Phillip; Stevens, Robert; and Brass, A et al.: (2002) ``Semantic Similarity Measures Across the Gene Ontology: Relating Sequence to Annotation'', in: Proc. Intelligent Systems for MicroBiology (ISMB 02)
[3] Resnik, Philip: (1999) ``Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems in Ambiguity in Natural Language'', J. Artificial Intelligence Research, v. 11, pp. 95-130
[4] Rada, Roy; Mili, Hafedh; and Bicknell, E et al.: (1989) ``Development and Application of a Metric on Semantic Nets'', IEEE Trans. on Systems, Man and Cybernetics, 19:1, pp. 17-30
[5] Joslyn, Cliff; Mniszewski, Susan; and Fulmer, A, et al.: (2003) ``Measures on Ontological Spaces of Biological Function'', Pacific Symposium on Biocompuating (PSB 03), ftp://ftp.c3.lanl.gov/pub/users/joslyn/psb03f.pdf