ReBIL : Relating Biological Information through Literature

Francisco M. Couto1, Pedro Coutinho2, Mário J. Silva
1fjmc@di.fc.ul.pt, Faculdade de Ciencias, Universidade de Lisboa; 2pedro@afmb.cnrs-mrs.fr, UMR 6098, Architecture et Fonction des Macromolécules Biologiques, CNRS

ReBIL aims to improve the efficiency of information extraction systems applied to biological literature, using the correlation between structural and functional classifications of gene products. The dogma of molecular biology that sequences should be correlated with their biological activity supports our approach. We developed a new method that evaluates extracted information by checking if gene products from a common family match a common set of biological properties. To evaluate the method, we developed an information extraction system that automatically annotates carbohydrate-active enzymes issued from CAZy (whose public interface available at http://afmb.cnrs-mrs.fr/CAZY) with functional properties extracted from literature. CAZy attributes each carbohydrate-active enzyme to one or more families of catalytic and carbohydrate-binding modules according to its modular structure. A biological ontology (GO) structures the functional properties as a graph. To compute the relatedness between functional properties, we implemented a semantic similarity measure in GO. So far, we measured a correlation between the modular structures and functional properties of annotations automatically extracted from literature. This result shows that our method is a viable approach for automatic validation of extracted biological information. More information about Rebil project is available at http://xldb.fc.ul.pt/rebil/.