ExMI: Extracting Molecular Interaction from Large Biomedical Literature

Yoshihiro Ohta1, Tohru Natume2, Tetsuo Nishikawa, Hiroko Ohi, Tohru Hisamitsu
1yoh@crl.hitachi.co.jp, HITACHI Central Research Laboratory; 2natsume@jbirc.aist.go.jp, National Institute of Advanced Industrial Science and Technology

ExMI: Extracting Molecular Interaction from Large Biomedical
Title:

ExMI: Extracting Molecular Interaction from Large Biomedical Literature


Author information:
Ohta Yoshihiro1 , Natume Tohru2 , Nishikawa etsuo3, Ohi Hiroko4, Hisamitsu Tohru5,

1 yoh@crl.hitachi.co.jp, Hitachi,Ltd.,Central Research Laboratory; 2 natsume@jbirc.aist.go.jp, National Institute of Advanced Industrial Science and Technology; 3 nisikawa@crl.hitachi.co.jp, Hitachi,Ltd.,Central Research Laboratory; 4 h-ohi@crl.hitachi.co.jp, Hitachi,Ltd.,Central Research Laboratory; 5 hisamitu@harl.hitachi.co.jp, Hitachi,Ltd.,Central Research Laboratory

One Page Abstract:

The information of molecular expressions and interactions among such molecules as genes, proteins, and low molecules is one of the key issues in several related fields. Since it plays an important role for researchers to find a mechanism of disease affections and develop a novel treatment technique or a new drug, those people who are involved in mass analysis also regard it as significant. In these studies, it is indispensable to utilize acquired knowledge usually kept in literature. Consequently, there has been an attempt to automatically extract the interaction information from biomedical literature stored in databases. Those extraction systems need various functions, including to recognize a name of a molecule, extract an interaction event, and visualize the result for users' comprehensive understanding. We scrutinized every single function to enable us to extract the interaction information from large scaled literature databases and developed a system called ExMI.

First of all, it is needed to recognize a name of a molecule and a word describing an event of an interaction. We prepared a list of keywords which indicated either a molecule (Name Key) or an interaction (Event Key.) We call this list as the Keyword List for Interaction Extraction (KLIE.) Name Keys and Event Keys were collected from public databases under an experts' guidance. This enables us to automatically extract names of molecules from literature. After KLIE's extracting names of molecules, we built those templates which contained patterns of sentences to describe the interactions. We also indicates a result of the extractions through a graphical interface called IntView, which allowed us to see as a network the relationships among extracted molecules. Related information is dynamically displayed by IntView at user's discretion. Therefore, users can easily catch even complicated relations.

ExMI consists of three parts: extraction of molecule names, extraction of interactions, and display of the result as a network. As for the extraction of molecule names, we built a KLIE which contained about 400 thousand entries; as for the extraction of interactions, we used the Event Key. As a result, about 300 thousand interactions were extracted from the targeted papers (those papers published from 1975 to 2002.) Compared with the legacy public databases which only contain more or less 10,000 interactions, our result indicates a significant improvement of the related technology.