Extracting Transcription Factor Interactions from Medline Abstracts
Marc Light1, Robert Arens, Vladimir Leontiev, Meredith Patterson, Xinying Qiu, Hudong Wang
1marc-light@uiowa.edu, University of Iowa
Staying abreast of research on transcription factors (TFs) is
currently a difficult task for biologists. The body of research is already too large and is growing. We are building a system that will
extract TF interactions from Medline abstracts automatically, populating a database table with such interactions. We will use a
number of computational linguistics modules. We are in the early stages of the project but have annotated, manually, a corpus of Medline abstracts that note TF interactions. We have also formally evaluated, on Medline abstracts, a number of component technologies that are likely to be useful for the task, e.g., tokenizer,
part-of-speech tagger, word sense disambiguator, syntactic parser, etc. The corpus currently is comprised of 97 positive examples of abstracts noting TF interactions and 784 negative examples. In addition, for each positive abstract, sentences that imply interact have been marked along with the interacting TFs. In this poster we present the corpus, curation process, evaluations of the component technologies, and our proposed system design.