SPACE-BLAST: Linux Cluster based Biological Sequence Parallel Processing system Summarized by Gene Ontology

Mihwa Park1, Jaewoo Kim2, Hyungsuk Won, Seungsik Yoo
1bfpark@posdata.co.kr, POSDATA; 2jaewoo@posdata.co.kr, POSDATA

The increase of gene-related projects such as the Human Genome Project and the development of high-throughput sequencing technology have established the massive DNA sequencing such as Expressed Sequence Tag (EST). Therefore, the efficient and effective ways to analyze DNA sequencing is highly demanded. Most researchers are using the BLAST (Basic Logical Alignment Tool), search system for DNA sequencing developed by NCBI (National Center for Biotechnology Information). However, the stacks of massive sequences, the extent of DB searching, and the complexity of the BLAST search results cause it difficulties in using the BLAST. Therefore, fast and systematized outputs are needed through the BLAST searching systems. In this paper, we present a new model that will search fast DNA sequencing and summarize massive amount of sequencing search results. SPACE-BLAST (Super PArallel Computer Engine for BLAST) is a high performance bioinformatics system that implements the NCBI’s BLAST system with low cost Linux cluster based parallel processing to search DNA sequencing at high speed. Also, Gene Ontology is applied to summarize massive amount of the BLAST search results. This model which shortens the time of developing new drug and finding a new gene will serve as a key role in biotechnology areas such as agriculture, chemistry, medicine, and environment. We showed the efficiency of the parallel processing model through the performance tests that show minimize waiting time for the result analysis. SPACE-BLAST is available at http://space-blast.posdata.co.kr.