Integrated Storage For Microarray Experimental Data

Supawan Prompramote1, Yi-Ping Phoebe Chen2, Frederic Maire
1s.prompramote@student.qut.edu.au, Centre for Information Technology Innovation, Faculty of Information Technology, Queensland University of Technology; 2p.chen@qut.edu.au, Centre for Information Technology Innovation, Faculty of Information Technology, Queensland University of Technology

Microarray technology is one of the most recent and important experiment breakthroughs in molecular biology. It allows researchers to gain a greater understanding of the interaction among thousands of genes simultaneously. This technology will significantly impact on genomic study such as drug discovery, toxicological research, disease diagnosis, and gene discovery.

In view of the fact that an experiment, typically, requires tens or hundreds of microarrays, where a single microarray will generate between one hundred thousand and a million pieces of data. The organization of this huge-volume of data produced by microarray techniques is one of the biggest challenges that scientists in bioinformatics are facing. Today, there are a limited number of efficient, publicly available tools for storing microarray data, however, they have their own storage structure and implementation – e.g. differences in hardware platforms, DBMS, data models and data languages. In addition, those proposed databases are created by different developers; unfortunately they often use different terminologies to describe the same domain or concepts (because of the lack of a common shared microarray-ontology). Additionally, those developers may use a terminology that differs in meaning, which as a result, could possibly lead to a limitation in the sharing of data with other laboratories and the combination of other experimental results.

As a result, we have investigated suitable methodologies and tools in the area of data management and analysis to assist in developing a unified storage solution for DNA microarray data that would help biologists to browse a database and perform complex queries. By linking related data from different public microarray sources and integrating them, the database system will provide a consistent view of data to the users. This will allow researchers to pose a single query, and to receive a single unified answer. Unlike past work on database interoperation in the bioinformatics community, this database design will take into account the important issues of microarray data integration including the semantic conflicts and contradictions. These problems are due to the lack of commonly shared ontologies in the microarray community and dynamic data representation of the microarray data sources owing to the infancy of microarray technology. Moreover, microarray ontologies are still evolving.