QUINTET: An R-based unified cDNA microarray data analysis system with graphical user interface
Tae-Hoon Chung1, Cheol-Goo Hur2, Sun Yong Park, Hyo Soo Lee
1thcng@kribb.re.kr, KRIBB; 2hurlee@kribb.re.kr, KRIBB
DNA microarray is well-regarded as the de facto standard technology for high-throughput functional genomics in the post-genome era. In order for this technology to be fruitful, analyzing the data in a coherent, reliable manner is as equally important as to producing a high-quality trustworthy data itself. Due to its high-throughput character, reliable analysis of the microarray data requires maturity in numerous statistical techniques in conjunction with dexterity in various aspects of biological knowledge. Because of this, a system that can provide numerical capabilities at the level of commercially available statistical packages as well as wide range of pertinent biological information through various general purpose databases is crucial for a highly productive and optimal analysis of microarray data. Furthermore, since many new algorithms and approaches are still evolving and published almost daily by researchers, the system should be flexible enough so that those new techniques can be experimented readily with minimal modification. In these respects, many commercial DNA microarray analysis packages were not quite satisfactory and we have set up a small-scale project to build up our own microarray data analysis software suite. As the first step, we present an R-based unified cDNA microarray data analysis system, QUINTET. This system performs five indispensable categories of the data analysis in a seamless manner: standard set of data quality assessment, data preprocessing including filtering of faulty spots and normalization of data distribution, identification of differentially expressed genes using various algorithms, clustering of gene expression profiles and classification of samples using a small set of gene expression patterns. Since it can recognize text slide data files which can be produced by all scanning softwares, virtually all cDNA microarray data can be analyzed using this system. Furthermore, since this system is based on the R language which is now an integral tool for statistical analysis of microarray data, it will produce accurate results and the system itself is highly flexible so that users can freely modify it for their own needs and purposes. In addition to conventional algorithms, many improvements and new algorithms are also implemented in this package. Also, the system provides numerous plots and textual information that can be readily incorporated in research papers. Finally, the graphical user interface makes the microarray data analysis process easier to learn and carry out, contrary to typical open source programs.