Correlation analysis as a preprocessing tool in clustering of time-course gene expression timecourse data
Christopher Bowman1, Richard Baumgartner2, Stephanie Booth
1Christppher.Bowman@nrc-cnrc.gc.ca, Institute for Biodiagnostics; 2Richard.Baumgartner@nrc-cnrc.gc.ca, Institute for Biodiagnostics
Correlation analysis is a frequently used technique in exploration of time-courses (TCs) in functional magnetic resonance imaging (fMRI). This technique is used to measure blood oxygenation levels at many (thousands of) spatial locations in the brain simultaneously, over the course of time. While physically there is little link between the gene expression timecourse data and fMRI timecourses, as data they have similar behaviour, namely, a large number of often highly correlated timecourses to be grouped according to their temporal behavior.
In correlation analysis, the TCs are ranked according to their (usually Pearson's) correlation coefficient with several pre-specified template TC. Template TC enable us to incorporate prior knowledge (when available) about the expected temporal behavior of a gene of interest. Using rank ordered data (i.e. using Spearman's correlation coefficient instead of Pearson's) is potentially useful in revealing trends in the TCs.
In order to validate the utility of the correlation analysis, we used it as a preprocessing tool prior to clustering analysis using publicly available gene expression data from timecourse data available at http://genome-www.stanford.edu/serum/.
We gratefully acknowledge the support of the Manitoba Medical Service Foundation for partially funding this research.
References:
Bandettini PA et al: Processing strategies for time-course data sets in functional MRI of the human brain. Magn Reson Med. 1993 Aug;30(2):161-73.
Iyer VR, Eisen MB, Ross DT, Schuler G, Moore T, Lee JCF, Trent JM, Staudt LM, Hudson J Jr, Boguski MS, Lashkari D, Shalon D, Botstein D, Brown PO.
The transcriptional program in the response of human fibroblasts to serum.
Science. 1999 Jan 1; 283(5398): 83-7.