Data to Diamonds: Multivariate Datamining Leads to Concise Gene

Rob Dunne1, Glenn Stone2
1Rob.Dunne@csiro.au, CSIRO ; 2Glenn.Stone@csiro.au, CSIRO

Many microarray experiments are designed to find genes that relate to a particular target. For example, which genes distinguish between disease and not disease, or which genes relate to a survival or prognostic outcome?

Most data analysis methods commonly used involve studying gene by gene association with the target, either as the primary analysis method or as a preselection step for a more sophisticated method such as Support Vector Machines. Unfortunately, preselection is a gene-by-gene approach that has the potential to miss important interactions and introduce bias into the results.

CSIRO Bioinformatics have developed an advanced analysis methodology, based on the statistical concepts of generalized linear models coupled with a specialized Bayesian variable selection technique that is fully integrated into the modeling process. The models therefore do not require preselection and avoid any associated bias. This methodology is capable of producing parsimonious predictors of; Classification targets (e.g. disease/not disease, disease subtype, patient outcome) using logistic or multinomial regressions, Numeric targets (e.g. minimum residual disease, LC50) using Gaussian, Poisson or Gamma regressions. Survival targets (e.g. months of survival from diagnosis) using Cox's proportional hazards regression.

The gene selection process routinely finds very small sets of genes with predictive accuracy equal or better than much larger sets found by existing techniques. For example, a predictor using hundreds of genes can be replaced with one using less than ten.

In addition, the CSIRO Bioinformatics technology is extremely fast, requiring only a few minutes to analyse a few hundred arrays with more than 12,000 gene expression measurements, making it possible to use computationally intensive statistical validation techniques such as cross-validation and permutation testing.

Contact:
Dr Rob Dunne
Dr Glenn Stone
CSIRO Bioinformatics