Hight-throughput gene expression analysis with GATO

David Vilanova¹, James holzwarth², Marie Camille Zwahlen, Frank Desiere,Matthew Alan Roberts
¹david.vilanova@rdls.nestle.com, Nestle research center; ²james.holzwarth@rdls.nestle.com, Nestle research center

Microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatic hurdles including statistical analysis, data visualization, interpretation and storage. The present paper presents, a standard workflow for the annotation of gene expression data as well as an Gene Annotation TOol (GATO). Each field in the workflow is considered a "pillar" of important information which can be used for the biological interpretation of an experiment. GATO integrates different modules: Assignment of Gene Ontology (molecular function, biological process, cellular component), biochemical function (EC numbers), cytogenetic location (LocusLink), identify associated diseases (Genes and diseases, OMIM), and identify metabolic pathway for each of the genes (Go data mining). All of these data fields can be simply obtained from either ENSEMBL or GO. The choice of these particular fields is based on the authors core facilties experience across a broad portfolio of projects, and refers to information that is considered critical in most efforts of biological interpretation of microarray data independent of experimental objective. The GATO tool is made freely available as open source code, which implies that it can be developed further by the community of microarray researchers. The most obvious application is for microarray data, but as molecular information with common identifiers come also from other techniques, i.e. genomics, proteomics and metabolomics, the tool could be additionaly applied in these other areas. This paper further demonstrates the use of GATO to analyze the full complement of consensus sequences represented by the Affymetrix Human Genome U133 (HG-U133 A &B) microarray set. Comparisons and contrasts are then made between the Human U133 set and the full ENSEMBL resource, using criteria such as distribution of chromosome, GO and pathway annotations.