Human and Mouse expression maps from in silico expression profiles

Alia BenKahla1, Ralf Herwig2, Hans Lehrach, Marie-Laure Yaspo
1kahla@molgen.mpg.de, Max Planck Institute for Molecular Genetics; 2herwig@molgen.mpg.de, Max Planck Institute for Molecular Genetics

One step towards understanding the molecular and cellular function of the estimated 30,000 human and mouse genes is to build gene expression atlases surveying a large number of tissues and developmental stages. Exploiting information from complete genome sequences and analyzing the expression profiles of the corresponding human and mouse transcripts from the collection of expressed sequence tags (5,013,723 human ESTs and 3,668,532 mouse ESTs currently in dbEST) is an attractive approach complementing well ongoing "wet lab" experiments. The wide variety of tissues represented in mouse dbEST and the difficulty to access some of these samples for microarray experiments make this collection of ESTs very interesting for in silico analyses. Also, the expression patterns that can be generated by RNA in situ experiments cannot cover all developmental stages. Using human chromosome 21 as a pilot study, we have generated an expression map of orthologous genes in the mouse by combining in situ hybridization and EST mining approaches. After pooling libraries with identical biological terms and removing statistically non-significant libraries (too small or normalized) we considered EST counts using a previously validated method relying on the Pearson's correlation coefficient. On chr.21, we identified 8 clusters of co-expressed genes as well as 19 tissue-restricted genes (The HSA21 expression map initiative, Nature 2002). We are now extending the EST mining to the whole genomes of man and mouse using the ENSEMBL genes as a reference set. The detailed description of the human and mouse protein-coding genes (description, related disease, chromosomal location, etc.), respectively 22,980 and 22,444 gene entries (v. 10), is taken from the ENSEMBL database (http://www.ensembl.org/). The expression of the ENSEMBL genes in the set of libraries available in dbEST is summarized in one "expression matrix" allowing to deduce expression maps of the genome. We will present the strategy used to construct the "expression matrix" and the data describing differentially expressed genes, disease related genes, and cluster of genes potentially involved in a common cellular functions. This approach has been performed separately for the human and mouse gene catalogs. Orthology gene expression comparison will also be presented.