EST based method to identify differentially expressed gene clusters along chromosomes

Karine Megy1, Stephane Audic2, Francios Enault, Jean-Michel Claverie
1km369@cam.ac.uk, University of Cambridge; 2audic@igs.cnrs-mrs.fr, CNRS

html>ISMB Poster Karine Mégy

Poster Karine Mégy

Keywords: statistical analysis, Expressed Sequence Tag, differential expression, gene clusters

Title:

EST based method to identify differentially expressed gene clusters along chromosmes.


Author information:
Karine Mégy1, Stéphane Audic2 and Jean-Michel Claverie2
1km369@cam.ac.uk
University of Cambridge - Department of Pathology - Tennis Court Road - Cambridge CB2 1QP - UK
2Stephane.Audic@igs.cnrs-mrs.fr, Jean-Michel.Claveri@igs.cnrs-mrs.fr
Genomic and StructuralInformation - CNRS - 31, chemin J. Aiguier - 13402 Marseille Cedex 20 - France

One Page Abstract:

Introduction. The positional clustering of co-expressed genes is common in prokaryotes (operons) and was recently described in several eukaryote organisms [1-4]. Clusters of highly expressed genes were recently revealed in the Human genome [5] and complementary study [6] suggested that such clusters might mostly consist of housekeeping genes and not of genes with similar tissue expression profiles. In order to specifically analyze tissue specific expression, other studies [7-9] were based on sets of genes expressed in a given tissue. They suggest that clusters of tissue specific genes do exist, and might be more frequent than initially thought.

Method. To evaluate the clustering along the chromosomes of genes specifically expressed in any tissue, we developped a method based on a statistical analysis of EST.
Gene expression profile are first generated. Every gene is compared to the total EST set at high stringency with BLAST. The expression profile is derived from the cognate ESTs in each tissue category relative to the total number of ESTs in the tissue category. All expression profiles are stored in a matrix with rows corresponding to genes and columns corresponding to tissue categories.
The probability of differential expression in each condition in then computed. To assess its differential expression in a tissue category and for a given group, every gene is compared to the total EST set of this group at high stringency (previously described matrix). The hit list of cognate matches is then separated in two groups: ESTs from the corresponding tissue category vs. any other tissue categories. The statistical significance of the difference in frequencies between these two groups is then computed by using a previously published statistical formula [13].
Correlation islands are finally identified. They are considered as clusters of at least three successive genes differentiall expressed (p-value > 0.90) in the same tissue category. To assess the biological meaning of these clusters, we estimated the probability of finding such a number of clusters under a randomization of the gene position along the chromosomes (5,000 randomizations).

Results. This method was applied on the human chromosomes 20, 21 and 22. We searched for correlation islands of diffenrential expressed genes along these chromosomes and obtained 9, 5 and 17 clusters respectively. To assess the statistical significance of these results, we computed the probability of finding such a number of clusters under a random permutation of the gene order along the chromosomes. This probability was found to be respectively 3.8x10-2, 2.8x10-2 and 8x10-4 for chromosomes 20, 21 and 22. We can thus confidently conclude that there are more clusters that expected by chance. The existence of co-expressed/co-localized gene clusters is consistent with a model where large chromatin regions would change their activity (openness) status in a tissue specific manner, allowing neighboring genes to be transcribed or shut down in a coordinated way. Such a model, confirmed by our study, has been around for quite sometimes, although experimental evidence have been obtained for only a few tissues and cell types [11,12].

[1] Cohen B.A., et al. 2000. Nature Genetics 26:183-186.
[2] Blumenthal T. 1998.Bioessays 20:480-487.
[3] Roy P.J., et al. 2002. Nature 418:975-979.
[4] Spellman P.T. and Rubin G.M. 2002. Journal of Biology 1:5.
[5] Caron H. et al.2001. Science 291:1289-1292.
[6] Lercher M.J., et al.2002. Nature Genetics 31:180-183.
[7] Gabrielsson B.L., Carlsson B. and Carlsson L.M. 2000. Obesity Research 8:374-384.
[8] Dempsey A.A., et al.2001. Journal of Molecular and Cellular Cardiology 33:587-591.
[9] Bortoluzzi S., et al.1998. Genome Research 8:817-825.
[11] Armstrong J.A. and Emerson B.M. 1998. Current Opinion in Genetics and Development8:165-172.
[12] Akashi K., et al. 2003. Blood 101:383-389.
[13] Audic S. and Claverie J.M. 1997. Genome Research 7:986-995.

ie J.M. 1997. Genome Research 7:986-995.