Keywords: statistical analysis, Expressed Sequence Tag,
differential expression, gene clusters
Introduction.
The positional clustering of co-expressed genes is common in
prokaryotes (operons) and was recently described in several eukaryote
organisms [1-4]. Clusters of highly expressed genes were recently
revealed in the Human genome [5] and complementary study [6] suggested
that such clusters might mostly consist of housekeeping genes and not
of genes with similar tissue expression profiles. In order to
specifically analyze tissue specific expression, other studies [7-9]
were based on sets of genes expressed in a given tissue. They suggest
that clusters of tissue specific genes do exist, and might be more
frequent than initially thought.
Method. To
evaluate the clustering along the chromosomes of genes specifically
expressed in any tissue, we developped a method based on a
statistical analysis of EST. Gene expression profile are first
generated. Every gene is compared to the total EST set at high
stringency with BLAST. The expression profile is derived from the
cognate ESTs in each tissue category relative to the total number of
ESTs in the tissue category. All expression profiles are stored in a
matrix with rows corresponding to genes and columns corresponding to
tissue categories. The probability of differential expression in
each condition in then computed. To assess its differential expression
in a tissue category and for a given group, every gene is compared to
the total EST set of this group at high stringency (previously
described matrix). The hit list of cognate matches is then separated
in two groups: ESTs from the corresponding tissue category vs.
any other tissue categories. The statistical significance of the
difference in frequencies between these two groups is then computed by
using a previously published statistical formula
[13]. Correlation islands are finally identified. They are
considered as clusters of at least three successive genes
differentiall expressed (p-value > 0.90) in the same tissue
category. To assess the biological meaning of these clusters, we
estimated the probability of finding such a number of clusters under a
randomization of the gene position along the chromosomes (5,000
randomizations).
Results. This method was applied
on the human chromosomes 20, 21 and 22. We searched for correlation
islands of diffenrential expressed genes along these chromosomes and
obtained 9, 5 and 17 clusters respectively. To assess the statistical
significance of these results, we computed the probability of finding
such a number of clusters under a random permutation of the gene order
along the chromosomes. This probability was found to be respectively
3.8x10-2, 2.8x10-2 and 8x10-4 for
chromosomes 20, 21 and 22. We can thus confidently conclude that there
are more clusters that expected by chance. The existence of
co-expressed/co-localized gene clusters is consistent with a model
where large chromatin regions would change their activity (openness)
status in a tissue specific manner, allowing neighboring genes to be
transcribed or shut down in a coordinated way. Such a model, confirmed
by our study, has been around for quite sometimes, although
experimental evidence have been obtained for only a few tissues and
cell types [11,12].
[1] Cohen B.A., et
al. 2000. Nature Genetics 26:183-186. [2] Blumenthal
T. 1998.Bioessays 20:480-487. [3] Roy P.J., et
al. 2002. Nature 418:975-979. [4] Spellman P.T. and
Rubin G.M. 2002. Journal of Biology 1:5. [5] Caron
H. et al.2001. Science 291:1289-1292. [6] Lercher
M.J., et al.2002. Nature Genetics 31:180-183. [7]
Gabrielsson B.L., Carlsson B. and Carlsson L.M. 2000. Obesity
Research 8:374-384. [8] Dempsey A.A., et
al.2001. Journal of Molecular and Cellular Cardiology
33:587-591. [9] Bortoluzzi S., et al.1998. Genome
Research 8:817-825. [11] Armstrong J.A. and Emerson
B.M. 1998. Current Opinion in Genetics and
Development8:165-172. [12] Akashi K., et al.
2003. Blood 101:383-389. [13] Audic S. and Claverie J.M. 1997. Genome Research 7:986-995.
|