P-quasi complete linkage clustering method for gene-expression profiles based on distribution analysis
Shigeto Seno1, Reiji Teramoto2, Yoichi Takenaka, Hideo Matsuda
1s-senoo@ist.osaka-u.ac.jp, Department of Bioinformatic Engineering, Graduate School of Information Science and Technology, Osaka University; 2teramoto@sumitomopharm.co.jp, Genomic Science Laboratories, Research Division, Sumitomo Pharmaceuticals
In order to find the function of genes from gene-expression profiles, the hierarchical clustering with correlation coefficient, in general, has been used. This method, however, has a serious problem in terms of representation capability of relationship. The resulting dendrogram by the method can represent only simple similarity relationships between genes.
In other words, it looses a lot of useful information except for the largest
score of correlation coefficient.
To cope with the problem, we propose a new clustering method with the following
two features. First, the proposed method exploits a new similarity measure based
on distribution of gene expressions. This measure allows us to find weak relationship
between a pair of genes that cannot be clarified or by correlation coefficient.
Second, the proposed clustering method leverages the P-quasi complete linkage algorithm
for describing clusters. The P-quasi complete linkage graph satisfies the condition
that any member in one group has linkages to at least P% of all the members within
the group. With the algorithm, members that do not always have sufficient similarity
to each other can be clustered if they have linkages to more than P% of all the members.
This fact means that the algorithm facilitates us to find relationships among
multiple genes. The synergy of the two features provides more informative clustering
in comparison with the hierarchical clustering with correlation coefficient.
In the poster, we will show the effectiveness and usefulness of the proposed
clustering method through the gene-expression profile analysis of cancer patients.