Motivation: We developed a statistical method for estimating gene networks and detecting promoter elements simultaneously. A common problem of the estimation of gene networks from microarray expression data alone is that the number of microarrays is limited compared to the number of variables in the network model. This makes the accurate estimation a difficult task. Our method overcomes this problem by integrating microarray data and the DNA sequence information into a Bayesian network model. The basic idea of our method is that, if a set of genes are directly regulated by a transcription factor, they may share a consensus motif called a promoter element, in their upstream regions of the DNA sequences. Our method detects consensus motifs based on the structure of the estimated network, then re-estimates the network using the result of the motif detection. We continue this iteration until the network becomes stable.
Result: We first conducted Monte Carlo simulations to evaluate the performance of our method on an artificial network and pseudo DNA sequences. The edges wrongly estimated using only microarray data were corrected appropriately when we used the DNA sequence information. We also applied our method to Saccharomyces cerevisiae microarray gene expression data obtained by disrupting 100 genes. Our method succeeded in detecting a known promoter element as well as correcting misdirected edges to have the proper direction. Furthermore, we have found two genes that contain this promoter motif. These genes lie, in the estimated network, close to genes which are known to be regulated by the transcription factor which binds to this motif. This may suggest that these two genes are indeed also regulated by the same transcription factor.