Estimation of oncogenes by Bayesian inverse modeling of
gene-expression patterns
Mathaeus Dejori1, Martin Stetter2
1mathaeus.dejori.external@mchp.siemens.de, Technical University of Munich; 2stetter@siemens.com, Siemens AG
Understanding regulatory genetic networks represents an important step
towards the characterization of genetic mechanisms underlying complex
diseases. In cancer research for example, where the identification
of onco- and tumor supressor genes plays a key role, the knowledge of new
potential oncogenes and their interaction with other molecules can
be a contribution for revealing the basic principles that govern the
transformation of normal cells into malignant cancer cells.
We will show that our approach of Bayesian inverse modeling is capable of
detecting genes with such an oncogenic characteristic just by
statistically analyzing gene-expression pattern measured by DNA-microarrays.
The underlying probabilistic model that we use is a Bayesian network which
encodes the multivariate probability distribution of a set of variables by
a set of conditional probability distributions. Statistical dependencies
are encoded in a graph structure. The learning procedure uses Bayesian statistics
to find the network structure and the corresponding model parameters which
describe best the probability distribution drawn by the dataset.
For the case of gene expression analysis, nodes of the Bayes net represent
genes and edges represent causal relationships among them. We trained a
Bayes net on a microarray dataset of different pediatric acute lymphoblastic
leukemia (ALL) subtypes.
The model can now be used for generating new artificial microarray datasets
and moreover, by intervening in our model namely by clamping for example one
gene at a certain expression state and by sampling data out of this model,
we can simulate the effect of our intervention on the expression of all
other genes, that is we are able to predict the effect of the expression of
a few genes on the global gene-expression pattern which is related to
cellular behavior.
The approach of Bayesian inverse modeling can be defined as finding those
genes that, by fixing them at a certain expression level, affect the model
such that the generated artificial microarray dataset shows the same
properties as a cancer-specifc measured dataset. In terms of statistics this
means, that we estimate the probability that our model generates
cancer-characteristic data given the fixed expression-state of one or more
genes, where a high probability predicts the fixed genes to be
oncogenic. Clamping for example gene PBX1 to the overexpressed state leads
our model to generate with a probability of 0.96 a dataset that is characteristic
for ALL B-lineage subtype E2A/PBX1 which could be an indication for the oncogenic
characteristic of this gene causing the leukemia subtype mentioned above. And in fact,
due to a chromosomal translocation PBX1 is known to convert to a potent oncogene
causing leukemia subtype E2A/PBX1. Besides PBX1 we found other genes
either known to be oncogenes or to be involved in critical biological
processes such as ADPRT and PSMD10 which are both involved in DNA repair. Thus with
our generative model we are able to predict genes that have a potentially
oncogenic characteristic.
Furthermore, since the graph structure of the model can be interpreted in a
causal manner it gives information about the interaction between potential
oncogenes and other ones which in turn can be interpreted as an oncogenic
regulation. Looking at the structure around PBX1 it can be shown
that it is a dominant gene, that influences many others but is regulated
itself only by one or few other genes. This can again be elucidated
by known biology, since PBX1 acts as a potent transcriptional
activator, activating genes that are either normally not expressed or
expressed at low levels.
Consequently, we can show that our statistical and data driven approach of
Bayesian inverse modeling can be efficient to infer the biological
pathogenic impact of individual genes and to reveal the interaction with
other genes.