SPONSORS:

Silver

Silver Sponsor: Sanofi



General

General Sponsor - IBM Research

General Sponsor - MAGNet

General Sponsor -National Cancer Institute

RECOMB/ISCB RegSysGen 2014 Sponsor - NRNB

Cytoscape Sponsors

RECOMB/ISCB RegSysGen 2014 Sponsor - Agilent Technologies

RECOMB/ISCB RegSysGen 2014 Sponsor - Cytoscape

ACCEPTED PAPERS

Updated Oct 28, 2014


The following papers will be presented as talks during the conference.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A validated gene regulatory network and GWAS to identify early transcription factors in T-cell associated diseases

Mika Gustafsson1, Danuta Gawel1, Sandra Hellberg1, Aelita Konstantinell1, Daniel Eklund1, Jan Ernerudh1, Antonio Lentini1, Robert Liljenström1, Johan Mellergård1, Hui Wang2, Colm E. Nestor1, Huan Zhang1 and Mikael Benson1

1Linköpings Univeristet, 2MD Anderson Cancer Center

The identification of early regulators of disease is important for understanding disease mechanisms, as well as finding candidates for early diagnosis and treatment. Such regulators are difficult to identify because patients generally present when they are symptomatic, after early disease processes. Here, we present an analytical strategy to systematically identify early regulators by combining gene regulatory networks (GRNs) with GWAS. We hypothesized that early regulators of T-cell associated diseases could be found by defining upstream transcription factors (TFs) in T-cell differentiation. Time-series expression profiling identified upstream TFs of T-cell differentiation into Th1/Th2 subsets enriched for disease associated SNPs identified by GWAS. We constructed a Th1/Th2 GRN based on integration of expression, DNA methylation profiling and sequence-based predictions data using LASSO algorithm. The GRN was validated by ChIP-seq and siRNA knockdowns. GATA3, MAF and MYB were prioritized based on GWAS and the number of GRN predicted targets. The disease relevance was supported by differential expression of the TFs and their targets in profiling data from six T-cell associated diseases. We tested if the three TFs or their splice variants changed early in disease by exon profiling of two relapsing diseases, namely multiple sclerosis and seasonal allergic rhinitis. This showed differential expression of splice variants of the TFs during relapse-free asymptomatic stages. Potential targets of the splice variants were validated based on expression profiling and siRNA knockdowns. Those targets changed during symptomatic stages. Our results show that combining construction of GRNs with GWAS can be used to infer early regulators of disease.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are all genetic variants in DNase I sensitivity regions functional?

Gregory A. Moyerbrailean1, Chris T. Harvey1, Cynthia A. Kalita1, Xiaoquan Wen2, Francesca Luca1, Roger Pique-Regi1

1Wayne State University, 2University of Michigan

A detailed mechanistic understanding of the direct functional consequences of DNA variation on gene regulatory mechanism is critical for a complete understanding of complex trait genetics and evolution. Here, we present a novel approach that integrates sequence information and DNase I footprinting data to predict the impact of a sequence change on transcription factor binding. Applying this approach to 653 DNase-seq samples, we identified 3,831,862 regulatory variants predicted to affect active regulatory elements for a panel of 1,372 transcription factor motifs. Using QuASAR, we validated the non-coding variants predicted to be functional by examining allele-specific binding (ASB). Combining the predictive model and the ASB signal, we identified 3,217 binding variants within footprints that are significantly imbalanced (20% FDR). Even though most variants in DNase I hypersensitive regions may not be functional, we estimate that 56% of our annotated functional variants show actual evidence of ASB. To assess the effect these variants may have on complex phenotypes, we examined their association with complex traits using GWAS and observed that ASB-SNPs are enriched 1.22-fold for complex traits variants. Furthermore, we show that integrating footprint annotations into GWAS meta-study results improves identification of likely causal SNPs and provides a putative mechanism by which the phenotype is affected.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A scalable method for molecular network reconstruction identifies properties of targets and mutations in acute myeloid leukemia

Edison Ong1, Anthony Szedlak2, Yunyi Kang, Peyton Smith1, Nicholas Smith1, Madison McBride3, Darren Finlay3, Kristiina Vuori3, James Mason4, Edward D. Ball5, Carlo Piermarocchi2, Giovanni Paternostro3

1Salgomed, 2Michigan State University, 3Sanford-Burnham Medical Research Institute, 4Scripps Health, San Diego, 5University of California, San Diego

A key aim of systems biology is the reconstruction of molecular networks. However, we do not yet have networks that integrate information from all datasets available for a particular clinical condition. This is in part due to the limited scalability, in terms of required computational time and power, of existing algorithms. Network reconstruction methods should also be scalable in the sense of allowing scientists from different backgrounds to efficiently integrate additional data.

We present a network model of acute myeloid leukemia (AML). In the current version (AML 2.1) we have used gene expression data (both microarray and RNA-seq) from five different studies comprising a total of 771 AML samples and a protein-protein interactions dataset. Our scalable network reconstruction method is in part based on the well-known property of gene expression correlation among interacting molecules. The difficulty of distinguishing between direct and indirect interactions is addressed by optimizing the coefficient of variation of gene expression, using a validated gold standard dataset of direct interactions. Computational time is much reduced compared to other network reconstruction methods. A key feature is the study of the reproducibility of interactions found in independent clinical datasets.

An analysis of the most significant clusters, and of the network properties (intraset efficiency, degree, betweenness centrality, and PageRank) of common AML mutations demonstrated the biological significance of the network. A statistical analysis of the response of blast cells from eleven AML patients to a library of kinase inhibitors provided an experimental validation of the network. A combination of network and experimental data identified CDK1, CDK2, CDK4, CDK6, and other kinases as potential therapeutic targets in AML.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A cell lineage-specific regulatory network inferred using limited expression data of erythropoiesis

Fan Zhu1, Lihong Shi1, James Engel1, Yuanfang Guan1

1University of Michigan

Modeling regulatory networks using expression data observed in a differentiation process may help identify context-specific interactions. Despite intensive research efforts on this topic, the outcome of the current algorithms highly depends on the quality and quantity of a single time-course data, and the performance may be compromised for data with a limited number of samples. In this work, we report a novel multi-layer graphical model that is capable of leveraging heterogeneous, generic, publicly available time-course datasets, as well as limited cell lineage-specific data to model regulatory networks specific to a differentiation process. First, a collection of network inference methods are used to predict the regulatory relationships in individual datasets. Then, the inferred relationships are weighted and integrated together by evaluating against the cell lineage-specific data. To test the accuracy of this algorithm, we collected a time-course RNA-Seq dataset during human erythropoiesis to infer regulatory relationships specific to this differentiation process. The resulting erythroid-specific regulatory network reveals novel regulatory relationships activated in erythropoiesis, which were further validated by genome-wide TR4 binding studies using ChIP-seq. These erythropoiesis-specific regulatory relationships were not identifiable by single dataset-based methods or context-independent integrations. Analysis of the predicted targets reveals that they are all closely associated with hematopoietic lineage differentiation. In summary, this paper develops an integrative strategy that is capable of leveraging a limited, cell type-specific expression dataset and large-scale, generic time-course datasets to infer regulatory networks specific to a differentiation process, which is applicable to other cell lineages.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


cDREM: inferring dynamic combinatorial gene regulation

Aaron Wise1, Ziv Bar-Joseph1

1Carnegie Mellon University

Motivation: Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation.

Results: Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. Applying cDREM to study human response to flu we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Multi-species network inference improves gene regulatory network reconstruction for early embryonic development in Drosophila

Anagha Joshi1, Yvonne Beck1, Tom Michoel1

1The Roslin Institute, University of Edinburgh

Gene regulatory network inference uses genome-wide transcriptome measurements in response to genetic, environmental or dynamic perturbations to predict causal regulatory influences between genes. We hypothesized that evolution also acts as a suitable network perturbation and that integration of data from multiple closely related species can lead to improved reconstruction of gene regulatory networks. To test this hypothesis, we predicted networks from temporal gene expression data for 3,610 genes measured during early embryonic development in six Drosophila species, and compared predicted networks to gold standard networks of ChIP-chip and ChIP-seq interactions for developmental transcription factors in five species. We found that (i) the performance of single-species networks was independent of the species where the gold standard was measured; (ii) differences between predicted networks reflected the known phylogeny and differences in biology between the species; (iii) an integrative consensus network which minimized the total number of edge gains and losses with respect to all single-species networks performed better than any individual network. Our results show that in an evolutionarily conserved system, integration of data from comparable experiments in multiple species improves the inference of gene regulatory networks. They provide a basis for future studies on the numerous multi-species gene expression datasets for other biological processes available in the literature.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Reconstruction of gene regulatory networks based on repairing sparse low-rank matrices

Young Hwan Chang1, Roel Dobbe1, Palak Bhushan1, Joe W. Gray2, Claire J. Tomlin1

1University of California, Berkeley, 2Oregon Health and Science University

With the growth of high-throughput proteomic data, in particular time series gene expression data from various perturbations, a general question that has arisen is how to organize inherently heterogenous data into meaningful structures. Since biological systems such as breast cancer tumors respond differently to various treatments, little is known about exactly how these gene regulatory networks (GRNs) operate under different stimuli. For example, when we apply a drug-induced perturbation to a target protein, we often only know that the dynamic response of the specific protein may be affected. We do not know by how much, how long and even whether this perturbation affects other proteins or not. Challenges due to the lack of such knowledge not only occur in modeling the dynamics of a GRN but also cause bias or uncertainties in identifying parameters or inferring the GRN structure. This paper describes a new algorithm which enables us to estimate bias error due to the effect of perturbations and correctly identify the common graph structure among biased inferred graph structures. To do this, we retrieve common dynamics of GRN subject to various perturbations. We refer to the task as “repairing” inspired by “image repairing” in computer vision. The method can automatically correctly repair the common graph structure across perturbed GRNs, even without precise information about the effect of the perturbations. We evaluate the method on synthetic data sets and demonstrate advantages over l1-regularized graph inference by advancing our understanding of how these networks respond across different targeted therapies.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Pathways on demand: automated reconstruction of human signaling networks

Anna Ritz1, Christopher Poirel1, Allison Tegge1, Nicholas Sharp1, Allison Powell1, Kelsey Simmons1, Shiv Kale1, T.M. Murali1

1Virginia Polytechnic Institute and State University

Signaling pathways are a cornerstone of systems biology. Several databases store representations of these pathways that are amenable for automated analyses. Despite painstaking manual curation, significant variations exist between databases. To overcome these limitations, we present PathLinker, a new computational method that can reconstruct a signaling pathway from a background protein interaction network given only the identities of the receptors and transcription factors and regulators in that pathway. We demonstrate that PathLinker can reconstruct the Wnt pathway in the NetPath database with much higher precision and recall than several state-of-the-art algorithms, recovering non-canonical branches that appear only in this pathway's representation in other databases. PathLinker suggests a surprising role for CFTR, a chloride ion channel transporter of the ABC class, in Wnt/beta-catenin signaling, which we validate using siRNA experiments. We extend our computational results to accurately reconstruct a comprehensive set of signaling pathways in the NetPath database. We demonstrate that PathLinker can bridge differing representations of the same pathway between databases.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Inferring the genome-wide functional modulatory network: a case study on the NF-κB/RelA transcription factor

Xueling Li1, Min Zhu2, Allan Brasier1, Andrzej Kudlicki1

1University of Texas Medical Branch at Galveston, 2Hefei Institutes of Physical Science, Chinese Academy of Sciences

How different pathways lead to the activation of a specific transcription factor with specific effects is not fully understood. A modulatory network is composed of triplets of a specific transcription factor, target genes and modulators. Modulators usually affect the activity of the specific transcription factor at the post-transcription level in a target gene-specific manner (action mode), which may be classified as enhancement, attenuation and inversion of the activation or inhibition. Reconstructing such modulatory network will help to interpret how transcription factors produce distinct gene responses to different stimuli. As a case study, here we inferred, from a large collection of expression profiles, all potential modulations of NF-κB/RelA. The predicted modulators include many proteins previously not reported as physically binding to RelA. The functions of the predicted modulators are consistent with biological activities of NF-κB/RelA include RNA processing, alternative splicing, cell cycle, mitochondrion, ubiquitin-dependent proteolysis and ribosome biogenesis, and are consistent with binding modulators in our previous study. The predicted genome-wide RelA modulators from different enriched pathways or processes exert specific prevalent action modes on distinct pathways through RelA. Also, the modulators from non coding RNA (ncRNA), RNA binding proteins, transcription factors, cytoskeleton, and kinases modulate the NF-κB/RelA activity with specific action modes consistent with their molecular functions and modulation level. Finally, we analyzed the modulatory network of NF-κB/RelA in the context of TGFB1 induced epithelial-mesenchymal transition (EMT). Here modulators of NF-κB/RelA included those involved in extracellular matrix (FBN1), cytoskeletal regulation (ACTN1) and tumor suppression (FOXP1).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Systematic study of synthetic transcript features in S. cerevisiae exposes gene-expression determinants

Tuval Ben-Yehezkel1, Shimshi Atar2, Tzipy Marx1, Rafael Cohen1, Alon Diament2, Alexandra Dana2, Anna Feldman2, Ehud Shapiro1, Tamir Tuller2

1Weizmann Institute of Science, 2Tel Aviv University

A major challenge in functional genomics is understanding how different parts of the transcript affect aspects of its expression. Heterologous gene expression can potentially contribute to this research topic, but has rarely been studied systematically, specifically in eukaryotes. Here, we use a synthetic biology approach to study the distinct and causal effect of different parts of the transcript in the eukaryote S. cerevisiae. We generated three distinct reporter libraries of the viral HRSVgp04 gene for studying the effect of three distinct regions in the transcript; (1) the 5'UTR, (2) the first 40 codons, and (3) codons 42-81 of the ORF. Each of the three libraries contained variants with multiple, rationally designed synonymous mutations, totaling 383 distinct variants tested individually for gene expression. Our results show that while synonymous mutations in each of the three regions can have a dramatic effect on protein abundance, those closer to the 5’end of the ORF are the most effective modulators of protein abundance. Additionally, while weaker local mRNA folding at the beginning of the ORF (codons 1-8) increases protein abundance, it decreases protein abundance when present in downstream codons, reinforcing previous evolutionary studies demonstrating the selection of folding strength in different parts of the ORF. Finally, we show that the mean relative codon decoding time, based on ribosomal densities in endogenous genes, significantly correlates with our measured protein abundance (correlation up to r = 0.6175; p=0.0013). While this report provides an improved understanding of transcript evolution and gene expression regulation, it also suggests relatively simple rules for engineering synthetic gene expression in a eukaryote.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A canonical correlation analysis based dynamic Bayesian network prior to infer gene regulatory networks from multiple types of biological data

Brittany Baur1, Serdar Bozdag1

1Marquette University

One of the challenging and important computational problems in systems biology is to infer gene regulatory networks of biological systems. Several methods that exploit gene expression data have been developed to tackle this problem. In this study, we propose the use of copy number and DNA methylation data to infer gene regulatory networks. We developed an algorithm that scores regulatory interactions between genes based on canonical correlation analysis. In this algorithm, copy number or DNA methylation variables are treated as potential regulator variables and expression variables are treated as potential target variables. We first validated that the canonical correlation analysis method is able to infer true interactions in high accuracy. We showed that the use of DNA methylation or copy number datasets leads to improved inference over steady-state expression. Our results also showed that epigenetic and structural information could be used to infer directionality of regulatory interactions. Additional improvements in gene regulatory network inference can be gleaned from incorporating the result in an informative prior in a dynamic Bayesian algorithm. This is the first study that incorporates copy number and DNA methylation into an informative prior in dynamic Bayesian framework. By closely examining top-scoring interactions with different sources of epigenetic or structural information, we also identified potential novel regulatory interactions.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Disease gene prioritization using network and feature

Bingqing Xie1, Gady Agam1, Sandhya Balasubramanian2, Jinbo Xu3, Natalia Maltsev2, Conrad Gilliam2, Daniela Boernigen2

1Illinois Institute of Technology, 2University of Chicago, 3Toyota Technological Institute of Chicago

Identification of the most promising candidate genes contributing to the disease phenotypes among large lists of variations produced by high-throughput genomics using traditional experimental methods is time- and cost- consuming. Therefore, using computational approaches utilizing existing biological knowledge for the prioritization of such candidate genes will allow enhancing the efficiency and accuracy of the analysis of biomedical data. It will also allow reducing the cost of the studies by avoiding experimental validations of irrelevant candidates. To prioritize candidate genes contributing to a disease or phenotype of user’s interest for further testing, in this study, we present a novel algorithm that utilizes both types of information sources, gene annotations and gene interactions simultaneously, while preserving their original representation using Conditional Random Field (CRF) model. We further improve the accuracy and efficiency of our proposed approach by assigning enrichment scores to the annotation feature factors within the model. To estimate the performance of our approach, we evaluated it on two independent benchmark studies, ranking the candidate genes by both network and feature knowledge. Our results overall had high Area Under Curve (AUC) values and high partial AUC (pAUC) values on various diseases benchmarks and revealed a higher accuracy and precision at the top predictions (10%) as compared with other prioritization tools. Additionally, we applied our method on a case study for the prediction of molecular mechanisms contributing to intellectual disability and autism. Our method was able to recover additional genes related to both disorders and provide suggestions for possible candidates based on their rankings and functional categories.


top