- Judith Zaugg, EMBL Heidelberg, Germany
- Anthony Mathelier, NCMM, University of Oslo, Norway
Presentation Overview: Show
Overview and introduction
- Eileen Furlong, European Molecular Biology Laboratory, Germany
Presentation Overview: Show
Complex patterns of temporal and spatial gene expression are regulated by enhancers; cis-regulatory elements that recruit multiple transcription factors, leading to a very defined output of expression. Enhancers can be located in close proximity to, or at great distances from, their target gene. To better understand the relationship between enhancer usage and transcriptional regulation, we are integrating single cell genomic approaches with single cell imaging and genetic deletions to determine inherent properties of enhancers and regulatory networks during embryogenesis. Using single cell ATAC-seq we recently showed that this information can predict tissue-specific enhancers, identify cell types and follow their trajectories during embryogenesis1. We are now extending this to study the specification of the mesoderm (one of the three germ layers) into different tissue primordia. This is being combined with natural sequence variation and transcription factor mutants to dissect the functional impact of perturbing the system in both cis and trans, as well as with single cell imaging of nuclear topology and nascent transcription. This information is being used to link enhancers to their target genes and build a regulatory network of mesoderm development.
- Amin Allahyar, Hubrecht Institute, Netherlands
- Carlo Vermeulen, Hubrecht Institute, Netherlands
- Wouter de Laat, Hubrecht Institute, Netherlands
- Jeoren de Ridder, UMC Utrecht, Netherlands
Presentation Overview: Show
Chromatin folding contributes to the regulation of genomic processes such as gene activity. Existing conformation capture methods characterize genome topology through analysis of pairwise chromatin contacts in populations of cells but cannot discern whether individual interactions occur simultaneously or competitively. Here we present multi-contact 4C (MC-4C) and multi-contact HiC (MC-HiC), which applies MinION and PromethION nanopore sequencing to study multi-way DNA conformations of individual alleles. MC-4C/HiC distinguishes cooperative from random and competing interactions and identifies previously missed structures in subpopulations of cells.
In this talk we will address the computational challenges associated with the analysis of nanopore sequencing data in general, and multi-way chromatin interaction data in particular. Based on our analyses we are able to demonstrate that individual elements of the β-globin super enhancer can aggregate into an enhancer hub that can simultaneously accommodate two genes. Neighboring chromatin domain loops can form rosette-like structures through collision of their CTCF-bound anchors, as seen most prominently in cells lacking the cohesin-unloading factor WAPL. Here, massive collision of CTCF-anchored chromatin loops is believed to reflect ‘cohesin traffic jams’. Single-allele topology studies thus help us understand the mechanisms underlying genome folding and functioning.
Reference: Allahyar and Vermeulen etal. Nature Genetics volume 50 (2018)
- Da-Inn Lee, University of Wisconsin-Madison, United States
- Sushmita Roy, University of Wisconsin-Madison, United States
Presentation Overview: Show
The three-dimensional (3D) organization of the genome is an important layer of regulation in developmental, disease, and evolutionary processes. Hi-C is a high-throughput chromosome conformation capture (3C) assay used to study the 3D genome by measuring pairwise interactions of genomic loci. Analysis of Hi-C data has shown that the genome is organized into higher-order organizational units such as compartments and topologically associating domains (TADs). Recent comparisons of TAD-finding methods found them to be unstable to different resolutions and sparsity levels of Hi-C data, suggesting the need for more robust methods. We present GRiNCH, a graph-regularized Non-negative Matrix Factorization (NMF) approach to identifying organizational units of chromosomes from Hi-C data. GRiNCH uses graph regularization to encourage neighboring genomic regions to belong to the same low-dimensional space. GRiNCH can recover TAD-like clusters which are significantly enriched in architectural protein binding in the boundaries and are more stable to sparse and low-depth Hi-C datasets than existing methods. Finally, GRiNCH can use the low-dimensional NMF factors to impute missing interaction counts and offer a smoothed Hi-C matrix. Taken together, GRiNCH offers a promising approach to identifying biologically meaningful structural domains of the genome.
- Jaime A Castro-Mondragon, Centre for Molecular Medicine Norway, Norway
- Miriam Ragle Aure, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Norway
- Vessela N Kristensen, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Norway
- Anthony Mathelier, NCMM, University of Oslo, Norway
Presentation Overview: Show
MiRNAs are involved in gene regulation by inhibiting mRNA translation and a single miRNA sequence may regulate
hundreds of mRNAs. With miRNAs known to be involved in cancer initiation and progression, a better understanding of
miRNA transcriptional regulation and its disruption in cancer is clearly required.
By combining TFBSs and miRNA TSSs information with cancer patient data, we evaluated the combined effects of
transcriptional and post-transcriptional dysregulation of gene expression with the alteration of miRNA regulation in cancer
through cis-regulatory alterations. The analyses culminated with the identification of mutations at TFBSs affecting the
expression of key protein-coding and miRNA genes with a cascading dysregulating effect of the cells’ regulatory program. Our
predictions were enriched for protein-coding and miRNA genes previously annotated as potential cancer drivers. Functional
enrichment analyses highlighted the dysregulation of key pathways associated with carcinogenesis. These results confirm
that our method predicts cis-regulatory mutations related to the dysregulation of key gene regulatory networks in cancer
patients. This new strategy represents an original methodology to decipher how the gene regulatory program is disrupted in
cancer cells by combining transcriptional and post-transcriptional regulation of gene expression.
- Abbas Roayaei Ardakany, University of California Riverside, United States
- Ferhat Ay, La Jolla Institute for Allergy and Immunology, United States
- Stefano Lonardi, University of California Riverside, United States
Presentation Overview: Show
Motivation:High-throughput conformation capture experiments such as Hi-C provide genome-wide maps of chromatin interactions, enabling life scientists to investigate the role of the three-dimensional structure of genomes in gene regulation and other essential cellular functions. A fundamental problem in the analysis of Hi-C data is how to compare two contact maps derived from Hi-C experiments. Detecting similarities and differences between contact maps is critical in evaluating the reproducibility of replicate experiments and for identifying differential genomic regions with biological significance. Due to the complexity of chromatin conformations and the presence of technology-driven and sequence-specific biases, the comparative analysis of Hi-C data is analytically and computationally challenging.
Results:We present a novel method called Selfish for the comparative analysis of Hi-C data that takes advantage of the structural self-similarity in contact maps. We define a novel self-similarity measure to design algorithms for (i) measuring reproducibility for Hi-C replicate experiments and (ii) finding differential chromatin interactions between two contact maps. Extensive experimental results on simulated and real data show that Selfish is more accurate and robust than state-of-the-art methods.
- Ignacio Ibarra Del Río, EMBL Heidelberg, Germany
- Nele Merrett Hollmann, EMBL Heidelberg, Germany
- Bernd Klaus, EMBL Heidelberg, Germany
- Sandra Augsten, EMBL Heidelberg, Germany
- Britta Velten, EMBL Heidelberg, Germany
- Janosch Hennig, EMBL Heidelberg, Germany
- Judith Zaugg, EMBL Heidelberg, Germany
Presentation Overview: Show
Recent high-throughput in vitro transcription factor (TF) binding assays revealed that TF cooperativity is a widespread phenomenon. However, we are still missing global mechanistic insights into TF cooperativity and its biological implications. Here we present a framework using statistical learning and next generation sequencing data that provides high-throughput structural insights into TF cooperativity and predicts its in vivo consequences. Applying it to human TFs we identified DNA shape as driver for cooperativity, particularly for Forkhead-Ets pairs. We identified a mechanism where an Ets residue interacts with the minor groove opposite the Forkhead-DNA interface and thereby increased Forkheads binding affinity as validated by NMR and site-directed mutagenesis. Further, we found many functional associations for cooperatively bound TFs and followed up a novel link between FOXO1:ETV6 and lymphomas, which revealed an association between their joint expression levels and overall patient survival. Altogether, our results demonstrate that cooperative TF binding is driven by position-specific DNA shape features and that it can add a layer of regulatory complexity for certain TF families that allows more specific control over biological functions.
- Bart Deplancke, EPFL, École Polytechnique Fédérale de Lausanne, Switzerland
Presentation Overview: Show
Understanding the relationship between genetic and phenotypic variation and the impact of the environment on this association is clearly a major challenge in current biology. To address this challenge, large human GWAS studies investigating the genetic basis of complex phenotypes have been conducted. However, their outcome has so far been ambiguous, reflecting in part the difficulty to discriminate causal variants from tagging ones and the fact that the majority of associated variants are located in non-coding regions of the genome. Consequently, human studies alone will be unable to determine all the causative factors underlying phenotypic variability, rationalizing the highly complementary role of model organism population analyses in providing hypotheses and mechanisms for human population studies. Indeed, several factors that confound human studies can be better dealt with using model organisms, including more standardized environmental control, the ability to map all types of variants at very high resolution, to use a very large supply of individuals with the same genotype, and to perform downstream validation experiments on putative causal variants. Here, I will present my lab’s efforts to exploit the Drosophila Genetic Reference Panel (DGRP), consisting of over 200 inbred fly lines, to better understand the genetic and molecular basis of two complex phenotypes: first, I will summarize our findings on elucidating the genetic and molecular basis underlying gut immunocompetence. In a second part, I will present a recent systems genomics study aimed at resolving the impact of genetic variation on the molecular bearings of the circadian clock, revealing remarkable variation in tissue-specific circadian expression.
- Roza Berhanu Lemma, Center for Molecular Medicine Norway (NCMM), University of Oslo, Norway
- Anthony Mathelier, NCMM, University of Oslo, Norway
Presentation Overview: Show
Methylation of CpGs at promoters and enhancers represents a major epigenetic DNA modification involved in transcriptional regulation. Aberrant DNA methylation patterns have recurrently been associated with dysregulation of the regulatory program in cancer cells. By combining DNA methylation arrays and gene expression data from TCGA with transcription factor (TF) binding sites, we explored the interplay between TF binding and DNA methylation in cancers. We hypothesized that aberrant methylation patterns could be triggered by binding of specific TFs. This was assessed by studying the correlation between the level of expression of TFs with the level of methylation at their binding regions. Specifically, for each TF, we performed expression-methylation quantitative trait loci computations and estimated the proportion of CpGs in the TF binding regions with methylation level correlated with the TF’s expression. The TFs with the highest proportion of correlated CpGs methylation are most likely to be associated with aberrant DNA methylation patterns. We identified 18 TFs as outliers, with high correlation between expression and demethylation at CpGs close to their binding sites. These TFs were significantly enriched for pioneering function, suggesting a special role for these pioneer TFs in modulating the chromatin structure and thereby the transcriptional profile in cancer patients.
- Wouter Meuleman, Altius Institute for Biomedical Sciences, United States
- Alexander Muratov, Altius Institute for Biomedical Sciences, United States
- Eric Rynes, Altius Institute for Biomedical Sciences, United States
- John Stamatoyannopoulos, Altius Institute for Biomedical Sciences, United States
Presentation Overview: Show
Regulatory information encoded in the human genome is activated by sequence-specific DNA binding factors, creating focal alterations in chromatin structure that are hypersensitive to DNase I. We created deep reference maps of DNase I hypersensitive sites (DHSs) from 733 human biosamples encompassing 439 cellular conditions, and integrated these to precisely delineate and numerically index ~3.6 million DHSs, providing a common coordinate system for regulatory DNA.
Here we show that the expansive scale of cell and tissue states sampled exposes an unprecedented degree of stereotyped actuation of large sets of elements. We show further that the complex actuation patterns of individual elements can be captured comprehensively by a simple regulatory vocabulary reflecting their dominant program. This vocabulary, in turn, enables comprehensive and quantitative regulatory annotation of both protein-coding genes and the vast array of well-defined but poorly-characterized non-coding RNA genes.
Finally, we show that regulatory vocabularies open new avenues for systematically interpreting non-coding genetic variation, and substantially empower the connection of disease-associated variation with specific cell and tissue states. Taken together, our results provide a common and extensible coordinate system and vocabulary for human regulatory DNA, and open a new global perspective on the architecture of human gene regulation.
- Swann Floc'Hlay, Institut de Biologie de l'Ecole normale supérieure, France
- Morgane Thomas-Chollier, Institut de Biologie de l'Ecole normale supérieure, France
- Denis Thieffry, Institut de Biologie de l'Ecole normale supérieure, France
- Eileen Furlong, European Molecular Biology Laboratory, Germany
Presentation Overview: Show
Recent high-throughput sequencing studies between individuals of a given species have revealed extensive variation in gene expression, as a consequence of segregating genetic variation within the population. Most of this regulatory genetic variation is in non-coding DNA, presumably disrupting enhancer function. However, understanding and predicting how genetic variants disrupt transcriptional regulation remains very poorly understood.
We aim at getting a mechanistic understanding of how natural genetic variation affects multiple layers of transcriptional regulation. We use hybrid embryos of genetically distinct Drosophila lines, isolated from a wild population, at three crucial time windows of embryonic development. The use of hybrid individuals offers a powerful approach to dissect cis versus trans-regulatory mutations by obtaining allele specific information (e.g. allelic specific ATAC-seq, ChIP-seq, RNA-seq data).
We used the parental genome mapping strategy and the partial correlation method to extract direct regulatory relationship. Surprisingly enough, the regulatory architecture obtained by the allelic ratio correlation analysis differs noticeably from results obtained solely from coverage. This result suggest a contrast in the pathways impacting imbalance and gene expression levels.
The integration of gene expression, enhancer/promoter activity and chromatin states data should lead to a more extensive view of the genetic bases influencing transcriptional regulation.
- Surag Nair, Stanford University, United States
- Daniel Kim, Stanford University, United States
- Jacob Perricone, Stanford University, United States
- Anshul Kundaje, Stanford University, United States
Presentation Overview: Show
Motivation: Genome-wide profiles of chromatin accessibility and gene expression in diverse cellular contexts are critical to decipher the dynamics of transcriptional regulation. Recently, convolutional neural networks (CNNs) have been used to learn predictive cis-regulatory DNA sequence models of context-specific chromatin accessibility landscapes. However, these context-specific regulatory sequence models cannot generalize predictions across cell types.
Results: We introduce multi-modal, residual neural network architectures that integrate cis-regulatory sequence and context-specific expression of trans-regulators to predict genome-wide chromatin accessibility profiles across cellular contexts. We show that the average accessibility of a genomic region across training contexts can be a surprisingly powerful predictor. We leverage this feature and employ novel strategies for training models to enhance genome-wide prediction of shared and context-specific chromatin accessible sites across cell types. We interpret the models to reveal insights into cis and trans regulation of chromatin dynamics across 123 diverse cellular contexts.
Availability: The code is available at https://github.com/kundajelab/ChromDragoNN
Contact: akundaje@stanford.edu
- Robin Andersson, The Bioinformatics Centre, University of Copenhagen, Denmark
Presentation Overview: Show
The correct activities of enhancers and promoters are essential for the coordinated transcriptional activities within a cell. Their critical roles have motivated diverse methodology to measure the activities of each type, implicitly assuming that they are distinct. I will describe our efforts to characterize the activities and architectures of enhancers and promoters using Cap Analysis of Gene Expression (CAGE) and how such data reveal broad similarities between enhancers and promoters, suggesting a unifying architecture of transcriptional regulatory elements with varying levels of enhancer and promoter potential. I will further present our work on assessing the importance of individual regulatory elements, through modelling of the impact of regulatory genetic variants and architectural redundancies.
- Susanne Bornelöv, University of Cambridge, United Kingdom
- Tommaso Selmi, University of Cambridge, United Kingdom
- Sophia Flad, University of Cambridge, United Kingdom
- Sabine Dietmann, University of Cambridge, United Kingdom
- Michaela Frye, German Cancer Research Center (DKFZ), Germany
Presentation Overview: Show
Transfer RNAs (tRNA) transfer the genetic information from the three-nucleotide codons to protein level. The tRNAs are heavily modified in vivo and we hypothesized that fine-tuning the tRNA usage via these modifications could be a cellular mechanism towards mRNA translational control and global gene regulation. Ribo-seq is an emerging technique that captures native mRNAs bound by a ribosome and can be used to construct a snapshot of translation at single codon resolution.
In this study, we used Ribo-seq and RNA-seq to study codon composition and translation in self-renewing and differentiating cells from both human and mouse. We identified differences in GC content connected to global changes in gene expression during differentiation as well as specific codons that exhibit translational differences between the self-renewing and differentiating cellular states.
These differences were partially conserved between human and mouse stem cells and may reflect evolutionary optimization of global nucleotide usage or differences in tRNA abundance and modifications. Notably, we identified a group of affected codons that are all dependent on adenine-to-inosine modified tRNAs for their translation, unveiling the possibility of a direct link between RNA editing and translational regulation during fate specification of embryonic stem cells.
- Max Schubach, Berlin Institute of Health (BIH), Berlin, Germany, Germany
- Chenling Xiong, Department of Genome Sciences, University of Washington, Seattle WA, USA, United States
- Beth Martin, Department of Genome Sciences, University of Washington, Seattle WA, USA, United States
- Fumitaka Inoue, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA, United States
- Robert Ja Bell, Department of Neurosurgery, University of California San Francisco, San Francisco, CA, USA, United States
- Joseph F Costello, Department of Neurosurgery, University of California San Francisco, San Francisco, CA, USA, United States
- Nadav Ahituv, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA, United States
- Jay Shendure, University of Washington, United States
- Martin Kircher, Berlin Institute of Health, Germany
Presentation Overview: Show
The majority of variants associated with common diseases, as well as an unknown proportion of causal mutations of rare diseases, fall in noncoding regions of the genome. Although catalogs of regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we performed saturation mutagenesis in conjunction with massively parallel reporter assays (MPRAs) on 20 disease-associated gene promoters (e.g. HBG1, LDLR, TERT) and enhancers (e.g. IRF6, SORT1, TCF7L2). We generated functional measurements of over 30,000 single nucleotide substitutions (SNVs) and deletions (https://mpra.gs.washington.edu), characterizing effect sizes of regulatory mutations at an unprecedented scale.
Across elements, we find that a majority of mutations lead to a reduction in activity (transversions more so than transitions, deletions more so than SNVs), suggesting that transcription factor binding is more easily lost than gained. Further, the density of putative binding sites varies widely between elements, as does the extent to which annotations, evolutionary conservation, and integrative scores predict our measured effects. In fact, no score or annotation consistently predicts experimental measures. Therefore, in addition to being a rich compendium for studying these 20 disease-associated elements, our data serves as a gold standard for further score developments.
- Judith Zaugg, EMBL Heidelberg, Germany
- Anthony Mathelier, NCMM, University of Oslo, Norway
Presentation Overview: Show
Overview and introduction
- Barbara Treutlein, ETH Zurich, Switzerland
- Joshua Welch, University of Michigan, United States
- Evan Macosko, Broad Institute of Harvard and MIT, United States
Presentation Overview: Show
Defining cell types requires integrating diverse measurements from multiple experiments and biological contexts. Technological developments have enabled high-throughput profiling of single-cell gene expression, epigenetic regulation, and spatial position within complex tissues, but computational approaches for integrating these datasets are lacking. We developed LIGER, an algorithm that delineates shared and dataset-specific features of cell identity, allowing flexible modeling of highly heterogeneous single-cell datasets. We demonstrated its broad utility by applying it to four diverse and challenging datasets from human and mouse brain cells. First, we defined both cell-type-specific and sexually dimorphic gene expression in the mouse bed nucleus of the stria terminalis, an anatomically complex brain region that plays important roles in sex-specific behaviors. Second, we analyzed gene expression in the substantia nigra of seven postmortem human subjects, comparing cell states in specific donors, and relating cell types to those in the mouse. Third, we jointly leveraged in situ gene expression and scRNA-seq data to spatially locate fine subtypes of cells present in the mouse frontal cortex. Finally, we integrated mouse cortical scRNA-seq profiles with single-cell DNA methylation signatures, revealing mechanisms of cell-type-specific gene regulation. Integrative analyses using LIGER promise to accelerate investigations of cellular identity, gene regulation, and disease states.
- Sushmita Roy, University of Wisconsin-Madison, United States
- Matthew Stone, University of Wisconsin-Madison, United States
- Viswesh Periyasamy, University of Wisconsin-Madison, United States
- Sunnie Grace McCalla, University of Wisconsin-Madison, United States
- Alireza Fotuhi Siahpirani, University of Wisconsin-Madison, United States
Presentation Overview: Show
Single-cell RNA-sequencing (scRNA-seq) offers unparalleled insight into transcriptional programs governing different cellular states by measuring the transcriptome of thousands of individual cells. An emerging problem in the analysis of scRNA-seq is the inference of transcriptional gene regulatory networks. Recent methods for network inference problem from scRNA-seq data vary based on the statistical model for representing regulatory relationships, whether they incorporate pseudo time, or employ smoothing and imputation strategies. However, the accuracy of the proposed methods and the impact of incorporating pseudotime or imputation into network inference have yet to be comprehensively evaluated.
We compared eleven network inference methods on six published single-cell RNA-sequencing datasets from human, mouse, and yeast, inferring networks before and after smoothing the datasets. Methods varied in performance depending upon the gold standard and whether imputation was beneficial for network inference. Overall, there was no method or pre-processing that consistently outperformed other strategies. Surprisingly, a network with edges weighted by absolute value of Pearson’s correlation coefficient yielded accuracy on par or better than the evaluated methods. This is likely because existing gold standards are obtained from bulk experiments suggesting a consequent need for improved gold standards to robustly evaluate network inference methods in scRNA-seq.
- Wei Vivian Li, University of California, Los Angeles, United States
- Jingyi Jessica Li, University of California, Los Angeles, United States
Presentation Overview: Show
Motivation: Single-cell RNA-sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths, and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information.
Results: Here we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and six different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experiment design based on specific research goals and compares various scRNA-seq computational methods.
Availability: We have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/scDesign.
Contact: jli@stat.ucla.edu
- Guray Kuzu, The Pennsylvania State University, United States
- Naomi Yamada, The Pennsylvania State University, United States
- Matthew Rossi, The Pennsylvania State University, United States
- Prashant Kuntala, The Pennsylvania State University, United States
- Chitvan Mittal, The Pennsylvania State University, United States
- Nitika Badjatia, The Pennsylvania State University, United States
- William Lai, The Pennsylvania State University, United States
- Gretta Kellogg, The Pennsylvania State University, United States
- B. Franklin Pugh, The Pennsylvania State University, United States
- Shaun Mahony, The Pennsylvania State University, United States
Presentation Overview: Show
Characterizing the composition and organization of protein-DNA complexes is key to understanding gene regulation. Under our ongoing Yeast Epigenome Project, we applied the high-resolution ChIP-exo assay to characterize genomic occupancy patterns of ~800 nuclear-localized proteins in S. cerevisiae. The resulting dataset represents the first comprehensive characterization of any cell type’s genome-wide protein-DNA interaction landscape at a resolution sufficient to define the positional organization of factors. Here, we demonstrate that topic modeling approaches can effectively identify sets of interacting proteins within this regulatory landscape. Our Hierarchical Dirichlet Process approach forms probabilistic topics from co-occurring ChIP-exo signals. We show that topic models outperform state-based models in characterizing the fine-grained organization of overlapping protein-DNA complexes. Furthermore, after estimating topics, we take advantage of ChIP-exo crosslinking signatures to model the spatial organization of proteins within each candidate regulatory complex. Our approach aligns multi-protein ChIP-exo profiles across multiple genomic loci, and uses a probabilistic mixture model to quantify relative protein-DNA crosslinking strengths for each protein at each estimated crosslinking position. The resulting crosslinking matrix enables inference of the positional organization of proteins within a given regulatory complex. We will demonstrate our approaches to characterize regulatory complexes in the context of the Yeast Epigenome Project.
- Harshit Sahay, Duke University, United States
- Ariel Afek, Duke University, United States
- Honglue Shi, Duke University, United States
- Atul Rangadurai, Duke University, United States
- Hashim Al-Hashimi, Duke University, United States
- Raluca Gordan, Duke University, United States
Presentation Overview: Show
DNA sequence and shape are known to be important for transcription factor (TF)-DNA recognition. Still, some fundamental aspects of this recognition are poorly understood. Structures of TF-bound DNA show significant distortions from B-form DNA, and the implications of these distortions on specificity have not been characterized. Additionally, while TFs bound to lesioned DNA are believed to act as roadblocks for repair, the effects of these lesions on TF binding are unknown. Here, we focus on DNA mismatches, a type of lesion frequently generated in the cell. We present the first high-throughput assay to measure the effects of mismatches on TF binding. Mismatches can cause significant distortions in B-DNA, and thus present a way to characterize the effects of conformational penalties on TF binding. Our results show that mismatches have a widespread impact on binding across many TF families, which is not explained by sequence alone. We find that mismatches that increase TF binding generally exhibit geometries similar to distorted base-pairs in TF-bound structures. TFs can compete with repair enzymes for these mismatched sequences, eventually causing mutations. Focusing on c-Myc and a T-G mismatch that increases binding by >30-fold, we show that the resulting mutation is highly enriched in cancer genomes.
- Stein Aerts, KU Leuven, Netherlands
Presentation Overview: Show
Single-cell transcriptomics and single-cell epigenomics allow building cell atlases of any tissue and species, which can provide unprecedented insight into the dynamics of cellular state transitions during developmental or disease trajectories. In the first part of my talk I will describe recent single-cell technologies as well as computational methods to identify transcription factors, gene networks, and cell states from single-cell data. These methods include SCENIC for the inference of gene networks from scRNA-seq data; cisTopic for the prediction of co-regulatory enhancers from scATAC-seq data; and SCope for the visualisation of single-cell atlases. To integrate single-cell epigenome and transcriptome data we exploit cis-regulatory sequence analysis, using deep learning and large databases of transcription factor recognition motifs. In the second part of my talk I will present several data sets from the Fly Cell Atlas, with a focus on the central and peripheral nervous system, where we aim to trace genomic regulatory programs of neuronal identity at single-cell resolution.
- Jonas Paulsen, Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Norway
- Tharvesh M. Liyakat Ali, Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Norway
- Maxim Nekrasov, Biomolecular Research Facility, The John Curtin School of Medical Research, The Australian National University, Australia
- Erwan Delbarre, Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Norway
- Marie-Odile Baudement, Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Norway
- Sebastian Kurscheid, Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Australia
- David Tremethick, Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Australia
- Philippe Collas, Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Norway
Presentation Overview: Show
Genomic information is selectively used to direct spatial and temporal gene expression during differentiation. Interactions between topologically associated domains (TADs) and between chromatin and the nuclear lamina organize and position chromosomes in the nucleus. However, how these genomic organizers together shape genome architecture is unclear. Using a dual-lineage differentiation system, we report here long-range TAD-TAD interactions forming dynamic constitutive and variable TAD cliques. A differentiation-coupled relationship between TAD cliques and lamina-associated domains suggests that TAD cliques stabilize heterochromatin at the nuclear periphery. We also provide evidence of dynamic TAD cliques during mouse embryonic stem cell differentiation and somatic cell reprogramming, and of inter-TAD associations in single-cell Hi-C data. TAD cliques represent a new level of 4-dimensional genome conformation reinforcing the silencing of repressed developmental genes.
- Azim Dehghani Amirabad, International Max Planck Research School for Computer Science, Germany
- Dennis Kostka, University of Pittsburegh, United States
- Marcel Schulz, Goethe University, Germany
- Markus List, Technical University of Munich, Germany
Presentation Overview: Show
Motivation: MicroRNAs (miRNAs) are important non-coding post-transcriptional regulators that are involved in many biological processes and human diseases. Individual miRNAs may regulate hundreds of genes, giving rise to a complex gene regulatory network in which transcripts carrying miRNA binding sites act as competing endogenous RNAs (ceRNAs). Several methods for the analysis of ceRNA interactions exist, but these do often not adjust for statistical confounders or address the problem that more than one miRNA interacts with a target transcript.
Results: We present SPONGE, a method for the fast construction of ceRNA networks. SPONGE uses 'multiple sensitivity correlation', a newly defined measure for which we can estimate a distribution under a null hypothesis. SPONGE can accurately quantify the contribution of multiple miRNAs to a ceRNA interaction with a probabilistic model that addresses previously neglected confounding factors and allows fast $p$-value calculation, thus outperforming existing approaches. We applied SPONGE to paired miRNA and gene expression data from The Cancer Genome Atlas for studying global effects of miRNA-mediated cross-talk. Our results highlight already established and novel protein-coding and non-coding ceRNA which could serve as biomarkers in cancer.
Availability: SPONGE is available as an R/Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.SPONGE)
Contact: markus.list@wzw.tum.de and marcel.schulz@em.uni-frankfurt.de
- Joseph A. Wayman, Cincinnati Children's Hospital, United States
- Diep Nguyen, Oberlin College, United States
- Peter DeWeirdt, Cincinnati Children's Hospital, United States
- Bryan D. Bryson, Massachusetts Institute of Technology, United States
- Emily R. Miraldi, Cincinnati Children's Hospital Medical Center, United States
Presentation Overview: Show
Transcriptional regulatory networks (TRNs) promote cellular behavior through coordination of gene expression by transcription factors (TFs). Single-cell RNA-seq (scRNA-seq) enables characterization of rare and heterogeneous cell populations, providing new opportunities for genome-scale TRN inference. However, single-cell resolution comes at the cost of technical noise, creating a need for TRN inference methods designed for scRNA-seq (“scTRN methods”).
We propose a scTRN method that (1) imputes technical zeros in scRNA-seq and (2) amplifies biological signal through incorporation of prior information (e.g., TF-gene interactions from a database, ATAC-seq). Borrowing information from similar cells and genes, we impute drop-out genes using a conservative weighted averaging approach. Our framework models gene expression as a sparse function of TF activities. Prior information enters the formulation twice: (1) to estimate per-cell TF activities and (2) to reinforce prior-supported interactions.
For benchmarking, we developed scRNA-seq in Th17 cells, for which a large set of “gold-standard” interactions exist (from TF ChIP-seq and knockout RNA-seq). Using precision-recall, we evaluated each step in our pipeline and compared to existing prior-based scTRN methods, for which evaluation with an extensive gold standard was not previously possible. Our scTRN methods outperforms state-of-the-art, and our benchmark dataset will engender further improvements in scTRN inference.
- Jerzy Tiuryn, Warsaw University, Poland
- Ewa Szczurek, University of Warsaw, Poland
Presentation Overview: Show
Perturbation experiments constitute the central means to study cellular networks. Several confounding factors complicate computational modeling of signaling networks from this data. First, the technique of RNA interference (RNAi), designed and commonly used to knockdown specific genes, suffers from off-target effects. As a result, each experiment is a combinatorial perturbation of multiple genes. Second, the perturbations propagate along unknown connections in the signaling network. Once the signal is blocked by perturbation, proteins downstream of the targeted proteins also become inactivated. Finally, all perturbed network members, either directly targeted by the experiment, or by propagation in the network, contribute to the observed effect, either in a positive or negative manner. One of the key questions of computational inference of signaling networks from such data is, how many and what combinations of perturbations are required to uniquely and accurately infer the model?
Results: Here, we introduce an enhanced version of linear effects models (LEMs), which extends the original by accounting for both negative and positive contributions of the perturbed network proteins to the observed phenotype. We prove that the enhanced LEMs are identified from data measured under perturbations of all single, pairs and triplets of network proteins. For small networks of up to five nodes, only perturbations of single and pairs of proteins are required for identifiability. Extensive simulations demonstrate that enhanced LEMs achieve excellent accuracy of parameter estimation and network structure learning, outperforming the previous version on realistic data. LEMs applied to Bartonella henselae infection RNAi screening data identified known interactions between eight nodes of the infection network, confirming high specificity of our model, and suggested one new interaction.
Availability: https://github.com/EwaSzczurek/LEM
Contact: szczurek@mimuw.edu.pl
- Julia Zeitlinger, Stowers Institute for Medical Research, United States
Presentation Overview: Show
Genes are regulated through cis-regulatory enhancer sequences, which contain combinations of transcription factor binding motifs. Although the spacing and orientation of these motifs is thought to be important for the cis-regulatory code, genome-wide evidence for such organizational features have been lacking. Singe-nucleotide resolution transcription factor binding data such as those produced by our recently developed ChIP-nexus technology contain information on transcription factor binding motifs and their combinatorial effect on transcription factor binding, but extracting this information with computational methods has been challenging. To learn the cis-regulatory code contained in these data, we have developed a sequence-to-profile convolutional neural network, BPNet, in collaboration with Anshul Kundaje’s lab and Žiga Avsec from the Gagneur lab. BPNet is first trained to predict ChIP-nexus transcription factor binding profiles at base-resolution. We then use a suite of model interpretation tools we developed to extract predictive cis-regulatory sequence patterns learned by the model. By applying this method to the four mouse pluripotency transcription factors Oct4, Sox2, Nanog and Klf4, we accurately map hundred thousands of motifs in the genome and identify rules by which motifs influence the cooperative binding of transcription factors. We reveal strict instances of spacing, soft preferences for short-range and long-range cooperative interactions, as well as a strong ~10bp helical periodicity binding pattern for Nanog. Our results suggest that the combination of base-resolution deep learning and interpretation tools is a powerful computational paradigm for the systematic discovery of cis-regulatory code in experimentally accessible cell types.
- Avanti Shrikumar, Stanford University, United States
- Eva Prakash, BASIS Independent Silicon Valley, United States
- Anshul Kundaje, Stanford University, United States
Presentation Overview: Show
Support Vector Machines with gapped k-mer kernels (gkm-SVMs) have been used to learn predictive models of regulatory DNA sequence. However, interpreting predictive sequence patterns learned by gkm-SVMs can be challenging. Existing interpretation methods such as deltaSVM, in-silico mutagenesis (ISM), or SHAP either do not scale well or make limiting assumptions about the model that can produce misleading results when the gkm kernel is combined with nonlinear kernels. Here, we propose GkmExplain: a computationally efficient feature attribution method for interpreting predictive sequence patterns from gkm-SVM models that has theoretical connections to the method of Integrated Gradients. Using simulated regulatory DNA sequences, we show that GkmExplain identifies predictive patterns with high accuracy while avoiding pitfalls of deltaSVM and ISM and being orders of magnitude more computationally efficient than SHAP. By applying GkmExplain and a recently developed motif discovery method called TF-MoDISco to gkm-SVM models trained on in vivo TF binding data, we obtain superior recovery of consolidated, non-redundant transcription factor (TF) motifs compared to other motif discovery methods. Mutation impact scores derived using GkmExplain consistently outperform deltaSVM and ISM at identifying regulatory genetic variants from gkm-SVM models of chromatin accessibility in lymphoblastoid cell-lines.
- Ameni Trabelsi, Colorado State University, United States
- Mohamed Chaabane, Colorado State University, United States
- Asa Ben-Hur, Colorado State University, United States
Presentation Overview: Show
Motivation:
Deep learning architectures have recently demonstrated their power in predicting DNA- and RNA-binding specificity.
Existing methods fall into three classes: Some are based on Convolutional Neural Networks (CNNs), others use Recurrent Neural Networks (RNNs), and others rely on hybrid architectures combining CNNs and RNNs.
However, based on existing studies the relative merit of the various architectures is still unclear.
Results: In this study, We present a systematic exploration of deep learning architectures for predicting DNA- and RNA-binding specificity. For this purpose, we present \deepRAM, an end-to-end deep learning tool that provides an implementation of a wide selection of architectures; its fully automatic model selection procedure allows us to perform a fair and unbiased comparison of deep learning architectures.
We find that deeper more complex architectures provide a clear advantage with sufficient training data, and that hybrid CNN/RNN architectures outperform other methods in terms of accuracy.
Our work provides guidelines that can assist the practitioner in choosing an appropriate network architecture, and provides insight on the difference between the models learned by convolutional and recurrent networks.
In particular, we find that although recurrent networks improve model accuracy, this comes at the expense of a loss in the interpretability of the features learned by the model.