All Highlights and Proceedings Track presentations are presented by scientific area part of the combined Paper Presentation schedule.
KN1 - How Chromatin organization and epigenetics talk with alternative splicing
Date: Sunday, July 21, 9:00 AM - 10:00AMRoom: Hall 1
Presenting author: Gil Ast , Tel Aviv University, Israel
Session Chair:
Keyword: TOP
KN3 - Sequencing based functional genomics (analysis)
Date: Monday, July 22, 9:00 AM - 10:00 AMRoom: TBA
Presenting author: Lior Pachter , University of California, Berkeley, United States
Session Chair:
Keyword: TOP
KN4 - Searching for Signals in Sequences
Date: Monday, July 22, 4:35 - 5:35Room: Hall 1
Presenting author: Gary Stormo , Washington University in St. Louis, United States
Session Chair:
Keyword: TOP
KN5 - Results may vary: what is reproducible? why do open science and who gets the credit?
Date: Tuesday, July 23, 9:00 AM - 10:00 AMRoom: Hall 1
Presenting author: Carole Goble , University of Manchester, United Kingdom
Session Chair:
Keyword: TOP
KN6 - Protein Interactions in Health and Disease
Date: Tuesday, July 23, 4:35 PM - 5:35 PMRoom: Hall 1
Presenting author: David Eisenberg , UCLA, United States
Session Chair:
Keyword: TOP
LBR01 - Ontology-aware classification of tissue and cell-type signals in gene expression profiles across platforms and technologies
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 15.2
Presenting author: Young-suk Lee, Princeton University, United States
Additional authors:Qian Zhu, Princeton University, United States
Arjun Krishnan, Princeton University, United States
Olga Troyanskaya, Princeton University, United States
Session Chair:
Directly dealing with multicellularity and heteorogeneity of human gene expression samples is paramount for understanding human homeostasis, disease manifestation and pharmacokinetics/pharmacodynamics. However, leveraging gene expression data through large-scale integrative analyses is challenging because most samples are not fully annotated to their tissue/cell-type of origin. A computational method to classify samples using their entire gene expression profiles is needed. Such a method must be applicable across thousands of independent studies, hundreds of gene expression technologies, and hundreds of diverse human tissues and cell-types. We present URSA (Unveiling RNA Sample Annotation) that leverages the complex tissue/cell-type relationships and simultaneously estimates the probabilities associated to hundreds of tissues/cell-types for any given gene expression profile. URSA provides accurate and intuitive probability values for expression profiles across independent studies and outperforms other methods irrespective of data preprocessing techniques. Moreover, without re-training, URSA can be used to classify samples from diverse microarray platforms and even from next generation sequencing technology. Finally, we provide a molecular interpretation for the tissue and cell-type models as the biological basis for URSA’s classifications.
Keyword: Functional Genomics, Systems Biology and Networks
TOP
LBR02 - A Model-Based Analysis of GC-Biased Gene Conversion
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 15.2
Presenting author: John Capra , Vanderbilt University, United States
Additional authors:John Capra, Vanderbilt University, United States
Melissa Hubisz, Cornell University, United States
Dennis Kostka, University of Pittsburgh, United States
Katherine Pollard, University of California, San Francisco, United States
Adam Siepel, Cornell University, United States
Session Chair:
Interpreting patterns of DNA sequence variation between the genomes of closely related species is critically important to understanding the causes and functional effects of nucleotide substitutions. In addition to well-studied adaptive processes, like natural selection, other forces influence substitution patterns. GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that favors the fixation of strong (G/C) over weak (A/T) alleles. In mammals, gBGC is thought to promote variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations. It also has the potential to produce false positives in common tests for positive selection. However, because it is difficult to incorporate gBGC into existing statistical models of evolution, its genome-wide influence is poorly understood. In this work, we describe a new phylogenetic hidden Markov model that jointly models the effects of selection and gBGC and apply it to the human and chimpanzee genomes. We find that gBGC has influenced a small, but important fraction of these genomes. Fast evolving regions and disease-associated polymorphisms show significant enrichment for gBGC. Overall, our analyses indicate that gBGC has been an important force in recent human evolution, and our publicly available algorithms and predictions will enable other researchers to consider gBGC in their analyses.
Keyword: Comparative Genomics, Population Genetics Variation and Evolution
TOP
LBR03 - Determination of hormone induced structural changes in genomic topological domains
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: Hall 15.2
Presenting author: Davide Bau , Centro Nacional de Analisis Genomica, Spain
Additional authors:Davide Bau, Centro Nacional de Analisis Genomica, Spain
Marc Marti-Renom, Centro Nacional de Analisis Genomica, Spain
Session Chair:
Advances in genomic technologies have allowed getting better insights into how the genome is organized inside the cell nucleus. Recently, it has been shown that chromatin is organized in Topologically Associating Domains (TADs), large interaction domains that appear to be conserved among different cell types. To determine whether these TADs have a functional role during the dynamic changes of gene expression in terminally differentiated cells, we studied the relationship between the spatial position of Progesterone (Pg) responsive genes and the TAD structure in breast cancer cells. Using Hi-C data, we found that the genome is organized into about 2,000 TADs. TADs were similarly positioned before and after hormone treatment; nonetheless the Pg induced some changes in the intra-TAD chromatin interactions. Unexpectedly, a large proportion of genes that responded similarly upon Pg treatment was clustered within individual TADs, indicating a topological segregation of Pg up- and down-regulation sites. Remarkably, hormone induced correlated epigenetic changes that spread over several 100kb, revealing regional remodeling of chromatin. Although consecutive TADs can be covered by one or more similar epigenetic changes, their combination differs among individual consecutive TADs, reflecting topologically restrained combinatory chromatin signatures. Integrative 3D modeling of the intra-TAD contacts before and after Pg stimulation further supports this hypothesis, showing dynamic structural changes correlated with the transcriptional response. Given the segregation of target genes in TADs and the fine-tuning of Pg induced chromatin changes, we propose that TADs behave as regulons enabling spatially proximal genes to be coordinately transcribed in response to hormone.
Keyword: Genome Organization and Annotation, Functional Genomics
TOP
LBR04 - Maximum Parsimony Interpretation of Chromatin Capture Experiments
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: Hall 15.2
Presenting author: Andrzej Kudlicki , University of Texas Medical Branch, United States
Additional authors:Andrzej Kudlicki, UT Medical Branch, United States
Session Chair:
Genome-wide chromatin conformation capture experiments allow characterizing the spatial structure of genome; however, existing methods of data processing provide no means of appreciating the variability between the cells in the sample. We present a novel algorithmic framework that addresses this problem by analyzing the geometric and topological characteristics of an experimental DNA contact network. Our method applied to the measurement of interactions in the yeast genome of Duan et al (2010) prove that indeed no homogeneous conformation can agree with the observed 3C contacts, and attempting to construct a homogeneous 3D model will lead to thousands of geometrically impossible structural motifs. The topological properties of the DNA contact network, along with Occam’s razor principle, are used to reconstruct the chromatin conformations characteristic of uniform subpopulations of cells confounding the experimental sample. Specifically, the individual chromatin states are inferred by analyzing and coloring a line graph representing geometrical conflicts within the DNA contact network, i.e., loci whose direct interpretation will lead to violation of the triangle inequality. We show that hundreds of thousands of conflicting interactions can be resolved by just a handful of chromatin states, and the the properties of these states point to different transcriptional programs being executed.
Keyword: Genome Organization and Annotation, Computational aspects
TOP
LBR05 - A protein domain-centric approach for the comparative analysis of human and yeast phenotypically relevant mutations
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 15.2
Presenting author: Maricel Kann , UMBC, United States
Session Chair:
The body of disease mutations with known phenotypic relevance continues to increase and is expected to do so even faster with the advent of new experimental techniques such as whole- genome sequencing coupled with disease association studies. However, genomic association studies are limited by the molecular complexity of the phenotype being studied and the population size needed to have adequate statistical power. One way around this problem, which is critical for the study of rare diseases, is to study the functional patterns of known disease mutations. We have previously shown that the functional patterns of known human disease mutations have a significant tendency to cluster at protein domain positions, namely position-based domain hotspots of disease mutations. However, the limited number of known disease mutations remains the main factor hindering the advancement of mutation studies at a functional level. In this paper, we address this problem by incorporating mutations known to be disruptive of phenotypes in other species. Focusing on two evolutionarily distant organisms, human and yeast, we describe the first inter-species analysis of mutations of phenotypic relevance at the protein domain level. Our results show that phenotypic mutations from yeast cluster at specific positions on protein domains, a characteristic previously revealed to be displayed by human disease mutations. This first-of-a-kind study of phenotypically relevant yeast mutations in relation to human disease mutations demonstrates the utility of a multi-species analysis for advancing the understanding of the relationship between genetic mutations and phenotypic changes at the organismal level.
Keyword: Bioinformatics of Disease and Treatment, Genetic Variation Analysis
TOP
LBR06 - Predicting the biochemical consequences of missense mutations using genome-wide homology modeling
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 15.2
Presenting author: Andrew Bordner , Mayo Clinic, United States
Additional authors:Andrew Bordner, Mayo Clinic, United States
Barry Zorman, Mayo Clinic, United States
Session Chair:
The discovery of which mutations contribute to a particular disease is an important biomedical problem with potential applications in drug discovery, disease diagnosis and prognosis, and the development of improved personalized therapies. To this end, we have developed a computational method that integrates complementary approaches for predicting the biochemical effects of missense mutations using genome-wide generation of homology models for human protein complexes. Mutations affecting diverse types of binding sites are identified by homology to available X-ray structures of complexes and machine learning classifiers while spatial clustering of mutations is used to detect other compact regions of the protein structure important for its function. A Random Forest classifier trained on results from these structure-based methods, as well as annotations from online databases, evolutionary conservation, and predicted stability changes was found to outperform current popular prediction methods. Finally, the predicted biochemical effects of mutations showed good agreement with experimental assays.
Keyword: Protein Structure and Function Prediction and Anal, Genetic Variation Analysis
TOP
LBR07 - Integrative modelling coupled with mass spectrometry (MS)-based approaches reveals the structure and dynamics of protein assemblies
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 15.2
Presenting author: Argyris Politis , Univeristy of Oxford, uk
Additional authors:Argyris Politis, University of Oxford, United Kingdom
Session Chair:
In recent years, integrative structure determination of protein complexes has garnered great interest as a result of the vast amount of data obtained by different experiments. In particular integrative approaches have gained attention for studying highly heterogeneous and dynamic systems which remain refractory to structure determination by conventional methods. Key developments in emerging mass spectrometry (MS)-based techniques, such as native MS and ion mobility (IM)-MS, have led to their integration into the structural biologist’s pipeline. Here we present an integrative approach for structure determination of protein assemblies by combining native mass spectrometry (MS), ion mobility-MS and chemical cross-linking MS. The accuracy and confidence levels of this approach are demonstrated by encoding data from MS techniques into restraints for assembling a set of known hetero-complexes from their building blocks. This method enabled us to characterize the structures of two unknown precursors acting en route to the assembly of the AAA-ATPase base subcomplex within proteasome, a macromolecule responsible for the controlled degradation of intracellular proteins.
Keyword: Protein Structure and Function Prediction and Anal, Computational aspects
TOP
LBR08 - The next generation of SCOP and ASTRAL
CancelledDate: Sunday, July 21
, 3:40 p.m. - 4:05 p.m.Room: Hall 15.2
Presenting author: John-Marc Chandonia , Berkeley National Lab, United States
Session Chair:
The Structural Classification of Proteins (SCOP) database is a manually curated, near-comprehensive ordering of domains from proteins of known structure in a hierarchy according to their structural and evolutionary relationships. The ASTRAL compendium is a collection of software and databases, closely related to SCOP, that is used to aid research into protein structure and evolution. We released new versions of both SCOP and ASTRAL (1.75B) in January 2013. The new releases are the second in a series of stable SCOP and ASTRAL releases based on SCOP 1.75. New versions of both databases are presented to the public through a single, unified interface (http://scop.berkeley.edu/). New features include a SQL-based infrastructure and build procedure, a fully automated classification scheme for new PDB entries that are similar to previously classified entries, and periodic incremental releases to supplement the stable releases. More than 11,300 new PDB entries have been added since SCOP 1.75, without sacrificing the reliability that SCOP has accumulated through years of careful manual curation. We plan to introduce additional features in a series of stable releases, while a major reclassification (SCOP 2.0) is in progress.
Keyword: Protein Structure and Function Prediction and Anal, Computational aspects
TOP
LBR09 - Computational methods to preclude switch-like behavior: analysis of the Biomodels database
Date: Monday, July 22
, 10:30 a.m. - 10:55 a.m.Room: Hall 15.2
Presenting author: Elisenda Feliu , University of Copenhagen, Denmark
Additional authors:Miguel A Alejo, University of Copenhagen, Denmark
Carsten Wiuf, University of Copenhagen, Denmark
Session Chair:
The number of states in which a cell can be at any given time is linked to the flexibility in its decision making and to cell-to-cell variability. Particularly, bi- and multistable cellular systems provide mechanisms for rapidly switching between different responses. Identifying whether a system exhibits multistable behavior or not is, however, challenging. The theoretical determination of small motifs in gene regulatory networks and signaling pathways that can exhibit multistationarity has been the focus of several studies in the past. However, it remains unclear to what extend these motifs are actually highly represented in living cells.
We have developed a computational method that gives a necessary condition for a system to exhibit multistationarity. If a system is multistationary, we can screen all small subnetworks and determine the key components in multistationarity. We have applied the method to 365 models extracted from the publicly available database Biomodels with data precomputed in PoCab. In this way, we have obtained a catalog of small motifs responsible for multistationarity in real systems.
At the conference, the method will be briefly described and the exhaustive analysis of the Biomodels database, including the small structures causing multistationarity, will be presented
Keyword: Systems Biology and Networks, Computational aspects
TOP
LBR10 - Efficient Modeling and Active Learning of Biological Responses: Learning without Prior Knowledge
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 15.2
Presenting author: Armaghan Naik, Carnegie Mellon University, United States
Additional authors:Joshua Kangas, Carnegie Mellon University, United States
Devin Sullivan, Carnegie Mellon University, United States
Christopher Langmead, Carnegie Mellon University, United States
Robert Murphy, Carnegie Mellon University, United States
Robert Murphy, Carnegie Mellon University, United States
Session Chair:
High throughput screening involves determination of the effect of many chemical compounds on a given cellular target. As currently practiced, a full set of measurements for all compounds for each new target is typically made, with little use of information from previous screens. To efficiently study compound effects on many targets, a means is needed for determining and exploiting similarities in the effects of compounds and/or behavior of targets such that measurements of all combinations of compounds and targets are not needed to achieve high accuracy. Here, we describe probabilistic models that can be used to predict results for unmeasured combinations, and active learning algorithms for selecting future informative batches of experiments. Through extensive simulated experiments we showed that our approaches can produce powerful predictive models and learn them significantly faster than can be done by random choice. We further characterized our method’s performance experimentally using a collection of 48 compounds and 48 NIH 3T3 cell clones expressing different GFP-tagged proteins; the learner’s task was to efficiently build a model of the effects of each compound on each clone. Since none of the effects were known prior to beginning the experiments, each clone and compound was silently duplicated to provide the ability to check how well duplicates were recognized. The learner could to request acquisition of batches of image data for specific combinations of drugs and clones using liquid handling robotics and an automated microscope. Our method achieved a 92% accuracy having only sampled 28% of the experiment space.
Keyword: Systems Biology and Networks, Proteomics
TOP
LBR11 - De novo reconstruction of cell cycle progression using Tour-Recovered Automatic models for Cellular Continuums (TRACC) on multiparameter flow cytometry data
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: Hall 15.2
Presenting author: Tiffany Chen , Stanford University, United States
Additional authors:Tiffany Chen, Stanford University, United States
Matthew Clutter, Stanford University, United States
Nikesh Kotecha, Stanford University, United States
Karen Sachs, Stanford University, United States
Wendy Fantl, Stanford University, United States
Garry Nolan, Stanford University, United States
Serafim Batzoglou, Stanford University, United States
Session Chair:
Most cell-based drug screening methods identify and evaluate potential drug candidates based on measurements of cell death or target inhibition. Using these approaches, the global impact of these drug candidates on cell cycle and signaling networks is greatly deemphasized, even though quantitative analysis of the cell cycle is fundamental to most anti-cancer drug development. Single-cell multiparameter flow cytometry can simultaneously measure intracellular proteins including those participating in the cell cycle and signaling pathways. To date, however, no automated, data-driven method exists for processing such biologically complex measurements. To address this need, we developed Tour-Recovered Automatic models for Cellular Continuums (TRACC), a computational methodology for automatically reconstructing the cell cycle de novo from flow cytometry data. TRACC reconstructs cell cycle progression without prior expert knowledge, thus setting a foundation for automated cell cycle analysis.
Keyword: Systems Biology and Networks, Computational aspects
TOP
LBR12 - Network-based stratification of tumor mutations
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 15.2
Presenting author: Matan Hofree , UCSD, United States
Additional authors:John Shen, University of California, San Diego, United States
Hannah Carter, University of California, San Diego, United States
Andy Gross, University of California, San Diego, United States
Trey Ideker, University of California, San Diego, United States
Session Chair:
Many forms of cancer consist of multiple subtypes with different molecular causes and clinical outcomes. Somatic tumor genomes provide a rich new source of data for uncovering these subtypes, but have proven difficult to compare as two tumors rarely share the same mutations. Here, we introduce ‘network-based stratification’(NBS) which integrates somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients who have mutations within similar network regions. We demonstrate the validity of this approach in simulation. Next, we apply the method to somatic mutation data from three cancer patient cohorts collected as part of The Cancer Genome Atlas - ovarian cancer(OV), breast cancer(BRCA) and uterine cancer(UCEC) and are able to discover a robust cluster assignment significantly associated with important clinical phenotypes. In BRCA we recover subtypes significantly correlated with known subtypes and other clinical makers. In UCEC subtypes segregate patients into distinct sets enriched for tumor grade and histology. In OV subtypes are associated with patient survival and acquired resistance to platinum chemotherapy. We use the OV subtypes to define a predictive signature based on gene expression which successfully recovers the somatic mutation derived subtypes in an independent expression cohort. Finally, we use the subtypes derived in each cohort to highlight potentially dysregulated subnetworks characteristic of each mutation derived subtypes. This study provides a proof of principle for the utility of combining somatic mutation genotypes with interaction networks, enabling the discovery of clinically meaningful mutation based subtypes.
Keyword: Bioinformatics of Disease and Treatment, Systems Biology and Networks
TOP
LBR13 - Comparison of D. melanogaster and C. elegans Developmental Stages by modENCODE RNA-Seq data
Date: Monday, July 22
, 2:10 p.m. - 2:35 p.m.Room: Hall 15.2
Presenting author: Steven Brenner , University of California, Berkeley, United States
Additional authors:Jingyi Jessica Li, University of California, Berkeley, United States
Haiyun Huang, University of California, Berkeley, United States
Peter Bickel, University of California, Berkeley, United States
Steven Brenner, University of California, Berkeley, United States
Session Chair:
Drosophila melanogaster and Caenorhabditis elegans are two well-studied model organisms in developmental biology. Their morphological development differ greatly, yet we postulated that there may nonetheless be underlying shared developmental programs employing orthologous genes. We used modENCODE RNA-Seq data to perform a transcriptome-wide comparison of their developmental time courses to address this question. Our approach centers on using stage-associated orthologous genes to link the two organisms. For every stage in each organism, we select stage-associated genes which are defined as relatively highly expressed at that stage compared with others. We tested the dependence of a pair of D. melanogaster and C. elegans stages in terms of orthologous gene expression—the number of orthologous gene pairs associated with both stages.
We first carried out the test on pairs of stages within D. melanogaster and C. elegans respectively, and we found that temporally adjacent stages in both species exhibit high dependence in gene expression, supporting the validity of this approach. When comparing fly with worm, we observed a strong colinearity of their developmental time courses from early embryos to late larvae. Another parallel collinear pattern is found between fly white prepupae through adults and worm late embryos through adults. Investigating stage-associated genes overlapped between stages shows that many- to-one fly-worm orthologs are key factors leading to the two collinear patterns. Some orthologs are known to play similar roles in both organisms, and their mapping in this study may help inform their functions in the development of D. melanogaster and C. elegans.
Keyword: Comparative Genomics, Functional Genomics
TOP
LBR14 - Phylogenetic quantification of intra-tumour heterogeneity
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 15.2
Presenting author: Roland Schwarz , European Molecular Biology Laboratory, uk
Session Chair:
Intra-tumour heterogeneity (ITH) is currently the focus of cancer
research due to its implications for disease progression, resistance
development and its impact on personalised medicine
approaches. Understanding the aetiology of ITH involves reconstructing
the evolutionary history of cancer within the patient. Especially with
respect to genomic rearrangements this is impeded by changing
cellularity, unknown phasing of genomic variants and the fact that
genomic rearrangement events cover large often overlapping segments of
the genome.
In this study we have assembled a novel clinical dataset of 170 copy
number (CN) profiles from 20 patients undergoing neoadjuvant
chemotherapy for high-grade serous ovarian cancer. Patients were
sampled at multiple distinct sites at biopsy, interval debulking
surgery and relapse. We have developed MEDICC, a novel phylogenetic
method for reconstruction of evolutionary trees based on genomic
rearrangements. Employing state-of-the art machine learning techniques
we phase parental alleles, reconstruct trees and ancestral genomes and
at the same time numerically quantify the degree of ITH and clonal
expansion in each patient. Correlation of these indices with clinical
endpoints such as progression free survival shows how the amount of
genomic change in the course of chemotherapy, and the degree of clonal
expansion determine patient survival times.
Our study is the first to combine rigorous evolutionary methodology
and with a novel clinical dataset of a large patient cohort to
quantify ITH in a rigorous and unbiased manner. We combine insights
from natural language processing with spatial statistics to quantify
biologically meaningful indices of cancer progression in a coherent
translational setting.
Keyword: Bioinformatics of Disease and Treatment, Comparative Genomics
TOP
LBR15 - The Yule-Simpson effect casts doubt on DNA methylation differences at functional boundaries
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 15.2
Presenting author: Meromit Singer, UC Berkeley, United States
Additional authors:Lior Pachter, University of California, Berkeley, United States
Session Chair:
Genome-wide functional assays based on high-throughput sequencing now allow for experimental probing of a wide variety of molecular phenotypes. Among these is DNA methylation, which can be probed at all CpG sites in the genome using bisulfite sequencing. This has allowed for comparisons of methylation extent in different functional regions by first averaging methylation states within region types and then comparing averages between regions. Such comparisons have become commonplace in genome-wide DNA methylation studies. For example, it has been repeatedly reported that the methylation extent is significantly higher in coding regions as compared to introns or UTRs. We report and characterize a bias present in these seemingly straightforward comparisons that is a special case of the Yule-Simpson's effect and show it has extensively altered the magnitude and significance of DNA methylation differences observed and reported from such comparative studies. The bias we discuss arises from the dependance of the sparsity of CpG sites on the extent of evolutionary pressure at a region, together with its overall methylation state. We present a correction utilizing a matrix completion algorithm that is based on a methylation model and show how it affects reported results regarding differences in DNA methylation across functional regions.
Keyword: Epigenetics, Computational aspects
TOP
LBR16 - Epigenetic mechanisms underlying human T helper cell differentiation
Date: Monday, July 22
, 3:40 p.m. - 4:05 p.m.Room: Hall 15.2
Presenting author: Harri Lähdesmäki , Aalto University, Finland
Additional authors:David Hawkins, University of Washington School of Medicine, United States
Antti Larjo, Aalto University, Finland
Subhash Tripathi, University of Turku and Åbo Akademi University, Finland
Ulrich Wagner, Ludwig Institute for Cancer Research, University of California San Diego, United States
Ying Luu, Ludwig Institute for Cancer Research, University of California San Diego, United States
Tapio Lönnberg, University of Turku and Åbo Akademi University, Finland
Sunil Raghav, University of Turku and Åbo Akademi University, Finland
Leonard Lee, Ludwig Institute for Cancer Research, University of California San Diego, United States
Riikka Lund, University of Turku and Åbo Akademi University, Finland
Harri Lähdesmäki, Aalto University, Finland
Bing Ren, Ludwig Institute for Cancer Research, University of California San Diego, United States
Riitta Lahesmaa, University of Turku and Åbo Akademi University, Finland
Session Chair:
Multipotent CD4+ T cells are central to the adaptive immune system. CD4+ T cells can differentiate to functionally distinct effector subtypes such as T helper 1 (Th1), Th2, Th17, and iTreg. In this study, we have focused on identification of histone modifications (H3K4me1, H3K27ac, H3K4me3) that define the cell-type specific functional cis-regulatory repertoire for early differentiating human Th1 and Th2 cells. Additionally, we have integrated genome-wide digital gene expression analysis from the Helicos platform to correlate epigenetic information with gene expression. We also overlay the identified enhancer regions with open chromatin sites (DNase-seq) from fully differentiated T cells to characterize whether early enhancers are active only during the early lineage specification or remain active in committed Th cells. By analyzing transcription factor binding sites at enhancers we are able to identify known and novel transcriptional regulators which drive the lineage determination. Lastly, under the principle that improper cell fate specification can lead to immunopathogenesis, we found within these lineage-specific enhancers a great number of SNPs from genome-wide association studies (GWAS) that were associated with various autoimmune disorders including T1D, rheumatoid arthritis, Crohn’s disease, and asthma. Several alter transcription factor binding site motifs, and using DAPA experiments we show for a subset of such SNPs within these predicted sites that they influence transcription factor binding. This study provides the first look at how enhancers can contribute to early human T cell lineage specification. Our results also provide insight into how regulatory SNPs may contribute to the disease pathogenesis.
Keyword: Epigenetics, Functional Genomics
TOP
LBR17 - An assessment of the recovery of curated genetic variants through text mining
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 10
Presenting author: Karin Verspoor, NICTA, Australia
Additional authors:Antonio Jimeno Yepes, National Information Communications Technology Australia, Australia
Session Chair:
We assess a mutation extraction tool with respect to the task of curation of the literature for the purpose of populating a database of genetic variation information. Our analysis shows that the ability of text mining tools to recover the mutations catalogued in the databases is far less than what would be expected based on the typically excellent performance of such tools on intrinsic evaluation. While lack of access to the full text of publications has been argued to explain this phenomenon, we show show that the effect persists even when the full text article that was indicated to be the direct source of a mutation in a curated resource is available for processing. We explore several possible explanations for these results, including difficulties in linking genetic variants to specific genes, and the inclusion of data from high-throughput experiments. The results of our work have implications for the future development of text mining systems for genetic variation.
Keyword: Genetic Variation Analysis, other
TOP
LBR18 - Quantification of Cell-to-cell Variability in Protein Spatial Spread from Fluorescence Microscopy of Unsynchronized Budding Yeast
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 10
Presenting author: Louis-Francois Handfield , University of Toronto, Canada
Additional authors:Louis-Francois Handfield, University of Toronto, Canada
Alan Moses, University of Toronto, Canada
Session Chair:
The characterization of protein abundance and stochastic abundance has been systematically defined in budding yeast using fluorescently tagged proteins. Subcellular location can also be systematically uncovered using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. As an alternative, we capture cell stage dependence of protein spatial expression within automatically identified cells. We use the identified the bud area as cell-stage indicator. We show that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Further analysis reveals that such a characterization allows identify a 12% of the 4004 proteins by finding the protein that is closest in expression pattern in a replicate experiment. This characterization includes stochasticity levels in measurement, which are correlated with previous reports in the case of stochasticity in protein abundance. Other stochasticity levels, such as in compactness for protein expression, are shown to be reproducible. Changes in cell morphology due to the alpha factor mating pheromone or changes of fluorescents markers required for segmentation also have a limited impact on the measured variability levels. Our results suggest that quantitative cell-stage dependent representations of protein spread discriminates protein spatial expressions without requiring predefined subcellular location classes. We show that some major quantified deviations, such as high spatial variability, are systematically detected under a spectrum of experimental conditions.
Keyword: Proteomics
TOP
LBR19 - ARepA: automated repository acquisition for standardized high-throughput data retrieval, normalization, and analysis
Date: Tuesday, July 23
, 11:30 a.m. - 11:55 a.m.Room: Hall 10
Presenting author: Daniela Boernigen , Harvard School of Public Health, Harvard University, United States
Additional authors:Daniela Boernigen, Harvard School of Public Health, United States
Yo Sup Moon, Harvard School of Public Health, United States
Levi Waldron, Harvard School of Public Health, United States
Eric Franzosa, Harvard School of Public Health, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Session Chair:
Biological databases of high-throughput experimental results provide vast and growing resources for medical, and bioinformatic research. Open questions remain in how best to maintain such resources, access them computationally, meta-analyze their contents from hundreds of experiments, and do so reproducibly while maintaining computational best practices.
We present ARepA, an extensible, modular Automated Repository Acquisition system for reproducible biological data acquisition and processing. ARepA allows configurable data access for any organism(s) from the GEO, IntAct, BioGRID, RegulonDB, STRING, Bacteriome, and MPIDB databases. A user can retrieve raw data and metadata from these repositories, normalize data files, and automatically process them in standardized ways (e.g. for network analysis). When retrieving data from six model organisms, ARepA currently produces more than 2M interactions (600K physical interactions, 4K regulatory interactions, 1.5M functional associations) and 2.7K gene expression data sets covering approx. 800K samples, accompanied by corresponding metadata and derived network data.
We include biological examples demonstrating the utility of ARepA for integrative analyses. When focusing on human data, ARepA's metadata database allowed us to identify and standardize 12 human prostate cancer gene expression datasets from GEO, which were subsequently meta-analyzed across six different platforms. A subsequent co-expression network analysis correctly recovered the NfκB signaling pathway along with new candidate genes with roles in prostate cancer. A similar example in mouse integrates 11 gene expression datasets selected by querying ARepA for metadata indicating germ-free and intestinal tissue conditions. Finally, multiple data types from three model microbes were integrated to assess differences in peptide secretion systems.
Keyword: Functional Genomics, Computational aspects
TOP
LBR20 - Deciphering the Gene Expression Code via a Combined Synthetic-Computational Biology Approach
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: Hall 10
Presenting author: Tamir Tuller , Tel Aviv University, Israel
Session Chair:
One of the greatest challenges of functional genomics is to decipher the way information encoded in that transcript affects various aspects of its expression regulation. Since it is impossible to determine the causality based on the analysis of endogenous sequence features and expression levels we suggest a combined and novel computational-synthetic biology approach. The talk will survey large scale synthetic biology experiments for understanding three aspects of gene expression: 1) splicing, 2) translation elongation; 3) translation initiation from out-of-frame codons; in each experiment a specific library including hundreds of heterologous genes has been tailored to tackle the corresponding question, expression levels of all the library genes have been expressed in S. cerevisiae, and the results were computationally analyzed.
Among others, our analyses emphasize the contribution of local folding strength in different parts of the transcript, and the position and distribution of codons to splicing and translation efficiency and fidelity. In addition, we report novel sets of enhancer and silencer sequence motifs that contribute to various aspects of translation and splicing regulation.
I will also explain how the results inferred in the three studies are integrated, and compared to existing computational biophysical models of gene expression, and will compare the obtained results to the ones reported recently via an evolutionary systems biology analysis of endogenous genes.
Keyword: Functional Genomics, Systems Biology and Networks
TOP
LBR21 - Experimental characterization of the human non sequence-specific nucleic acid interactome
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: Hall 10
Presenting author: Jacques Colinge , CeMM, Austria
Additional authors:Gerhard Dürnberger, CeMM, Austria
Tilmann Bürckstümmer, CeMM, Austria
Kilian Huber, CeMM, Austria
Roberto Giambruno, CeMM, Austria
Evren Karayel, CeMM, Austria
Thomas Burkard, CeMM, Austria
Ines Kaupe, CeMM, Austria
Andre Müller, CeMM, Austria
Keiryn Bennett, CeMM, Austria
Tobias Doerks, EMBL, Germany
Peer Bork, EMBL, Germany
Andreas Schönegger, CeMM, Austria
Gehard Ecker, Uni Wien, Austria
Hans Lohninger, TU Wien, Austria
Giulio Superti-Furga, CeMM, Austria
Session Chair:
Interactions between proteins and nucleic acids (NAs) play a pivotal role in a wide variety of essential biological processes. Transcription factors that recognize specific DNA motifs only constitute part of the NA-binding proteins (NABPs). In this study, we present the first large-scale effort to systematically map human NABPs with generic classes of nucleic acids. Using 25 carefully designed synthetic DNA and RNA oligonucleotides as baits and affinity purification mass spectrometry (AP-MS), we performed pulldowns in three cell lines that yielded 10,000+ protein-NA interactions and involved 900+ proteins. Bioinformatic analysis allowed us to identify 139 new NABPs, to provide first experimental evidence for another 98, and to determine 513 specificities for 219 distinct NABPs for different subtypes of NAs.
Successful validation of 7/8 chosen new specificities confirmed the affinity of YB-1 for methylated cytosine. YB-1 is over-expressed in tumors and is associated with multiple drug resistance. Network analysis of YB-1 ChIP-seq peak nearest genes identified a subnetwork of 73 genes strongly associated with cancer pathways, thereby suggesting a potential epigenetic role of YB-1 in resistant tumors.
We could also show that non sequence specific proteins binding DNA do interact with nucleic acid chains through an interface that is more constraint in its geometry than proteins binding mRNA, which are known to contain more disordered regions.
To extend the experimental data we undertook a machine learning approach to derive a method of automatically inferring nucleic acid binding. We employed a family of support vector machines (SVMs) to predict NA binding de novo.
Keyword: Systems Biology and Networks, Protein Structure and Function Prediction and Anal
TOP
LBR22 - Sequence Determinants Govern the Translation Efficiency of the Secretory Proteome
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: Hall 10
Presenting author: Michal Linial , The Hebrew University of Jerusalem, Israel
Additional authors:Shelly Mahlab, The Hebrew University, Israel
Session Chair:
Translation must be tightly controlled for coping with the cell's demand and its limited resources. Energetically, translation is the most expensive operation in dividing cells. We applied a measure of tRNA adaptation index (tAI) as an indirect proxy for the translation rate. We tested the possibility that sequence determinants are encoded along the transcripts to govern translational efficiency. The secretory proteome comprises about 30% of the proteins in human and other multi-cellular model systems. Many of these proteins contain at their N’-terminal a segment that is called Signal Peptide (SP) which determines a translocation to the ER. Indeed, all SP-proteins are translated by ER-membrane bound ribosomes. We anticipated that proteins translated by free or bound ribosomes differ with respect to their overall translation speed. We demonstrate that clusters of poorly adapted codons followed by abundant codons specify the N’-terminal of secreted and SP-membranous proteins. The phenomenon is generalized to the proteomes of yeast, fly and worm despite a poor correlation among their codon tAI values. We propose that translation determinants are evolved to match the cellular needs for translational rate. The codons’ arrangement along transctipts is crucial for management of synaptic sites and poorly folded protein translation. The appearance of low tAI codons at the N'-terminal of SP proteins attenuates the elongation rate. We conclude that processes such as translocation through the ER membrane, processing, maturation and folding are dependent on a specific codon arrangement that dictates a delay in translational elongation.
Keyword: Proteomics, Sequence Analysis
TOP
LBR23 - SH3 Interactome Conserves General Function Over Specific Form
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 10
Presenting author: David Gfeller , Swiss Institute of Bioinformatics, Switzerland
Additional authors:David Gfeller, Swiss Institute of Bioinformatics, Switzerland
Xiaofeng Xin, University of Toronto, Canada
Jackie Cheng, University of California Berkeley, Canada
Raffi Tonikian, University of Toronto, Canada
Charles Boone, University of Toronto, Canada
Sachdev Sidhu, University of Toronto, Canada
Gary Bader, University of Toronto, Canada
Session Chair:
SH3 domains bind peptides to mediate protein-protein interactions that assemble and regulate dynamic biological processes. We surveyed the repertoire of SH3 binding specificity using peptide phage display in a metazoan, the worm Caenorhabditis elegans, and discovered that it structurally mirrors that of the budding yeast Saccharomyces cerevisiae. We then mapped the worm SH3 interactome using stringent yeast two-hybrid and compared it to the equivalent map for yeast. We found that the worm SH3 interactome resembles the analogous yeast network because it is significantly enriched for proteins with roles in endocytosis. Nevertheless, orthologous SH3 domain mediated interactions are highly rewired. Our results suggest a model of network evolution where general function of the SH3 domain network is conserved over its specific form.
Keyword: Protein Structure and Function Prediction and Anal, Systems Biology and Networks
TOP
OPT01 -
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT02 -
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT03 -
CancelledDate: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT04 -
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT05 -
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT06 -
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT07 -
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT08 -
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT09 -
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT10 -
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT11 -
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT12 -
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT13 -
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT14 -
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT15 -
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT16 -
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT17 -
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT18 -
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT19 -
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT20 -
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT21 -
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT22 -
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT23 -
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
OPT24 -
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
PP01 (PT) - Simple Topological Properties Predict Functional Misannotations in a Metabolic Network
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 4/5
Presenting author: John Pinney, Imperial College London, United Kingdom
Additional authors:Rodrigo Liberal, Imperial College London, United Kingdom
Session Chair: Erik Bongcam-Rudloff
Motivation: Misannotation in sequence databases is an important
obstacle for automated tools for gene function annotation, which
rely extensively on comparison to sequences with known function.
To improve current annotations and prevent future propagation of
errors, sequence-independent tools are therefore needed to assist
in the identification of misannotated gene products. In the case
of enzymatic functions, each functional assignment implies the
existence of a reaction within the organism’s metabolic network;
a first approximation to a genome-scale metabolic model can
be obtained directly from an automated genome annotation. Any
obvious problems in the network, such as dead-end or disconnected
reactions, can therefore be strong indications of misannotation.
Results: We demonstrate that a machine learning approach using
only network topological features can successfully predict the validity
of enzyme annotations. The predictions are tested at 3 different
levels. A random forest using topological features of the metabolic
network and trained on curated sets of correct and incorrect enzyme
assignments was found to have an accuracy of up to 86% in 5-fold
cross validation experiments. Further cross validation against unseen
enzyme superfamilies indicates that this classifier can successfully
extrapolate beyond the classes of enzyme present in the training
data. The random forest model was applied to several automated
genome annotations, achieving an accuracy of 60% in most cases
when validated against recent genome-scale metabolic models. We
also observe that when applied to draft metabolic networks for
multiple species, a clear negative correlation is observed between
predicted annotation quality and phylogenetic distance to the major
model organism for biochemistry (Escherichia coli for prokaryotes
and Homo sapiens for eukaryotes).
Contact: j.pinney@imperial.ac.uk
Keyword: Metabolic networks, Network topology, Enzyme function
TOP
PP02 (HT) - Heart Attacks: Leveraging A Cardiovascular Systems Biology Strategy To Predict Future Outcomes
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 7
Presenting author: Carlo Vittorio Cannistraci , King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Additional authors:Timothy Ravasi, King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Enrico Ammirati, San Raffaele Scientific Institute, Vita-Salute San Raffaele University, Italy
Session Chair: Predrag Radivojac
Inflammation is likely involved in ST-elevation acute myocardial infarction (STEMI), and patients with STEMI can present with high levels of circulating interleukin-6 (IL6) at the onset of symptoms. We used machine learning techniques to identify characteristic inflammatory cytokine patterns in the blood of emergency-room patients with STEMI, and observed two functional modules characterizing the reciprocal behaviours of the cytokines in patients with high IL6 levels. Next, exploiting reverse engineering techniques, we inferred which cytokines were crucial inside the respective modules. Combining them together with IL6 in a unique formula yielded a risk-index – a kind of composed-biomarker – that outperformed any single cytokine and classical prognostic factors in the prediction of cardiac dysfunction at discharge and death at six months.
Our methodology was considered a translational research innovation for the definition of composed-inflammatory-markers in cardiology, while our findings have potential implications for risk-oriented patient stratification and design of immune-modulating therapies.
Keyword: Applied Bioinformatics, Applied Bioinformatics
TOP
PP03 (HT) - Computational identification of a transiently open L1/S3 pocket for reactivation of mutant p53.
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 14.2
Presenting author: Richard Lathrop , University of California, Irvine, United States
Additional authors:Christopher Wassman, Google Inc., United States
Roberta Baronio, University of California, Irvine, United States
Özlem Demir, University of California, San Diego, United States
Brad Wallentine, University of California, Irvine, United States
Chiung-Kuang Chen, University of California, Irvine, United States
Linda Hall, University of California, Irvine, United States
Faezeh Salehi, University of California, Irvine, United States
Da-Wei Lin, University of California, Irvine, United States
Benjamin Chung, University of California, Irvine, United States
Wesley Hatfield, University of California, Irvine, United States
Richard Chamberlin, University of California, Irvine, United States
Hartmut Luecke, University of California, Irvine, United States
Peter Kaiser, University of California, Irvine, United States
Rommie Amaro, University of California, San Diego, United States
Session Chair: Russell Schwartz
The tumour suppressor p53 is the most frequently mutated gene in human cancer. Reactivation of mutant p53 by small molecules is an exciting potential cancer therapy. Although several compounds restore wild-type function to mutant p53, their binding sites and mechanisms of action are elusive. Here computational methods identify a transiently open binding pocket between loop L1 and sheet S3 of the p53 core domain. Mutation of residue Cys124, located at the centre of the pocket, abolishes p53 reactivation of mutant R175H by PRIMA-1, a known reactivation compound. Ensemble-based virtual screening against this newly revealed pocket selects stictic acid as a potential p53 reactivation compound. In human osteosarcoma cells, stictic acid exhibits dose-dependent reactivation of p21 expression for mutant R175H more strongly than does PRIMA-1. These results indicate the L1/S3 pocket as a target for pharmaceutical reactivation of p53 mutants.
Keyword: Applied Bioinformatics, Disease Models & Epidemiology
TOP
PP04 (PT) - Stability selection for regression-based models of transcription factor-DNA binding specificity
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 4/5
Presenting author: Fantine Mordelet , Duke University, United States
Additional authors:John Horton, Duke University, United States
Alexander Hartemink, Duke University, United States
Barbara Engelhardt, Duke University, United States
Raluca Gordan, Duke University, United States
Session Chair: Erik Bongcam-Rudloff
Motivation: The DNA binding specificity of a transcription factor (TF)
is typically represented using a position weight matrix (PWM) model,
which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that
does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have
their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret.
Results: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from
custom protein binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding
sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human
TFs c-Myc, Max, and Mad2) in their native genomic context. These
high-throughput, quantitative data are well suited for training complex
models that take into account not only independent contributions from
individual bases, but also contributions from di- and trinucleotides at
various positions within or near the binding sites. To ensure that our
models remain interpretable, we use feature selection to identify a
small number of sequence features that accurately predict TF-DNA
binding specificity. To further illustrate the accuracy of our regression
models, we show that even in the case of paralogous TF with
highly similar PWMs, our new models can distinguish the specificities
of individual factors. Thus, our work represents an important step
towards better sequence-based models of individual TF-DNA binding
specificity.
Availability: Our code is available at http://genome.duke.edu/labs/
gordan/ISMB2013. The PBM data used in this paper are available in
the Gene Expression Omnibus under accession number GSE44604.
Keyword: DNA binding specificity, Regression models, LASSO, Protein binding microarr
TOP
PP05 (HT) - Of Men and Not Mice: Comparative Genomic Analysis of Human Diseases and Mouse Models
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 7
Presenting author: Wenzhong Xiao , Massachusetts General Hospital/Harvard Medical School and Stanford University, United States
Session Chair: Predrag Radivojac
A cornerstone of modern biomedical research is the use of mouse models to explore basic disease mechanisms, evaluate new therapeutic approaches, and make decisions to carry new drug candidates forward into clinical trials. However, few of these human trials have shown success. Here we systematically compared the genomic response from publically available datasets of patients of different acute inflammatory diseases and corresponding murine models, and show that, although inflammation from different etiologies result in highly similar genomic responses in humans, the responses in mouse models correlate poorly with the human disease and also one another. Among genes changed significantly in humans, the murine orthologs are close to random in matching their human counterparts. In addition to improvements in the current animal model systems, our study supports higher priority for translational research to focus on the more complex human conditions rather than relying on mouse models to study human inflammatory diseases.
Keyword: Disease Models & Epidemiology, Evolution & Comparative Genomics
TOP
PP06 (HT) - Virtual ligand screening against comparative structural models of membrane transporters
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 14.2
Presenting author: Avner Schlessinger , Mount Sinai School of Medicine, United States
Additional authors:Ethan Geier, University of California, San Francisco, United States
Hao Fan, University of California, San Francisco, United States
Jonathan Gable, University of California, San Francisco, United States
John Irwin, University of California, San Francisco, United States
Kathleen Giacomini, University of California, San Francisco, United States
Andrej Sali, University of California, San Francisco, United States
Session Chair: Russell Schwartz
We describe a structure-based discovery approach to identify small molecule ligands for pharmacologically important membrane proteins. Here, we focus on LAT-1, a transporter of amino acids, thyroid hormones, and prescription drugs that is highly expressed in the blood-brain-barrier (BBB) and various types of cancer. LAT-1 is important for cancer development as well as for mediating drug and nutrient delivery across the BBB, making it a key drug target. We identify four LAT-1 ligands, including one chemically novel substrate, by comparative modeling, virtual screening, and experimental testing. These results may rationalize the enhanced brain permeability of two drug-like molecules, including the anti-cancer agent acivicin. Two of our hits inhibited proliferation of a cancer cell-line by distinct molecular mechanisms, providing useful chemical tools to characterize the role of LAT-1 in cancer metabolism. Finally, our integrated approach is generally applicable to characterization of other protein families and their interactions with small molecule ligands.
Keyword: Protein Structure & Function, Applied Bioinformatics
TOP
PP07 (PT) - A Graph Kernel Approach for Alignment-Free Domain-Peptide Interaction Prediction with an Application to Human SH3 Domains
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: Hall 4/5
Presenting author: Kousik Kundu , University of Freiburg, Germany
Additional authors:Fabrizio Costa, University of Freiburg, Germany
Rolf Backofen, University of Freiburg, Germany
Session Chair: Erik Bongcam-Rudloff
State-of-the-art experimental data for determining binding specificities of peptide recognition modules (PRMs) is obtained by high-throughput approaches like peptide arrays. Most prediction tools applicable to this kind of data are based on an initial multiple alignment of the peptide ligands. Building an initial alignment can be error-prone, especially in the case of the proline-rich peptides bound by the SH3 domains. Here we present a machine learning approach based on an efficient graph-kernel technique to predict the specificity of a large set of 70 human SH3 domains, which are a very important class of PRMs. The graph-kernel strategy allows us to 1) integrate several types of physico-chemical information for each amino acid, 2) consider high order correlations between these features and 3) eliminate the need for an initial peptide alignment. We build specialized models for each human SH3 domain and achieve competitive predictive performance of 0.73 area under precision-recall curve (AUC PR), compared to 0.27 AUC PR for state-of-the-art methods based on position weight matrices. We show that better models can be obtained when we use information on the on-interacting peptides (negative examples), which is currently not used by the state-of-the art approaches based on position-weight matrices. To this end, we analyze two strategies to identify subsets of high confidence negative data. The techniques introduced here are more general and hence can also be used for any other protein domains which interact with short peptides (i.e., other PRMs).
Keyword: PRM: Protein Recognition module, SVM: Support Vector Machine, PWM: Position Weight
TOP
PP08 (HT) - Impact of genetic dynamics and single-cell heterogeneity on development of nonstandard personalized medicine strategies for cancer
Date: Sunday, July 21, 11:30 a.m. - 11:55 a.m.Room: Hall 7
Presenting author: Chen-Hsiang Yeang , Academia Sinica, Taiwan
Additional authors:Robert Beckman, University of California, San Francisco, United States
Gunter Schemmann, World Water and Solar Technologies, United States
Session Chair: Predrag Radivojac
Cancers are heterogeneous and genetically unstable. Current practice of personalized medicine tailors therapy to heterogeneity between cancers of the same organ type. However, it does not yet systematically address heterogeneity within a single individual’s cancer. We developed a mathematical model of personalized cancer therapy incorporating genetic evolutionary dynamics and single-cell heterogeneity, and examined simulated clinical outcomes. Analyses of an illustrative case and a virtual clinical trial of over 3 million evaluable “patients” demonstrate that augmented nonstandard personalized medicine strategies may lead to superior outcomes compared with the current personalized medicine approach. Current personalized medicine matches generally focuses on the average, static, and current properties of the sample. In contrast, nonstandard strategies also consider minor subclones, dynamics, and predicted future tumor states. Our methods allow systematic study and evaluation of nonstandard personalized medicine strategies. These findings may, in turn, suggest global adjustments and enhancements to translational oncology research paradigms.
Keyword: Applied Bioinformatics, Disease Models & Epidemiology
TOP
PP09 (HT) - Extensive changes in DNA methylation are associated with expression of mutant huntingtin
Date: Sunday, July 21, 11:30 a.m. - 11:55 p.m.Room: Hall 14.2
Presenting author: Christopher Ng , Massachusetts Institute of Technology, United States
Additional authors:Ferah Yildirim, Massachusetts Institute of Technology, United States
Yoon Yap, Massachusetts Institute of Technology, United States
Simona Dalin, Massachusetts Institute of Technology, United States
Bryan Matthews, Massachusetts Institute of Technology, United States
Patricio Velez, Massachusetts Institute of Technology, United States
Adam Labadorf, Massachusetts Institute of Technology, United States
Ernest Fraenkel, Massachusetts Institute of Technology, United States
David Housman, Massachusetts Institute of Technology, United States
Session Chair: Russell Schwartz
With technological advances, it is becoming increasingly clear that DNA methylation has a role in wide range of biological processes, including neuronal activity, learning, and memory. In this paper, we explored the hypothesis that DNA methylation is altered in Huntington’s disease and used reduced representation bisulfite sequencing (RRBS) to map sites of DNA methylation in cells carrying either wild-type or mutant huntingtin (HTT). We found that a large fraction of the genes that change in expression in the presence of mutant HTT demonstrate significant changes in DNA methylation. Regions with low CpG content, which have previously been shown to undergo methylation changes in response to neuronal activity, were disproportionately affected. Using motif analysis, we identified transcriptional regulators associated with DNA methylation changes, and we confirmed these hypotheses using genome-wide chromatin immunoprecipitation sequencing (ChIP-Seq). Our findings suggest new mechanisms for the effects of polyglutamine-expanded HTT on DNA methylation and transcriptional dysregulation.
Keyword: Gene Regulation & Transcriptomics, Applied Bioinformatics
TOP
PP10 (HT) - Systems-based metatranscriptomic analysis
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: Hall 7
Presenting author: Xuejian Xiong , Hospital for Sick Children, Canada
Additional authors:John Parkinson, Hospital For Sick Children, Canada
Daniel Frank, University of Colorado, United States
Charles Robertson, University of Colorado, United States
Stacy Hung, Hospital for Sick Children, Canada
Janet Markle, Hospital for Sick Children, Canada
Jayne Danska, Hospital for Sick Children, Canada
Philippe Poussier, Sunnybrook Health Sciences Centre Research Institute, Canada
Angelo Canty, McMaster University, Canada
Kathy McCoy, University of Bern, Switzerland
Andrew MacPherson, University of Bern, Switzerland
Session Chair: Predrag Radivojac
The emerging science of metagenomics is transforming our understanding of the relationships of microbes with their environments. Moving beyond cataloguing the organisms and genes present, metatranscriptomics offers the exciting prospect of providing a more mechanistic understanding of these relationships. Exploiting metatranscriptomic data from microbiomes of increasing complexity, generated using the Illumina platform, we are developing novel software pipelines to process and interpret these datasets. Key to these analyses is adopting a protein-protein interaction and other systems datasets as frameworks onto which metatranscriptomic data may be integrated and interpreted. In this presentation I will outline some of the significant challenges we have encountered in analysing metatranscriptomic data generated by next generation sequencing platforms and discuss how these challenges are may be addressed.
Keyword: Sequence Analysis, Applied Bioinformatics
TOP
PP11 (HT) - Metabolic phenotypic analysis uncovers reduced proliferation associated with oxidative stress in progressed breast cancer
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: Hall 14.2
Presenting author: Livnat Jerby Arnon , Tel Aviv University, Israel
Additional authors:Lior Wolf, Tel Aviv University, Israel
Carsten Denkert, Charité Hospital, Germany
Gideon Y Stein, Beilinson Hospital, Rabin Medical Center, Israel
Mika Hilvo, VTT Technical Research Centre of Finland, Finland
Matej Oresic, VTT Technical Research Centre of Finland, Finland
Tamar Geiger, Tel Aviv University, Israel
Eytan Ruppin, Tel Aviv University, Israel
Session Chair: Russell Schwartz
The importance of metabolic reprogramming in cancer is being increasingly recognized. However, whole metabolic flux measurements in cancer are still scarce. Hence, we developed a novel Metabolic Phenotypic Analysis (MPA) method that profiles the metabolic phenotype of a tumor based on its gene or protein expression. We applied MPA to conduct the first genome-scale study of breast cancer metabolism based on the gene expression of a large cohort of cell lines and clinical samples. The modeling correctly predicted cell lines' growth rates, tumor lipid levels, and amino acid biomarkers, outperforming other metabolic modeling methods. MPA revealed that the tumor proliferation decreases as it evolves metastatic capability. We experimentally validated this "go or grow" dichotomy in-vitro, and linked the proliferation decrease to oxidative stress. Finally, we found fundamental metabolic differences between estrogen receptor (ER)+ and ER- tumors. These findings provide new insights into core metabolic aberrations in breast cancer.
Keyword: Applied Bioinformatics, Disease Models & Epidemiology
TOP
PP12 (HT) - Mapping the Strategies of Viruses Hijacking Human Host Cells – An Experimental and Computational Comparative Study
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 4/5
Presenting author: Jacques Colinge , CeMM, Austria
Session Chair: Olga Vitek
It is well known that viral proteins interfere with the innate immune system of the infected host to block detection or prevent response. Is it all what viruses do to human cells? Do they share common strategies? In a pan viral study mapping by mass spectrometry the protein interactions of 70 viral proteins from 30 viruses known to modulate the innate immune system, we tried to answer these questions. In particular, we found that viruses reprogram a broad range of biological functions through interactions with multifunctional general regulators. We proposed that size-limited virus genomes dictate such strategies, which we could support by comparing the functional and human interactome impact of diverse viral proteins showing non redundancy among a single genome and convergent evolution within virus families. In recent work, we are focusing on a smaller number of viruses whose host interactions have been mapped for almost all their proteins.
Keyword: Protein Interactions & Molecular Networks, Mass Spectrometry & Proteomics
TOP
PP13 (HT) - Ultrashort and progressive 4sU-tagging reveals key characteristics of RNA processing at nucleotide resolution
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 7
Presenting author: Caroline Friedel , Ludwig-Maximilians-Universität München, Germany
Additional authors:Lukas Windhager, Ludwig-Maximilians-Universität München, Germany
Thomas Bonfert, Ludwig-Maximilians-Universität München, Germany
Kaspar Burger, Helmholtz-Zentrum München, Germany
Zsolt Ruzsics, Ludwig-Maximilians-Universität München, Germany
Stefan Krebs, Ludwig-Maximilians-Universität München, Germany
Stefanie Kaufmann, Ludwig-Maximilians-Universität München, Germany
Georg Malterer, Ludwig-Maximilians-Universität München, Germany
Anne L’Hernault, University of Cambridge, United Kingdom
Markus Schilhabel, Christian-Albrechts-Universität Kiel, Germany
Stefan Schreiber, Christian-Albrechts-Universität Kiel, Germany
Philip Rosenstiel, Christian-Albrechts-Universität Kiel, Germany
Ralf Zimmer, Ludwig-Maximilians-Universität München, Germany
Dirk Eick, Helmholtz-Zentrum München, Germany
Lars Dölken, University of Cambridge, United Kingdom
Session Chair: Ivo Hofacker
Metabolic tagging of newly transcribed RNA by 4-thiouridine (4sU) can reveal the relative contributions of RNA synthesis and decay rates. Recently, we showed that ultra-short 4sU-tagging combined with RNA-seq determines global RNA processing kinetics at nucleotide resolution. This allowed identification of classes of rapidly and slowly spliced/degraded introns characterized by a distinct association with intron length, gene length and splice site strength. For one class of introns, we also observed long lasting retention in the primary transcript, but efficient secondary splicing/degradation at later time points. Finally, we showed that processing of most small nucleolar (sno)RNA-containing introns is remarkably inefficient with the majority of introns being spliced and degraded rather than processed into mature snoRNAs. In summary, our study yielded unparalleled insights into the kinetics of RNA processing and provides the tools to study molecular mechanisms of RNA processing and their contribution to gene expression regulation at the nucleotide level.
Keyword: Gene Regulation & Transcriptomics, other
TOP
PP14 (HT) - Identifying differentially expressed transcripts from RNA-seq data with biological variation
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 14.2
Presenting author: Peter Glaus , University of Manchester, uk
Additional authors:Antti Honkela, University of Helsinki, Finland
Magnus Rattray, University of Manchester, United Kingdom
Session Chair: Cenk Sahinalp
Analysing RNA-seq data poses multiple challenges due to base mismatches, non-uniform read distribution, reads shared by multiple splice variants and other factors which make the expression analysis especially difficult. The BitSeq method uses a Bayesian approach to model the read generation and sequencing processes and infers expression estimates of individual transcripts. Transcript expression levels can be used to obtain more accurate gene expression estimates, in comparison to popular count based methods, or for identifying differentially expressed transcripts or genes. Our differential expression model combines the uncertainty of the expression estimates with variances estimated from biologically replicated experiments to identify significantly differentially expressed transcripts with improved precision.
We present advantages of using BitSeq in RNA-seq datasets dealing with multi-mapping reads and non-uniform read distribution. Experiments with real and synthetic datasets show that BitSeq produces state-of-the-art results in both expression estimation and differential expression analysis.
Keyword: Gene Regulation & Transcriptomics, Sequence Analysis
TOP
PP15 (PT) - Multi-task learning for Host-Pathogen protein interactions
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 4/5
Presenting author: Meghana Kshirsagar , Carnegie Mellon University , United States
Additional authors:Jaime Carbonell, Carnegie Mellon University, United States
Judith Klein-Seetharaman, University of Pittsburgh School of Medicine, United States
Session Chair: Olga Vitek
Motivation:
An important aspect of infectious disease research involves understanding the differences and commonalities in the infection mechanisms underlying various diseases. Systems biology based approaches study infectious diseases by analyzing the interactions between the host species and the pathogen organisms. This work aims to combine the knowledge from experimental studies of host-pathogen interactions in several diseases in order to build stronger predictive models. Our approach is based on a formalism from machine-learning called `multi-task learning', which considers the problem of building models across tasks that are related to each other. A `task' in our scenario is the set of host-pathogen protein interactions involved in one disease. To integrate interactions from several tasks (i.e diseases), our method exploits the similarity in the infection process across the diseases. In particular, we use the biological hypothesis that similar pathogens target the same critical biological processes in the host, in defining a common structure across the tasks.
Results:
Our current work on host-pathogen protein interaction prediction focuses on human as the host, and four bacterial species as pathogens. The multi-task learning technique we develop uses a task based regularization approach. We find that the resulting optimization problem is a difference of convex (DC) functions. To optimize, we implement a Convex-Concave procedure based algorithm. We compare our integrative approach to baseline methods that build models on a single host-pathogen protein interaction dataset. Our results show that our approach outperforms the baselines on the training data. We further analyse the protein interaction predictions generated by the models, and find some interesting insights.
Keyword: Host-pathogen protein interaction, multi-task learning, machine learning, bacteria hu
TOP
PP16 (HT) - Gene expression anti-profiles as a basis for accurate universal cancer signatures
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 7
Presenting author: Hector Corrada Bravo , University of Maryland, United States
Session Chair: Ivo Hofacker
Gene expression anti-profiles are a new computational approach for developing cancer genomic signatures that specifically take advantage of gene expression heterogeneity. This presentation will describe the biological basis for this method derived from experimental findings suggesting that stochastic across-sample hyper-variability in the expression of specific genes is a stable and general property of cancer. Application of this methodology in screening patients for colon cancer based on expression measurements obtained from peripheral blood samples will be presented. We will also present results from development of a universal cancer anti-profile that accurately distinguishes cancer from normal regardless of tissue type. This method uses single-chip normalization and quality assessment methods so no further retraining of signatures would be required before their application in clinical settings. These results suggest that anti-profiles may be used to develop inexpensive and non-invasive universal cancer screening tests.
Keyword: Applied Bioinformatics, Gene Regulation & Transcriptomics
TOP
PP17 (PT) - GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 14.2
Presenting author: Wei Wang, UCLA, United States
Additional authors:Shunping Huang, UNC Chapel Hill, United States
Jack Wang, UNC Chapel Hill, United States
Xiang Zhang, Case Western Reserve University, United States
Fernando Pardo Manuel De Villena, UNC Chapel Hill, United States
Leonard McMillan, UNC Chapel Hill, United States
Zhaojun Zhang, UNC Chapel Hill, United States
Session Chair: Cenk Sahinalp
Motivation:
RNA-seq techniques provide an unparalleled means for exploring a transcriptome with deep coverage and base pair level resolution. Various analysis tools have been developed to align and assemble RNA-seq data, such as the widely used TopHat/Cufflinks pipeline. A common observation is that a sizable fraction of the fragments/reads align to multiple locations of the genome. These multiple alignments pose substantial challenges to existing RNA-seq analysis tools. Inappropriate treatment may result in reporting spurious expressed genes (false positives), and missing the real expressed genes (false negatives). Such errors impact the subsequent analysis, such as differential expression analysis. In our study, we observe that about 3.5% of transcripts reported by TopHat/Cufflinks pipeline correspond to annotated nonfunctional pseudogenes. Moreover, about 10.0% of reported transcripts are not annotated in the Ensembl database. These genes could be either novel expressed genes or false discoveries.
Results:
We examine the underlying genomic features that lead to multiple alignments and investigate how they generate systematic errors in RNA-seq analysis. We develop a general tool, GeneScissors, which exploits machine learning techniques guided by biological knowledge to detect and correct spurious transcriptome inference by existing RNA-seq analysis methods. In our simulated study, GeneScissors can predict spurious transcriptome calls due to misalignment with an accuracy close to 90%. It provides substantial improvement over the widely used TopHat/Cufflinks or MapSplice/Cufflinks pipelines in both precision and F-measurement. On real data, GeneScissors reports 53.6% less pseudogenes and 0.97% more expressed and annotated transcripts, when compared with the TopHat/Cufflinks pipeline. In addition, among the 10.0% unannotated transcripts reported by TopHat/Cufflinks, GeneScissors finds that more than 16.3% of them are false positives.
Availablility:
The software can be downloaded at http://csbio.unc.edu/genescissors/
Keyword: Pseudogene, RNA-seq, RNA-seq Alignment, RNA-seq Assembling
TOP
PP18 (HT) - A Conserved Map of Genetic Interactions Induced by DNA Damage
CancelledDate: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 4/5
Presenting author: Rohith Srivas , University of California, San Diego, United States
Additional authors:Aude Guenole, Leiden University Medical Center, Netherlands
Kees Vreeken, Leiden University Medical Center, Netherlands
Ze Zhong Wang, University of California, San Diego, United States
Shuyi Wang, University of California, San Francisco, United States
Nevan Krogan, University of California, San Francisco, United States
Trey Ideker, University of California, San Diego, United States
Haico van Attikum, Leiden University Medical Center, Netherlands
Session Chair: Olga Vitek
To protect the genome, cells have evolved a diverse set of pathways designed to sense, signal, and repair multiple types of DNA damage. To assess the degree of coordination and crosstalk among these pathways, we systematically mapped changes in the cell’s genetic network across a panel of differentDNA-damaging agents, resulting in ~1,800,000 differential measurements. Each agent was associated with a distinct interaction pattern, which, unlike single-mutant phenotypes or gene expression data, has high statistical power to pinpoint the specific repair mechanisms at work. The agent specific networks revealed roles for the histone acetyltranferase Rtt109 in the mutagenic bypass ofDNA lesions and the neddylation machinery in cell cycle regulation and genome stability, while the network induced by multiple agents implicatesIrc21, an uncharacterized protein, in checkpoint control and DNA repair. Our multiconditional genetic interaction map provides a unique resource that identifies agent-specific and general DNA damage
response pathways.
Keyword: Protein Interactions & Molecular Networks, Disease Models & Epidemiology
TOP
PP19 (HT) - Newborn screening for SCID identifies patients with ataxia telangiectasia
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 7
Presenting author: Steven Brenner , University of California, Berkeley, United States
Additional authors:Jacob Mallott, UCSF, United States
Antonia Kwan, UCSF, United States
Joseph Church, USC, United States
Diana Gonzalez, UCSF, United States
Fred Lorey, Public Health Institute, United States
Ling Tang, UCSF, United States
Rajgopal Srinivisan, Tata Conservancy Service, India
Sadhna Rana, Tata Conservancy Service, India
Uma Sunderam, Tata Conservancy Service, India
Session Chair: Ivo Hofacker
Severe combined immunodeficiency (SCID) is characterized by failure of T lymphocyte development. Newborn screening to identify SCID is now performed in several states. In addition to infants with typical SCID, screening identifies infants with T lymphocytopenia who appear healthy and in whom a SCID diagnosis cannot be confirmed. Deep sequencing was employed to find causes of T lymphocytopenia in such infants. Whole exome sequencing and analysis were performed in infants and their parents. Upon finding deleterious mutations in the ataxia telangiectasia mutated (ATM) gene, we confirmed the diagnosis of ataxia telangiectasia (AT) in two infants. AT is usually not diagnosed until much later in life, after symptoms are manifest. Although there is no current cure for the progressive neurological impairment of AT, early detection permits avoidance of infectious complications, while providing information for families regarding reproductive recurrence risks and increased cancer risks in patients and carriers.
Keyword: Sequence Analysis, Disease Models & Epidemiology
TOP
PP20 (PT) - Poly(A) motif prediction using spectral latent features from human DNA sequences
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 14.2
Presenting author: Bo Xie , Georgia Institute of Technology, United States
Additional authors:Boris Yankovic, King Abdullah University of Science and Technology
Vladimir Bajic, King Abdullah University of Science and Technology
Le Song, Georgia Institute of Technology, United States
Xin Gao, King Abdullah University of Science and Technology
Session Chair: Cenk Sahinalp
Motivation:
Polyadenylation is the addition of a poly(A) tail to an RNA molecule. Identifying DNA sequence motifs that signal the addition of poly(A) tails is essential to improved genome annotation and better understanding of the regulatory mechanisms and stability of mRNA.
Existing poly(A) motif predictors demonstrate that information extracted from the surrounding nucleotide sequences of candidate poly(A) motifs can differentiate true motifs from the false ones to a great extent. A variety of sophisticated features has been explored, including sequential, structural, statistical, thermodynamic and evolutionary properties. However, most of these methods involve extensive manual feature engineering, which can be time-consuming and can require in-depth domain knowledge.
Results:
We propose a novel machine learning method for poly(A) motif prediction by marrying generative learning (hidden Markov models) and discriminative learning (support vector machines). Generative learning provides a rich palette on which the uncertainty and diversity of sequence information can be handled, while discriminative learning allows the performance of the classification task to be directly optimized. Here, we employed hidden Markov models for fitting the DNA sequence dynamics, and developed an efficient spectral algorithm for extracting latent variable information from these models. These spectral latent features were then fed into support vector machines to fine tune the classification performance.
We evaluated our proposed method on a comprehensive human poly(A) dataset that consists of 14,740 samples from 12 of the most abundant variants of human poly(A) motifs. Compared with one of previous state-of-art methods in the literature (the random forest model with expert-crafted features), our method reduces the average error rate, false negative rate and false positive rate by 26%, 15% and 35%, respectively. Meanwhile, our method made about 30% fewer error predictions relative to the other string kernels. Furthermore, our method can be used to visualize the importance of oligomers and positions in predicting poly(A) motifs, from which we can observe a number of characteristics in the surrounding regions of true and false motifs that have not been reported before.
Availability:
website:http://sfb.kaust.edu.sa/Pages/Software.aspx
Keyword: Poly(A) motif, classification, generative learning, discriminativ
TOP
PP21 (HT) - Synthetic lethality between gene defects affecting a single non-essential molecular pathway with reversible steps
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 4/5
Presenting author: Inna Kuperstein , Institut Cuire, France
Additional authors:Andrei Zinovyev, Institut Curie, France
Emmanuel Barillot, Institut Curie, France
Wolf-Dietrich Heyer, University of California, Davis, United States
Session Chair: Olga Vitek
Synthetic lethality (SL) is a framework to decipher molecular pathways and to develop new treatment strategies. The canonical explanation of SL considers two genes functioning in parallel, mutually compensatory pathways, the between-pathway SL. We classify all known types of synthetic lethal interactions and propose a novel mechanism of SL in a single pathway. The new within-reversible-pathway SL (wrpSL) involves pathway with reversible steps and kinetic trapping of a toxic intermediate or of an essential resource. Mathematical modeling recapitulates the possibility of kinetic trapping leading to lethality and reveals the potential contributions of synthetic dosage and positive masking interactions in a single pathway. Experimental data with Homologous Recombination DNA repair pathway validate the concept. Analysis of yeast gene interactions and pathways suggests broad applicability of this novel concept in many biological processes. These observations extend the interpretation of synthetic lethality and contribute to pathways reconstruction and therapeutic approach improvement.
Keyword: Protein Interactions & Molecular Networks, Disease Models & Epidemiology
TOP
PP22 (HT) - BioJS: An Open Source JavaScript Framework for Biological Data Visualization. Bioinformatics
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 7
Presenting author: Manuel Corpas , The Genome Analysis Centre, uk
Additional authors:John Gómez, EBI, United Kingdom
Leyla García, EBI, United Kingdom
Gustavo Salazar, University of Cape Town, South Africa
Jose Villaveces, Max Planck Institute, Germany
Swanand Gore, EBI, United Kingdom
Alexander García, Florida State University, United States
Maria Martín, EBI, United Kingdom
Guillaume Launay, Lyon1 University, France
Rafael Alcántara, EBI, United Kingdom
Noemi Del Toro Ayllón, EBI, United Kingdom
Marine Dumousseau, EBI, United Kingdom
Sandra Orchard, EBI, United Kingdom
Sameer Velankar, EBI, United Kingdom
Henning Hermjakob , EBI, United Kingdom
Chenggong Zong, UCLA, United States
Peipei Ping, UCLA, United States
Rafael Jiménez, EBI, United Kingdom
Session Chair: Ivo Hofacker
This presentation first sets the scene for the problem: dynamic web visualization of bioinformatics, which depends heavily on JavaScript, has no coordination of efforts to date. Available applications in JavaScript are difficult to discover, develop, test, maintain, use, customize, extend or combine. BioJS provides a common specification to document, develop and register JavaScript graphical components in bioinformatics. Next, I will briefly talk about how components are developed to comply with our purposely-defined implementation guidelines. The rest of the talk is mostly taken by a practical demonstration of representative functionalities already available in the BioJS registry. Examples include a) the Sequence component to visualize proteins in fasta format in a variety of ways, b) the GeneExpressionSummary that links genes to phenotypes, c) the ChEBICompound and d) the InteractionTable. To conclude, I briefly show the portal for the project, how to contribute to this effort and who is involved.
Keyword: Bioimaging & Data Visualization, Databases & Ontologies
TOP
PP23 (PT) - Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 14.2
Presenting author: Dina Hafez , Duke University, United States
Additional authors:Uwe Ohler, Max Delbrück Center for Molecular Medicine, Germany
Jun Zhu, National Institutes of Health
Ting Ni, National Institutes of Health
Sayan Mukherjee, Duke University, United States
Session Chair: Cenk Sahinalp
Motivation:
Pre-mRNA cleavage and polyadenylation is an essential step for 3' end maturation, and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage site (polyA site), which are frequently constrained by sequence content and position. More than 50\% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with varying 3'UTRs, thus affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered both by the lack of suitable data on the precise location of cleavage sites, as well as of appropriate tests for determining APAs with significant differences across multiple libraries.
Results:
We applied a tailored paired-end RNA-seq protocol to specifically probe the position of polyA sites in three adult cell types. We specified a linear effects regression model to identify tissue-specific biases indicating regulated alternative polyadenylation; the significance of differences between cell types was assessed by an appropriately designed permutation test. This combination allowed to identify highly specific subsets of APA events in the individual cell types. Predictive models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6\%), as well as tissue-specific regulated sets from each other. We found that the main cis-regulatory elements described for polyadenylation are a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical PAS signal being nearly absent at brain-specific polyA sites. Together, our results contribute to the understanding of the diversity of post-transcriptional gene regulation.
Keyword: Alternative polyadenylation, tissue-specific regulation, RNA-seq, predictive mo
TOP
PP24 (HT) - Efficient Computation of Gene Tree Probability based on Coalescent Theory under Incomplete Lineage Sorting
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 4/5
Presenting author: Yufeng Wu , University of Connecticut, United States
Session Chair: Russell Schwartz
Incomplete lineage sorting is a genealogical phenomenon that is caused by the inherent stochasticity of population genealogical processes. With incomplete lineage sorting, gene tree topologies may be different from the species tree topologies and thus may potentially cause difficulty in inferring species phylogeny or population evolutionary history. An established topic in incomplete lineage sorting is computing the probability (called gene tree probability) of a gene tree topology for a given species tree based on coalescent theory. However, previously there exists no practical algorithm for computing the gene tree probability for large trees. Gene tree probability is. In this talk, I will present an algorithm for computing the gene tree probability. This algorithm is much faster than an existing algorithm and can be applied to larger trees. Thus, this new algorithm may be useful in large-scale phylogenetics study.
Keyword: Population Genomics, Evolution & Comparative Genomics
TOP
PP25 (PT) - Predicting protein contact map using evolutionary and physical constraints by integer programming
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 7
Presenting author: Jinbo Xu , Toyota Technological Institute at Chicago, United States
Additional authors:Zhiyong Wang, Toyota Technological Institute at Chicago
Session Chair: Alex Bateman
Motivation. Protein contact map describes the pairwise spatial and functional relationship of residues in a protein and contains key information for protein 3D structure prediction. Although studied extensively, it remains very challenging to predict contact map using only sequence information. Most existing methods predict the contact map matrix element-by-element, ignoring correlation among contacts and physical feasibility of the whole contact map. A couple of recent methods predict contact map by using mutual information (MI) and enforcing a sparsity restraint (i.e., the contact matrix shall be very sparse), but these methods demand for a very large number of sequence homologs and the resultant contact map may be still physically infeasible.
Results. This paper presents a novel method for contact map prediction, integrating both evolutionary and physical restraints by machine learning and integer linear programming (ILP). The evolutionary restraints are much more informative than MI and the physical restraints specify more concrete relationship among contacts than the sparsity restraint. As such, our method greatly reduces the solution space of the contact map matrix and thus, significantly improves prediction accuracy. Experimental results show that our method outperforms currently popular methods no matter how many sequence homologs are available for the protein under consideration.
Keyword: Protein contact map prediction, integer programming, physical constraint, evolutio
TOP
PP26 (HT) - Interpreting Personal Transcriptomes: Personalized Mechanism-Scale Profiling Predicts Survival in Oral, Prostate, Lung and Gastric Cancers
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 14.2
Presenting author: Yves Lussier , The University of Illinois, United States
Additional authors:Xinan Yang, The University of Chicago, United States
Kelly Regan, Ohio State University, United States
Yong Huang, The University of Chicago, United States
Jianrong Li, The University of Illinois at Chicago, United States
Ezra Cohen, The University of Chicago, United States
Tanguy Zeiwert, The University of Chicago, United States
Session Chair: Serafim Batzoglou
Gene expression signatures that are predictive of therapeutic response or prognosis are increasingly useful in clinical care; however, mechanistic interpretation of expression arrays remains an unmet challenge. We developed a novel approach to generate “personal mechanism signatures” of molecular pathways and functions from gene expression arrays. FAIME, the Functional-Analysis-of-Individual-Microarray-Expression, computes mechanism scores using rank-weighted gene expression of an individual sample. In oral squamous cell carcinoma samples, the overlap of “Oncogenic Mechanisms of OSCC” (deregulated FAIME-derived scores of pathways and biological functions) accurately discriminate clinical samples in two additional datasets (n=35;91, F-accuracy=100%;97%, p<0.001), and predicts patients’ survival in two studies (p=0.0018;p=0.032). Previous approaches depending on group assignment of individual samples before selecting features or learning a classifier are limited by design to discrete-class prediction. FAIME is more amenable for clinical deployment since it translates the gene-level measurements of each given sample into pathways and molecular function profiles that can be applied to analyze continuous phenotypes(e.g. survival-time).
Keyword: Applied Bioinformatics, Databases & Ontologies
TOP
PP27 (HT) - Deconvolution of targeted protein-protein interaction maps
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: ICC Lounge 81
Presenting author: Alexey Stukalov , CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Austria
Session Chair: Hagit Shatkay
Guided by current knowledge on the modular structure of protein complexes, we propose BI-MAP, a novel statistical approach to analyze targeted medium-scale affinity purification-mass spectrometry (AP-MS) datasets. It allows confidently identifying protein modules, i.e. groups of proteins in strong interaction that are shared by multiple complexes. We show that BI-MAP can be applied from small and very detailed maps to large, sparse, and much noisier datasets. In the latter case, the analysis of the inferred posterior distribution helps identifying robust components that frequently recur in the most probable data models. Detailed performance analysis shows that BI-MAP clearly outperforms alternative algorithms addressing the same problem. A new graphical grammar representing the inferred modules and their interactions provides a convenient visual representation of the very complex underlying data that facilitates data interpretation by biologists. BI-MAP is open source with exports to R, Cytoscape and GraphML.
Keyword: Protein Interactions & Molecular Networks, Mass Spectrometry & Proteomics
TOP
PP28 (PT) - IBD-Groupon : An Efficient Method for Detecting Group-wise Identity-by-Descent regions simultaneously in Multiple Individuals based on Pairwise IBD relationships
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 4/5
Presenting author: Dan He , IBM T.J. Watson, United States
Session Chair: Russell Schwartz
Detecting Identity-by-Descent (IBD) is a very important problem in genetics. Most of the existing methods focus on detecting pairwise IBDs, which have relatively low power to detect short IBDs. Methods to detect IBDs among multiple individuals simultaneously, or group-wise IBDs, have better performance for short IBD detection. In the meanwhile group-wise IBDs can be applied to a wide range of applications such as disease mapping, pedigree reconstruction, etc. The existing group-wise IBD detection method is computationally inefficient and is only able to handle small data sets such as 20, 30 individuals with hundreds of SNPs. It also requires a prior specification of the number of IBD groups, which may not be realistic in many cases. The method can only handle small number of IBD groups such as two or three due to scalability issue. What's more, it does not take LD into consideration. In this work, we developed a very efficient method \textit{IBD-Groupon}, which detects group-wise IBDs based on pairwise IBD relationships and it is able to address all the drawbacks mentioned above. To our knowledge, our method is the first group-wise IBD detection method that is scalable to very large data sets, for example, hundreds of individuals with thousands of SNPs, and in the meanwhile is powerful to detect short IBDs. Our method does not need to specify the number of IBD groups, which will be detected automatically. And our method takes LD into consideration as it is based on pairwise IBDs where LD can be easily incorporated.
Keyword: Identity-by-Descent, HMM, MCMC, group-wise IBD
TOP
PP29 (PT) - ThreaDom: Extracting Protein Domain Boundary Information from Multiple Threading Alignments
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 7
Presenting author: Zhidong Xue , University of Michigan, United States
Additional authors:Dong Xu, University of Michigan
Yan Wang, University of Michigan
Yang Zhang, University of Michigan
Session Chair: Alex Bateman
Motivation: Protein domains are subunits that can fold and function independently. Identification of domain boundary locations is often the first step in protein folding and function annotations. Most of the current methods deduce domain boundaries by sequence-based analysis where accuracy is low. There is no efficient method for predicting discontinuous domains that consist of segments from separated sequences. Since template-based methods are most efficient for protein 3D structure modeling, combining multiple threading alignment information should increase the accuracy and reliability of computational domain predictions.
Result: We develop a new domain predictor, ThreaDom, which deduces protein domain boundary locations based on multiple threading alignments. The core of the method development is the derivation of a domain conservation score that combines composite information from template domain structures and terminal and internal alignment gaps. Tested on 630 non-redundant sequences, without using homologous templates ThreaDom generates correct single- and multi-domain classifications in 81% of cases where 78% have the domain linker location assigned within 20 residues. In a second test on 486 proteins with discontinuous domains, ThreaDom achieves an average precision 84% and a recall 65% in domain boundary prediction. Finally, ThreaDom was examined on 56 targets from CASP8 and had a domain overlap rate 73%, 87% and 85% with the target structure for Free Modeling, Hard multiple-domain and discontinuous domain proteins, respectively, which are significantly higher than most of the domain predictors in the CASP8 experiment.
Keyword: Protein structure prediction, Protein domain prediction, Threading, CASP experim
TOP
PP30 (HT) - Compressive Genomics
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 14.2
Presenting author: Michael Baym , Harvard Medical School, United States
Additional authors:Po-Ru Loh, MIT, United States
Bonnie Berger, MIT, United States
Session Chair: Serafim Batzoglou
The past two decades have seen an exponential increase in sequencing capabilities, outstripping advances in computing power. Extracting new insights from the datasets currently being generated will require not only faster computers; it will require smarter algorithms. However, most genomes currently sequenced are highly similar to ones already collected; thus the amount of novel sequence information is growing much more slowly. We show that this redundancy can be exploited by compressing the data so as to allow direct computation on the compressed data. This approach reduces the computational task of operating on many similar genomes to slightly more than that of operating on just one. Moreover, its relative advantage over existing algorithms grows with the accumulation of future genomic data. We demonstrate this compressive architecture by implementing versions of both BLAST and BLAT, and emphasize how compressive genomics, more generally, will enable biologists to keep pace with current data.
Keyword: Sequence Analysis, Databases & Ontologies
TOP
PP31 (PT) - Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: ICC Lounge 81
Presenting author: Carlo Vittorio Cannistraci , King Abdullah University of Science and Technology, Saudi Arabia
Additional authors:Gregorio Alanis-Lobato, King Abdullah University of Science and Technology
Timothy Ravasi, King Abdullah University of Science and Technology
Session Chair: Hagit Shatkay
Motivation: Most functions within the cell emerge thanks to protein-protein-interactions (PPIs), yet their experimental determination is both expensive and time consuming. PPI-networks present signifi-cant levels of noise and incompleteness. Prediction of interactions using solely PPI-network-topology (topological prediction) is difficult but essential when biological prior-knowledge is absent or unreliable.
Methods: Network-embedding emphasizes relations between net-work proteins embedded in a low-dimensional space, where protein-pairs closer to each other represent potential candidate interactions to predict. Network denoising, which boosts the prediction perfor-mance, is here achieved by minimum-curvilinear-embedding (MCE), combined with the shortest-path (SP) adopted in the reduced space for assigning likelihood scores to candidate interactions. Further-more, we introduce: (i) a new valid variation of MCE named non-centred-MCE (ncMCE); (ii) two automatic strategies for the selection of the appropriate embedding-dimension; (ii) two new randomised procedures for prediction evaluation.
Results: We compared our method against several unsupervised and supervised embedding approaches, and node-neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader outperforming the current methods for topological link prediction.
Conclusion: Minimum curvilinearity is a valuable nonlinear frame-work, which we successfully applied in embedding of protein net-works for unsupervised prediction of novel PPIs. The rationale is that biological and evolutionary prior-information is imprinted in the nonlinear patterns hidden behind the protein network topology, and can be exploited for prediction of new protein links. The predicted PPIs represent good candidates to test in high-throughput experi-ments or to exploit in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules.
Keyword: Network topology, network embedding, topological prediction, nonline
TOP
PP32 (PT) - Inference of historical migration rates via haplotype sharing
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: Hall 4/5
Presenting author: Pier Francesco Palamara , Columbia University, United States
Additional authors:Itsik Pe'Er, Columbia University, United States
Session Chair: Russell Schwartz
Pairs of individuals from a study cohort will often share long-range haplotypes identical-by-descent (IBD). Such haplotypes are transmitted from common ancestors that lived tens to hundreds of generations in the past, and can now be efficiently detected in high-resolution genomic datasets, providing a novel source of information in several domains of genetic analysis. Recently, haplotype sharing distributions were studied in the context of demographic inference, and were used to reconstruct recent demographic events in several populations. We here extend such framework to handle demographic models that contain multiple demes interacting through migration. We extensively test our formalism in several demographic scenarios, and provide a freely available software tool for demographic inference.
Keyword: Population genetics, Demographic inference, Identity by descent, Haplot
TOP
PP33 (PT) - Protein Threading Using Context-Specific Alignment Potential
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: Hall 7
Presenting author: Sheng Wang, Toyota Technological Institute at Chicago, United States
Additional authors:Jinbo Xu, Toyota Technological Institute at Chicago, United States
Feng Zhao, Toyota Technological Institute at Chicago, United States
Jianzhu Ma, Toyota Technological Institute at Chicago, United States
Session Chair: Alex Bateman
Motivation: Template-based modeling (TBM) including homology modeling and protein threading is the most reliable method for pro-tein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current TBM methods, especially when proteins under consideration are distantly related.
Results: We present a novel context-specific alignment potential for protein threading including alignment and template selection. Our alignment potential measures the log odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global context-specific information. The local alignment potential quantifies how well one sequence residue can be aligned to one template residue based upon context-specific information of the residues. The global alignment potential quantifies how well two sequence residues can be placed into two template positions at a given distance, again based upon context-specific information. By accounting for correla-tion among a variety of protein features and making use of context-specific information, our alignment potential is much more sensitive than the widely used context-independent or profile-based scoring function. Experimental results confirm that our method generates significantly better alignments and threading results than the best profile-based methods on several very large benchmarks. Our method works particularly well for distantly-related proteins or pro-teins with sparse sequence profiles due to the effective integration of context-specific, structure and global information.
Keyword: Protein Threading, Alignment Potential, Protein Pairwise Information
TOP
PP34 (PT) - Predicting Drug-Target Interactions Using Restricted Boltzmann Machines
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: Hall 14.2
Presenting author: Jianyang Zeng, Tsinghua University, China
Additional authors:Yuhao Wang, Tsinghua University, China
Session Chair: Serafim Batzoglou
Motivation:
In silico prediction of drug-target interactions plays an important role towards identifying and developing new uses of existing or abandoned drugs. Network-based approaches have recently become a popular tool for discovering new drug-target interactions. Unfortunately, most of these network-based approaches can only predict binary interactions between drugs and targets, and information about different types of interactions has not been well exploited for drug-target interaction prediction in previous studies. On the other hand, incorporating additional information about drug-target relationships or drug modes of action can improve prediction of drug-target interactions. Furthermore, the predicted types of drug-target interactions can broaden our understanding about the molecular basis of drug action.
Results:
We propose a first machine learning approach to integrate multiple types of drug-target interactions and predict unknown drug-target relationships or drug modes of action. We cast the new drug-target interaction prediction problem into a two-layer graphical model, called restricted Boltzmann machine (RBM), and apply a practical learning algorithm to train our model and make predictions. Tests on two public databases show that our RBM model can effectively capture the latent features of a drug-target interaction network, and achieve excellent performance on predicting different types of drug-target interactions, with the area under precision-recall curve (AUPR) up to 89.6. In addition, we demonstrate that integrating multiple types of drug-target interactions can significantly outperform other predictions either by simply mixing multiple types of interactions without distinction or using only a single interaction type. Further tests show that our approach can infer a high fraction of novel drug-target interactions that has been validated by known experiments in the literature or other databases. These results indicate that our approach can have highly practical relevance to drug-target interaction prediction and drug repositioning, and hence advance the drug discovery process.
Availability: Software and datasets are available upon request.
Keyword: Drug-Target Interaction, Drug Repositioning, Restricted Boltzmann Machine,
TOP
PP35 (PT) - Supervised de novo reconstruction of metabolic pathways from metabolome-scale compound sets
Date: Monday, July 22, 11:30 a.m. - 11:55 a.m.Room: ICC Lounge 81
Presenting author: Masaaki Kotera , Kyoto University, Japan
Additional authors:Yasuo Tabei,
Yoshihiro Yamanishi, Kyushu University, Japan
Toshiaki Tokimatsu, Kyoto University, Japan
Susumu Goto, Kyoto University, Japan
Session Chair: Hagit Shatkay
Motivation: The metabolic pathway is an important biochemical reaction network involving enzymatic reactions among chemical compounds. However, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. Therefore, the most important challenge in metabolomics is the automated de novo reconstruction of metabolic pathways, which includes the elucidation of previously unknown reactions to bridge the metabolic gaps.
Results: In this paper we develop a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as ”enzymatic-reaction likeness”, i.e., whether or not compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns, and in the large-scale applicability owing to the computational efficiency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in KEGG. Our comprehensively predicted reaction networks of 15,698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics.
Keyword: Metabolic network, de novo metabolic pathway reconstruction, enzymati
TOP
PP36 (PT) - Efficient network-guided multi-locus association mapping with graph cuts
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 4/5
Presenting author: Chloé-Agathe Azencott , Max-Planck-Institutes Tübingen, Germany
Additional authors:Dominik Grimm, Max-Planck-Institutes Tübingen, Germany
Mahito Sugiyama, Max-Planck-Institutes Tübingen, Germany
Yoshinobu Kawahara, Osaka University, Japan
Karsten Borgwardt, Max-Planck-Institutes Tübingen, Germany
Session Chair: Russell Schwartz
As an increasing number of genome-wide association studies reveal the limitations of the attempt to explain phenotypic heritability by single genetic loci, there is a recent focus on associating complex phenotypes with sets of genetic loci. While several methods for multi-locus mapping have been proposed, it is often unclear how to relate the detected loci to the growing knowledge about gene pathways and networks. The few methods that take biological pathways or networks into account are either restricted to investigating a limited number of predetermined sets of loci, or do not scale to genome-wide settings.
We present SConES, a new efficient method to discover sets of genetic loci that are maximally associated with a phenotype, while being connected in an underlying network. Our approach is based on a minimum cut reformulation of the problem of selecting features under sparsity and connectivity constraints, which can be solved exactly and rapidly.
SConES outperforms state-of-the-art competitors in terms of runtime, scales to hundreds of thousands of genetic loci and exhibits higher power in detecting causal SNPs in simulation studies than other methods. On flowering time phenotypes and genotypes from Arabidposis thaliana, SConES detects loci that enable accurate phenotype prediction and that are supported by the literature.
Keyword: Feature selection, statistical genetics, network biology, graph minin
TOP
PP37 (HT) - The role of proteins encoded by chimeric RNAs in eukaryotes
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 7
Presenting author: Milana Frenkel-Morgenstern , Spanish National Cancer Research Centre (CNIO), Spain
Additional authors:Alfonso Valencia, Spanish National Cancer Research Centre (CNIO), Spain
Session Chair: Alex Bateman
Chimeric RNAs of two or more genes are distinct from conventional alternatively spliced isoforms, because they result from the trans-splicing of pre-mRNAs or gene fusion following translocations. Only a limited number of chimeric transcripts and their associated proteins have been characterized, mostly result from chromosomal translocations and are associated with cancers. Therefore, it is important to extend these observations so as to catalog the chimeric transcripts expressed in different types of cancers, and to study the potential functions of their corresponding chimeric proteins, including the alterations they produce in protein-protein interaction networks. Indeed, we found already evidence that chimeric transcripts are translated into functional chimeric proteins and they can change cellular localization of parental proteins and can be identified in cancer patients using the specific and unique peptides. Finally, we collected the chimeric transcripts of human, mouse and fly in the ChiTaRS database to study the evolutionary conservation of chimeras.
Keyword: Evolution & Comparative Genomics, Sequence Analysis
TOP
PP38 (HT) - Navigating chemical and biological space – in the search of novel pharmaceuticals
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 14.2
Presenting author: Paula Petrone , Hoffmann-La Roche, Switzerland
Additional authors:Ben Simms, Novartis NIBR, United States
Anne Mai Wassermann, Novartis NIBR, United States
Eugen Lounkine, Novartis NIBR, United States
Peter Kutchukian, Novartis NIBR, United States
Paul Selzer, Novartis NIBR, United States
Florian Nigsch, Novartis NIBR, United States
Jeremy Jenkins, Novartis NIBR, United States
Allen Cornett, Novartis NIBR, United States
Zhan Deng, Novartis NIBR, United States
John W Davies, Novartis NIBR, United States
Session Chair: Serafim Batzoglou
Typically, virtual screening of compound libraries is based on the assumption that structurally similar compounds are likely to share similar properties and bind to the same group of proteins. This model often fails due to the rugged nature of the activity landscape. Furthermore, similarity in chemical space cannot explain the activity of compounds against a specific pathway or groups of pathways. Compounds that incur similar phenotypes and yet are structurally diverse are therefore often overlooked in automated searches. Our alternative perspective on virtual screening and library design is based solely on the interactions of compounds with the proteome. Ligands may be quantitatively grouped by the biological closeness of their targets by means of their biological fingerprints. We study similarity and diversity in biological space as necessary ingredients for compounds in screening libraries. We demonstrate here how compound-target interaction networks can be steered to find novel and biologically relevant chemical matter.
Keyword: Protein Interactions & Molecular Networks, other
TOP
PP39 (PT) - A framework for scalable parameter estimation of gene circuit models using structural information
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: ICC Lounge 81
Presenting author: Xin Gao , King Abdullah University of Science and Technology, Saudi Arabia
Additional authors:Ming Fan, King Abdullah University of Science and Technology , Saudi Arabia
Suojin Wang, Texas A&M University, United States
Hiroyuki Kuwahara, King Abdullah University of Science and Technology, Saudi Arabia
Session Chair: Hagit Shatkay
Motivation:
Systematic and scalable parameter estimation is a key to construct complex gene regulatory models and to ultimately facilitate an integrative systems biology approach to quantitatively understand the molecular mechanisms underpinning gene regulation.
Results:
Here, we report a novel framework for efficient and scalable parameter estimation that focuses specifically on modeling of gene circuits.
Exploiting the structure commonly found in gene circuit models, this framework decomposes a system of coupled rate equations into individual ones and efficiently integrates them separately to reconstruct the mean time evolution of the gene products. The accuracy of the parameters is refined by iteratively increasing the accuracy of numerical integration using the model structure. As a case study, we applied our framework to four gene circuit models with complex dynamics based on three synthetic data sets and one time-series microarray data set. We compared our framework to three state-of-the-art parameter estimation methods and found that our approach consistently generated higher quality parameter solutions efficiently.
While many general-purpose parameter estimation methods have been applied for modeling of gene circuits, our results suggest that the use of more tailored approaches to employ domain specific information may be a key to reverse-engineering of complex biological systems.
Availability:
Website: http://sfb.kaust.edu.sa/Pages/Software.aspx
Keyword: Parameter estimation, gene circuits, systems biology, synthetic biology
TOP
PP40 (PT) - Identifying proteins controlling key disease signaling pathways
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 4/5
Presenting author: Anthony Gitter , Carnegie Mellon University , United States
Additional authors:Ziv Bar-Joseph, Carnegie Mellon University
Session Chair: Reinhard Schneider
Several types of studies, including genome-wide association studies and RNA interference screens, strive to link genes to diseases. Although these approaches have had some success, genetic variants are often only present in a small subset of the population and screens are noisy with low overlap between experiments in different labs. Neither provides a mechanistic model explaining how identified genes impact the disease of interest or the dynamics of the pathways those genes regulate. Such mechanistic models could be used to accurately predict downstream effects of knocking down pathway members and allow comprehensive exploration of the effects of targeting pairs or higher-order combinations of genes.
We developed methods to model the activation of signaling and dynamic regulatory networks involved in disease progression. Our model, SDREM, integrates static and time series data to link proteins and the pathways they regulate in these networks. SDREM utilizes prior information about proteins' likelihood of involvement in a disease (e.g. from screens) to improve the quality of the predicted signaling pathways. We used our algorithms to study the human immune response to H1N1 influenza infection. The resulting networks correctly identified many of the known pathways and transcriptional regulators of this disease. Furthermore, they accurately predict RNA interference effects and can be used to infer genetic interactions, greatly improving over other methods suggested for this task. Applying our method to the more pathogenic H5N1 influenza allowed us to identify several strain-specific targets of this infection.
Keyword: Pathway inference, viral infection, RNAi screens, genetic interaction
TOP
PP41 (PT) - Automated Cellular Annotation for High Resolution Images of Adult C. elegans
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 7
Presenting author: Sarah Aerni , Stanford University, United States
Additional authors:Xiao Liu, Stanford University, United States
Chuong Do, 23andMe, Inc., United States
Samuel Gross, Stanford University, United States
Andy Nguyen, Stanford University School of Medicine, United States
Stephen Guo, Stanford University, United States
Fuhui Long, Howard Hughes Medical Institute, United States
Hanchuan Peng, Allen Institute for Brain Science, United States
Stuart Kim, Stanford University School of Medicine, United States
Serafim Batzoglou, Stanford University, United States
Session Chair: Stefan Kramer
Motivation:
Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm C. elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C. elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes.
Results:
In this paper, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal, and hypodermal cells) in high-resolution images of adult C. elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum flow and apply a cross-entropy based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system.These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C. elegans.
Keyword: Bioinformatics, Gene expression, Image analysis, Machine learning
TOP
PP42 (PT) - Haplotype assembly in polyploid genomes and identical by descent shared tracts
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 14.2
Presenting author: Derek Aguiar , Brown University, United States
Additional authors:Sorin Istrail, Brown University, United States
Session Chair: Sean O'Donoghue
Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing these high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (1) do not consider individuals sharing haplotypes jointly which reduces the size and accuracy of assembled haplotypes and (2) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Particularly, polyploid organisms are becoming the target of many research groups interested in studying the genomics of disease, phylogenetics, botany, and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction.
Results: In this work, we present a number of results, extensions, and generalizations of Compass graphs and our HapCompass framework (Aguiar et al. 2012). We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. We present graph theory-based algorithms for the problem of haplotype assembly from sequencing data using our previously developed HapCompass framework for (1) novel implementations of haplotype assembly optimizations (minimum error correction), (2) assembly of a pair of individuals sharing a tract identical by descent, and (3) assembly of polyploid genomes. We demonstrate the accuracy of each method on the 1000 Genomes Project, Pacific Biosciences, and simulated sequence data.
HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/
Keyword: Haplotype, haplotype assembly, single individual haplotyping,
TOP
PP43 (HT) - Emerging methods in protein co‐evolution
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: ICC Lounge 81
Presenting author: David Juan , Spanish National Cancer Research Centre, Spain
Additional authors:Florencio Pazos, Spanish National Centre for Biotechnology, Spain
Alfonso Valencia, Spanish National Cancer Research Centre, Spain
Session Chair: Burkhard Rost
Co‐evolution is an essential component of evolution that contributes to maintain the structure of ecological and molecular networks while allowing species, proteins and genes to change and adapt over time. A wide range of co‐evolution‐inspired computational methods has been designed for: protein modeling, detection of binding sites, deciphering protein mechanisms of action, prediction of protein–protein interaction partners and reconstruction of protein complexes and interaction networks. Interestingly, recent important breakthroughs in the field have resulted in a remarkable improved capacity to predict interactions between proteins, and contacts between different protein residues. While co‐evolution‐based approaches have been developed independently over the last several decades, we propose that unification under a common framework would be a major step forward in the understanding of the molecular basis of co‐evolution.
Keyword: Evolution & Comparative Genomics, Sequence Analysis
TOP
PP44 (PT) - Compressive genomics for protein databases
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 4/5
Presenting author: Noah Daniels , Tufts University, United States
Additional authors:Andrew Gallant, Tufts University, United States
Jian Peng, Massachusetts Institute of Technology, United States
Lenore Cowen, Tufts University, United States
Michael Baym, Harvard Medical School
Bonnie Berger, Massachusetts Institute of Technology, United States
Session Chair: Reinhard Schneider
Motivation: The exponential growth of protein sequence databases has increasingly made the fundamental question of searching for homologs a computational bottleneck. The amount of unique data, however, is not growing nearly as fast; we can exploit this fact to greatly accelerate homology search. Acceleration of programs in the popular PSI/DELTA-BLAST family of tools will not only speed up homology search directly, but also the huge collection of other current programs that primarily interact with large protein databases via precisely these tools.
Results: We introduce a suite of homology search tools, powered by compressively-accelerated protein BLAST (CaBLASTP), which are significantly faster than and comparably accurate to all known state- of-the-art tools including HHblits, DELTA-BLAST, and PSI-BLAST. Further, our tools are implemented in a manner that allows direct substitution into existing analysis pipelines. The key idea is that we introduce a local similarity-based compression scheme that allows us to operate directly on the compressed data. Importantly, CaBLASTP’s runtime scales almost linearly in the amount of unique data, as opposed to current BLASTP variants which scale linearly in the size of the full protein database being searched. Our compressive algorithms will speed up many tasks such as protein structure prediction and orthology mapping which rely heavily on homology search. Availability: CaBLASTP is available under the GNU Public License at http://cablastp.csail.mit.edu/
Keyword: Sequence Analysis, protein search,BLAST
TOP
PP45 (PT) - FuncISH: Learning a functional representation of neural ISH images
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 7
Presenting author: Noa Liscovitch , Bar Ilan University, Israel
Additional authors:Uri Shalit, Hebrew University of Jerusalem, Israel
Gal Chechik, Stanford University, United States
Session Chair: Stefan Kramer
High spatial resolution imaging datasets of mammalian brains have recently become available in unprecedented amounts. Images now reveal highly complex patterns of gene expression varying on multiple scales. The challenge in analyzing these images is both in extracting the patterns that are most relevant functionally, and in providing a meaningful representation that allows neuroscientists to interpret the extracted patterns.
Here we present FuncISH – a method to learn functional representations of neural in situ hybridization (ISH) images. We represent images using a histogram of local descriptors (SIFT) in several scales, and use this representation to learn detectors of functional (GO) categories for every image. As a result, each image is represented as a point in a low dimensional space whose axes correspond to meaningful functional annotations. The resulting representations define similarities between ISH images that can be easily explained by functional categories.
We applied our method to the genomic set of mouse neural ISH images available at the Allen Brain Atlas, finding that the majority of GO biological processes can be inferred from spatial expression patterns with high accuracy. Using functional representations, we predict several gene interaction properties such as protein-protein interactions and cell type specificity more accurately than competing methods based on global correlations. We used FuncISH to identify similar expression patterns of GABAergic neuronal markers that were not previously identified, and to infer new gene function based on image-image similarities.
Keyword: PCA, censored data, censoring, single-cell qPCR, Gaussi
TOP
PP46 (PT) - Using State Machines to Model the IonTorrent Sequencing Process and Improve Read Error-Rates
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 14.2
Presenting author: David Golan , Tel Aviv University, Israel
Additional authors:Paul Medvedev, The Pennsylvania State University, United States
Session Chair: Sean O'Donoghue
Motivation:
The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is IonTorrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous semiconductor technology and innovation in chemistry, IonTorrent has been gaining popularity since its debut in 2011. Despite the advantages, however, IonTorrent read accuracy
remains a significant concern.
Results:
We present FlowgramFixer, a new algorithm for converting flowgrams into reads. Our key observation is that the incorporation signals of neighboring flows, even after normalization and phase correction, carry considerable mutual information and are important in making the correct base-call. We therefore propose that base-calling of flowgrams should be done on a read-wide level, rather than one flow at a time. We show that this can be done in linear time by combining a state machine with a Viterbi algorithm to find the nucleotide sequence that maximizes the likelihood of the observed flowgram. FlowgramFixer is applicable to any flowgram based sequencing platform. We demonstrate FlowgramFixer’s superior performance on Ion Torrent E.Coli data, with a 4.8% improvement in the number of high-quality mapped reads and a 7.1% improvement in the number of uniquely mappable reads.
Availability:
Binaries and source code of FlowgramFixer are freely available at:
http://www.cs.tau.ac.il/˜davidgo5/flowgramfixer.html
Keyword: Sequencing, Viterbi, IonTorrent, Flowgram, Base calling
TOP
PP47 (HT) - Short Toxin-like Proteins Abound in Cnidaria Genomes
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: ICC Lounge 81
Presenting author: Michal Linial , The Hebrew University of Jerusalem, Israel
Additional authors:Isaak Tirosh, The Hebrew University of Jerusalem, Israel
Manor Askenazi, The Hebrew University of Jerusalem, Israel
Itai Linial, The Hebrew University of Jerusalem, Israel
Session Chair: Burkhard Rost
The publication of Tirosh et al (2012) deals with a neglected niche in functional genomics. The main finding is the identification of short active sequences that failed detection via classical alignment-based approaches. This research lies in the interface of computational biology and automatic functional annotation scheme.
Cnidaria is a rich phylum that includes thousands of marine species. In this study, we focused on Nematostella vectensis and Hydra magnipapillata genomes. We present a method for ranking toxin-like candidates. Toxin-like functions were revealed using ClanTox. Among 83,000 proteins from Cnidaria, we found 170 candidates that fulfill the properties of toxin-like-proteins. Remarkably, only 11% of the predicted toxin-like proteins were previously classified as toxins. Our prediction methodology inferred functions for protease inhibitors, membrane pore formation, ion channel blockers and metal binding proteins. We conclude that the evolutionary expansion of toxin-like proteins in Cnidaria contributes to their fitness in the complex environment of the aquatic ecosystem.
Keyword: Evolution & Comparative Genomics, Sequence Analysis
TOP
PP48 (HT) - Predicting the molecular complexity of sequencing libraries
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 4/5
Presenting author: Andrew Smith , University of Southern California, United States
Session Chair: Reinhard Schneider
Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of genome sequencing. Available methods to determine either how deeply to sequence, or predict the benefits of additional sequencing, are almost completely lacking. We introduce an empirical Bayesian method to implicitly model any source of bias and accurately characterize the molecular complexity of a DNA sample or library in almost any sequencing application.
Keyword: Applied Bioinformatics, other
TOP
PP49 (PT) - Automated annotation of gene expression image sequences via nonparametric factor analysis and conditional random fields
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 7
Presenting author: Iulian Pruteanu-Malinici , Duke University, United States
Additional authors:William Majoros, Duke University, United States
Uwe Ohler, Duke University, United States
Session Chair: Stefan Kramer
Motivation: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously.
Methods: We describe a discriminative, undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a nonparametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy, incomplete samples, i.e. it can tolerate data missing from individual time points.
Results: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared to previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages.
Keyword: Nonparametric sparse Bayesian factor analysis, undirected graphical models, gene annotation, miss
TOP
PP50 (HT) - A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 14.2
Presenting author: Mathieu Clément-Ziza , Biotec, Technische Universitaet Dresden, Germany
Additional authors:Paola Picotti, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Henry Lam, The Hong Kong University of Science and Technology, Hong Kong
David Campbell, Institute for Systems Biology, United States
Alexander Schmidt, University of Basel, Switzerland
Eric Deutsch, Institute for Systems Biology, United States
Hannes Röst, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Zhi Sun, Institute for Systems Biology, Seattle, United States
Olivier Rinner, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Lukas Reiter, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Qin Shen, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Jacob Michaelson, Technische Universitaet Dresden, Germany
Andreas Frei, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Simon Alberti, Max Planck Institute of Molecular Cell Biology and Genetics, Germany
Ulrike Kusebauch, Institute for Systems Biology, Seattle, United States
Bernd Wollscheid, nstitute of Molecular Systems Biology, ETH Zurich, Switzerland
Robert Moritz, Institute for Systems Biology, Seattle, United States
Andreas Beyer, BIOTEC, Technische Universitaet Dresden, Germany
Ruedi Aebersold, Institute of Molecular Systems Biology, ETH Zurich, Switzerland
Session Chair: Sean O'Donoghue
sing a combination of new proteomics methods and novel computational algorithms we investigated the impact of natural genetic variation on protein concentrations. To accomplish this task we generated an almost complete reference map of the yeast proteome for shotgun and targeted proteomics. We used this map in a series of shotgun- and targeted proteomics experiments in a panel of 78 budding yeast strains in order to identify protein-QTL, i.e. genomic regions associated with protein abundance changes. These experiments were informed by computational network analysis. Using a powerful new machine-learning approach we could identify a surprisingly large fraction of protein-QTL being in epistasis with each other.
The network-based analysis facilitated the identification of protein modules, whose members are affected by several independent genetic variants in a coordinated way. This suggests that selective pressure favors the acquisition of sets of polymorphisms that adapt protein abundances at the pathway level.
Keyword: Mass Spectrometry & Proteomics, Gene Regulation & Transcriptomics
TOP
PP51 (PT) - Predicting protein interactions via parsimonious network history inference
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: ICC Lounge 81
Presenting author: Robert Patro , Carnegie Mellon University, United States
Additional authors:Carl Kingsford, Carnegie Mellon University, United States
Session Chair: Burkhard Rost
Motivation: Reconstruction of the network-level evolutionary history of
protein-protein interactions provides a principled way to relate interactions
in several present-day networks. Here, we present a general framework for
inferring such histories and demonstrate how it can be used to determine what
interactions existed in the ancestral networks, which present-day interactions
should we expect to exist based on evolutionary evidence, and what information
extant networks contain about the order of ancestral protein duplications.
Results: Our framework characterizes the space of likely parismonious network
histories. It results in a structure that can be used to find probabilities for
a number of events associated with the histories. The framework is based on a
directed hypergraph formulation of dynamic programming that we extend to
enumerate many optimal and near-optimal solutions. The algorithm is applied to
reconstructing ancestral interactions among bZIP transcription factors,
imputing missing present-day interactions among the bZIPs and among proteins
from 5 herpes viruses, and determining relative protein duplication order in
the bZIP family. Our approach more accurately reconstructs ancestral
interactions compared with existing approaches. In cross-validation tests, we find
that our approach ranks the majority of the left-out present-day interactions
among the top 2% and 17% of possible edges for the bZIP and herpes networks,
respectively, making it a competitive approach for edge imputation. It also
estimates, from interaction data alone, relative bZIP protein duplication
orders that are significantly correlated with sequence-based estimates.
Availability: The algorithm is implemented in C++, is open source,
and available at http://www.cs.cmu.edu/~ckingsf/software/parana2.
Contact: robp@cs.cmu.edu and carlk@cs.cmu.edu
Keyword: Protein interaction evolution, ancestral network reconstruction, interaction pred
TOP
PP52 (HT) - Interpreting genomic data via entropic dissection
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 4/5
Presenting author: Rajeev Azad , University of North Texas, United States
Session Chair: Reinhard Schneider
Keyword: Sequence Analysis, Applied Bioinformatics
TOP
PP53 (PT) - A High-Throughput Framework to Detect Synapses in Electron Microscopy Images
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 7
Presenting author: Saket Navlakha , Carnegie Mellon University , United States
Additional authors:Joseph Suhan, Carnegie Mellon University
Alison Barth, Carnegie Mellon University
Ziv Bar-Joseph, Carnegie Mellon University
Session Chair: Stefan Kramer
Motivation: Synaptic connections underlie learning and memory in the brain and are dynamically formed and eliminated during development and
in response to stimuli. Quantifying changes in overall density and strength of synapses is an important pre-requisite for studying
connectivity and plasticity in these cases or in diseased conditions. Unfortunately, most techniques to detect such changes are either
low-throughput (e.g. electrophysiology), prone to error and difficult to automate (e.g. standard electron microscopy), or too coarse (e.g.
MRI) to provide accurate and large-scale measurements. Results: To facilitate high-throughput analyses, we used a 50-year-old
experimental technique to selectively stain for synapses in electron microscopy (EM) images, and we developed a machine learning framework
to automatically detect synapses in these images. To validate our method we experimentally imaged brain tissue of the somatosensory
cortex in six mice. We detected thousands of synapses in these images and demonstrate the accuracy of our approach using cross-validation
with manually labeled data and by comparing against existing algorithms and against tools that process standard EM images. We also
used a semi-supervised algorithm that leverages unlabeled data to overcome sample heterogeneity and improve performance. Our algorithms
are highly efficient and scalable and are freely available for others to use.
Keyword: Image processing, Machine learning, Semi-supervised, Synapses, Elect
TOP
PP54 (HT) - A probabilistic histone modification map of the human genome and its implications for gene regulation
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 14.2
Presenting author: Misook Ha , Samsung Advanced Institute of Technology, Korea, Rep
Additional authors:Soondo Hong, Samsung Display Corporation, Korea, Rep
Wen-Hsing Li, University of Chicago, United States
Session Chair: Sean O'Donoghue
Histone modifications play an important role in chromatin structure and gene regulation. To understand the relationship between genome sequence and chromatin structure we studied DNA sequences at histone modification sites in various human cell types. We found sequence specificity for histone modifications. Using the sequence specificities of H3 and H3K4me3 nucleosomes, we developed a model that computes the probability of H3K4me3 occupation at each base-pair from the genome sequence context. A comparison of our predictions with in vivo data suggests a high performance of our method. The predicted H3K4me3 sequence signature preferentially occurs at binding sites of transcription regulators involved in chromatin modification activities, including histone acetylases and enhancer- and insulator-associated factors. Clearly, the human genome sequence contains signatures for chromatin modifications essential for gene regulation and development. Our method may be applied to find new regulatory elements functioning by chromatin modifications and disease-causing impaired chromatin structures.
Keyword: Sequence Analysis, Gene Regulation & Transcriptomics
TOP
PP55 (HT) - Identification of small RNA pathway genes using patterns of phylogenetic conservation and divergence.
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: ICC Lounge 81
Presenting author: Yuval Tabach , Massachusetts General Hospital/ Harvard Medical School, United States
Session Chair: Burkhard Rost
Small RNAs such as microRNAs and small interfering RNAs (siRNAs) require protein cofactors to promote their biogenesis and mediate their silencing functions. Even though small RNA pathways are widely distributed among animal, plant, fungal, and protist phyla, these pathways diverge or are lost in particular taxonomic clades. We used phylogenetic conservation patterns to identify new small RNA cofactor genes. We compared 86 divergent eukaryotic genome sequences to discern the sets of genes that show similar phylogenetic profiles with known small RNA cofactor genes. The top predictions from this phylogenetic screen were tested for defects in RNA interference and a large fraction of the candidate genes showed defects as strong as validated small RNA cofactor genes, revealing new components in the pathway. RNA splicing components were the most enriched class of new small RNA cofactors identified, suggesting a deep connection between the mechanism of RNA splicing and small RNA-mediated gene silencing.
Keyword: Evolution & Comparative Genomics, other
TOP
PP56 (PT) - IDBA-Tran: A More Robust de novo de Bruijn Graph Assembler for Transcriptomes with Uneven Expression Levels
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 4/5
Presenting author: Henry C.M. Leung, The University of Hong Kong
Additional authors:S.M. Yiu, The University of Hong Kong
Xin-Guang Zhu, Shanghai Institutes for Biological Sciences, China
Ming-Zhu Lv, Shanghai Institutes for Biological Sciences, China
Francis Chin, The University of Hong Kong
Yu Peng, The University of Hong Kong, Hong Kong
Session Chair: Debra Goldberg
Motivation: RNA sequencing based on next-generation sequencing technology is an effective approach for analyzing transcriptomes. Similar to de novo genome assembly, de novo transcriptome assembly does not rely on a reference genome or additional annotated information. It is well-known that the transcriptome assembly problem is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100) which make it very difficult to identify low-expressed isoforms. Technically, a core issue is to remove erroneous vertices/edges with high multiplicity (produced by high-expressed isoforms) in the de Bruijn graph without removing those correct ones with not so high multiplicity corresponding to low-expressed isoforms. Failing to do so will result in the loss of low-expressed isoforms or having complicated subgraphs with transcripts of different genes mixed together due to the erroneous vertices and edges.
Contributions: Unlike existing tools which usually remove erroneous vertices/edges if their multiplicities are lower than a global threshold, we developed a probabilistic progressive approach with local thresholds to iteratively remove those erroneous vertices/edges. This enables us to decompose the graph into disconnected components, each of which contains a few, if not single, genes, while keeping a lot of correct vertices/edges of low-expressed isoforms. Combined with existing techniques, IDBA-Tran is able to assemble both high-expressed and low-expressed transcripts and outperforms existing assemblers in terms of sensitivity and specificity for both simulated and real data.
Availability: http://www.cs.hku.hk/~alse/idba_tran
Keyword: Transcriptome assembling, paired-end reads, isoforms, RNA-Seq, de Bruijn gra
TOP
PP57 (HT) - Visual Exploration for Cancer Subtype Analysis
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 7
Presenting author: Nils Gehlenborg , Harvard Medical School, United States
Additional authors:Alexander Lex, Harvard University, United States
Marc Streit, Johannes Kepler University Linz, Austria
Hans-Joerg Schulz, University of Rostock, Germany
Christian Partl, Graz University of Technology, Austria
Dieter Schmalstieg, Graz University of Technology, Austria
Peter Park, Harvard Medical School, United States
Session Chair: Thomas Lengauer
This talk will introduce the promises and challenges of identifying and characterizing tumor subtypes in cancer genomics data sets from patient cohorts with hundreds of patients and how our visual exploration system Caleydo StratomeX (http://stratomex.caleydo.org) supports these processes. Heterogeneous data sets including multiple genomic (mRNA, miRNA, RPPA, copy number, gene mutations) and clinical data types can be loaded into the software to efficiently generate and confirm hypotheses about tumor subtypes and their functional and clinical effects.
In order to help analysts to identify promising candidate subtypes, StratomeX has been extended with computational methods to rank stratifications and identify stratifications that provide corroborating evidence for candidate subtypes. This previously unpublished feature as well as a new interactive website with large heterogeneous data sets from The Cancer Genome Atlas (TCGA) will be presented, too.
The talk will demonstrate the utility of StratomeX through a comprehensive case study from TCGA.
Keyword: Bioimaging & Data Visualization, Applied Bioinformatics
TOP
PP58 (HT) - Simulating Delta/Notch Signaling in Somitogenesis and Pancreas Development
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 14.2
Presenting author: Hendrik Tiedemann , Helmholtz Center Munich, Germany
Additional authors:Elida Schneltzer, Helmholtz Center Munich, Germany
Gerhard Przemeck, Helmholtz Center Munich, Germany
Martin Hrabě De Angelis, Helmholtz Center Munich, Germany
Session Chair: Lonnie Welch
The Delta-Notch signal transduction pathway is involved in numerous processes in embryogenesis and adult organisms.
After binding of the Delta or Jagged ligand to the Notch receptors on the membrane of neighboring cells the cleaved-off
intracellular domain of Notch activates genes of the Hey/Hes trancription factor family, which show ultradian expression
in somitogenesis and some neural progenitor cells. While in somitogenesis D/N-signaling enforces the synchronization of
ultradian oscillators and is important for boundary formation, in neurogenesis it acts by lateral inhibition to give some
cells a different developmental fate than their neighbors. Similar processes destine some cells in intestinal crypts,
the developing airways of the lung, and the epithelial ducts of the developing pancreas to different fates.
With our gene- and cell-based computer model we simulated boundary formation in somitogenesis and islet progenitor
cell formation in pancreas and examined which parameters steer the systems toward lateral inhibition or synchronization,
respectively.
Keyword: Gene Regulation & Transcriptomics, Protein Interactions & Molecular Networks
TOP
PP59 (HT) - From sequence co-evolution to protein (complex) structure prediction
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: ICC Lounge 81
Presenting author: Martin Weigt , Universite Pierre and Marie Curie, France
Session Chair: Janet Kelso
Biological research has been revolutionized by high-throughput experiments. Unprecedented amounts of large-scale data have to be complemented by computational methods unveiling the information hidden in raw data, to increase our understanding of complex biological processes.
As an example, proteins show a remarkable degree of structural and
functional conservation in the course of evolution, despite large sequence divergence. We have developed a
statistical-inference approach, Direct Coupling Analysis, to link sequence variability to protein structure. Using sequence alone, we infer directly co-evolving residue pairs, to detect native residue-residue contacts. This information is used to guide tertiary and quaternary structure prediction. As a specific case study, I will discuss the auto-phosphorylation complex of histidine kinases, which
are involved in the majority of signal transduction systems in the bacteria. Only a multidisciplinary approach integrating statistical genomics, biophysical protein simulation, and mutagenesis experiments, allows us to predict and verify the, previously unknown, active kinase structure.
Keyword: Sequence Analysis, Protein Structure & Function
TOP
PP60 (PT) - Short Read Alignment with Populations of Genomes
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 4/5
Presenting author: Victoria Popic , Stanford University, United States
Additional authors:Lin Huang, Stanford University, United States
Serafim Batzoglou, Stanford University, United States
Session Chair: Debra Goldberg
The increasing availability of high throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to-date there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this paper. We (1) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (2) design a new alignment algorithm based on the Burrows-Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of 2 or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome.
Keyword: Short read alignment, genome collection, burrows-wheeler transform
TOP
PP61 (HT) - The cBio Portal for Cancer Genomics
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 7
Presenting author: Nikolaus Schultz , Memorial Sloan-Kettering Cancer Center, United States
Additional authors:Jianjiong Gao, Memorial Sloan-Kettering Cancer Center, United States
B. Arman Aksoy, Memorial Sloan-Kettering Cancer Center, United States
Benjamin Gross, Memorial Sloan-Kettering Cancer Center, United States
Gideon Dresdner, Memorial Sloan-Kettering Cancer Center, United States
S. Onur Sumer, Memorial Sloan-Kettering Cancer Center, United States
Ethan Cerami, Memorial Sloan-Kettering Cancer Center, United States
Anders Jacobsen, Memorial Sloan-Kettering Cancer Center, United States
Ugur Dogrusoz, Bilkent University, Turkey
Erik Larsson, University of Gothenburg, Sweden
Chris Sander, Memorial Sloan-Kettering Cancer Center, United States
Session Chair: Thomas Lengauer
The cBio Portal for Cancer Genomics (cbioportal.org) provides an integrated and easy to use web resource for exploring, visualizing and analyzing multidimensional cancer genomics data. The portal reduces massive molecular profiling data from cancer tissues and cell lines to a readily understandable form as genetic, epigenetic, gene expression and proteomic events. The combination of a convenient query interface and customized data storage enables researchers to interactively explore genetic alterations across samples, genes and pathways and to link these to clinical outcomes, when available. The portal provides graphical summaries of gene-level data from multiple platforms, network visualization and analysis, survival analysis, and patient-centric queries. With its simple, yet powerful and flexible, interface and software programmatic access, the portal makes complex cancer genomics profiles accessible to researchers and clinicians without requiring bioinformatics expertise, thus facilitating biological discoveries.
Keyword: Bioimaging & Data Visualization, Databases & Ontologies
TOP
PP62 (HT) - ATARiS: Computational quantification of gene suppression phenotypes from multisample RNAi screens
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 14.2
Presenting author: Aviad Tsherniak , Broad Institute of MIT and Harvard, United States
Additional authors:Diane Shao, Broad Institute, United States
William Hahn, Broad Institute, United States
Jill Mesirov, Broad Institute, United States
Session Chair: Lonnie Welch
Genome-scale RNAi libraries enable the systematic interrogation of gene function. However, the interpretation of RNAi screens is complicated by the observation that RNAi reagents designed to suppress the mRNA transcripts of the same gene often produce a spectrum of phenotypic outcomes due to differential on-target gene suppression or perturbation of off-target transcripts. Here we present ATARiS, a computational method that takes advantage of patterns in RNAi data across multiple samples in order to enrich for RNAi reagents whose phenotypic effects relate to suppression of their intended targets. By summarizing only such reagent effects for each gene, ATARiS produces quantitative, gene-level phenotype values, which provide an intuitive measure of the effect of gene suppression in each sample. This method is robust for datasets that contain as few as ten samples and can be used to analyze screens of any number of targeted genes. ATARiS is available at http://broadinstitute.org/ataris
Keyword: Gene Regulation & Transcriptomics, Gene Regulation & Transcriptomics
TOP
PP63 (HT) - Accurate prediction of peptide-induced dynamical changes within the second PDZ domain of PTP1e
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: ICC Lounge 81
Presenting author: Elisa Cilia , Université Libre de Bruxelles, Belgium
Additional authors:Tom Lenaerts, Université Libre de Bruxelles, Belgium
Geerten Vuister, University Of Leicester, United Kingdom
Session Chair: Janet Kelso
Experimental NMR relaxation studies have shown that peptide binding induces dynamical changes at the side-chain level throughout the second PDZ domain of PTP1e, identifying as such the residues involved in long-range communication. Even though different computational approaches have identified qualitatively similar subsets of these residues, no quantitative analysis of the accuracy of these predictions was thus far determined.
We show that our own approach based on Monte-Carlo sampling and information theoretical analysis gives significantly more accurate results than the methods that aimed to tackle the same question earlier. Moreover, a network is inferred that captures clearly the residues involved in the process. We show furthermore that these predictions are consistent within both the human and mouse variants of this domain.
Together, these results improve the understanding of intra-protein communication and allostery in PDZ domains, underlining at the same time the necessity of producing similar data sets for further validation purposes.
Keyword: Protein Structure & Function
TOP
PP64 (PT) - Design of Shortest Double-Stranded DNA Sequences Covering All K-mers with Applications to Protein Binding Microarrays and Synthetic Enhancers
Date: Tuesday, July 23, 11:30 a.m. - 11:55 a.m.Room: Hall 4/5
Presenting author: Yaron Orenstein , Tel-Aviv University, Israel
Additional authors:Ron Shamir, Tel-Aviv University, Israel
Session Chair: Debra Goldberg
Novel technologies can generate large sets of short double-stranded DNA sequences that can be used to measure their regulatory effects. Microarrays can measure in vitro the binding intensity of a protein to thousands of probes. Synthetic enhancer sequences inserted into an organism's genome allow us to measure in vivo the effect of such sequences on the phenotype. In both applications, by using sequence probes that cover all k-mers, a comprehensive picture of the effect of all possible short sequences on gene regulation is obtained. The value of k that can be used in practice is, however, severely limited by cost and space considerations. A key challenge is therefore to cover all k-mers with a minimal number of probes.The standard way to do this uses the de Bruijn sequence of length 4^k. However, since probes are double stranded, when a k-mer is included in a probe, its reverse complement k-mer is accounted for as well. Here we show how to efficiently create a
shortest possible sequence with the property that it contains each k-mer or its reverse complement, but not necessarily both. The length of the resulting sequence approaches half that of the de Bruijn sequence as k increases. By reducing the total sequence length, experimental limitations can be overcome; alternatively, additional sequences with redundant k-mers of interest can be added.
Keyword: de Bruijn sequence, de Bruijn graph, protein binding microarray, oligo
TOP
PP65 (HT) - Visualizing and Mining Chemical-Biological Space
Date: Tuesday, July 23, 11:30 a.m. - 11:55 p.m.Room: Hall 7
Presenting author: Stefan Kramer , Johannes Gutenberg University Mainz, Germany
Additional authors:Andreas Karwath, Johannes Gutenberg University Mainz, Germany
Madeleine Seeland, TU München, Germany
Martin Gütlein, University of Freiburg, Germany
Session Chair: Thomas Lengauer
It is generally agreed that a better understanding of chemical space and its bioactive compounds requires a better set of tools for the visualization and the mining of structures and associated activities. In the talk, I will present some progress towards this goal. In the first part, I will present the visualization tool CheS-Mapper (Chemical Space Mapping and Visualization in 3D), which arranges sets of chemical structures in 3D space, such that spatially close structures share more common properties than remote ones. In the second part of the talk, I will present new methods for predicting the bioactivities of compounds. These methods build upon a recently developed clustering scheme that clusters chemical structures by common "scaffolds", i.e., the existence of one large substructure shared by all cluster elements. With the help of such a structural clustering, prediction performance can be improved substantially, in particular on heterogeneous sets of structures.
Keyword: Applied Bioinformatics, Protein Structure & Function
TOP
PP66 (PT) - Learning Subgroup-Specific Regulatory Interactions and Regulator Independence with PARADIGM
Date: Tuesday, July 23, 11:30 a.m. - 11:55 a.m.Room: Hall 14.2
Presenting author: Andrew J. Sedgewick , University of Pittsburgh, United States
Additional authors:Stephen Benz, Five3 Genomics, LLC
Shahrooz Rabizadeh, Chan Soon-Shiong Institute for Advanced Health
Patrick Soon-Shiong, Chan Soon-Shiong Institute for Advanced Health
Charles Vaske, Five3 Genomics, LLC
Session Chair: Lonnie Welch
High-dimensional “-omics” profiling provides a detailed molecular view of individual cancers, however understanding the mechanisms by which tumors evade cellular defenses requires deep knowledge of the underlying cellular pathways within each cancer sample. We extended the PARADIGM algorithm (Vaske et al., 2010), a pathway analysis method for combining multiple “-omics” data types, to learn the strength and direction of 9139 gene and protein interactions curated from the literature. Using genomic and mRNA expression data from 1936 samples in The Cancer Genome Atlas (TCGA) cohort, we learned interactions that provided support for and relative strength of 7138 (78%) of the curated links. Gene set enrichment found that genes involved in the strongest interactions were significantly enriched for transcriptional regulation, apoptosis, cell cycle regulation, and response to tumor cells. Within the TCGA breast cancer cohort we assessed different interaction strengths between breast cancer subtypes, and found interactions associated with the MYC pathway and the ER alpha network to be among the most differential between basal and luminal A subtypes. PARADIGM with the Naive Bayesian assumption produced gene activity predictions that, when clustered, found groups of patients with better separation in survival than both the original version of PARADIGM and a version without the assumption. We found that this Naive Bayes assumption was valid for the vast majority of co-regulators, indicating that most co-regulators act independently on their shared target. Availability: http://paradigm.five3genomics.com
Keyword: Cancer, pathway, gene expression, copy number, probabilist
TOP
PP67 (HT) - A large‐scale evaluation of computational protein function prediction
Date: Tuesday, July 23, 11:30 a.m. - 11:55 p.m.Room: ICC Lounge 81
Presenting author: Predrag Radivojac , Indiana University, United States
Session Chair: Janet Kelso
The presentation will first provide motivation for and challenges of predicting protein function. This will include both biological significance and also precise computational problem formulation. We will then present details (at an appropriate level for a highlight presentation) of the CAFA experiment as described in the paper, discuss current state-of-the art in protein function prediction, and lay out possible avenues for improvements and accuracy assessment of computational function prediction. Finally, we intend to briefly discuss the next CAFA challenge whose start will coincide with the ISMB 2013 conference.
Keyword: Protein Structure & Function
TOP
PP68 (HT) - Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: Hall 4/5
Presenting author: Denisa Duma , University of California Riverside, United States
Additional authors:Stefano Lonardi, University of California Riverside, United States
Matthew Alpert, University of California Riverside, United States
Gianfranco Ciardo, University of California Riverside, United States
Timothy J. Close, University of California Riverside, United States
Steve Wanamaker, University of California Riverside, United States
Yaqin Ma, University of California Riverside, United States
Ming-Cheng Luo, University of California Davis, United States
Yonghui Wu, University of California Riverside, United States
Francesca Cordero, University of Torino, Italy
Marco Beccuti, University of Torino, Italy
Serdar Bozdag, Marquette University, United States
Prasanna R. Bhat, University of California Riverside, United States
Burair Alsaihati, University of California Riverside, United States
Josh Resnik, University of California Riverside, United States
Session Chair: Debra Goldberg
The problem of obtaining the full genomic sequence of an organism has been solved either via a global brute-force approach (WGS) or by a divide-and-conquer strategy (clone-by-clone). While the advent of NGS instruments, made the WGS approach the preferred choice, the clone-by-clone strategy is still relevant especially for large complex genomes for which clone libraries and physical maps are available. In this paper, we demonstrate the feasibility of the clone-by-clone approach on the gene-space of a large, very repetitive plant genome. The novelty of our approach consists in exploiting the the high throughput of NGS instruments by pooling together hundreds of clones using a special type of combinatorial pooling design and a companion decoding algorithm.Our method allows accurate determination of the source clone(s) of each sequenced read. I will present extensive simulations and experimental results on the genomes of rice and barley, as well as new developments on decoding algorithms using Compressive Sensing ideas.
Keyword: Sequence Analysis, Applied Bioinformatics
TOP
PP69 (HT) - Designing with the user in mind: how UCD can work for bioinformatics
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: Hall 7
Presenting author: Jennifer Cham , European Bioinformatics Institute, uk
Additional authors:Katrina Pavelin, European Bioinformatics Institute, United Kingdom
Paula de Matos, European Bioinformatics Institute, United Kingdom
Cath Brooksbank, European Bioinformatics Institute, United Kingdom
Graham Cameron, European Bioinformatics Institute, United Kingdom
Hong Cao, European Bioinformatics Institute, United Kingdom
Rafael Alcantara, European Bioinformatics Institute, United Kingdom
Francis Rowland, European Bioinformatics Institute, United Kingdom
Brendan Vaughan, European Bioinformatics Institute, United Kingdom
Silvano Squizzato , European Bioinformatics Institute, United Kingdom
Youngmi Park, European Bioinformatics Institute, United Kingdom
Rodrigo Lopez, European Bioinformatics Institute, United Kingdom
Christoph Steinbeck, European Bioinformatics Institute, United Kingdom
Session Chair: Thomas Lengauer
It is recognised that bioinformatics resources often suffer from usability problems: for example, they can be too complex for the infrequent user to navigate, and they can “lack sophistication” compared to other websites that people use in their daily lives. In this presentation, Dr. Jenny Cham, User-Experience Analyst at the European Bioinformatics Institute, UK, will describe specific case studies to show how user-centred design (UCD) principles can be applied to bioinformatics services.
As well as improved usability, the benefits of UCD can include more effective decision-making for design ideas and technologies during development; enhanced team-working and communication; cost effectiveness; and ultimately a bioinformatics service that more closely meets the needs of its target research community.
Keyword: other, other
TOP
PP70 (PT) - Hard-wired heterogeneity in blood stem cells revealed using a dynamic regulatory network model
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: Hall 14.2
Presenting author: Nicola Bonzanni , VU University Amsterdam, Netherlands
Additional authors:Abhishek Garg, Swiss Institute of Bioinformatics, Switzerland
K. Anton Feenstra, VU University Amsterdam, Netherlands
Judith Schütte, University of Cambridge, United Kingdom
Sarah Kinston, University of Cambridge
Diego Miranda-Saavedra, University of Cambridge
Jaap Heringa, VU University Amsterdam / Netherlands Bioinformatics Centre
Ioannis Xenarios, Swiss Institute of Bioinformatics
Berthold Göttgens, University of Cambridge, United Kingdom
Session Chair: Lonnie Welch
Motivation:
Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements therefore has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes.
Results:
Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as ‘stepping stones’ for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or ‘trigger’ is required to exit the stem cell state, with distinct triggers characterising maturation into the various different lineages. By focussing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1 which we confirmed experimentally thus validating our model.
In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells.
Keyword: Haematopoietic stem cell, cis-regulatory elements, transcription factor netw
TOP
PP71 (HT) - Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and functioncc
Date: Tuesday, July 23, 12:00 p.m. - 12:25 p.m.Room: ICC Lounge 81
Presenting author: Michael Liam Tress , Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Additional authors:Iakes Ezkurdia, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Angela del Pozo, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Jose Manuel Rodriguez, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Alfonso Valencia, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Jennifer Harrow, Wellcome Trust Sanger Centre, United Kingdom
Adam Frankish, Wellcome Trust Sanger Centre, United Kingdom
Keith Ashman, Centro Nacional de Investigaciones Oncologicas (CNIO), Spain
Session Chair: Janet Kelso
As part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases we provide a detailed overview of the population of alternatively spliced protein isoforms detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered.
Alternative isoforms generated from interchangeable homologous exons and from short indels were significantly enriched, both in human experiments and parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts.
The evidence of a strong bias towards subtle differences in coding sequence and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints.
Keyword: Mass Spectrometry & Proteomics, Evolution & Comparative Genomics
TOP
PP72 (HT) - Genetic variants in the next generation: detection, reprioritizing and function annotation
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: Hall 4/5
Presenting author: Junwen Wang , The University of Hong Kong, China
Additional authors:Feng Xu, The University of Hong Kong, China
Mulin Li, The University of Hong Kong, China
Weixin Wang, The University of Hong Kong, China
Pak Sham, The University of Hong Kong, China
Panwen Wang, The University of Hong Kong, China
Session Chair: Reinhard Schneider
In this talk, I will first introduce a fast and accurate genetic variants detection (FaSD) program we recently developed for NGS data [1]. We assessed this program and compared its performance with several state-of-the-art programs on normal and cancer NGS data. We found that FaSD is a fast and highly accurate SNP detection method, particularly when the sequence depth is low.
Next, I will also introduce a GWASdb database we manually curated to catalog the GVs discovered by GWAS and WGS [2]. In addition, we developed a GWASrap tool that can re-prioritize genetic variants by combining the GWAS statistical value and variant prioritization score based on the additive effect principle [3]. Our evaluations demonstrated that this prioritization method is very effective in selecting disease susceptibility regions.
In summary, our algorithm, database and tools will greatly facilitate NGS studies and benefit scientific community in general.
Keyword: Databases & Ontologies, Sequence Analysis
TOP
PP73 (HT) - Metagenomic inference and biomarker discovery for the gut microbiome in inflammatory bowel disease
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: Hall 7
Presenting author: Timothy Tickle , Harvard School of Public Health, United States
Additional authors:Xochitl Morgan, Harvard School of Public Health, United States
Harry Sokol, University of Paris, France
Dirk Gevers, Broad Institute, United States
Kathryn Devaney, Massachusetts General Hospital, United States
Doyle Ward, Broad Institute, United States
Joshua Reyes, Harvard School of Public Health, United States
Samir Shah, Brown University, United States
Neal LeLeiko, Brown University, United States
Scott Snapper, Children's Hospital and Brigham and Women's Hospital, United States
Athos Bousvaros, Children's Hospital and Brigham and Women's Hospital, United States
Joshua Korzenik, Children's Hospital and Brigham and Women's Hospital, United States
Bruce Sands, Mount Sinai School of Medicine, United States
Ramnik Xavier, Massachusetts General Hospital, United States
Curtis Huttenhower, Harvard School of Public Health, United States
Session Chair: Alfonso Valencia
The inflammatory bowel diseases have been consistently linked to dysbiosis in the gut microbiota. This microbial dysfunction has not been fully characterized, however, due to the lack of methods assessing community functional activity and statistically associating it with disease. In this study, "virtual" metagenomes were inferred using 16S rRNA gene sequencing of 231 biopsies and stool samples. This incorporated analysis of 1,119 microbial genomes and was validated by shotgun metagenomics . A multivariate approach linking microbiome shifts to disease, treatment, or environment recovered dysbioses in ~2% of microbial clades, including depletion of Clades IV and XIVa Clostridia and enrichment of Enterobacteriaceae. However, microbial functional activity was more consistently disrupted in disease, with 12% of pathways associated with IBD. These included decreases in short-chain fatty acid production, oxidative stress, and shifts from amino acid biosynthesis towards transport. These results provide initial methods for assessing biomolecular functions corresponding to changes in microbial community ecology.
Keyword: Disease Models & Epidemiology, Applied Bioinformatics
TOP
PP74 (HT) - Interplay of microRNAs, transcription factors and target genes: linking dynamic expression changes to function
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: Hall 14.2
Presenting author: Petr Nazarov , Centre de Recherche Public de la Sante, Luxembourg
Additional authors:Susanne Reinsbach, University of Luxembourg, Luxembourg
Arnaud Muller, Centre de Recherche Public de la Sante, Luxembourg
Nathalie Nicot, Centre de Recherche Public de la Sante, Luxembourg
Demetra Philippidou, University of Luxembourg, Luxembourg
Laurent Vallar, Centre de Recherche Public de la Sante, Luxembourg
Stephanie Kreis, University of Luxembourg, Luxembourg
Session Chair: Ralf Zimmer
MicroRNAs (miRNAs), small non-coding RNAs that negatively regulate gene expression at the post-transcriptional level, are involved in fine-tuning fundamental cellular processes and are believed to confer robustness to biological responses. Using microarray data we investigated simultaneously the transcriptional changes of miRNA and mRNA expression levels over time after activation of the Jak/STAT pathway by IFN-γ stimulation of melanoma cells. We observed delayed responses of miRNAs (after 24-48 h) with respect to mRNAs (12-24 h) and identified biological functions involved at each step of the cellular response. Inference of the upstream regulators allowed for identification of transcriptional regulators involved in cellular reactions to IFN-γ stimulation. Linking expression profiles of transcriptional regulators and miRNAs with their annotated functions, we demonstrate the dynamic interplay of miRNAs and upstream regulators with biological functions. Finally, our data revealed network motifs in the form of feed-forward loops involving transcriptional regulators, mRNAs and miRNAs.
Keyword: Gene Regulation & Transcriptomics
TOP
PP75 (HT) - Systematic Computational Drug Repositioning
Date: Tuesday, July 23, 2:10 p.m. - 2:35 p.m.Room: ICC Lounge 81
Presenting author: Philippe Sanseau , GlaxoSmithKline, uk
Additional authors:Mark Hurle, GlaxoSmithKline, United States
Brent Richards, McGill University, Canada
Lon Cardon, GlaxoSmithKline, United States
Pankaj Agarwal, GlaxoSmithKline, United States
Session Chair: Donna Slonim
Systematic drug repositioning is perhaps one the best ways for computational biology to show clear translational value in the pharmaceutical and biotech industry. Bionformatics methods that use genome-wide association studies (GWAS), side effects and connectivity map data are proving to have value. We built a computational pipeline to examine the relationship between the drug disease indications of drugs and genetics findings such as GWAS traits. When the drug indication was different from the GWAS disease trait we hypothesized that the drug could potentially be repositioned. We identified almost 100 GWAS genes with at least one associated drug that suggest potential drug repositioning opportunities. Further investigations provided additional evidence for some of these opportunities. We will also show some recent developments in connectivity map and side effect methods to reposition rapidly drugs and ultimately benefit the patients.
Keyword: Applied Bioinformatics, Disease Models & Epidemiology
TOP
PP76 (PT) - Information-theoretic evaluation of predicted ontological annotations
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: Hall 4/5
Presenting author: Wyatt Clark , Indiana University, United States
Additional authors:Predrag Radivojac, Indiana University, United States
Session Chair: Reinhard Schneider
The development of effective methods for the prediction of ontological annotations is an important goal in computational biology, with protein function prediction and disease gene prioritization gaining wide recognition. While various algorithms have been proposed for these tasks, evaluating their performance is difficult due to problems caused both by the structure of biomedical ontologies and biased or incomplete experimental annotations of genes and gene products. In this work, we propose an information-theoretic framework to evaluate the performance of computational protein function prediction. We use a Bayesian network, structured according to the underlying ontology, to model the prior probability of a protein's function. We then define two concepts, misinformation and remaining uncertainty, that can be seen as information-theoretic analogs of precision and recall. Finally, we propose a single statistic, referred to as semantic distance, that can be used to rank or train classification models. We evaluate our approach by analyzing the performance of three protein function predictors of Gene Ontology terms and provide evidence that we address several weaknesses of currently used metrics. We believe this framework provides useful insights into the performance of protein function prediction tools.
Keyword: Gene Ontology, Bayesian Network, Information Content, Protein Fun
TOP
PP77 (PT) - CAMPways: Constrained Alignment Framework for the Comparative Analysis of a Pair of Metabolic Pathways
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: Hall 7
Presenting author: Cesim Erten, Kadir Has University
Additional authors:Turker Biyikoglu, Izmir Institute of Technology
Gamze Abaka, Kadir Has University, Turkey
Session Chair: Alfonso Valencia
Given a pair of metabolic pathways, an alignment of the pathways corresponds to
a mapping between similar substructures of the pair. Successful alignments may provide useful applications in phylogenetic tree reconstruction, drug design, and overall may enhance our understanding of cellular metabolism. We consider the problem of providing one-to-many alignments of reactions in a pair of metabolic
pathways. We first provide a constrained alignment framework applicable to the problem. We show that the constrained alignment problem even in a very primitive setting is computationally intractable which justifies efforts for designing efficient heuristics. We present our Constrained Alignment of Metabolic Pathways (CAMPWays) algorithm designed for this purpose. Through extensive experiments involving a large pathway database we demonstrate that when compared to a state-of-the-art alternative, the CAMPWays algorithm provides better alignment results on metabolic networks as far as measures based same-pathway inclusion are concerned. The execution speed of our algorithm constitutes yet another important improvement over alternative algorithms.
Keyword: Metabolic pathways, Network alignment, Graph matching, Algorithms
TOP
PP78 (PT) - Integrating sequence, expression and interaction data to determine condition-specific miRNA regulation
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: Hall 14.2
Presenting author: Hai-Son Le , Carnegie Mellon, United States
Additional authors:Ziv Bar-Joseph, Carnegie Mellon
Session Chair: Ralf Zimmer
Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression post-transcriptionally. MiRNAs were shown to play an important role in development and disease, and accurately determining the networks regulated by these miRNAs in a specific condition is of great interest. Early work on miRNA target prediction has focused on utilizing static sequence information. More recently, researchers have combined sequence and expression data to identify such targets in various conditions.
Results: Here we propose a regression-based probabilistic method that integrates sequence, expression and interaction data to identify modules of mRNAs controlled by small sets of miRNAs. We formulate an optimization problem and develop a learning framework to determine the module regulation and membership. Applying our method to cancer data we show that by adding protein interaction data and modeling combinatorial regulation our method can accurately identify both miRNA and their targets improving upon prior methods. We next used our method to jointly analyze a number of different types of cancers and identified both common and cancer type specific miRNA regulators.
Keyword: microRNA, gene regulation, transcriptomics, regulatory netwo
TOP
PP79 (PT) - Phylogenetic analysis of multiprobe fluorescence in situ hybridization data from tumor cell populations
Date: Tuesday, July 23, 2:40 p.m. - 3:05 p.m.Room: ICC Lounge 81
Presenting author: Russell Schwartz, Carnegie Mellon University, United States
Additional authors:Stanley Shackney, Intelligent Oncotherapeutics
Kerstin Heselmeyer-Haddad, National Institutes of Health
Thomas Ried, National Institutes of Health
Alejandro Schäffer, National Institutes of Health
Salim Akhter Chowdhury, Carnegie Mellon University, United States
Session Chair: Donna Slonim
Motivation: Development and progression of solid tumors can be attributed to a process of mutations, which typically includes changes in the number of copies of genes or genomic regions. Although comparisons of cells within single tumors show extensive heterogeneity, recurring features of their evolutionary process may be discerned by comparing multiple regions or cells of a tumor. A particularly useful source of data for studying likely progression of individual tumors is fluorescence in situ hybridization (FISH), which allows one to count copy numbers of several genes in hundreds of single cells. Novel algorithms for interpreting such data phylogenetically are needed, however, to reconstruct likely evolutionary trajectories from states of single cells and facilitate analysis of their evolutionary trajectories.
Results: In this paper, we develop phylogenetic methods to infer likely models of tumor progression using FISH copy number data and apply them to a study of FISH data from two cancer types. Statistical analyses of topological characteristics of the tree-based model provide insights into likely tumor progression pathways consistent with the prior literature. Furthermore, tree statistics from the resulting phylogenies can be used as features for prediction methods. This results in improved accuracy, relative to unstructured gene copy number data, at predicting tumor state and future metastasis.
Availability: A package of source code for FISH tree building (FISHtrees) and the data on cervical cancer and breast cancer examined here are publicly available at the site ftp://ftp.ncbi.nlm.nih.gov/pub/FISHtrees.
Keyword: dose-response analysis, antioxidant mechanisms of interactions, simulation
TOP
PP80 (HT) - Turning networks into ontologies of gene function
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 4/5
Presenting author: Janusz Dutkowski , University of California, San Diego, United States
Additional authors:Michael Kramer, University of California San Diego, United States
Michal Surma, 3. Max Planck Institute, Germany
Rama Balakrishnan, Stanford University, United States
J. Michael Cherry, Stanford University, United States
Nevan Krogan, University of California, San Francisco, United States
Trey Ideker, University of California San Diego, United States
Session Chair: Reinhard Schneider
Ontologies are of key importance to many domains of biological research. The Gene Ontology (GO), in particular, has proven instrumental in unifying knowledge about biological processes, cellular components, and molecular functions through a hierarchy of concepts and their interrelationships. However, given only partial biological knowledge and inconsistency in how this knowledge is curated, it has been difficult to construct, extend and validate GO in an unbiased manner. To address this problem we have recently developed a new computational system that infers ontological representations automatically from large-scale maps of gene and protein interactions. The result is a network-extracted ontology (NeXO), which contains 4,123 biological concepts and 5,766 hierarchical concept relations, capturing the majority of known cellular components and identifying approximately 600 new components and relationships. As we show, many new components can be validated using a combination of experimental and bioinformatic approaches, and used directly to update the Gene Ontology structure.
Keyword: Protein Interactions & Molecular Networks, Applied Bioinformatics
TOP
PP81 (HT) - Along Signal Paths: Connecting Pathway Annotation to Topological Analyses
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 7
Presenting author: Gabriele Sales , Università di Padova, Italy
Additional authors:Paolo Martini, Università di Padova, Italy
Enrica Calura, Università di Padova, Italy
Chiara Romualdi, Università di Padova, Italy
Session Chair: Alfonso Valencia
Gene expression analysis is increasingly relying on information about pathway topology to enhance result interpretation. This connection between pathway annotation and analysis remains limited. Pathway representation formats have grown richer, but at the same time they gained a great deal of complexity that offers no direct advantage to data modelling. As a result, most analysis methods completely discard the information about topology and instead focus on simple gene lists.
Our recent efforts have been directed to fill this gap between annotation and analysis. We developed a totally new computational platform that exploits both the richness of the latest pathway data formats (such as BioPax 3) and the sensitivity of the topological analyses.
Our software is able to convert topological information into gene networks. From this, it can dissect the complexity of a pathway identifying the portions associated with a biological process, providing easy visualization, access and interpretation of expression data.
Keyword: Protein Interactions & Molecular Networks, Gene Regulation & Transcriptomics
TOP
PP82 (PT) - The RNA Newton Polytope and Learnability of Energy Parameters
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 14.2
Presenting author: Hamidreza Chitsaz, Wayne State University, United States
Additional authors:Elmirasadat Forouzmand, Wayne State University, United States
Session Chair: Ralf Zimmer
Motivation: Computational RNA structure prediction is a mature important problem which has received a new wave of attention with the discovery of regulatory non-coding RNAs and the advent of high-throughput transcriptome sequencing. Despite nearly two scores of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. So far, researchers have proposed increasingly complex energy models and improved parameter estimation methods, experimental and/or computational, in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. Why is that?
Approach: The first step towards high accuracy structure prediction is to pick an energy model that is inherently capable of predicting each and every one of known structures to date. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of such an inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. To the best of our knowledge, this is the first approach towards computing the RNA Newton polytope and a systematic assessment of the inherent capabilities of an energy model. The worst complexity of our algorithm is expontential in the number of features. However, one could employ dimensionality reduction techniques to avoid the curse of dimensionality.
Results: We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U, C-G, and G-U base pairs. Our results show that this simple energy model satisfies the necessary condition for more than half of the input unpseudoknotted sequence-structure pairs (55%) chosen from the RNA STRAND v2.0 database and severely violates the condition for about 13%, which provide a set of hard cases that require further investigation. From 1350 RNA strands, the observed three dimensional feature vector for 749 strands is on the surface of the computed polytope. For 289 RNA strands, the observed feature vector is not on the boundary of the polytope but its distance from the boundary is not more than one. A distance of one essentially means one base pair difference between the observed structure and the closest point on the boundary of the polytope, which need not be the feature vector of a structure. For 171 sequences, this distance is larger than 2, and for only 11 sequences, this distance is larger than 5.
Keyword: RNA structure prediction, Energy parameter estimation, Computational algebra
TOP
PP83 (PT) - Automated target segmentation and fast alignment methods for high-throughput classification and averaging of crowded cryo-electron subtomograms
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: ICC Lounge 81
Presenting author: Min Xu , University of Southern California, United States
Additional authors:Frank Alber, University of Southern California
Session Chair: Donna Slonim
Motivation: Cryo-electron tomography allows the imaging of macromolecular complexes in near living conditions. To enhance the nominal resolution of a structure it is necessary to align and average individual subtomograms each containing identical complexes. However, if the sample of complexes is heterogeneous, it is necessary to first classify subtomograms into groups of identical complexes. This task becomes challenging when tomograms contain mixtures of unknown complexes extracted from a crowded environment. Two main challenges must be overcome: First, classification of subtomograms must be performed without knowledge of template structures. However, most alignment methods are too slow to perform reference-free classification of a large number of (e.g. tens of thousands) of subtomograms. Second, subtomograms extracted from crowded cellular environments, contain often fragments of other structures besides the target complex. However, alignment methods generally assume that each subtomogram only contains one complex. Automatic methods are needed to identify the target complexes in a subtomogram even when its shape is unknown.
Results: In this paper, we propose an automatic and systematic method for the isolation and masking of target complexes in subtomograms extracted from crowded environments. Moreover, we also propose a fast alignment method using fast rotational matching in real space. Our experiments show that, compared to our previously proposed fast alignment method in reciprocal space, our new method significantly improves the alignment accuracy for highly distorted and especially crowded subtomograms. Such improvements are important for achieving successful and unbiased high-throughput reference-free structural classification of complexes inside whole cell tomograms.
Keyword: Cryo-electron tomography, Subtomogram alignment, Subtomogram classification
TOP
PP84 (PT) - A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 4/5
Presenting author: Sophia Ananiadou, The University of Manchester
Additional authors:Tomoko Ohta, The University of Manchester
Rafal Rak, The University of Manchester
Andrew Rowley, The University of Manchester
Douglas B. Kell, The University of Manchester
Sampo Pyysalo, The University of Manchester
Makoto Miwa, The University of Manchester, United Kingdom
Session Chair: Reinhard Schneider
Motivation: In order to create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge.
Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models, and then turns them into queries for three text-mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machinelearning approaches.
Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText.
Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/.
Contact: makoto.miwa@manchester.ac.uk
Keyword: Text mining, Pathway, Ranking
TOP
PP85 (PT) - A Context-Sensitive Framework for the Analysis of Human Signalling Pathways in Molecular Interaction Networks
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 7
Presenting author: Alex Lan , Ben Gurion University, Israel
Additional authors:Michal Ziv-Ukelson, Ben Gurion University of the Negev, Israel
Esti Yeger-Lotem, Ben Gurion University, Israel
Session Chair: Alfonso Valencia
A major challenge in systems biology is to reveal the cellular pathways that give rise to specific phenotypes and behaviours. Current techniques often rely on a network representation of molecular interactions, where each node represents a protein or a gene and each interaction is assigned a single static score. However, the use of single interaction scores fails to capture the tendency of proteins to favour different partners under distinct cellular conditions. Here we propose a novel context-sensitive network model, in which genes and protein nodes are assigned multiple contexts based on their gene ontology annotations, and their interactions are associated with multiple context-sensitive scores. Using this model we developed a new approach and a corresponding tool, ContextNet, based on a dynamic programming algorithm for identifying signalling paths linking proteins to their downstream target genes. ContextNet finds high-ranking context-sensitive paths in the interactome, thereby revealing the intermediate proteins in the path and their path-specific contexts. We validated the model using 18,348 manually-curated cellular paths derived from the SPIKE database. We next applied our framework to elucidate the responses of human primary lung cells to influenza infection. Top-ranking paths were much more likely to contain infection-related proteins, and this likelihood was highly correlated with path score. Moreover, the contexts assigned by the algorithm pointed to putative as well as previously known responses to viral infection. Thus context-sensitivity is an important extension to current network biology models and can be efficiently used to elucidate cellular response mechanisms.
ContextNet is publicly available at http://netbio.bgu.ac.il/ContextNet.
Keyword: PPI-Network, Context-Sensitive-Path, Systems-Biology
TOP
PP86 (PT) - A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotides distribution
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 14.2
Presenting author: Vladimir Reinharz , McGill University, Canada
Additional authors:Yann Ponty, CNRS/LIX, Polytechnique, France
Jerome Waldispuhl, McGill University, Canada
Session Chair: Ralf Zimmer
Motivations: The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software use similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criteria for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity.
Results: In this paper, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seed-less (we remove the bias of the seed in local search heuristics), and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop an hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology outperforms both local and global approaches.
Keyword: RNA, secondary structure, design, weighted sampling, GC
TOP
SS03_PartB - Inference, visualization and evaluation of signaling networks: A literature based framework
Date: Monday, July 22, 11:00 a.m. -11:25 a.m.Room: Hall 1
Presenting author: Hayssam Soueidan , NKI-AVL, United States
Session Chair:
Keyword: TOP
SS03_PartD - Using the rxncon framework for network definition, visualisation and modelling
Date: Monday, July 22, 12:00 p.m.-12:25 p.m.Room: Hall 1
Presenting author: Marcus Krantz , Humboldt University, United States
Session Chair:
Keyword: TOP
SS06_PartA1 - ELIXIR
Date: Tuesday, July 23, 2:10 p.m.-2:35 p.m.Room: Hall 1
Presenting author: Niklas Blomberg , ,
Session Chair:
Keyword: TOP
SS06_PartA2 - BioMedBridges - providing data and services bridges between the biomedical sciences infrastructures
Date: Tuesday, July 23, 2:10 p.m.-2:35 p.m.Room: Hall 1
Presenting author: Janet Thornton , EMBL-EBI, United Kingdom
Session Chair:
Keyword: TOP
SS06_PartC3 - ELIXIR Swedish Node
Date: Tuesday, July 23, 2:10 p.m.-2:35 p.m.Room: Hall 1
Presenting author: Bengt Persson , Linköping University, Sweden
Session Chair:
Keyword: TOP
TT01 -
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 9
Presenting author: , ,
Session Chair: Geoff Barton
Keyword: TOP
TT02 -
Date: Sunday, July 21, 10:30 a.m. - 10:55 a.m.Room: Hall 10
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT03 -
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 9
Presenting author: , ,
Session Chair: Geoff Barton
Keyword: TOP
TT04 -
Date: Sunday, July 21, 11:00 a.m. - 11:25 a.m.Room: Hall 10
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT05 -
Date: Sunday, July 21, 11:30 a.m. - 11:55 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Geoff Barton
Keyword: TOP
TT06 -
Date: Sunday, July 21, 11:30 a.m. - 12:25 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT07 -
Date: Sunday, July 21, 12:00 p.m. - 12:25 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Geoff Barton
Keyword: TOP
TT08 -
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT09 -
Date: Sunday, July 21, 2:10 p.m. - 2:35 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT10 -
Date: Sunday, July 21, 2:40 p.m. - 3:35 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT11 -
Date: Sunday, July 21, 2:40 p.m. - 3:05 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT12 -
Date: Sunday, July 21, 3:10 p.m. - 3:35 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT13 -
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT14 -
Date: Sunday, July 21, 3:40 p.m. - 4:05 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT15 -
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT16 -
Date: Monday, July 22, 10:30 a.m. - 10:55 a.m.Room: Hall 10
Presenting author: , ,
Session Chair: Christophe Blanchet
Keyword: TOP
TT17 -
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT18 -
Date: Monday, July 22, 11:00 a.m. - 11:25 a.m.Room: Hall 10
Presenting author: , ,
Session Chair: Christophe Blanchet
Keyword: TOP
TT19 -
Date: Monday, July 22, 11:30 a.m. - 12:25 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT20 -
Date: Monday, July 22, 11:30 a.m. - 11:55 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Christophe Blanchet
Keyword: TOP
TT21 -
Date: Monday, July 22, 12:00 p.m. - 12:25 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Christophe Blanchet
Keyword: TOP
TT22 -
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT23 -
Date: Monday, July 22, 2:10 p.m. - 2:35 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Christophe Blanchet
Keyword: TOP
TT24 -
Date: Monday, July 22, 2:40 p.m. - 3:05 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Johannes Soedling
Keyword: TOP
TT25 -
Date: Monday, July 22, 2:40 p.m. - 3:35 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Christophe Blanchet
Keyword: TOP
TT26 -
Date: Monday, July 22, 3:10 p.m. - 3:35 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Johannes Soedling
Keyword: TOP
TT27 -
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Johannes Soedling
Keyword: TOP
TT28 -
Date: Monday, July 22, 3:40 p.m. - 4:05 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Christophe Blanchet
Keyword: TOP
TT29 -
Date: Tuesday, July 23, 10:30 a.m. - 10:55 a.m.Room: Hall 9
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT30 -
Date: Tuesday, July 23, 11:00 a.m. - 11:25 a.m.Room: Hall 9
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT31 -
Date: Tuesday, July 23, 11:30 a.m. - 12:25 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Dominic Clark
Keyword: TOP
TT32 -
Date: Tuesday, July 23, 2:10 p.m. - 3:05 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT33 -
Date: Tuesday, July 23, 3:10 p.m. - 3:35 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP
TT34 -
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: ICC Lounge 81
Presenting author: , ,
Session Chair:
Keyword: TOP
TT35 -
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 10
Presenting author: , ,
Session Chair: Johannes Soedling
Keyword: TOP
TT36 -
Date: Tuesday, July 23, 3:40 p.m. - 4:05 p.m.Room: Hall 9
Presenting author: , ,
Session Chair: Rodrigo Lopez
Keyword: TOP