Highlights Track: HL01
The Plurality of Prognostic Gene Signatures for Cancer
Monday, June 29 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Paul Boutros, Ontario Institute for Cancer Research, Canada
Presentation Overview:
Many diseases exhibit highly variable prognosis, with some individuals responding to therapy and others not. Despite this variability, patients are often treated identically. One major goal of modern medicine is to identify biomarkers that predict the optimal therapy for each patient. Although diverse computational techniques have been developed, existing methods have two key weaknesses: they overfit training datasets and they yield non-overlapping biomarkers that complicate clinical validation. We addressed these weaknesses in the context of lung cancer. First, we developed a non-linear biomarker-identification technique by coupling unsupervised machine-learning to gradient-descent optimization. This algorithm identified a six-gene biomarker that was validated on eight independent datasets comprising 589 patients. Second, we devised a technique for estimating the null distribution of biomarkers that can be used to obtain an unbiased ranking of biomarkers; our six-gene biomarker is in the 99.98 percentile. More importantly, this analysis reveals that over 500,000 unique lung cancer biomarkers exist. Thus, our work resolves two key questions in the field of biomarker identification: the over-fitting and non-overlapping biomarker problems. Although we focus on lung cancer, our techniques are directly applicable to other diseases. We are currently applying these techniques to other diseases. Preliminary results from breast cancer and schizophrenia will be presented.
Highlights Track: HL02
Modeling Ecological and Genetic Diversity in Bacteria
Monday, June 29 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Eric Alm, MIT, United States
Presentation Overview:
Defining species boundaries is a major challenge for microbiologists. This is because bacteria are difficult to study in their natural habitat, and they can exchange genes across species boundaries (however these may be defined). One promising direction is to combine environmental DNA sequence data together with ecological metadata to infer: (i) populations adapted to a habitat/niche; (ii) the structure of the habitat underlying a series of ecological measurements; and (iii) the history of ecological adaptation within a lineage. We describe a quantitative model (AdaptML) that infers the evolutionary
history of ecological differentiation for a collection of ocean bacteria, revealing populations specific for different seasons and life-styles, and in more recent work we have used the predictions of the AdaptML algorithm to target entire populations of bacteria for complete genome sequencing. This population genomic approach is helping to elucidate the underlying microevolutionary processes shaping bacterial species.
Highlights Track: HL03
Whole genome analysis of mtDNA natural evolution in human and in cancer
Monday, June 29 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Eitan Rubin, Ben Gurion University, Israel
Presentation Overview:
mtDNA is an exceptional model genome for bioinformatics. It is the only genome sequenced in 2400 different individuals. In this work we demonstrate how comparative whole-genome analysis reveals strong patterns the recur in human evolution and in cancer.
Highlights Track: HL04
The role of the RNA folding free energy in the evolution of the influenza virus
Monday, June 29 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Panayiotis Benos, University of Pittsburgh, United States
Presentation Overview:
The influenza A virus genome consists of eight single stranded RNA segments of negative polarity. Current efforts to understand viral host-specificity have largely focused on the amino acid differences between avian and human isolates. The results presented here demonstrate that the RNA folding free energy (FFE) of the influenza polymerase genes plays a key role in the evolution and host specificity of the virus. In particular, we found that the distribution of the FFEs is significantly different between human and avian isolated strains, with human isolates having generally higher FFEs (less stringent RNA structures). When avian polymerase genes are introduced in the human population, their FFEs shift toward higher values over the years. Infection experiments in mammalian cells growing at different temperatures show that human isolated viruses cannot propagate efficiently at higher temperatures and more recent results (not in the paper) show the opposite: i.e., avian isolates cannot propagate efficiently in lower temperatures. Taken together, our data suggest for the first time that RNA structure stability is important for the emergence and host shift of influenza A virus. The fact that cellsÕ temperature affects virus propagation in mammalian cells has important consequences for the prevention and therapeutic strategies.
Highlights Track: HL05
Proteomics first approach for discovering sub-network targets in cancer
Monday, June 29 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Rod Nibbe, Case Western Reserve University, United States
Presentation Overview:
Using a proteomics first approach we identified many targets significant for late stage human colon cancer. These targets were used to seed a search in a well-annotated PPI for subnetworks possibly significant for the late stage phenotype. We devised a method to score certain of the subnetworks found using an information theoretic (mutual information) approach based on a complement of transcription data (microarray) as a surrogate for subnetwork activity. The subnetworks were pruned to leave significant targets, and extended one hop to infer functional relevance and inform follow-on experiments. The significant targets in one subnetwork were validated by label-free mass spectrometry or western blot, and found to be coordinately regulated at the level of protein and mRNA. Overall, the work outlines a novel quantitative approach for extending the results of proteomic profiling for finding disease discriminators at the level of protein subnetworks (and thus function), and drives target selection for in vitro/in vivo verification.
Highlights Track: HL06
Bayesian Inference of Selection Histories in Six Mammalian Genomes
Monday, June 29 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Tomas Vinar, Comenius University in Bratislava, Slovak Republic
Presentation Overview:
Genome-wide scans for positively selected genes (PSGs) in mammals haveprovided insight into the dynamics of genome evolution, the geneticbasis of differences between species, and the functions of individualgenes. Here we present the most comprehensive examination of mammalianPSGs to date, using the six high-coverage genome assemblies nowavailable for eutherian mammals. The increased phylogenetic depth ofthis dataset results in substantially improved statistical power, andpermits several new lineage- and clade-specific tests to beapplied. Of ~16,500 human genes with high-confidence orthologs in atleast two other species, 400 genes showed significant evidence ofpositive selection (FDR
Highlights Track: HL07
Computational approach to model peptide antigenicity
Monday, June 29 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Carlos Camacho, University Of Pittsburgh, United States
Presentation Overview:
It is well known that relatively unstable peptides bearing only partial structural resemblance to native protein can trigger antibodies recognizing higher order structures found in native protein. Based on sound thermodynamic principles and computational modeling, this work reveals that stability of immunogenic protein-like motifs is a critical parameter rationalizing the diverse humoral immune responses induced by different linear peptide epitopes. In this paradigm, peptides with a minimal amount of stability (∆GX < 0 kcal/mol) around a protein-like motif (X) are capable to induce antibodies with similar affinity for both peptide and native protein, more weakly stable peptides (∆GX > 0 kcal/mol) trigger antibodies recognizing full protein but not peptide, and unstable peptides (∆GX > 8 kcal/mol) fail to generate antibodies against either peptide or protein. Immunization experiments involving peptides derived from the autoantigen histidyl-tRNA synthetase verify that selected peptides with varying relative stabilities predicted by molecular dynamics simulations induce antibody responses consistent with this theory. Collectively, these studies provide insight pertinent to the structural basis of immunogenicity and, at the same time, validate this form of thermodynamic and molecular modeling as an approach to probe the development/evolution of humoral immune responses.
Highlights Track: HL08
Insights into corn genes derived from large-scale cDNA sequencing
Monday, June 29 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Nickolai Alexandrov, Ceres, Inc., United States
Presentation Overview:
We present a large portion of the transcriptome of Zea mays, including 31,552 fully sequenced non-redundant cDNA clones. These and other previously sequenced transcripts have been aligned with genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in codons. Many transcripts that have lower GC content have dicot homologs but the high GC transcripts are more specific to the grasses. The high GC content class is also enriched with intronless genes. This evolutionary divergence may be the result of horizontal gene transfer from species not only with different GC content but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea.
Highlights Track: HL09
Inferring pathway activity toward precise disease classification
Monday, June 29 - 2:15 p.m. - 2:40 p.m.
Room: T1
Presenting author: Eunjung Lee, KAIST, Korea, Dem. Rep.
Presentation Overview:
The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in cancer due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than the expression levels of individual genes or proteins. We propose a new pathway-based classification procedure in which markers are encoded not as individual genes, nor as the set of genes making up a known pathway, but as subsets of Òcondition-responsive genes (CORGs)Ó within those pathways. Using expression profiles from seven different microarray studies, we show that the accuracy of this method is significantly better than both the conventional gene- and pathway- based diagnostics. Furthermore, the identified CORGs may facilitate the development of effective diagnostic markers and the discovery of molecular mechanisms underlying disease.
Highlights Track: HL10
Evolution of genome structure: what statistics can tell us about the biology of chromosomes
Monday, June 29 - 2:15 p.m. - 2:40 p.m.
Room: T2
Presenting author: Aaron Darling, University of California-Davis, United States
Presentation Overview:
The profound effect recombination has on genome evolution remains poorly understood. Recombination can give rise to myriad genomic mutations, including large-scale deletions, lateral gene transfers, duplications, and even genome rearrangements. In this talk, I examine patterns in how homologous recombination gives rise to genomic inversions. The process of evolution by inversion has been modeled using a Bayesian statistical framework, and under that model we infer the phylogenetic history of inversions among nine Yersinia genomes. We statistically confirm that inversions are generally short and that bacteria exhibit a Òbilateral genomic symmetryÓ between the origin and terminus of chromosome replication. We extend previous controversy over breakpoint reuse rates by discovering that hotspots of breakpoint reuse localize near the origin of replication. We illustrate how statistical confidence intervals can be derived for breakpoint reuse rates. Finally, we discover a canonical configuration for the origin and terminus of replication in Yersinia. Our work highlights how advances in combinatorial theory of genome rearrangement can lead to novel statistical inference methods, which can in turn offer new insight into genomic biology.
Highlights Track: HL11
Using side effects of medicines to identify drug targets
Monday, June 29 - 2:45 p.m. - 3:10 p.m.
Room: T1
Presenting author: Michael Kuhn, EMBL Heidelberg, Germany
Presentation Overview:
This talk is based on the study ÒDrug target identification using side-effect similarityÓ, published in Science (July 2008). It describes a computational method to predict whether two drugs share targets based on phenotypic side-effect similarities. While earlier studies focused on molecular or cellular features, we employed a global analysis of drugs and their side effects to predict sharing of drug targets. Applied to 746 marketed drugs, a network of 1018 side effect-driven drug-drug relations became apparent, 261 of which are formed by chemically dissimilar drugs from different therapeutic indications. We experimentally tested 20 of these unexpected drug-drug relations and validated 13 implied drug-target relations by in vitro binding assays, of which 11 reveal inhibition constants equal to less than 10 micromolar. Nine of these were tested and confirmed in cell assays, documenting the feasibility of using phenotypic information to infer molecular interactions and hinting at new uses of marketed drugs.
Highlights Track: HL12
Gene Loss Under Neighbourhood Selection Following Whole Genome Duplication And The Reconstruction Of The Ancestral Diploid
Monday, June 29 - 2:45 p.m. - 3:10 p.m.
Room: T2
Presenting author: David Sankoff, University of Ottawa, Canada
Presentation Overview:
How can we construct a phylogeny based on gene order if some of the genomes under study are descendents of whole genome doubling events? We have integrated a \\\\\\\\\\\\\\\"guided genome halving algorithm\\\\\\\\\\\\\\\" and a median genome routine to heuristically solve the small phylogeny problem, and have applied it to data containing thousands of sets of homologs among the poplar (tetraploid), grapevine (diploid) and papaya (diploid) genomes. We have been able to reconstruct the last diploid ancestor of poplar before its genome was doubled. We can then follow the evolution of duplicate genes pairs, and assess the mechanism that determines which of the two is likely to be lost and whether there is a bias towards losing adjacent genes on the same strand.
Highlights Track: HL13
The Human Phenotype Ontology
Monday, June 29 - 3:15 p.m.- 3:40 p.m.
Room: T1
Presenting author: Peter Robinson, CharitŽ - UniversitŠtsmedizin Berlin, Germany
Presentation Overview:
We will describe the Human Phenotype Ontology and explain how to use it to annotate and analyze human disease. In addition, we will present new methods for performing clinical diagnostics using ontological similarity measures in the HPO and methods for calculating P-values for the scores, as well as new results on the relationship between phenotypic modules (all genes/proteins related to phenotypic features of the HPO) and the network characteristics of these proteins in the protein interactome.
Highlights Track: HL14
Visualizing Genomic Dark Matter: Repeat Probability Clouds in the Human Genome
Monday, June 29 - 3:15 p.m.- 3:40 p.m.
Room: T2
Presenting author: David Pollock, University of Colorado School of Medicine, United States
Presentation Overview:
We have no clear idea about where half the human genome comes from. This is the dark matter of the genome. The part of the genome we do know about is about 90% derived from repetitive elements (mostly transposable elements) and 10% derived from protein- or RNA-encoding genes. It is reasonable to presume that the dark matter is made up of the same things in similar proportions, but that the original sequences have mutated so much that we can no longer identify them easily. The newly developed Òrepeat probability cloudÓ approach is a means of identifying the signature of repeat structure in large genomes that is invisible to standard approaches. In a similar manner to how dark matter is visualized in the physical universe, we visualize genomic dark matter by its perturbing effect on sequence space. Other advantages of this method are that it is extremely fast and that it does not require prior knowledge of transposable element structure. The application of this technique is expected to revolutionize our understanding of the human genome and its origins in the evolution of ancestral mammalian genomes.
Highlights Track: HL15
Analyzing risk factor of heart disease by a computational lipidology approach
Monday, June 29 - 3:45 p.m. - 4:10 p.m.
Room: T1
Presenting author: Katrin Huebner, University Heidelberg, Germany
Presentation Overview:
In our work clinical investigation meets computational simulation to analyze blood lipid values beyond “bad“ and “good“ cholesterol to understand in vivo mechanisms leading to atherosclerosis and heart disease. Following an introduction into the clinical relevance of lipoproteins the experimental and a novel modeling setup is explained. As main results virtual lipoprotein profiles that closely matches clinical values from healthy subjects, model-based predictions of mimicked disorders in underlying molecular processes and alterations in high-resolution lipoprotein profiles are presented.
Highlights Track: HL16
MotifMap: a human genome-wide map of candidate regulatory motif sites.
Monday, June 29 - 3:45 p.m. - 4:10 p.m.
Room: T2
Presenting author: Pierre Baldi, UC Irvine, United States
Presentation Overview:
Comprehensive identification of all regulatory elements encoded in the human genome is a fundamental need in biomedical research. So far, only a small fraction of these elements have been identified experimentally. There is great interest in systematically discovering regulatory elements through computational means. We describe how to use comparative genomics to derive the first comprehensive map of regulatory elements in the human genome, taking advantage of the recent availability of 18 mammalian genomes. We developed a new scoring scheme for detecting regulatory elements, called Bayesian branch length score (BBLS), which can account for phylogenetic relationship between the species being compared, and motif-matching score at each individual species, while at the same time being flexible to alignment errors and missing sequences. Using BBLS, we were able to predict 1.5 million regulatory sites in the human genome with FDR less than 50%, corresponding to 380 regulatory motifs in the Transfac database. The method is particularly effective for 155 motifs, for which over 121 thousands sites can be mapped with FDR less than 10%.
Highlights Track: HL17
Learning from Resequencing Data: What To Do When the $1000 Genome Arrives?
Tuesday, June 30 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Gregory Kryukov, Brigham & Women's Hospital / Harvard Medical School , United States
Presentation Overview:
We investigated the potential of resequencing all exons in a clinical population to discover genes underlying human complex phenotypes. Computer simulations based on currently available deep resequencing data show that genes meaningfully affecting a human trait can be identified in an unbiased fashion, although large sample sizes would be required to achieve substantial power.
Highlights Track: HL18
Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods
Tuesday, June 30 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Christophe Dessimoz, ETH Zurich, Switzerland
Presentation Overview:
The identification of orthologs, pairs of homologous genes in different species that started diverging through speciation events, is a central problem in genomics with applications in many research areas, including comparative genomics, phylogenetics, protein function annotation, or genome rearrangement.An increasing number of projects aim at inferring orthologs from complete genomes, but little is known about their relative accuracy or coverage. Since the exact evolutionary history of entire genomes remains largely unknown, predictions can only be validated indirectly, that is, in the context of the different applications of orthology. The few comparison studies published so far have asssessed orthology exclusively from the expectation that orthologs have conserved protein function.In the present work, we have introduced methodology to verify orthology in terms of phylogeny, and perform a comprehensive comparison of nine leading orthologs inference projects and two methods using both phylogenetic and functional tests. The results show large variations among the different projects in terms of performances, which indicates that the choice of orthology database can have a strong impact on any downstream analysis.
Highlights Track: HL19
A Mathematical Framework for the Selection of an Optimal Set of Peptides for Epitope-Based Vaccines
Tuesday, June 30 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Oliver Kohlbacher, Eberhard-Karls-UniversitŠt, Germany
Presentation Overview:
Due to their manifold advantages (e.g., safety, ease of production,analytical control) and their applicability in personalized medicineepitope-based vaccines (EVs) have recently been attracting significantinterest, in particular as a therapeutic strategy for cancer andinfectious diseases. EVs trigger an immune response via target-specificimmunogenic peptides (epitopes). A crucial step in the design of an EVis the selection of the epitopes to be included. Depending on the numberof candidate epitopes, the diversity of the target population, and theimmunological requirements, the epitope selection can become a verycomplex problem. The epitope selection problem poses an interesting and novelbioinformatics problem. We present a mathematical framework to find anoptimal set of epitopes for an EV. Given a set of epitopes, theframework efficiently identifies the epitopes most likely to elicit abroad and potent immune response in the target population.We can translate the epitope selection problem into an integer linearprogram which allows an easy adaptation to different variants of the EVdesign problem. Among the few published computational approaches ourapproach is the first to identify an optimal epitope set. Thismathematical framework will prove to be a valuable tool in vaccine design.
Highlights Track: HL20
Prediction of Binding Sites on Proteins Using the Gaussian Network Model
Tuesday, June 30 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Burak Erman, Koc University, Turkey
Presentation Overview:
Residues at the binding sites of the ligand and receptor of several enzyme-inhibitor and antibodyantigencomplexes are predicted from the slowest (for the ligand) and fastest (for the receptor) modes ofmotion by the Gaussian Network Model applied to unbound molecules.
Highlights Track: HL21
The miRNA/siRNA saturation effect - transfection of small RNAs compromise gene regulation by endogenous microRNAs
Tuesday, June 30 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Debora Marks, Harvard Medical School, United States
Presentation Overview:
Transfection of siRNAs or miRNAs (microRNAs) into cells typically lowers gene expression of hundreds of genes, assessed by decreases in protein or mRNA levels, but increases in gene expression have also been observed. One explanation for unexpected up-regulation upon miRNA or siRNA over-expression is a reduction in the effective function of the endogenous microRNAs. This may result from competition between exogenous and endogenous RNAs for the intracellular small RNA protein machinery. We tested the validity of this explanation by computational analysis of more than 150 si/miRNA transfection experiments in 7 different cell types for which genome-wide mRNA changes were measured using microarrays. After verifying the expected down-regulation of genes with UTRs that contain target sites for the exogenously introduced small RNAs, we show that genes with target sites for endogenous miRNAs are significantly up-regulated. Confirming this result we observe this competition effect with protein expression changes and a striking correlation of the dose response and temporal response of up-regulated genes with down-regulated genes after siRNA transfections. These findings have broad implications for the design and interpretation of experiments using small RNAs and for the design of clinical trials using siRNA therapeutics.
Highlights Track: HL22
Sequence Similarity Network Reveals Common Ancestry of Multidomain Proteins
Tuesday, June 30 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Dannie Durand, Carnegie Mellon University, United States
Presentation Overview:
The challenge of homology identification in multidomain families with varied domain architectures is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. We first present an extension of the traditional model of homology to include domain insertions and a manually curated benchmark of well-studied mammalian families. We next introduce Neighborhood Correlation, a novel method that identifies homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. We show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them.
Highlights Track: HL23
Models from experiments: combinatorial perturbations of cancer cells
Tuesday, June 30 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Sven Nelander, University of Gothenburg , Sweden
Presentation Overview:
We present a novel method for deriving network models from molecular profiles of perturbed cellular systems. The network models aim to predict quantitative outcomes of combinatorial perturbations, such as drug pair treatments or multiple genetic alterations. Mathematically, we represent the system by a set of nodes, representing molecular concentrations or cellular processes, a perturbation vector and an interaction matrix. After perturbation, the system evolves in time according to differential equations with built-in nonlinearity, similar to Hopfield networks, capable of representing epistasis and saturation effects. For a particular set of experiments, we derive the interaction matrix by minimizing a composite error function, aiming at accuracy of prediction and simplicity of network structure. To evaluate the predictive potential of the method, we performed 21 drug pair treatment experiments in a human breast cancer cell line (MCF7) with observation of phospho-proteins and cell cycle markers. The best derived network model rediscovered known interactions and contained interesting predictions. Possible applications include the discovery of regulatory interactions, the design of targeted combination therapies and the engineering of molecular biological networks.
Highlights Track: HL24
Discovery of a hidden sequence motif conserved in the bacterial type III secretion signal: implications for structure, drug discovery and host-pathogen systems models.
Tuesday, June 30 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Jason McDermott, Pacific Northwest National Laboratory, United States
Presentation Overview:
We describe a novel computational method for identification of type III secreted virulence effectors in bacteria and characterization of a putative secretion signal from apparently unrelated sequences. We will discuss preliminary experimental results supporting our model of a partially disordered structure that may be widely conserved in the absence of sequence similarity and show how these predictions support development of systems models of host-pathogen interactions.
Highlights Track: HL25
Re-examining the connection between the network topology and essentiality
Tuesday, June 30 - 2:15 p.m. - 2:40 p.m.
Room: T1
Presenting author: Teresa Przytycka, NIH, United States
Presentation Overview:
We instigate the reason for correlation between degree and essentiality, observed in a number of yeast networks. Based on an analysis of six genome-wide protein interaction networks compiled from diverse sources of interaction data, we rejected the previously proposed hypotheses and put forward an alternative explanation. We argued that the majority of hubs are essential due to their involvement in groups densely connected proteins, many presumably protein complexes enriched in essential proteins.
Highlights Track: HL26
Built-in loops allow versatility in domainÐdomain interactions: Lessons from self-interacting domains
Tuesday, June 30 - 2:15 p.m. - 2:40 p.m.
Room: T2
Presenting author: Eyal Akiva, The Hebrew University, Israel
Presentation Overview:
The function of most proteins depends on their interaction with other proteins. It was shown that many proteinÐprotein interactions are mediated by protein domains, and that there are distinct domain pairs that are used repeatedly as interaction mediators in various protein contexts. However, not all protein pairs with the corresponding domains that may mediate interaction do interact. It is conceivable that there are intra-domain structural and sequence features that play a role in determining the interaction potential of domains. Here, we discover such features by comparing domains that, on the one hand, mediate homodimerization of proteins and, on the other, occur in different proteins that are monomeric. This comparison uncovered surface loops that can be considered as determinants of the interactions. There are enabling loops, which mediate the domain interactions, and disabling loops that prevent the interactions. The presence of the enabling/disabling loops is consistent with the fulfillment/prevention of the interaction and is highly preserved in evolution. Thus, along with the preservation of structural elements that enable interaction, evolution maintains elements intended to prevent unwanted interactions. Our results extend the hierarchy of attributes that establish the modularity of domain-mediated protein-protein interactions, and provide a novel approach for predicting domain-domain interactions.
Highlights Track: HL27
Regulatory networks define phenotypic classes of human stem cell lines
Tuesday, June 30 - 2:45 p.m. - 3:10 p.m.
Room: T1
Presenting author: Igor Ulitsky, Tel Aviv University, Israel
Presentation Overview:
Hundreds of different human cell lines from embryonic, fetal and adult sources are referred to as stem cells, even though they range from pluripotent cellsÑtypified by embryonic stem cells, which are capable of virtually unlimited proliferation and differentiationÑto adult stem cell lines, which can generate a far more limited repertoire of differentiated cell types. The rapid increase in reports of new sources of stem cells and their anticipated value to regenerative medicine calls for a general, reproducible method for classification of these cells. We have created and analyzed a database of global gene expression profiles that enables the analysis of cultured human stem cells in the context of a wide variety of pluripotent, multipotent and differentiated cell types. We categorized a collection of 150 cell samples, and discovered that pluripotent stem cell lines group together, whereas other cell types, including brain-derived neural stem cell lines, are very diverse. In addition, we uncovered a proteinÐprotein network that is shared by the pluripotent stem cells. Our results offer a new strategy for classifying stem cells and support the idea that pluripotency and self-renewal are under tight control by specific molecular networks.
Highlights Track: HL28
Confirming alternative protein isoforms in Drosophila
Tuesday, June 30 - 2:45 p.m. - 3:10 p.m.
Room: T2
Presenting author: Michael Tress, Spanish National Cancer Research Center (CNIO), Spain
Presentation Overview:
Alternative splicing of messenger RNA permits the formation of a wide range of mature RNA transcripts and has the potential to generate a diverse spectrum of functional proteins. While there is extensive evidence for large scale alternative splicing at the transcript level there have been no comparable studies validating the existence of alternatively spliced protein isoforms.Two recent large scale proteomics studies generated extensive, high quality peptide catalogs from the Drosophila melangaster proteome. The analysis of this proteomic data confirmed the presence of multiple alternative gene products for over a hundred Drosophila genes and for the first time demonstrated the large-scale expression of alternatively spliced gene products. The fact that evidence for alternatively spliced isoforms came from proteomics studies confirms that these alternative isoforms must be expressed in sufficient quantity and be stable enough in vivo to be detected. However, the study suggested that many of the alternative gene products are likely to have regions that are disordered in solution, and that specific proteomics methodologies may be required to identify these isoforms.The analysis highlights the growing importance of proteomics in the validation of predicted proteins and points the way towards further research in this area.
Highlights Track: HL29
Biomedical Discovery Acceleration
Tuesday, June 30 - 3:15 p.m.- 3:40 p.m.
Room: T1
Presenting author: Lawrence Hunter, University of Colorado School of Medicine, United States
Presentation Overview:
Recent technology has made it possible to do experiments that show hundreds or even thousands of genes play a role in a disease or other biological phenomena. Interpreting these experimental results in the light of everything that has ever been published about any of those genes is often overwhelming, and the failure to take advantage of all prior knowledge may impede biomedical research. The computer program described in this paper ÒreadsÓ the biomedical literature and molecular biology databases, ÒreasonsÓ about what all that information means to this experiment, and ÒreportsÓ on its findings in a way that makes digesting all of this information far more efficient than ever before possible. Analysis of a large, complex dataset with this tool led rapidly to the creation of a novel hypothesis about the role of several genes in the development of the tongue, which was then confirmed experimentally.
Highlights Track: HL30
Automated Analysis of Patterns in Human Protein Atlas Images
Tuesday, June 30 - 3:15 p.m.- 3:40 p.m.
Room: T2
Presenting author: Robert Murphy, Carnegie Mellon University, United States
Presentation Overview:
This paper describes the first approach to automatically analyze all major subcellular patterns in tissue images from the Human Protein Atlas.
Highlights Track: HL31
Toward Automated, Practical Provision of Need-Based, High-Utility Text to Diverse Biomedical Users and Database Curators
Tuesday, June 30 - 3:45 p.m. - 4:10 p.m.
Room: Victoria Hall
Presenting author: Hagit Shatkay, Queen's University , Canada
Presentation Overview:
This work is concerned with bridging the gap between actual text-needs of biomedical users (database curators being one example), and text-mining methods. Biomedical text-mining aims to serve a diverse community of scientists by identifying relevant information within scientific text. We note that there is no Òaverage biologistÓª client; different users have distinct needs. Specifically, evaluation efforts (BioCreative, TREC) noted that database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists often search for high-confidence facts about genes and proteins. We have recently introduced a multi-dimensional categorization and annotation scheme, applicable to a wide variety of biomedical text, while supporting specific biomedical retrieval and extraction tasks, including the identification of methods and experimental evidence. Along with it, we developed a large annotated corpus, (10,000 sentences tagged by eight annotators), and trained and tested machine learning classifiers to automatically categorize text based on the tagging scheme. We have also developed models to handle noise and disagreements among annotators. We show that automatic annotation of text along multiple useful dimensions is highly feasible, and that our new framework for scientific sentence categorization is applicable in practice. Among other categories, our classifier accurately identifies methodology and experimental statements.
Highlights Track: HL32
Network modeling of human interactome and phenome
Tuesday, June 30 - 3:45 p.m. - 4:10 p.m.
Room: T1
Presenting author: Rui Jiang, Tsinghua University, China
Presentation Overview:
We model the genome- and phenome-wide human gene-disease relationship using simple regression models, and show that the correlation between protein network distance and disease phenotype similarity accurately predicts disease genes. We perform genome-wide candidate gene prioritization for over 5000 diseases, revealing a comprehensive and modular genetic landscape of human disease. We also provide quantitative evidence supporting the correlation between phenotypic overlap and genetic overlap in human diseases, which implies a concordance between the topology of disease network and gene network. We then introduce the network alignment technique to compare the topology of the protein network and disease network, leading to the discovery of 39 disease families and corresponding causal gene networks, as well as a novel network alignment-based disease gene prediction approach and the high-quality predictions for 70 human diseases. Related paper has been featured by Nature Publishing Group in four areas: Genetics, Pathology, Systems Biology and Biotechnology, and the paper is also highlighted in Nature China. The paper became the journal\'s most accessed article by September of 2008, and is now cited by 7 papers according to ISI. The predicted disease genetic landscape is publicly available, and has been visited by over 1000 researchers around the world.
Highlights Track: HL33
Mitochondrial beta-barrel Outer Membrane Proteins, All Accounted For?
Tuesday, June 30 - 3:45 p.m. - 4:10 p.m.
Room: T2
Presenting author: Paul Horton, AIST, Computational Biology Research Center, Japan
Presentation Overview:
Mitochondrial -barrel Outer Membrane Proteins (MBOMPs) are an important class of proteins which include the essential proteins Tom40, Sam50 and the highly abundant VDAC protein. (Until our challenge) it has been thought that Eukaryotic genomes such as yeast would have 100\'s of such proteins [Wimley, Cur. Opin. Struct. Bio. 2003], an estimate largely based on analogy to bacteria, which share a common ancestor with mitochondria and have numerous families of -barrel Outer Membrane Proteins. Interestingly, despite these high estimates and the availability of the complete genomes of many Eukaryotic organisms; only 5 families of MBOMPs are currently known: (Tom40, Sam50, VDAC, Mdm10, Mmm2).In our talk we will present substantial evidence to support the provocative hypothesis that the 5 known families of MBOMPs represent all or nearly all of the MBOMPs in the entire Eukaryotic world. This conclusion is based on 1) our initial analysis [Imai et al. Cell 2008] of the recently discovered -signal ([Kutik et al. Cell 2008]) for MBOMP membrane integration, and 2) further analysis covering all uniprot Eukaryotic proteins as well as a search for possible \"bacteria-like\" MBOMPs without a -signal..
Highlights Track: HL34
Network-based prediction of human tissue-specific metabolism
Wednesday, July 1 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Tomer Shlomi, Technion, Israel
Presentation Overview:
A major challenge in studying metabolic processes in mammals is that different tissues are characterized by distinct metabolic functions whose direct in vivo investigation is difficult. Here we present the first computational method that successfully obtains a large-scale, tissue-specific description of human metabolism. Our approach is based on integrating tissue-specific gene and protein expression data with an existing comprehensive reconstruction of the global metabolic network. Applying the method to predict tissue-specific metabolic activity for 10 human tissues reveals that post-transcriptional regulation plays a central role in shaping tissue-specific metabolic activity profiles. The predicted tissue specificity of metabolic disease-causing genes and of metabolite exchange with biofluids are shown to go markedly beyond that manifested in the enzyme expression data, and are validated via large-scale mining of tissue-specificity data. Our results lay down the computational basis for the genome-wide study of normal and abnormal human metabolism in a tissue-specific manner.
Highlights Track: HL35
Context-specific BLAST detects twice as many homologous proteins as BLAST
Wednesday, July 1 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Johannes Soeding, University of Munich, Germany
Presentation Overview:
We present a context-specific approach to sequence comparison that allows to drastically improve the sensitivity and alignment quality in comparison with conventional search methods. Our context-specific version of BLAST, CS-BLAST, achieves over two-fold increase in sensitivity at the same specificity and speed, the iterative version CSI-BLAST finds as many homologs after two iterations as PSI-BLAST after five iterations.
Highlights Track: HL36
Leveraging the context-specific coordination of transcript and metabolite concentrations to discover gene-metabolite interactions.
Wednesday, July 1 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Patrick Bradley, Princeton University , United States
Presentation Overview:
Metabolite concentrations can regulate gene expression, which can in turn regulate metabolic activity. The extent to which functionally related transcripts and metabolites show similar patterns of concentration changes, however, remains unestablished. We measure and analyze the metabolomic and transcriptional responses of Saccharomyces cerevisiae to carbon and nitrogen starvation. Our analysis demonstrates that transcripts and metabolites show coordinated response dynamics. Furthermore, metabolites and gene products whose concentration profiles are alike tend to participate in related biological processes. To identify specific, functionally related genes and metabolites, we develop an approach based on Bayesian integration of the joint metabolomic and transcriptomic data. This algorithm finds interactions by evaluating transcript-metabolite correlations in light of the experimental context in which they occur and the class of metabolite involved. It effectively predicts known enzymatic and regulatory relationships, including a gene-metabolite interaction central to the glycolytic-gluconeogenetic switch. This work provides quantitative evidence that functionally related metabolites and transcripts show coherent patterns of behavior on the genome scale and lays the groundwork for building gene-metabolite interaction networks directly from systems-level data.
Highlights Track: HL37
Classification, Evolution, and Assembly of Protein Complexes
Wednesday, July 1 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Emmanuel Levy, Universite de Montreal, Canada
Presentation Overview:
A homomer is formed by self-interacting copies of a protein unit. This is functionally important, as in allostery, and structurally crucial because mis-assembly of homomers is implicated in disease. Homomers are widespread, with 50Ð70% of proteins with a known quaternary state assembling into such structures. Despite their prevalence, little is known about the mechanisms that drive their formation, both at the level of evolution and assembly in the cell. Here we present an analysis of over 5,000 unique atomic structures and show that the quaternary structure of homomers is conserved in over 70% of protein pairs sharing as little as 30% sequence identity. Where quaternary structure is not conserved, a detailed investigation revealed well-defined evolutionary pathways by which proteins transit between different quaternary structure types. Furthermore, we show by perturbing subunit interfaces within complexes and by mass spectrometry analysis, that the (dis)assembly pathway mimics the evolutionary pathway. These data represent a molecular analogy to Haeckel\'s evolutionary paradigm of embryonic development, where an intermediate in the assembly of a complex represents a form that appeared in its own evolutionary history. Our model of self-assembly allows reliable prediction of evolution and assembly of a complex solely from its crystal structure.
Highlights Track: HL38
MeltDB: A software platform for the analysis and integration of Metabolomics Experiment Data
Wednesday, July 1 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Heiko Neuweger, Bielefeld University, Germany
Presentation Overview:
The recent advances in metabolomics have created the potential to measure the levels of hundreds of metabolites which are the end products of cellular regulatory processes. The automation of the sample acquisition and subsequent analysis in high-throughput instruments that are capable of measuring metabolites is posing a challenge on the necessary systematic storage and computational processing of the experimental datasets. Whereas a multitude of specialized software systems for individual instruments and preprocessing methods exists, there is clearly a need for a free and platform-independent system that allows the standardized and integrated storage and analysis of data obtained from metabolomics experiments. Here, we present the web based and platform independent MeltDB systems that provides functionality to consistently store, organize and annotatethe datasets generated in metabolomics experiments. The system offersfunctionality for the preprocessing of mass spectrometry datasets in the file formats netCDF, mzXML and mzData. The results of the preprocessing are visualized and integrated within a functional genomics context and access to higher level statistical analysis are provided via the MeltDB web interface.
Highlights Track: HL39
Evolutionary potentials for protein structure and function prediction
Wednesday, July 1 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Francisco Melo, P. Universidad Catolica de Chile, Chile
Presentation Overview:
We describe a new type of potentials for protein structure prediction, which are called \'evolutionary potentials\' (EvPs). In contrast to current potentials, which are derived from a set of non-redundant protein structures, the EvPs described here exploit the evolutionary record of all known proteins that adopt a specific fold.This study involved large-scale computations such as the structural comparison and clustering of the complete Protein Data Bank, the comparison at the sequence level of the non-redundant proteins against all known proteins at the NCBI database (about 7 million proteins), the building of 3D structure models for all proteins present in each multiple sequence alignment and the derivation of about 19,000 EvPs. The performance of EvPs was assessed for the task of fold assessment. It was demonstrated that EvPs outperform a typical representative potential and that the increase in performance is not a consequence of the amount of information retrieved.As an extension of the already published paper, recent results when using EvPs in the detection of distantly related proteins that would adopt a similar structure, as well as in the detection of key residues for protein function, will be presented for specific and biologically relevant example cases.
Highlights Track: HL40
Histone modifications at human enhancers reflect global cell-type-specific gene expression
Wednesday, July 1 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Gary Hon, University of California, San Diego, United States
Presentation Overview:
Although it is known that gene expression is driven by promoters, enhancers, and insulators, the relative roles of these regulatory elements in this process are not clear. Here we identify these elements in multiple cell types and investigate their roles in cell-type-specific gene expression. We observed that the chromatin state at promoters and CTCF-binding at insulators is largely invariant across diverse cell types. In contrast, enhancers are marked with
highly cell-type-specific histone modification patterns, strongly correlate to cell-type-specific gene expression programs on a global scale, and are functionally active in a cell-type-specific manner. Our results define over 100,000 potential transcriptional enhancers in the human genome, significantly expanding the current catalogue of human enhancers and highlighting the role of these elements in cell-type-specific gene expression.
Highlights Track: HL41
Comparative analysis of crystal interfaces of homologous proteins
Wednesday, July 1 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Roland Dunbrack, Fox Chase Cancer Center, United States
Presentation Overview:
Many proteins act as homooligomers. Often there is little direct evidence of what interfaces are present in the biologically active oligomer(s). Most such oligomers are derived from examination of crystal interfaces, but these are mostly hypothetical, based on surface area and observation of specific kinds of interactions. We have compared interfaces in crystals across different crystal forms of proteins, and correlated the existence of an interface in all or most crystal forms as evidence in favor of biological relevance. We used previously published benchmarks as well as monomeric and oligomeric structures solved by solution NMR to validate this assumption. We find that presence of a similar interface in two or more crystals when sequence identity is less than 90% correlates highly with the benchmark data. The data indicate that of three publicly available sources of biological assemblies (PDB, PQS, and PISA) that PISA is most consistent with interfaces observed in large numbers of crystal forms. Comparative crystal analysis is better at identifying likely monomers, when there are large interfaces present in crystals which automated methods tend to identify as biologically relevant. The cytosolic sulfotransferases provide a particularly interesting example of this kind of analysis.
Highlights Track: HL42
Exploring the human genome with functional maps
Wednesday, July 1 - 2:15 p.m. - 2:40 p.m.
Room: T1
Presenting author: Curtis Huttenhower, Princeton University, United States
Presentation Overview:
Human genomic data of many types are readily available, but the complexity and scale of human molecular biology make it difficult to integrate this body of data, understand it from a systems level, and apply it to the study of specific pathways or genetic disorders. An investigator could best explore a particular protein, pathway, or disease if given a functional map summarizing the data and interactions most relevant to his or her area of interest. Using a regularized Bayesian integration system, we provide maps of functional activity and interaction networks in over 200 areas of human cellular biology, each including information from ~30,000 genome-scale experiments. Key to these analyses is the ability to efficiently summarize this large data collection from a variety of biologically informative perspectives: prediction of protein function and functional modules, cross-talk among biological processes, and association of novel genes and pathways with known genetic disorders. Experimental investigation of five specific genes (AP3B1, ATP6AP1, BLOC1S1, LAMP2 and RAB11A) has confirmed novel roles for these proteins in the proper initiation of macroautophagy in human fibroblasts. Our functional maps can be explored using HEFalMp, a web interface allowing interactive visualization and investigation of this large body of information.
Highlights Track: HL43
Disordered flanks prevent peptide aggregation
Wednesday, July 1 - 2:15 p.m. - 2:40 p.m.
Room: T2
Presenting author: Sanne Abeln, FOM Institute for Atomic and Molecular Physics [AMOLF], Netherlands
Presentation Overview:
In their natural cellular environment proteins are dissolved in a concentrated aqueous solution of biomolecules. Even under such crowded conditions, proteins must not aggregate; aggregates may be cytotoxic or compromise the biological function of the peptide. Evolutionary pressure generally ensures that proteins do not aggregate in their natural biochemical environment. A well-known mechanism to prevent aggregation is the folding of proteins. Here we report a different mechanism that can prevent the aggregation of proteins. Recently, it was discovered that many proteins contain regions that are disordered (not folded) in their natural environment. We show with coarse-grained simulations that embedding small hydrophobic binding motifs in disordered regions can prevent aggregation: the disordered regions of different proteins sterically hinder the formation of aggregates. Moreover, our simulations show that the disordered regions have no adverse effect on the biological (signalling) function of the binding motifs, because they do not obstruct the binding and folding of the binding motif on its specific substrate.
Highlights Track: HL44
Benchmarking tools in Metabolic Pathway Analysis
Wednesday, July 1 - 2:45 p.m. - 3:10 p.m.
Room: T1
Presenting author: Luis de Figueiredo, Friedrich-Schiller-UniversitŠt Jena, Germany
Presentation Overview:
Metabolic Pathway Analysis is a growing field within Systems Biology. For thereconstruction and prediction of metabolic pathways, the concept of elementaryflux modes has turned out to be very useful. It takes into accountstoichiometry and mass balance at steady state not only for monomolecularreactions but also for reactions of higher molecularity. Alternative approacheshave been proposed, which are based on graph theory and neglect the mass balanceof co-substrates and byproducts. Here, we present three benchmark examples bywhich pathway analysis tools can be compared. They concern the question whethereven-chain fatty acids can be converted into sugars in animals, whetherMycoplasma hominis can convert glucose into pyruvate and whether human redblood cells can salvage hypoxanthine. Moreover, new results and directions willbe given for improving stoichiometry-based tools in order to deal withgenome-scale metabolic networks.
Highlights Track: HL45
From the detection of functional regions towards function annotation in proteins
Wednesday, July 1 - 2:45 p.m. - 3:10 p.m.
Room: T2
Presenting author: Nir Ben-Tal, Tel Aviv University, Israel
Presentation Overview:
The identification of functional regions in proteins may aid in function annotation, mutation analysis and drug discovery. In the talk I will present PatchFinder, a method for the identification of functionally important regions in proteins with known three-dimensional structure. The method is based on the assumption that these regions are often evolutionarily conserved in order to retain the effectiveness of the protein. My colleagues and I compiled the N-Func database of 757 proteins of unknown function, whose structure is known, and used PatchFinder to predict their functional regions. In some cases we suggested what the function might be. N-Func and PatchFinder are available as a webserver.Obviously, some of N-Func\\\'s protein bind DNA. In my talk I will also present a new method that we developed for discrimination between proteins that do and do not bind DNA. The method is based on various characteristics of the protein and the (predicted) functional region. We used it to predict DNA-binding proteins in the N-Func database.
Highlights Track: HL46
Prioritizing functional modules mediating genetic perturbations and phenotypic effects
Wednesday, July 1 - 3:15 p.m.- 3:40 p.m.
Room: T1
Presenting author: Li Wang, University of Southern California, United States
Presentation Overview:
How variation in DNA leads to variation in phenotypes is a question open to interpretation. Here we present a global strategy based on the Bayesian network framework to prioritize the functional modules mediating genetic perturbations and their phenotypic effects among a set of overlapping candidate modules. We take lethality in Saccharomyces cerevisiae and human cancer as two examples to show the superiority of this approach over the traditional hypergeometric enrichment test, which ignores the interrelationships among modules.
Highlights Track: HL47
Fitting multiple components into a cryoEM map of their assembly
Wednesday, July 1 - 3:15 p.m.- 3:40 p.m.
Room: T2
Presenting author: Keren Lasker, Tel Aviv University, Israel
Presentation Overview:
Models of macromolecular assemblies are essential for a mechanistic description of cellular processes. Such models are increasingly obtained by fitting atomic-resolution structures of components into a density map of the whole assembly. Yet, current density-fitting techniques are frequently insufficient for an unambiguous determination of the positions and orientations of all components. In the first part of the talk, we will describe MultiFit, a method for simultaneously fitting atomic structures of components into their assembly density map at resolutions of as low as 25 Å. In MultiFit, the positions and orientations of the components are optimized with respect to a scoring function that includes the quality-of-fit of components in the map, the protrusion of components from the map envelope, and the shape complementarity between pairs of components. The scoring function is optimized by our new exact inference optimizer DOMINO (Discrete Optimization of Multiple INteracting Objects) that efficiently finds the global minimum in a discrete sampling space. In the second part of the talk, we will demonstrate the utility of MultiFit for modeling the configuration of large macromolecular assemblies.
Highlights Track: HL48
Architecture of CpG methylation in the human genome
Wednesday, July 1 - 3:45 p.m. - 4:10 p.m.
Room: K2
Presenting author: Israel Steinfeld, Technion - Israel Institute of Technology, Israel
Presentation Overview:
Constitutively unmethylated regions in the genome significantly contribute to open chromatin domains within a sea of global transcriptional repression. This constitutive unmethylated status is commonly thought to be directly associated to the density of CpG dinucleotides. We will present data and analysis results from genome-wide CpG-island methylation profiling of multiple human tissue samples. We will show how unmethylated regions (UMRs) seem to be formed during early embryogenesis, not as a result of CpG-ness, but rather through the recognition of specific sequence motifs closely associated with transcription start sites. We will describe the computational methods used in the analysis, including methylation status calling, motif discovery, GO enrichment and machine learning techniques. We will introduce a new class of nonpromoter UMRs that become de novo methylated in a tissue-specific manner during development and present experimental validation of our findings. In short, we show that UMRs influence genome structure and have a dynamic role in development.
Highlights Track: HL49
Robust simplifications of multiscale biochemical networks
Wednesday, July 1 - 3:45 p.m. - 4:10 p.m.
Room: T1
Presenting author: Andrei Zinovyev, Institut Curie, France
Presentation Overview:
Model reduction, i.e. simplifying complex models to simpler ones, preserving some important (dynamical) features is a necessary technique in almost all projects on biochemicalnetworks modeling, but unfortunately there is no yet solid methodology which can be systematically appliedfor systems biology models.There exist several model reduction directions, connected with time scale separation.A textbook notion of the \'limiting\' reaction can be applied to few simple networkstructures. In this paper we show how the notion of the \'limiting\' reaction rate can be applied forlarge and complex networks.The general view that we develop is that at a particular time window, only a small dominant subsetof reactions determine the behaviour of the biochemical model, but this subset can change with time.We develop an algorithm of model reduction in linear networks, for which the dominant system is unique. We show that the case of hierarchical linear biochemical networks is the only one when the topology of the network determinescompletely its dynamical features.We develop a notion of the dominant subsystem for non-linear networks and demonstrate howit can be applied for simplifying a complex model of NFkB signalling. Our approach generatesa hierarchy of simplified models some levels of which can be compared with existing models of NFkB signalling.
Highlights Track: HL50
Rapid sampling of molecular motion with prior information constraints: Insights into channel gating and domain swapping
Wednesday, July 1 - 3:45 p.m. - 4:10 p.m.
Room: T2
Presenting author: Ora Schueler-Furman, The Hebrew University of Jerusalem, Israel
Presentation Overview:
Proteins are active, flexible machines that perform a range of different functions. Innovative experimental approaches may now provide limited partial information about conformational changes along motion pathways of proteins. There is therefore a need for computational approaches that efficiently incorporate prior information into motion prediction schemes. We present PathRover, a framework designed for the integration of prior information into the motion-planning algorithm of Rapidly-exploring Random Trees (RRT). Each suggested motion pathway comprises a sequence of low-energy, clash-free conformations that satisfy a number of prior information constraints, derived from experimental data or from expert intuition. The incorporation of prior information in an outright fashion narrows down the vast search in the typically high-dimensional conformational space, leading to dramatic reduction in running time. Hybridization of low-energy pathways is then performed using a novel algorithm for the efficient alignment and comparison of molecular motion pathways (similar to string matching algorithms). The suggested framework can serve as an effective, complementary tool for Molecular Dynamics, Normal Mode Analysis, and other prevalent techniques for predicting motion in proteins. We used PathRover to explore in detail molecular motions of domain swapping, substrate binding, and ion channel gating.
Highlights Track: HL51
GraphWeb: functional analysis of genomic networks
Thursday, July 2 - 10:45 a.m. - 11:10 a.m.
Room: Victoria Hall
Presenting author: JŸri Reimand, University of Tartu, Estonia
Presentation Overview:
Deciphering heterogeneous cellular networks of transcriptional regulation, protein interactions and metabolism is a great challenge of current systems biology. Such networks contain modules of interacting genes and proteins that potentially share regulatory mechanisms and common function. Here, we present GraphWeb (http://biit.cs.ut.ee/graphweb/), a web server for network analysis and module discovery. GraphWeb (i) integrates heterogeneous and multispecies data into networks; (ii) discovers topological network modules and (iii) interprets the modules using Gene Ontology, pathways, regulatory motifs and microRNA targets.
Highlights Track: HL52
Characterizing transcriptome plasticity using whole-genome tiling arrays and machine learning
Thursday, July 2 - 10:45 a.m. - 11:10 a.m.
Room: T2
Presenting author: Georg Zeller, Friedrich Miescher Laboratory of the Max Planck Society, Germany
Presentation Overview:
Currently, whole-genome tiling arrays are still a cost-effective technology to quantitatively monitor transcriptomes. We present machine learning methods for data normalization and de novo transcript identification and show substantially improved accuracy of the resulting predictions compared to competing methods. Application to Arabidopsis tiling array data revealed thousands of new transcripts missing in current annotations, including ones with a stress-dependant expression pattern. We moreover characterized the transcriptomes of mutants impaired in various steps of RNA processing.
Highlights Track: HL53
Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project
Thursday, July 2 - 10:45 a.m. - 11:10 a.m.
Room: T1
Presenting author: Chris Taylor, EMBL-EBI, United Kingdom
Presentation Overview:
Minimum information checklists specify the information that should be provided when reporting research. They promote transparency and data accessibility, and support more thorough quality assessment, increasing the value of data set, and by extension the competitiveness of the originators and the host database(s). However, with no mechanisms to coordinate checklist development, to establish the number of extant checklists and to track their evolution were both challenging exercises. Furthermore, overlaps in scope between checklists and arbitrary decisions on wording and structure almost guaranteed significant incompatibilities.Consequently, representatives of several checklist development projects began the MIBBI (Minimum Information for Biological and Biomedical Investigations) project (http://mibbi.org/). MIBBI has two broad goals: To provide access to checklists and their developers (a Ôone-stop shopÕ), and to foster the development of new, integrated checklist ÔsuiteÕ by the participant communities.Since publication last year the MIBBI project has found increasing favour with both journals and funders. The Portal now lists twenty-nine projects and the Foundry is generating new modules. Tools are also emerging to help researchers follow MIBBI guidelines. Overall, we have much of interest to the bioinformatics community: as significant consumers of research data, they stand to benefit greatly from the standardisation that MIBBI is encouraging.
Highlights Track: HL54
Search and discovery of recurring patterns with interactomes
Thursday, July 2 - 11:15 a.m. - 11:40 a.m.
Room: Victoria Hall
Presenting author: Mona Singh, Princeton University, United States
Presentation Overview:
Searching for recurring patterns in biological data has been the backbone of much research and analysis in bioinformatics. For example, within the realm of sequence analysis, the search for recurring or similar patterns has given rise to extensive work on sequence alignments and sequence motif discovery, and has resulted in large sequence motif libraries. Our goal is to begin to develop the analogous techniques for biological networks, by building a framework for searching and mining interaction networks in order to reveal and systematize the recurring protein and interaction patterns within them.I will describe:(1) Our formalism--network schemas--for representing recurring patterns within interactomes. Network schemas specify descriptions of proteins (e.g., their molecular functions or putative domains) and the topology of interactions among them. They can describe domain-domain interactions, signaling and regulatory pathways or more complex network patterns.(2) Our fast algorithms for searching for user-supplied network schemas in arbitrary biological networks.(3) Our framework for systematically uncovering recurring, over-represented schemas in physical interaction networks.(4) An application of our methods to the yeast and human interactomes, where we identify hundreds of recurring and over-represented network schemas of various complexity and provide several lines of evidence of their functional importance.
Highlights Track: HL55
The Computational Exploration of (Alternative) Splicing Mechanisms
Thursday, July 2 - 11:15 a.m. - 11:40 a.m.
Room: T2
Presenting author: Michael Sammeth, CRG/IMIM/UPF, Spain
Presentation Overview:
The recent advent of high-throughput methods that allow for sequencing the whole RNA complement of cell populations has given a first impression on the amount of novel exons and spliceforms that is expected to be added to transcriptome annotations over the next years. In this respect alternative splicing (AS) is more and more coming to the fore as the molecular mechanism that is mainly responsible for creating the plethora of different combinations Ð which finally account for organsim complexity Ð from the limited reservoir of genetically inherited information. In order to keep step with the flood of data, we have developed a generic model of AS based on an universal defenition of its atomary unit called \\\\\\\'event\\\\\\\' and a method which can efficiently retrieve all such events from huge datasets (i.e., millions of annotations) containing a high level of noise (i.e., sequencing errors). In the highlighted publication we focus on comparing these generic events to the ones that have usually been considered in computational analyzes across 12 different species. Additionally, we now show several examples in which characterization of such events provides bona fide propositions for kinetic models to explain the molecular mechanism of splicing in general.
Highlights Track: HL56
Nominalization and alternations in the language of molecular biology: Implications for text mining
Thursday, July 2 - 11:15 a.m. - 11:40 a.m.
Room: T1
Presenting author: Kevin Cohen, University of Colorado School of Medicine, United States
Presentation Overview:
We present data on the understudied phenomenon of nominalization in the language of molecular biology and demonstrate its implications for the design of biomedical text mining systems.
Highlights Track: HL57
FunCoup: global networks of functional coupling in eukaryotes
Thursday, July 2 - 11:45 a.m. - 12:40 p.m.
Room: Victoria Hall
Presenting author: Erik Sonnhammer, Stockholm Bioinformatics Centre, Sweden
Presentation Overview:
Interactomes computationally predicted via data integration are becoming an increasingly popular tool and context for biological research. However merging disparate data sources and presenting relevant parts of a global network is not trivial.� FunCoup, an optimised Bayesian framework and a web resource, was developed to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions � physical interaction, protein complex member, metabolic or signalling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimise the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 datasets in seven organisms, and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. The networks, which are the largest interactome reconstructions to date, are freely available for download and query at http://FunCoup.sbc.su.se. The site allows detailed graphical and tabular analysis of subnetworks around query genes, as well as comparative analysis of orthologous networks in multiple species.
Highlights Track: HL58
A new method for high-resolution gene expression analysis
Thursday, July 2 - 11:45 a.m. - 12:10 p.m.
Room: T2
Presenting author: Caroline Friedel, Ludwig Maximilians - University Munich, Germany
Presentation Overview:
We present a novel approach for measuring both RNA transcription and decay in a single experimental setting. We show that this approach increases the sensitivity for differentially expressed genes and temporal kinetics of transcriptional regulation. Furthermore, alterations in transcription and decay can be distinguished and rates of RNA turnover can be determined with superior accuracy. This provides new insights into gene regulation and is important for quantitative systems biology modelling.
Highlights Track: HL59
Comparative community assessments for applied biomedical text mining: BioCreative II challenge and metaservices.
Thursday, July 2 - 11:45 a.m. - 12:10 p.m.
Room: T1
Presenting author: Florian Leitner, Spanish National Cancer Research Centre (CNIO), Spain
Presentation Overview:
Comparative evaluation of computational tools applied to biological data is crucial to enable monitoring improvements over time and identify competitive strategies, while promoting the tools\' availability through service encapsulation and unification supports their usage. In the interest of biomedical text mining, the BioCreative initiative has contributed substantially to the assessment of text mining tools applied to biologically relevant tasks. The second BioCreative initiative not only had a considerable impact in the development of text mining systems for the extraction of biological entities and protein interactions, but also motivated the implementation of the first text mining meta-server for biology, the BioCreative MetaServer (BCMS). The BCMS is a key infrastructure for the upcoming BioCreative II.5 event, where efforts of text mining developers, biological annotation databases, article authors and publishers contribute to the improvement of information access to full text articles.
Highlights Track: HL60
Studying alternative splicing regulatory networks through partial correlation analysis
Thursday, July 2 - 12:15 p.m. - 12:40 p.m.
Room: Victoria Hall
Presenting author: Liang Chen, University of Southern California, United States
Presentation Overview:
Alternative pre-mRNA splicing is an important gene regulation mechanism for expanding proteomic diversity in higher eukaryotes. The rapid accumulation of high-throughput data provides us an unprecedented opportunity to understand the complicated alternative splicing regulatory network. However, existing statistical and computational methods are still lagging behind the advanced technologies. Sorting out the coordinate and combinatorial alternative splicing regulatory network proposes a major challenge for post-genomic era. We developed statistical methods to derive signals from high-throughput exon array data or high-throughput sequencing data to understand the Òsplicing codesÓ in gene regulation. Partial correlation analysis was proposed to identify the association links between co-spliced exons and links between alternative exons and their regulators. The reconstructed splicing regulatory networks can help us better understand the coordinate and combinatorial nature of the alternative splicing regulation.
Highlights Track: HL61
Global Measures of Uncertainty: Long Overdue in Computational Molecular Biology
Thursday, July 2 - 12:15 p.m. - 12:40 p.m.
Room: T2
Presenting author: Lee Newberg, Wadsworth Center, United States
Presentation Overview:
High-dimensional (high-D) discrete prediction and estimation problems are arguably the most common inference problem in computational biology, covering a range from sequence alignment to network inference. Regardless of the procedure employed, when it delivers a single answer, that answer is a point estimate selected from the set of all possible solutions, the solution ensemble. For high-D discrete spaces the immense size of these ensembles almost always portends considerable uncertainty in estimation. Nevertheless our field has focused little attention on global uncertainty measures of these estimates. Specifically, the almost complete absence of confidence limits is a major oversight of our community, most embarrassingly including me. In this talk I describe credibility limits, Bayesian confidence limits (Webb), and two procedures for efficiently obtaining these limits: a sampling based procedure (Webb) and a more general DP algorithm (Newberg). This intentionally provocative talk will focus on the statistical underpinnings of prediction and estimation in discrete high-D spaces, the glaring need to delineate the uncertainty of estimates in these spaces, credibility limits for this delineation, the value of such measures in the comparison of estimators in the absence of Ògold standardsÓ, and an illustration of these principles on sequence alignment.
Highlights Track: HL62
A Complete Neandertal Mitochondrial Genome Sequence Determined by High-Throughput Sequencing
Thursday, July 2 - 12:15 p.m. - 12:40 p.m.
Room: T1
Presenting author: Richard E. Green, Max Planck Institute for Evolutionary Anthropology, Germany
Presentation Overview:
Recent advances in high-throughput sequencing have opened new vistas in ancient DNA genomics. Following these advances, we have embarked on an effort to retrieve and assemble the genome of our closest, extinct relative, the Neandertal. As a preamble to the complete nuclear genome, we have recovered, sequenced and assembled several complete mitochondrial genomes from Neandertal fossil bones. The extreme depth of sequencing coverage in these assemblies allows for a comprehensive and qualitative assessment of the issues inherent in ancient DNA sequencing, alignment, and assembly. We encoded this knowledge into a custom assembler for ancient DNA. Using this assembler, we generated six complete Neandertal mtDNA genomes. Comparison within Neandertals and to human and other great ape mtDNA sequences confirms that Neandertal the mtDNA lineage diverged roughly 600,000 years ago. Furthermore, these data reveal the low genetic diversity among late Neandertals and imply an effective population size of fewer than 3,500 females.