ISMB/ECCB 2015 Special Sessions
SST01: Translational Medicine InformaticsSST02: Algorithms, Machine Learning and Data Complexity: From Chromatin Interactions to Nuclear Function
SST03: Towards Unifying a Computational Biology Ecosystem Across Europe and the US
SST04: Crowd-Sourced Benchmarking of Somatic Mutation Calling
Attention Conference Presenters - please review the Speaker Information Page available here.
Room: The Auditorium
Organizer(s):
Venkata P. Satagopam is a Research Scientist at Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg actively involved in the Clinical and Translational Research. He is a co-founder of the ISCB student council, involved in the organization of several student council symposiums, initiated internships for students from developing nations etc. He is the co-chair of the ISMB/ECCB 2011 Killer App Award committee, one of the organizing committee member of the ISMB poster session, Arts & Science and co-organizer of the similar workshop in ISMB/ECCB 2011; grant writing tutorial and also involved in the organization of another workshop in ISMB 2012 “P2P – From Postdoc to Principal investigator” and in ISMB/ECCB 2013, ISMB 2014 organized “JPI – Junior Principal Investigator” meeting. He is also involved in the organization of the biohackathons, Garuda 5, 6 and active participant of several scientific conferences.
Mansoor Saqi is a Senior Researcher at the European Institute for Systems Biology and Medicine in Lyon, France and is working on eTRIKS. Previously he was Principal Investigator in Bioinformatics at the Department of Computational and Systems Biology at Rothamsted Research, UK, and has worked in both academic and industrial settings. His work has covered a number of application domains, including sequence and structural bioinformatics, pathways, data integration and the analysis of integrated biological networks.
Reinhard Schneider is a Head of the Bioinformatics Core facility at the Luxembourg Centre for Systems Biomedicine (LCSB) at the University of Luxembourg. Between 1994I2010 he worked as a Team Leader at the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany. Before joining the EMBL he was cofounder of LION bioscience AG, Heidelberg where he served as Chief Information Office. Before founding LION bioscience, he worked as a scientist in the Biocomputing department at the EMBL, Heidelberg, where he studied various aspects of protein structures. Dr. Schneider received his Ph.D. in Biology at the University of Heidelberg, Germany and has over 90 research papers published. He is a member of the executive committee of the International Society for Computational Biology where he serves as the treasurer and chairs the Governance, Fundraising and Finance Committee. He was organizer and co-organizer of several conferences (ISMB 2014 to ISMB 2007, ISCBIAsia/SCCG 2012, ISMB Latin America 2012, VizBi, BBC11, Garuda 5, 6). Beside his academic career, he is involved in several startup projects.
Wei Gu is a Postdoctoral Researcher at the Bioinformatics Core facility at the Luxembourg Centre for Systems Biomedicine (LCSB) at the University of Luxembourg. A key player in the IMI projects eTRIKS and AETIONOMY in data curation and integration, hosting, analytics and visualization of clinical and multi-omics data. He also plays an important role in other IMI projects (through eTRIKS): ONCOTRACK, ABIRISK and APPROACH in terms of data curation and integration as well platform deployments. Dr. Gu received his Ph.D. in bioinformatics at the Centre for Bioinformatics (CBI), Saarland University, Germany in 2008. He has (co-) authored more than 20 scientific publications. Before joining LCSB, he worked as a bioinformatics/biostatistics scientist and a member of the IT support team at CBI Saarland and University Hospital of Saarland.
Irina Roznovat is a Postdoctoral Researcher at the European Institute for Systems Biology and Medicine in Lyon, France and is working on eTRIKS. She holds a PhD in Computer Science (Computational Biology) from Dublin City University, Ireland and has worked on integrating information on genetic/epigenetic interdependencies, signalling pathways, stem cell dynamics, ageing/gender influences and epigenetic inhibitors, to develop a multi-scale computational model for colon cancer dynamics. Her main research interests are complex systems modelling (epigenetics, cancer, neurodegenerative disorders), machine learning, data analysis, concurrent programming. She is a co-organizer of the ‘Empowering Systems Medicine Through Optimal Computational Modelling’ Workshop, held in conjunction with IEEE BIBM 2014, Belfast, UK, Nov. 2014.
In this session, we will discuss the current status of computational biology approaches within the field of clinial and translational medicine. Large amounts of multi ‘omics and clinical data can now be captured for given patient populations. The molecular data, generated from high throughput experiments, includes data relating to gene expression, copy number variation and single nucleotide polymorphisms. Harmonization of retrospective and prospective clinical data from serveral studies and application of controlled terminologies and standards in order to facilitate cross study comparitions is a challenge. A variety of computational approaches are currently being used to harmonize and relate molecular data to clinical outcomes in order to better understand disease conditions. These methods also have the potential to be used predictively to help to suggest personalised therapeutic strategies for patients.
The session comprises of four reports of 20 minutes duration with subsequent 5 min discussion or time for questions. These four topics will (a) give an overview of the importance of computational methods in translational medicine and recent progress in this area (b) address the context related to data curation (c) describe the recent European initiative eTRIKS (European Translational Information and Knowledge Management Services) (d) present an application case that illustrates the value of combining carefully curated clinical data with high dimensional ‘omics data.
Dr. Winston Hide Trained at Temple University , post docced with Wen-Hsuing Li, Richard Gibbs and also at the Smithsonian Museum of Natural History in Washington DC. He founded the South African National Bioinformatics Institute in 1996. Recognised with an outstanding achievement award by the ISCB for his work in establishing Bioinformatics in Africa, Hide has recently driven strategic development for bioinformatics at Harvard’s School of Public Health and Stem Cell Institute - increasingly focusing on translation. He now leads the MSc Programme for genome medicine at Sheffield and is establishing a centre for genome translation at the University.
Scientific communication and scholarly publishing more generally are evolving rapidly under the pressure from funding agencies, regulatory agencies, publishers and the general public for removing the obstacle to data access and to facilitate assessment. This requires having data communication standards in place but that alone is not enough. In this short presentation, we will provide a overview of the landscape of resources relevant to clinical research as well as the latest progress in the field of functional genomics data standards, showing how resources such as Biosharing.org, CDISC standards and ISA format can be harnessed. Following an outline of the key issues, we will share the experience and the lessons learned as part of the IMI eTRIKS initiative the standardization and curation activities involved to improve and facilitate translational research. We will also discuss the opportunities for collaboration and cooperation with other major initiative worldwide, such as the NIH big data to knowledge initiative (NIH BD2K).
David Henderson, PhD is Principal Scientist and Liaison Manager in Global External Innovation and Alliances at Bayer Pharma AG in Berlin, Germany. He earned his B.Sc from the University of Edinburgh and his PhD in Molecular Biology from Vanderbilt University. With over 30 years’ experience in drug discovery and development, he has worked on drug development and biomarker studies in clinical trials in oncology, ranging from Phase I to Phase III. In his present position, he is Liaison Manager for Bayer’s contributions to several projects funded by the Innovative Medicines Initiative (IMI-JU) and acts as Coordinator of the OncoTrack consortium.
eTRIKS (European Translational Research, Informatics and Knowledge Management Services) is a public private partnership made up of 17 pharmaceutical companies, academic institutions and SMEs, jointly financed by the Pharma partners and the Innovative Medicines Initiative of the EU (IMI-JU), with expertise in data management, systems biology, biomedical curation, collaboration and data exchange standards. The goal of the consortium is to leverage the open source transMART platform to provide a series of services to Translational Research and Biomarker research programs, enabling disease stratification and biomarker discovery by:
- Driving the adoption of a common open source platform
- Promoting multi-study data harmonisation
- Developing best practice guidelines and resources for the re-use of research data
- Providing advice and support for translational research projects
In this presentation, we shall describe and outline the current state of the eTRIKS project, illustrate how we are working to support ongoing research projects and how we are providing a hub for a growing ecosystem of open source informatics technologies and providers to support the translational research community.
Dr.Anna Goldenberg is a Scientist in Genetics and Genome Biology program at the SickKids Research Institute and an Assistant Professor in the Department of Computer Science at the University of Toronto. Dr Goldenberg has obtained her PhD in Machine Learning from Carnegie Mellon University developing efficient methods for structural learning in graphical models in application to social networks. She has then immersed herself in the field of computational medicine first as a postdoc at UPenn and later as a postdoc at UofT's Donnelly Centre for Cellular and Biomolecular Research. Her current research focuses on developing novel machine learning methods for genomic and clinical data integration, addressing heterogeneity and identifying disease mechanisms in complex human diseases.
Organizer(s):
Pietro Lio, University of Cambridge, United Kingdom
Yoli Shavit, University of Cambridge, United Kingdom
Very recently, conspicuous effort has been dedicated to describing and predicting the three-dimensional organization of chromosomes inside the eukaryotic nucleus by generic polymer models [1]. In my talk, I will discuss recent results showing that chromosome structure and dynamics can be quantitatively described by a polymer model of decondensing chromosomes which takes into account only minimal physical ingredients like density, stiffness and topology conservation of the chromatin fiber [2-4]. Then, I will present preliminary results concerning how this model can be (1) employed in order to investigate the origin of the visco-elastic properties of the nucleus [5], and (2) suitably generalized for studying the consequences on chromosome structure arising from a mixed composition (10nm-fibers vs. 30nm-fibers) of the underlying chromatin fiber [6].
References:
- A. Rosa and C. Zimmer, Int. Rev. Cell & Mol. Biol. 307, 275 (2014).
- A. Rosa and R. Everaers, Plos Comput. Biol. 4, e1000153 (2008).
- A. Rosa et al., Biophys. J. 98, 2410 (2010).
- A. Rosa and R. Everaers, Phys. Rev. Lett. 112, 118302 (2014).
- M. Valet and A. Rosa, in preparation.
- A.-M. Florescu and A. Rosa, in preparation.
Recent papers have investigated the link between replication timing, replication domains and topologically associated domains. Graph theory, machine learning and signal processing are key examples for methods applied for segmenting the genome based on contact frequency profiles, and for further linking it with replication domains. This presentation will discuss the latest algorithmic advancements and challenges in genome segmentation in the context of investigating the link between replication dynamics and chromatin folding.
An important concept in biology is the link between structure and function. For example, 'sequence makes structure makes function' is a key idea in protein folding. This presentation will address the concept of 'geometry makes function, function makes geometry'. It will present the latest methods for data integration of chromatin conformation and multi omic data, highlighting key results and open challenges.
Restraint-based modeling of genomes has been recently explored with the advent of Chromosome Conformation Capture (3C-based) experiments. We previously developed a reconstruction method to resolve the 3D architecture of both prokaryotic and eukaryotic genomes using 3C-based data. These models were congruent with fluorescent imaging validation. However, the limits of such methods have not systematically been assessed. Here we propose the first evaluation of a mean field restraint-based reconstruction of genomes by considering diverse chromosome architectures and different levels of data noise and structural variability. The results show that: first, current scoring functions for 3D reconstruction correlate with the accuracy of the models; second, reconstructed models are robust to noise but sensitive to structural variability; third, the local structure organization of genomes, such as Topologically Associating Domains, results in more accurate models; fourth, to a certain extent, the models capture the intrinsic structural variability in the input matrices; and fifth, the accuracy of the models can be a priori predicted by analyzing the properties of the interaction matrices. In summary, our work provides a systematic analysis of the limitations of a mean-field restrain-based method, which could be taken into consideration in further development of methods as well as their applications.
3C, 4C and more recently 5C and Hi-C data, were shown to be important for classification of disease taxonomy (for example, in leukaemia).To date, however, there is little established methodology for automated classification and inference. Thus, while imaging data of nuclear architecture, such as Fluorescent In Situ Hybridization (FISH) are commonly used for clinical applications, chromosome conformation data are still far from being adopted. This presentation will discuss some of the computational and bioinformatics challenges involved in data integration and data mining of chromosome conformation data and in their spatial interpretation, towards potential avenues for clinical applications.
An opportunity to address questions and open discussion with Special Session speakers.
Organizer(s):
Philip Bourne is Associate Director for Data Science at the National Institute of Health (NIH) and leader of BD2K, a trans-NIH initiative established to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximize community engagement.
Niklas Blomberg is Director of ELIXIR, the European infrastructure for biological information, based at the ELIXIR Hub located alongside the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK.
The session will examine the relationship between the Big Data to Knowledge (BD2K) initiative and ELIXIR. BD2K has been established by the National Institute of Health (NIH) to enable biomedical research as a digital research enterprise, facilitate discovery and support new knowledge, and to maximise community engagement in the US. ELIXIR, the European infrastructure for biological information, is the distributed effort of partners across Europe to coordinate, sustain and integrate data resources, bio-computing capacity, analysis tools, training and standards for the research community. The two initiatives share many parallels and plan close cooperation over the coming years.
The session will start with representatives from ELIXIR and BD2K talking about their respective priorities and programmes. It will conclude with a panel discussion comparing NIH data management practices and 'data commons’ approach, with the more federated European approach. It will assess the added value to be gained in aligning efforts between the initiatives and provide the opportunity for lively discussion and audience questions to senior representatives from BD2K and ELIXIR Nodes.
The panel session will be chaired by Prof. Nicola Mulder from the Institute of Infectious Disease and Molecular Medicine at the University of Cape Town in South Africa.
Room: The Auditorium
Organizer(s):
Dr. Boutros is an independent investigator in the Informatics and Biocomputing Platform of the Ontario Institute for Cancer Research in Toronto. He received his BSc (Chemistry) from the University of Waterloo and his PhD in Medical Biophysics from the University of Toronto. He has received several awards, including the CIHR/Next Generation First Prize. His research group focuses on using new DNA sequencing technologies to improve diagnosis and treatment of prostate cancer. Paul co-leads both the Canadian Prostate Cancer Genome Network and the ICGC-TCGA DREAM Somatic Mutation Calling Challenge.
Dr. Lee is a Bioinformatician in the Boutros laboratory of the Ontario Institute for Cancer Research in Toronto. She received her BMath from the University of Waterloo and her PhD from McGill University, both in Computer Science with a focus on Bioinformatics. She also completed a postdoctoral fellowship involving chemical genomics with Drs. Guri Giaever and Corey Nislow at the University of Toronto.
The analysis of cancer genome-sequencing data remains a significant challenge. Accurate and rapid identification of somatic mutations of all types – point-mutations and structural variants – is quickly becoming the key limiting step in data-analysis. However, the lack of accepted benchmarks has slowed the adoption of community standards and hindered the evolution of best-in-class methods through collaborative efforts.
The two largest international cancer genomics efforts – the Cancer Genome Atlas (TCGA) and the International Cancer Genomics Consortium (ICGC) – joined forces to launch the ICGC-TCGA DREAM Somatic Mutation Calling Challenge: a crowd-sourcing effort to identify the best pipelines for detecting mutations in the high-throughput sequencing reads of cancer genomes (https://www.synapse.org/#!Synapse:syn312572). The Challenge is part of the DREAM series of open challenges in computational biology, and is divided into sub-challenges focused on specific aspects of mutation calling.
As organizers of the Challenge, we will present the results of community efforts to create benchmarks for mutation calling. In addition, a winning structural variant detection method, novoBreak, will be presented by Dr. Ken Chen from the University of Texas MD Anderson Cancer Center.
Dr. Boutros is an independent investigator in the Informatics and Biocomputing Platform of the Ontario Institute for Cancer Research in Toronto. He received his BSc (Chemistry) from the University of Waterloo and his PhD in Medical Biophysics from the University of Toronto. He has received several awards, including the CIHR/Next Generation First Prize. His research group focuses on using new DNA sequencing technologies to improve diagnosis and treatment of prostate cancer. Paul co-leads both the Canadian Prostate Cancer Genome Network and the ICGC-TCGA DREAM Somatic Mutation Calling Challenge.
Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. The BAMSurgeon tool for simulating cancer genomes, and the results of 248 single nucleotide variant analyses of three in silico tumors created with it, will be presented. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/.
Dr. Lee is a Bioinformatician in the Boutros laboratory of the Ontario Institute for Cancer Research in Toronto. She received her BMath from the University of Waterloo and her PhD from McGill University, both in Computer Science with a focus on Bioinformatics. She also completed a postdoctoral fellowship involving chemical genomics with Drs. Guri Giaever and Corey Nislow at the University of Toronto.
The ICGC-TCGA DREAM Somatic Mutation Calling Challenge has highlighted the many difficulties in benchmarking and scoring structural variant detection, and has revealed areas of improving detection algorithms. The Challenge includes 206 structural variant analyses of three in silico tumors (created with BAMSurgeon). Different approaches to scoring these analyses for accuracy, and to defining ensembles of these analyses, will be presented. While different structural variant detection algorithms exhibit characteristic error profiles, errors generally tend to be associated with low mapping quality at the predicted breakpoints.
Dr. Chen has a background in machine learning, statistical signal processing, and cancer genomics. He has developed a set of computational tools such as BreakDancer, TIGRA, CREST, and VarScan that have been applied to characterize individual and population genomics in the Cancer Genome Atlas (TCGA) and the 1000 Genomes Project. He is particularly interested in constructing the genomes and the transcriptomes of various cancer cell populations towards understanding the heterogeneity and the evolution of cancer as a consequence of genetics and environment. He is also interested in developing integrative approaches to identify biomarkers that are useful for clinical decision support.
Structural variation (SV) is a major source of genomic variation and plays an important role in cancer genome evolution. However, due to the methodological limitations in aligning and interpreting short reads spanning breakpoints, current methods cannot achieve a high sensitivity and precision. Here, we present a novel algorithm, novoBreak, which comprehensively characterizes a variety of structural breakpoints at base-pair resolution. novoBreak first chops tumor reads into k-mers and indexes them; then by filtering against reference and normal reads, it derives tumor-specific k-mers (novo-kmers); next it clusters reads with the same breakpoints based on the novo-kmers and then locally assembles the reads associated with each breakpoint into contigs. After aligning the contigs to the reference, novoBreak identifies the precise breakpoints and infers various types of SVs. novoBreak consistently performed best in the SV breakpoint calling subchallenges in the ICGC-TCGA DREAM 8.5 Somatic Mutation Calling challenge. The framework of novoBreak can also be applied to discover germline events, gene fusions in RNA-seq data and SV breakpoints in whole exome data. The wider application of novoBreak is expected to reveal comprehensive structural landscape that can be linked to novel mechanistic signatures in cancer genomes.