ISMB 2014 Workshops

Attention Conference Presenters - please review the Speaker Information Page available here.

WK01 - Junior Principal Investigators Meeting Sunday, July 13 2014, 10:30 a.m. – 12:25 p.m.
Room: 310


Organizer(s):

Geoff Macintyre, University of Melbourne, Australia
Jeroen de Ridder, Delft University of Technology, Netherlands
Venkata Satagopam, University of Luxembourg, Luxembourg
Manuel Corpas, The Genome Analysis Centre, United Kingdom
Yana Bromberg, Rutgers University, United States

 

Presentation Overview:

 

Building on the success of the JPI meeting in 2013 (http://goo.gl/g7qbxj), the JPI workshop will bring together scientists who recently started or expect to start their own research group. It will serve the purpose of launching a community of junior PIs within the broader field of Computational Biology, providing the ideal platform for networking opportunities and the sharing of valuable experiences (http://goo.gl/kGr0Af).

Present your research

During the workshop you will get the chance to present your research in a 2-3 minute elevator pitch using a single presentation slide. This will facilitate networking opportunities and help connect you with potential collaborators. If you want to present you must submit a presentation slide (see details below).
Submissions will be accepted until 5pm Saturday July 12.

Hear from some of the best

Predrag Radivojac, Chad Myers, Curtis Huttenhower and Burkhard Rost will provide their tips on how to successfully set up and lead your lab. They will form a panel discussion for participants to ask questions.

Discuss the challenges of being a Junior PI with your peers

There will be a series of round-table discussions at the JPI lunch (immediately following the workshop) at P.F. Chang’s China Bistro, 800 Boylston St, Boston. Each participant can have an individual bill for lunch and mains cost $10-20. During this session we aim to learn from each other by discussing topics that are relevant to being a Junior PI. Examples: Grant writing, hiring and supervising, setting up collaborations, leading a team, and the role of social networks/media in maximizing research impact. Use the submission form to suggest topics relevant to you. Register your interest in attending the lunch during slide submission (details below).

 

Part A: Introduction and elevator pitches (10:30 a.m.-10:55 a.m.)
Session Description:

Aided by a single slide all delegates receive 2-3 minutes of speaking time to introduce themselves and their research. This will promote future networking opportunities and indepth scientific discussion.

Submission guidelines

Submission website: https://www.easychair.org/conferences/?conf=jpiismb2014. Submissions will be accepted until 5pm Saturday July 12.

Please include the following information in the abstract box on the submission website:

  • a short bio about you and your research to share with participants (maximum 200 words)

  • any topic recommendations for the round table discussions

  • whether you will attend the JPI lunch following the workshop to continue round table discussions

 

Part B: Elevator pitches by participants (continued) (11:00 a.m.-11:25 a.m.)
Part C: Panel Discussion: how to run a successful lab. (11:30 a.m.-11:55 a.m.)
Speaker: Predrag Radivojac, Indiana University, United States
Speaker: Chad Myers, University of Minnesota, United States
Speaker: Curtis Huttenhower, Harvard School of Public Health, United States
Speaker: Burkhard Rost, Technical University Munich, Germany
Part D: Panel discussion (continued) and Round table discussions (12:00 p.m.-12:25 p.m.)
Session Description:

To facilitate discussions we will organize a moderated round table discussion in which each table will discuss a certain subject. The goal of the discussion is to jointly identify a set of lessons or insights that will then be shared with the rest of the delegates during the plenary feedback session.
We will define a selected list of topics and assign delegates to the subjects according to their interests.
Each group will discuss the subject at hand and share their experiences. Each group has a discussion leader and someone taking minutes. Potential subjects are:
● Grant writing
● Hiring and supervising
● Setting up collaborations
● Leading a team
● The role of social networks/media in maximizing research impact
Suggestions for additional topics will be solicited prior to the workshop. The round table discussions will continue into lunch. Each participant can have an individual bill for lunch and mains cost $10-20


top

 

WK02 - Bioinformatics Core Facilities Sunday, July 13, 2014, 3:05 p.m. – 5:00 p.m.
Room: 310


Organizer(s):

David Sexton, Novartis Institute of Biomedical Research, United States
Brent Richter, Partners Healthcare, United States
Simon Andrews, Babraham Institute, United Kingdom
Matthew Eldridge, Cancer Research UK Cambridge Institute, United Kingdom
Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Resear, Switzerland

 

Presentation Overview:

 

The Bioinfo-core group is an international organization that brings together computational biologists working in and running bioinformatics core facilities.

The group was formed in 2002 and draws its membership from over 200 facilities in 23 countries. The group presented workshops at the last five ISMB conferences and met previously in BoF sessions.

Bioinformatics core facilities undertake work encompassing the full breadth of modern computational biology.  The ISMB conference has proved to be the ideal opportunity for core facilities to acquaint themselves with the state of the art in a wide range of topics at a single meeting.  However the environment in which core facilities operate means that they view the field from a perspective which requires them to weigh the scientific merits of analyses against factors such as ease of use, ease of interpretation, ability to scale, and robustness.  Their exposure to varied datasets from many different sources allows them to critically evaluate software and methods from a viewpoint not normally available to research groups. They also have specific interests relating to the running of large multi-user facilities that the workshop could address.  Although these topics might be covered in other ISMB sessions, our discussions offer a different perspective, being based around the specific requirements and experiences of core facilities. We are building on a format that has worked very well the last several years and that received high reviews.  The proposed themes and talks are topical and different from those in previous years.

 

Part A: “Core on a Budget” vs “Enterprise-Level Bioinformatics” (3:05 p.m.-3:30 p.m.)
Speaker: Alistair Kerr, Wellcome Trust Centre for Cell Biology, United Kingdom
Speaker: Michael Poidinger, Singapore Immunology Network, Singapore
Session Description:

Is there a strong case for using commercial software in a bioinformatics core or are free open source alternatives not only viable but preferable? Here two differing bioinformatics core facilities will make a case for their choices in both hardware and software infrastructure. The  discussion will focus on practicality, scientific validity, ideology and reproducibility, but will also cover constraints of budget and time as well as the dynamics between users and core facility staff.

Alastair Kerr, running a core on a budget, maintains his own servers and has made the decision to empower users to do at least a basic amount of analysis themselves. He is an advocate for free open source software and data, finding that many such solutions are preferable to commercial alternatives. In this talk he will detail such software that has been used for interfacing with users as well as some areas where commercial or semi-commercial products are still used. Like most core facilities, he has had to deal with the influx of large quantities of deep-sequencing data and will describe the associated budget-focused choices in hardware that were made.

Michael Poidinger has previous experience in pharma Discovery IT and currently runs a core group for an academic/governmental translational immunology research institute.  The group is very well funded by central funds with little need for cost recovery and focusses more on a collaborative rather than service model. At the same time his team of 8 handles 200+ people in 28 groups together with numerous external partnerships, so throughput and efficiency are also of concern.  The group is founded on two commercial products, Pipeline Pilot from Accelrys and Spotfire from Tibco/Perkin Elmer.  In this talk he will describe how the software first seen in the pharma environment has been brought over to an academic one, and the improvements in productivity that these tools provide.

Part B: Open Forum Discussion (3:35 p.m.-4:00 p.m.)
Speaker: David Sexton, Novartis Institutes for Biomedical Research, United States
Speaker: Matthew Eldridge, Cancer Research UK Cambridge Institute, United Kingdom
Session Description:

The discussion will pick up on the themes introduced by the speakers. We anticipate that participants will represent a diverse cross-section of bioinformatics groups with varying levels of resources both in terms of staffing and operating budgets. Some groups will have limited capacity to implement comprehensive bioinformatics infrastructure and are perhaps more likely to deploy open source or freely available software and analysis workflows developed by the wider academic community. These groups may have developed innovative approaches to overcome the constraints they face. Conversely, organizations that have been able to invest in enterprise-level offerings from bioinformatics software vendors will be able to provide an insight into how well these solutions work in practice that should be of interest to those making “buy versus build” decisions. In between these two extremes will be groups that have sufficient personnel to develop in-house solutions but where there may be significant duplication of effort to achieve the same end goals, e.g. many groups have developed their own workflow engines, sequencing LIMS or systems for tracking analysis projects but did the specific requirements of their institute or organization really justify this? In the open forum discussion we will explore the choices made by different groups.

Part C: Breaking topics in core facilities (4:05 p.m.-4:30 p.m.)
Speaker: Simon Andrews, Babraham Institute, United Kingdom
Speaker: Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Resear, Switzerland
Session Description:

This part of the workshop will be a structured discussion which will cover several different topics spanning the technical, scientific and managerial aspects of core facilities.  In contrast to the previous session there will be no named invited speakers and the entire session will be based around moderated group discussions.

The rationale for this session was constructed around observations made at previous workshops, namely that:

  1. The sessions for a workshop are planned several months in advance of the conference, and it is often the case that new technologies arise in the intervening time which would merit discussion but are then difficult to include.

  2. Managers of core facilities often have short topics on which they would like to solicit the opinion of the bioinfo-core community, but the previous structure of the workshops has not allowed for this.

  3. The discussions in workshops are often led by a more vocal subgroup of the whole community and we wanted to find ways to allow more people to become involved.

The aim of the session therefore is to end up with a set of community submitted late breaking topics for discussion which would be presented by the moderators in the workshop.  Each topic would be the subject of short, focussed discussion (5 - 10 minutes) so that we can try to cover a range of different areas of core facility management in order to address the interests of as many people as possible.  Within the constrained time we may not manage a complete coverage of an individual topic,  but those which promote most discussion could be taken forward for online coverage or be included in future conference calls or workshops as more major items.

The practical way in which we envisage this session to work would be that several weeks before the meeting we would put out a request to the bioinfo-core mailing list asking for submissions of short topics which they would find interesting. The topics could be a simple question, a suggestion of an area for discussion or a specific observation which they thought was interesting. Topics can be supplemented with a single explanatory powerpoint slide if needed. All submissions will be presented anonymously unless people specifically say that they are happy to have their name attached.

Around a fortnight before the meeting the workshop organisers will get together to go through the submissions. They will put them into rough groupings based on similar areas of interest and will deduplicate and rationalise them to try to tie together similar themes so that discussion can more naturally flow from one topic to another. They then finally put the deduplicated topics in order based on their popularity.  The finalised topics for the session could then be advertised at this point.

At the meeting two moderators will be responsible for managing the session and will work their way through the questions putting them out to group discussion and deciding when enough time has been spent on a topic to move on to the next one. We would aim for each topic to get around 5 minutes of discussion

After the session the whole set of discussion topics and associated notes from the discussion will be put up onto our bioinfo-core wiki so that and topics which were not sufficiently well covered in the meeting could be continued online.

Part D: (4:35 p.m.-5:00 p.m.)
top

 

WK03 - Workshop on Education in Bioinformatics (WEB 2014) : The Online World of Bioinformatics Education Monday, July 14, 2014, 10:30 a.m. – 12:25 p.m.
Room: 310


Organizer(s):

Bio:

Michelle Brazas, PhD is the Coordinator of the Canadian Bioinformatics Workshops (bioinformatics.ca) and Manager of Bioinformatics Education at the Ontario Institute for Cancer Research. She also teaches bioinformatics in Nigeria as part of her outreach programs. She also holds a position on the Global Organisation for Bioinformatics Learning, Education & Training executive.

Bio:

Patricia Palagi, PhD is the Coordinator of the SIB Education Activities and coordinator of the Bioinformatics track of the Master’s in Biology common to the Universities of Geneva and Lausanne. She teaches bioinformatics for proteomics and programming languages, and organises a variety of bioinformatics training, workshops and events for a wide audience. She also holds a position on the Global Organisation for Bioinformatics Learning, Education & Training operational board.

Bio:

Vicky Schneider, PhD is part of the Senior Management Team at TGAC, where she is Head of the Training and Outreach Team. She also facilitates and delivers actual training for a variety of audiences on bioinformatics and related subjects. She also holds a position on the Global Organisation for Bioinformatics Learning, Education & Training executive.

Bio:

Fran Lewitter, Ph.D. is the Director of Bioinformatics and Research Computing at Whitehead Institute for Biomedical Research. Her group develops materials and provides training to biologists in the Institute. In addition, Fran is a member of the ISCB board of directors and the chair of the Education committee. She also holds a position on the Global Organisation for Bioinformatics Learning, Education & Training executive.

 

Presentation Overview:

 

Relative to traditional classroom based learning environments, online teaching and learning, especially in bioinformatics, presents its own opportunities and challenges. While teaching through the web has the potential to reach a much larger audience, is this style of teaching of benefit to the learner? What are the benefits and differences between self-directed learning using web-based materials and being taught through large online classrooms as in MOOCs? Do bioinformatics games offer any teaching capacity?

Online learning in bioinformatics represents an exciting new opportunity for bioinformatics training programs. Since it takes considerable effort to put together online teaching material, understanding if and when a shift in bioinformatics training resources is warranted is important. Through a series of presentations and discussions, this workshop aims to bring awareness to the challenges of offering online bioinformatics training and to provide a platform for sharing online training strategies as experienced by existing online learning programs in bioinformatics.

This workshop will consist of presentations on the topics of 1) massive open online courses in bioinformatics; 2) online gaming in bioinformatics; 3) self-directed learning in bioinformatics; and conclude with a panel debate on 4) the merits and pitfalls of online learning in bioinformatics.

 

Part A: Details on running a MOOC in Bioinformatics and Biostatistics​ (10:30 a.m.-10:55 a.m.)
Bio:

Dr. Love is the Teaching Fellow for the Data Analysis for Genomics course offered through Harvard X, which is taught by Dr. Rafael Irizarry . Dr. Love, in the group of Dr. Irizarry, develops statistical software for the free, open source bioinformatics platform Bioconductor, which was used extensively during the Data Analysis for Genomics course.

Session Description:

Dr. Love will speak on how to produce content for an open online course, the mechanics of running a course, how such courses can (or cannot) be used to teach bioinformatics, and what resources are available for someone interested in creating an online course.

Part B: Games that teach, games that learn: an introduction to serious games for bioinformatics (11:00 a.m.-11:25 a.m.)
Bio:

Dr. Good’s research focuses on games with a purpose in crowdsourcing biological knowledge. To this end, his research has contributed to the development of a suite of interactive video games that allow for play, learning and contributions to science: genegames.org.

Session Description:

Dr. Good will speak on online games for bioinformatics and how they can (or cannot) be used to teach bioinformatics.

Part C: The MOOC Universe from the Learners' Perspective (11:30 a.m.-11:55 p.m.)
Bio:

Dr. Searls is an independent consultant in bioinformatics and former Senior VP of Bioinformatics at GlaxoSmithKline Pharmaceuticals. He recently authored “Ten Simple Rules for Online Learning” and “An Online Bioinformatics Curriculum”.

Session Description:

This talk will review the state of the MOOC universe as experienced by those seeking to study computational biology and related fields. It will include assessments of the current coverage of key topics, the quality and consistency of course offerings, and specific opportunities for improvement in both. Particular attention will be paid to the strengths and weaknesses of the online format with regard to achieving proficiency at various levels in bioinformatics and computational biology.

Part D: Panel Discussion: The Merits and Pitfalls of Online Learning for Bioinformatics (12:00 p.m.-12:25 p.m.)
Speaker: Michael Love, Harvard School of Public Health, United States
Speaker: Benjamin Good, The Scripps Research Institute, United States
Speaker: David Searls, University of Pennsylvania, United States
Speaker: Nicholas Provart, University of Toronto, Canada
Session Description:

The panel will include the 3 speakers plus a remote speaker (to test online learning systems!), and will debate the merits and pitfalls of online learning for bioinformatics. Do online learning mechanisms adequately teach bioinformatics, or must online learning be supplemented with other modes of learning? Where does online learning in bioinformatics need to go?

top

 

WK04 - Harnessing Genetic Diversity to Enable Personalized Medicine Monday, July 14, 2014, 2:10 p.m. – 4:05 p.m.
Room: Ballroom C


Organizer(s):

Frank Emmert-Streib, Queen's University Belfast, United Kingdom
Sol Efroni, Bar Ilan University, Israel
Christos Hatzis, Yale University, United States

 

Presentation Overview:

 

The post-human genome era has sprung efforts to generate data on all molecular and cellular levels at an unprecedented depth and breadth across all major cancers. Such efforts have opened enormous new possibilities for medical research to help uncover the molecular causes of complex diseases with the help of high-throughput data. On the downside, analysis and interpretation of the available data, e.g., from NGS technologies, is not straightforward and requires the development of dedicated analysis methods and visualization tools. Yet, an even greater challenge is to integrate the multiple layers of available data and synthesize the composite picture of each individual tumor that could inform a more personalized and precise disease management. The goal of this workshop is to provide a forum to disseminate the newest developments in systems biomedicine.


For more information about the workshop please visit our web page.

 

Part A: (2:10 p.m.-2:35 p.m.)
Speaker: Gad Getz, Broad Institute, United States
Part B: (2:40 p.m.-3:05 p.m.)
Speaker: Manolis Kellis, MIT, United States
Part C: (3:30 p.m.-4:05 p.m.)
Speaker: Atul Butte, Stanford School of Medicine, United States
top

 

WK05 - Trends in genomic data analysis with R/Bioconductor Tuesday, July 15, 2014, 10:30 a.m. – 12:25 p.m.
Room: 310


Organizer(s):

Bio:

Levi Waldron is an assistant professor of Biostatistics at the City University of New York, with interests in prediction modeling and meta-analysis for genomic data.  He is a recent addition to the Bioconductor Technical Advisory Board, and is committed to the development and promotion of the Bioconductor project.

 

Presentation Overview:

 

The Bioconductor project is a leading development and analysis environment for bioinformatics, supported by a core of dedicated programmers and a broad contributing scientific community.  The project is evolving rapidly along with sequencing technologies and the quantity of available genome annotation, and this workshop provides ISMB attendees with the inside track on the most recent and upcoming trends in Bioconductor. The workshop will begin with a high-level tour of leading Bioconductor packages and capabilities across a wide variety of disciplines, then will cover current advances for 1) accessing genomic annotation data such as ENCODE and the UCSC genome browser through the AnnotationHub architecture, 2) data and algorithm element designs for integrative analysis of large genomic data and annotation that permit scalable resource utilization at run-time, and 3) analysis of RNA-seq data.  The workshop features the project leader Martin Morgan, co-founders and Core member Vince Carey, Advisory Board member Levi Waldron, and post-doctoral fellow Michael Love (Rafael Irizarry lab).  This workshop is intended for a wide audience and will be valuable for beginner to experienced analysts of genomic data.

 

Part A: An overview of genomic data analysis in Bioconductor (10:30 a.m.-10:55 a.m.)
Bio:

Levi Waldron is an assistant professor of Biostatistics at the City University of New York, with interests in prediction modeling and meta-analysis for genomic data.  Levi is a recent addition to the Bioconductor Technical Advisory Board, and is committed to the development and promotion of the Bioconductor project.

Session Description:

Bioconductor is a collection of more than 700 individually code-reviewed software package, hundreds more annotation and experiment data packages, and specialized data structures for various domains.   This talk will provide a high-level overview of Bioconductor in the domains of gene expression, DNA variant calling, flow cytometry, proteomics and metabolomics.  This talk will provide users with an up-to-date overview of Bioconductor’s offerings and recent developments in these diverse domains.  It will also review options available to users wishing to analyze public data from souces such as The Cancer Genome Atlas, the Gene Expression Omnibus, and ArrayExpress, and data distributed by Bioconductor itself.

Part B: Genomic data and annotation through AnnotationHub (11:00 a.m.-11:25 a.m.)
Bio:

Dr. Martin Morgan trained as an evolutionary geneticist, and has been working with R / Bioconductor for the last 10 years. Dr. Morgan, based at the Fred Hutchinson Cancer Research Center, is current leader of the Bioconductor software project for the analysis and comprehension of high-throughput genomic data. Diverse topics addressed by Dr. Morgan's group include the design and analysis of sequence and microarray experiments, quality metrics of sequence-related data, efficient representation and manipulation of sequence data, and forward-looking approaches to representation, analysis, and use of whole-genome sequences.

Session Description:

Bioconductor provides a wealth of genomic annotation data as pre-compiled packages: gene-centric, transcript-centric, functional, and technology-specific. However as the size and richness of genomic annotations increases, such as generated by the ENCODE project, a more flexible and scalable approach is needed.  AnnotationHub is a database-driven, client-server architecture that meets these needs and is more convenient for creators and users of genomic annotation.  Annotations are added through simple “recipes” linked to versioned data and metadata; users can retrieve snap shots from any point in the history of a particular data resource, or search for data resources based on any of the associated metadata.  A beta version already provides more than 10,000 ENCODE tracks as prepared Bioconductor objects, and we expect AnnotationHub to overtake the current package-based model of providing annotation and experimental data in Bioconductor over the next several years.  This presentation will introduce the architecture and usage of AnnotationHub, for both users and contributors.

Part C: Scalable integrative bioinformatics with Bioconductor (11:30 a.m.-11:55 p.m.)
Bio:

Vincent Carey is Associate Professor of Medicine (Biostatistics) at the Channing Division of Medicine, Brigham and Women's Hospital, Harvard Medical School.  He is co-founder and core member of Bioconductor, and has given invited courses on statistical analysis of genome-scale data on four continents.

Session Description:

We discuss how to decompose data and algorithm elements so that available hardware configurations, be they modest or vast, can be used to maximize throughput of bioinformatic analyses.  We prefer algorithm and data structure designs that do not require developers to make strict advance commitments regarding memory requirements and degree of parallelism achievable.  Instead, we want something close to run-time determination of suitable approaches to resource utilization.  Ultimately such determinations should be autonomous; at present, user-driven selection of subproblem size and compute resource size and structure are needed, but the software and the software interface should allow great flexibility in mapping the solution process to the resources according to these selections.  We will discuss a number of use cases, including the evaluation of different procedures for inference on eQTL.  The required computations are intensive and involve substantial volumes of experimental and annotation data. An Amazon EC2 machine instance (AMI) has been created with all necessary software and data to illustrate the entire procedure, and we describe how to use the open StarCluster configuration tools to create and use clusters based on this AMI to solve various genomic discovery and interpretation problems with Bioconductor. 

Part D: RNA-Seq workflows in Bioconductor (12:00 p.m.-12:25 p.m.)
Bio:

Michael Love is a postdoctoral fellow in the Department of Biostatistics at the Dana Farber Cancer Institute and Harvard School of Public Health, in the group of Dr. Rafael Irizarry. Dr. Love's research focuses on statistical inference of biological signal from high-throughput sequencing read counts, working currently on the DESeq2 method for RNA-Seq and on sparse representations of coverage from genomic assays.

Session Description:

Bioconductor contains a number of software and annotation packages which simplify an RNA-Seq analysis workflow. This session will cover a standard pipeline: obtaining gene models and counting RNA-Seq reads in genes and exons; performing quality assessment and exploratory data analysis using transformed counts; performing differential expression at the gene or exon level; and finally annotating results and facilitation of downstream analysis.  Additionally, new DESeq2 functionality will be demonstrated which allows for moderated effect size estimates for an individual treatment relative to a number of other treatments.

top

 

WK06 - What Bioinformaticians need to know about digital publishing beyond the PDF2 Tuesday, July 15, 2014, 2:00 p.m. – 3:55 p.m.
Room: Ballroom C


Organizer(s):

Bio:

After training in Biochemistry from Imperial College, and a PhD on the Molecular Pathology of Ocular Melanoma at the Royal London Hospital his research mainly focused on cancer cell and molecular biology. After postdocs at the WHO International Agency for Research in Cancer in Lyon and Queen Mary University of London, he was senior scientific editor for the BMC Genomics and Bioinformatics journals at BMC before moving in 2010 to Shenzhen/Hong Kong to set up the GigaScience journal, database and integrated data analysis platform for the BGI.

Bio:

Jun Zhao is a Lecturer in the School of Computing and Commmunications InfoLab21, at Lancaster University. She is interested in applying Semantic Web technologies for publishing research data (including images) to the Web. She is an active member in the Linked Data and Research Object research communities.

Bio:

Dr. Marco Roos is Research Scientist in Human Genetics at Leiden University Medical Center, and co-director of the Biosemantics Group. Marco also holds a position at the Informatics Institute of the University of Amsterdam, and leads the Interoperability Task Force of the Netherlands Bioinformatics Centre (NBIC). His research interests are e-Science (Semantic Web and workflows) and the application of knowledge discovery techniques for elucidating molecular mechanisms in the cell, in particular the role of epigenetics.

Bio:

Barend Mons is professor in Biosemantics at the Department of Human Genetics at the Leiden University Medical Centre, and Scientific Director of the Netherlands Bioinformatics Centre (NBIC). He is the initiator of ConceptWiki.org, an inventor of Knowlet technology for knowledge discovery, and a driving force behind the Concept Web Alliance. He also co-founded the companies Collexis and Knewco. Mons published over 70 peer reviewed articles and holds three patents in semantic technology.

Bio:

Carole Goble is a full professor in Computer Science at the University of Manchester, UK. She has an international reputation in Semantic Web, Distributed computing, and Social Computing for scientific collaboration. She is a member of the Software Sustainability Institute UK. She directs the myGrid project, known for Taverna, myExperiment, Biocatalogue, and SEEK. In 2008 Carole was awarded the inaugural Microsoft Jim Gray award for outstanding contributions to e-Science. In 2010 she was elected a Fellow of the Royal Academy of Engineering. In 2012 she was nominated for the Benjamin Franklin award for open science in Biology.

Bio:

Amye Kenall is BioMed Central’s Journal Development Manager of Open Data initiatives and journals. She works closely with repositories and open source research tools in order to better the link between data and publication. She is responsible for spearheading BioMed Central open data initiatives and policy, developing the open data portfolio, and acting as the recognised point of contact for all open data initiatives internally and externally.

Bio:

I am Associate Director at the University of Oxford e-Research Centre and I also work at Nature Publishing Group as data consultant and Honorary Academic Editor for Scientific Data, an open access data publication platform.

 

Presentation Overview:

 

Data, software and written communication in the life sciences are rapidly increasing in volume, frequency, and sophistication. There is presently an explosion of alternative publication formats explicitly created for digital environments, and online resources are rapidly becoming essential tools for the life sciences and blur the line between data-production and communication. Furthermore, innovative platforms for crowd-sourced knowledge discovery are emerging where a vast landscape of findings and hypotheses can be made public in real-time data streams such as twitter, blogs, and wikis.

Current peer review processes are increasingly questioned, with a number of studies pointing out flaws in the system and a growing replication gap, a lack of incentives and technical challenges in reviewing code and data. The default peer review infrastructure as it stands today does not allow for quick validation of studies through review of the data and analysis tools behind the study. This has resulted in a number of journals and initiatives starting to implement different approaches to peer-review and testing of software, including the Mozilla Science code as a research object project.

Following last years popular workshop, the aim is to inform ISMB participants of rapid changes and new opportunities in scientific communication

 

Part A: Alternative models of Peer Review (2:00 p.m.-2:25 p.m.)
Bio:

Journal Development Manager at BioMed Central, and editor of Biome Magazine.

Session Description:

Academics often talk of the inefficiencies in current peer-review systems, and there have been suggestions for reform in various ways in past years. Recently a number of journals and initiatives have looked at new approaches to peer review to address these concerns - journal policies to reduce the number of iterations of review aiming for speed of publication, open peer review to increase transparency of the process, post publication peer review with new ways to surface quality research, or de-coupled review services that move responsibility for peer review from the publishing journal to the research community.

This presentation will explore some of these new approaches,as well as those with established variations on the traditional model such as Biology Direct (a journal edited by Eugene Koonin, David Lipman and Laura Landweber that since launch in 2006 has aimed to return the responsibility in peer review to the authors and reviewers, by allowing authors to choose reviewers, and by including named reviewer reports in the published manuscript as a guide to the paper as seen). Transitioning into our next section on reproducibility, we ask whether peer-review of software and data is needed and explore whether cloud computing might provide the infrastructure needed for a more painless peer-review of software and data.

Part B: Recent insights from experiments in reproducibility: What's the plan again? (2:30 p.m.-2:55 p.m.)
Bio:

Philippe Rocca-Serra, after an engineering degree from University of Rennes, received his PhD in Molecular Genetics from University of Bordeaux. He worked at EMBL-EBI in the helping establishing the european microarray archive. He has 10 years of practice in data management and has been an active member of several standardization efforts, aiming at promoting open data and open science vision. He is technical coordinator of the ISA project, part of the OBO Foundry editorial board and participates to resource development as part of the OBI project.

Bio:

Alejandra is a Sr. Software Engineer in the Oxford e-Research Centre (OeRC), University of Oxford, UK, working in the Data Sharing Infrastructure team. The main projects are the Investigation/Study/Assay (ISA) infrastructure and the BioSharing catalogue. Before joining OeRC, Alejandra was Senior Research Associate in Computational and Systems Medicine at University College London (UCL). At UCL, she was previously at the Department of Computer Science, working on knowledge management for cancer research data. She was awarded a PhD in Computer Science from Queen's University Belfast, UK and a Licentiateship from Universidad Nacional de Rosario, Argentina

Session Description:

Good data stewardship starts with good plans. This includes good description of experimental plans. In spite of the big data buzz, for most scientists, sound experimental setup is at the core of their work, with good level of replication giving strength to evidence. Yet very few make the most of experimental design information for managing, reporting but also reviewing and publishing scientific data. In this session, we would like to explore how simple principles derived from experimental design practice can greatly enhance data generation, data quality, data reporting, data review and data publication. The experience garnered over an experiment in reproducibility will also touch on issues associated with reporting findings.

Part C: Trends in data publishing (3:00 p.m.-3:25 p.m. )
Bio:

Barend Mons is professor in Biosemantics at the Department of Human Genetics at the Leiden University Medical Centre, and Scientific Director of the Netherlands Bioinformatics Centre (NBIC). He is the initiator of ConceptWiki.org, an inventor of Knowlet technology for knowledge discovery, and a driving force behind the Concept Web Alliance. He also co-founded the companies Collexis and Knewco. Mons published over 70 peer reviewed articles and holds three patents in semantic technology.

Session Description:

In the big data era, our funding bodies are starting to require a proper data stewardship plan for any project that produces data by public funding. What does this mean for the bioinformatics community? What does this mean for getting credit for the valuable work that we do to produce, preprocess and interpret data? What tools will we have available to increase the value of data, especially when data and method sharing will become a must. In this session Barend Mons addresses issues involved with the new trends in data publishing, data models and tools that support the bioinformatician, and the opportunities these present for biological research.

Part D: Software Review Panel: Q&A and audience participation (3:30 p.m.-3:55 p.m.)
Bio:

I head up the Mozilla Science Lab, an initiative dedicated to using the power of the web to change the way we do science. Strong believer in the promise of open science, interoperability, and good whisky (not necessarily in that order). I also chair big data events in London, advise the UK government on digital infrastructure, and bring data scientists and NGOs together to hack for social good at DataKindUK.

Bio:

Carole Goble is a full professor in Computer Science at the University of Manchester, UK. She has an international reputation in Semantic Web, Distributed computing, and Social Computing for scientific collaboration. She is a member of the Software Sustainability Institute UK. She directs the myGrid project, known for Taverna, myExperiment, Biocatalogue, and SEEK. In 2008 Carole was awarded the inaugural Microsoft Jim Gray award for outstanding contributions to e-Science. In 2010 she was elected a Fellow of the Royal Academy of Engineering. In 2012 she was nominated for the Benjamin Franklin award for open science in Biology.

Bio:

Philip E. Bourne PhD is the Associate Director for Data Science (ADDS) at the National Institutes of Health. Formally he was Associate Vice Chancellor for Innovation and Industry Alliances, a Professor in the Department of Pharmacology and Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California San Diego, Associate Director of the RCSB Protein Data Bank and an Adjunct Professor at the Sanford Burnham Institute.

Session Description:

This final session will bring together a panel of representatives from research and community initiatives to present a range of new approaches to peer review and publishing of software including Kaitlin Thaney (as a panel host), Carole Goble, and Phil Bourne. This panel will explore questions around software publishing and reuse, giving the audience the opportunity to participate. Questions to be explored include but are not limited to:

  • The practice of assigning identifiers to code so that it can be integrated into scholarly publications is becoming more of a reality, but what level of responsibility do reviewers now have for ensuring that code is sound?
  • What information is needed for the code to be picked up, forked and run by someone else outside of their lab?
  • What are the minimal metadata fields needed? The infrastructure (documentation, APIs, etc)?
  • How can we best surface information immediately of use to the researcher that gives them the necessary information to understand, use and build on the code made available?
  • In terms of code and best practice for reuse, what can we start doing right now? What do we need to aim for in the future?
top