Late Breaking Research Presentation Schedule

LBR01                                                                                Sunday, July 11: 10:45 a.m. - 11:10 a.m.
Down-regulation by microRNAs depends on target mRNA abundance
Room: 305
Presenting author: Aaron Arvey, MSKCC, United States

Additional authors:
Aaron Arvey, MSKCC, United States
Christina Leslie, MSKCC, United States
Debora Marks, Harvard Medical School, United States

Abstract:
Post-transcriptional regulation by microRNAs and siRNAs will depend on systems-level properties, as well as characteristics of individual binding sites in target mRNA molecules. Simple chemical kinetics predicts that the level of microRNA regulation will depend upon concentration of mRNA transcripts with target sites in the cell; that is, target abundance acts as a rate-limiting step in degrading target transcripts. To test this we analyze 143 microRNA and siRNA transfection experiments and show that down-regulation by miRNAs and siRNAs depends on total target mRNA abundance. Comparing pairs of miRNAs with high and low target abundance, we show that similar sites can result in very different amount of regulation as a result of differential target abundance. Our conclusion is that more global properties, such as mRNA target abundance, need to be considered in addition to local determinants. Furthermore, the paradigm of microRNA and siRNA targeting should shift away from the simple discretization of 'target' or 'not a target' and towards a more quantitative framework. This has consequences for microRNA target prediction, siRNA design and small RNA therapeutics.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17033.pdf

ISMB 2010 Blog
LBR01: Aaron Arvey - Down-regulation by microRNAs depends on target mRNA abundance Show Comments
TOP

LBR02                                                                                Sunday, July 11: 11:15 a.m. - 11:40 a.m.
Decipher the protein cofactors for small RNA function by comprehensive phylogenetic analysis, protein interactions, expression data, high throughput screens and other data sets
Room: 305
Presenting author: Yuval Tabach, Harvad/ MGH, United States

Abstract:
Small RNAs are short (~21-30nt) single-stranded RNA molecules which regulate chromatin structure, chromosome segregation, transcription, RNA processing, RNA stability, and translation. Systematically identifying the protein components of the small RNA world and understanding how they integrate into other molecular pathways represents a major challenge in cell biology. In an effort to more comprehensively identify genes that contribute to small RNA pathways, we experimentally and computationally collected different genome-scale data sets including: protein-protein interaction, phylogenetic profiles, expression databases, RNAi phenotypes, genetic interactions, interolog interactions, mass spectrometry data and RNAi screens to detect genes involved in microRNA and siRNA. Alone, each dataset has limitations and biases that result in limited sensitivity and specificity.
Combining these different resources provided clues on gene functionality and increase our confidence in prediction of new tinyRNA genes while reduce the false prediction rate. We integrated the data to generate a list of genes that are highly implicated as functioning in small RNA pathways. Our results of this integrative approach show high sensitivity and specificity and suggest novel genes from many noisy datasets which are highly likely to be part of the tiny RNA pathways.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17006.pdf

ISMB 2010 Blog
LBR02: Yuval Tabach - Decipher the protein cofactors for small RNA function by comprehensive phylogenetic analysis, protein interactions, expression data, high throughput screens and other data sets Show Comments
TOP

LBR03                                                                                Sunday, July 11: 11:45 a.m. - 12:10 p.m.
mRNA half-life affects susceptibility to RNAi
Room: 305
Presenting author: Erik Larsson, Memorial Sloan-Kettering Cancer Center, United States

Abstract:
The RNAi pathway participates in basic cellular processes and has enabled the development of si/shRNAs (small RNAs) as powerful investigational tools and potential therapeutics. We hypothesized that the turnover rates of mRNAs influence their susceptibility to small RNA perturbation. This is based on the idea that transcripts with short half-lives are already under strong destabilizing post-transcriptional regulation and should therefore be less vulnerable to the addition of novel negative regulatory factors. By reanalysis of mRNA stability data and small RNA overexpression experiments, we show that short-lived transcripts are consistently difficult to silence using siRNAs and less affected by microRNA regulation and siRNA off-target effects. mRNA half-life is therefore a target-inherent factor that may be considered when predicting microRNA and siRNA on-target/off-target effects.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16971.pdf

ISMB 2010 Blog
LBR03: Erik Larsson - mRNA half-life affects susceptibility to RNAi Show Comments
TOP

LBR04                                                                                Sunday, July 11: 12:15 p.m. - 12:40 p.m.
Marginal Cost of Discovery: An Economic Alternative to False Discovery Rate
Room: 305
Presenting author: S. Joshua Swamidass, Washington University, United States

Additional authors:
S. Joshua Swamidass, Washington University, United States
Joshua Bittker, The Broad Institute of MIT/Harvard, United States
Nicole Bodycombe, The Broad Institute of MIT/Harvard, United States
Sean Ryder, University of Massachusetts, United States
Paul Clemons, The Broad Institute of MIT/Harvard, United States

Abstract:
How many initial positives ('hits') from a high-throughput screen should be sent for confirmatory experiments? Analytical answers to this question are derived from statistics alone and aim to fix, for example, the false-discovery rate (FDR) at a predetermined tolerance. In contrast, we argue that this question is essentially economic, not statistical, and is amenable to an economic analysis that admits an optimal solution. This solution, in turn, suggests a novel tool for deciding the number of hits to confirm, the marginal cost of discovery (MCD), which meaningfully quantifies the local economic trade-off between true and false positives, yielding an economically optimal experimental strategy. The MCD and this economic protocol was validated with retrospective simulations and a prospective experiment using a screen for small molecule inhibitors of a DNA-binding protein. A prospective supply curves could be accurately constructed and its heightÐthe MCDÐenabled the correct identification of 157 additional actives which had been erroneously labeled inactive because the screeners had stopped confirming actives too soon. Although the introduction of this method is couched in the context of chemical screening, the framework is general enough to be applied in any context where FDR is commonly used. Furthermore, new experiments will be presented which shows how this framework can be used to optimize the yield of more complex end-points from these experiments---for example, to discover more novel scaffolds with confirmed activity---a task beyond the scope of FDR.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16980.pdf

ISMB 2010 Blog
LBR04: S. Joshua Swamidass - Marginal Cost of Discovery: An Economic Alternative to False Discovery Rate Show Comments
TOP

LBR05                                                                                Sunday, July 11: 2:30 p.m. - 2:55 p.m.
Systematic discovery of novel motifs which modulate microRNA regulation
Room: 305
Presenting author: Anders Jacobsen, Memorial Sloan Kettering Cancer Center, United States

Additional authors:
Anders Jacobsen, Memorial Sloan Kettering Cancer Center, United States

Abstract:
MicroRNAs and siRNAs destabilize mRNAs through base-pairing with the mRNA. However, the gene expression changes after perturbations of these small RNAs are only partially explained by predicted small RNA targeting. MicroRNA/siRNA targeting may be modulated by other mRNA sequence elements such as binding sites for the hundreds of RNA binding proteins expressed in any cell. This aspect of small RNA regulation has not yet been systematically explored. Across a panel of published experiments using rigorous computational methods, we systematically investigated to what extent sequence motifs (words) in 3'UTRs correlate with expression changes following transfection of small RNAs. We discover hundreds of motifs, in addition to the microRNA target sites, that are significantly correlated with up or down-regulation in all transfection experiments. The most significantly overrepresented motif in down-regulated mRNAs is a novel binding motif, UUUUAAA, recently discovered for the HuD RNA binding protein. Surprisingly, the most significantly overrepresented motif in up-regulated mRNAs is the heptanucleotide AU-rich element (ARE), UAUUUAU, which is known to affect mRNA stability via at least twenty different ARE binding proteins. We confirm this perturbed ARE-stability signal in other types of published experiments and we show that destabilization mediated by the transfected miRNA is generally attenuated and augmented by ARE and HuD motifs respectively. This is the first global assessment of candidate co-regulatory 3'UTR motifs that modulate regulation by microRNAs. Our results suggest that microRNA and siRNA binding sites should not be considered in isolation when interpreting and predicting effects of these small RNAs in-vivo.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16968.pdf

ISMB 2010 Blog
LBR05: Anders Jacobsen - Systematic discovery of novel motifs which modulate microRNA regulation
TOP

LBR06                                                                                Sunday, July 11: 3:00 p.m. - 3:25 p.m.
An Alternative Nuclear mRNA Export Pathway Modulated by 5'UTR Introns Introns
Room: 305
Presenting author: Frederick Roth, Harvard Medical School, United States

Additional authors:
Frederick Roth, Harvard Medical School, United States

Abstract:
In higher eukaryotes, messenger RNAs (mRNAs) are exported from the nucleus to the cytoplasm via factors deposited near the 5' end of the transcript during splicing. The signal sequence coding region (SSCR) has been shown to support an alternative route for mRNA export (ALREX) that does not depend on splicing. However, the vast majority of SSCR-containing genes also have introns, so the potential interplay between these export mechanisms remains unclear. We now provide evidence that introns in the 5' untranslated region (5'UTR) interfere with ALREX: specific nucleotide signatures are present in SSCRs when they are from 5'UTR-intron-lacking (5UI-), but not 5'UTR-intron-containing (5UI+) genes. Furthermore, we show experimentally that SSCRs from 5UI- genes promote mRNA export, while 5UI+ SSCRs do not. Unexpectedly, 5'UTR introns are also depleted among genes with a mitochondrial-targeting sequence coding region (MSCR). We further discovered that MSCRs from 5UI- genes exhibit nucleotide signatures associated with ALREX and promote mRNA export in vivo, whereas MSCRs from 5UI+ genes neither exhibit these signatures nor promote export. We computationally identify novel motifs associated with the ALREX pathway and validate these predictions experimentally. We provide the first known genome-wide regulatory role specific to 5' UTR introns. Our results suggest that many human genes, including a subset of those encoding secretory and mitochondrial proteins, share a common regulatory mechanism at the level of mRNA export via the ALREX pathway.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17013.pdf

ISMB 2010 Blog
LBR06: Frederick Roth - An Alternative Nuclear mRNA Export Pathway Modulated by 5'UTR Introns Introns
TOP

LBR07                                                                                Sunday, July 11: 3:30 p.m. - 3:55 p.m.
Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites
Room: 305
Presenting author: Doron Betel, Memorial Sloan-Kettering Cancer Center, United States

Abstract:
Accurate prediction of microRNA targets is a challenging computational problem, impeded by incomplete biological knowledge and the scarcity of experimentally validated targets. The primary determinant for regulation, near-perfect base pairing in the seed region of the microRNA (positions 2-7), gives poor speci?city as a prediction rule. To reduce false predictions, most computational methods restrict to perfect seed matches that are evolutionary conserved, despite experimental evidence that neither constraint holds in general. Here we present a machine learning approach for target prediction that does not impose these restrictions on seed complementarity or conservation. Our algorithm, called mirSVR, trains a support vector regression model on features of the predicted microRNA::mRNA duplex and contextual features. In a large-scale evaluation on independent transfection and inhibition experiments, mirSVR is competitive with leading target prediction methods for predicting genes that are deregulated at the mRNA or protein levels. Our approach expands the scope of target prediction in several important ways. First, mirSVR incorporates conservation as a feature, rather than a ?lter, and identifies many functional but non-conserved sites. Second, mirSVR provides a uni?ed scoring model for all target sites without relying on predefined seed classes or restricting to perfect seed complementarity ('canonical sites'). Third, mirSVR predicts regulation by multiple endogenously expressed microRNAs as tested on genome-wide data from AGO IP experiments. Finally, mirSVR extends to sites with non-canonical seed pairing and correctly identifies a significant number of experimentally determined non-canonical sites from recent CLIP data. Target predictions and mirSVR scores are available at www.microRNA.org
Presentation PDF: http://www.iscb.org/uploaded/css/64/17027.pdf

ISMB 2010 Blog
LBR07: Doron Betel - Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites
TOP

LBR08                                                                                Sunday, July 11: 4:00 p.m. - 4:25 p.m.
Detecting Trans-Splicing Events and Non-co-Linear Transcripts in Transcriptome Assemblies
Room: 305
Presenting author: Inanc Birol, Genome Sciences Centre BC Cancer Agency, Canada

Additional authors:
Inanc Birol, BC Genome Sciences Centre, Canada

Abstract:
Most methods to study transcriptome sequencing data from high throughput sequencing technologies are based on alignment of the reads to a reference genome or transcriptome. Two of the limitations of this approach are the requirement of a good reference genome and its proper annotation. However, even for the well-studied genomes, alignment-based analysis is a biased approach, with the null hypothesis of the experimental sequence being identical to the sequence of the reference genome; hence they tend to miss large scale events, such as novel transcripts or splicing. In this work, we describe an assembly-based analysis method as an unbiased discovery tool, with a special emphasis on trans-splicing events and non-co-linear transcripts. Ever since its inception, the central dogma of molecular biology has been under revision, and lately, there is building evidence against the tidy co-linear splicing paradigm being the only route to building mature transcripts. Although exceptions to it were accepted to be common phenomena in nematode and kineoplastid transcriptomes resequencing experiments on mammalian transcriptomes using high throughput sequencing technologies all but ignored them, mostly due to their difficult-to-decipher signature in short read alignments. We report that improved specificity of alignment gained by assembled contigs makes the analysis of transcriptome sequencing data more potent, and helps us detect novel transcription events, including trans-splicing and non-co-linear transcription.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17018.pdf

ISMB 2010 Blog
LBR08: Inanc Birol - Detecting Trans-Splicing Events and Non-co-Linear Transcripts in Transcriptome Assemblies
TOP

LBR09                                                                                Monday, July 12: 10:45 a.m. - 11:10 a.m.
Allele-specific copy number analysis of breast carcinomas
Room: 305
Presenting author: Peter Van Loo, VIB and University of Leuven, Belgium

Additional authors:
Peter Van Loo, Oslo University Hospital, Norway
Silje Nordgard, Oslo University Hospital, Norway
Ole Christian Lingjærde, University of Oslo, Norway
Hege Russnes, Oslo University Hospital, Norway
Inga Rye, Oslo University Hospital, Norway
Wei Sun, University of North Carolina, United States
Victor Weigman, University of North Carolina, United States
Peter Marynen, University of Leuven, Belgium
Anders Zetterberg, Karolinska Institutet, Sweden
Bjørn Naume, Oslo University Hospital, Norway
Charles Perou, University of North Carolina, United States
Anne-Lise Børresen-Dale, Oslo University Hospital, Norway
Vessela Kristensen, Oslo University Hospital, Norway

Abstract:
We present the first allele-specific copy number analysis of the in vivo breast cancer genome. We describe a novel bioinformatics approach, ASCAT (Allele-Specific Copy number Analysis of Tumors), to accurately dissect the allele-specific copy number of solid tumors, simultaneously estimating and adjusting for both tumor ploidy and non-aberrant cell admixture. This allows calculation of 'Tumor Profiles' (genome-wide allele-specific copy-number profiles) from which gains, losses, copy-number-neutral events and LOH can accurately be determined. In an early-stage breast carcinoma cohort, we observe aneuploidy (>2.7n) in 45% of the cases and an average non-aberrant cell admixture of 49%. By aggregation of Tumor Profiles across our cohort, we obtain genomic frequency distributions of gains and losses, as well as first-time genome-wide views of LOH and copy-number-neutral events in breast cancer. In addition, the Tumor Profiles reveal differences in aberrant tumor cell fraction, ploidy, gains, losses, LOH and copy-number-neutral events between the five previously identified molecular breast cancer subtypes. Basal-like breast carcinomas have a significantly higher frequency of LOH compared to other subtypes, and their Tumor Profiles show large-scale loss of genomic material during tumor development, followed by a whole-genome duplication, resulting in near-triploid genomes. Finally, from the Tumor Profiles, we construct a genome-wide map of allelic skewness in breast cancer, indicating loci where one allele is preferentially lost while the other allele is preferentially gained. We hypothesize that these alternative alleles have a different influence on breast carcinoma development.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16992.pdf

ISMB 2010 Blog
LBR09: Peter Van Loo - Allele-specific copy number analysis of breast carcinomas
TOP

LBR10                                                                                Monday, July 12: 11:15 a.m. - 11:40 a.m.
Genome-wide nucleosome positioning and DNA methylation in the malaria parasite: Insights into differentiation and virulence
Room: 305
Presenting author: Karine Le Roch, University of California, United States

Additional authors:
Nadia Ponts, UCR, United States
Elena Harris, UCR, United States
Elisandra Rodrigues, UCR, United States
Jacques Prudhomme, UCR, United States
Glenn Hicks, UCR, United States
Gary Hardiman, UCSD, United States
Stefano Lonardi, UCR, United States
Karine Le Roch, UCR, United States

Abstract:
In eukaryotic cells, chromatin reorganizes within promoters of active genes to allow the transcription machinery to access DNA. In this model, promoter-specific transcription factors bind DNA to initiate the production of mRNA in a tightly regulated manner. In the case of the human malaria parasite, Plasmodium falciparum, specific transcription factors are apparently underrepresented with regards to the size of the genome, and mechanisms underlying transcriptional regulation are controversial. To understand the importance of chromatin structure and epigenetics in the parasite development, we generated genome-wide maps of nucleosome occupancy across the parasite erythrocytic cycle using two complementary assays to extract protein-free DNA and histone-bound DNA. To identify a connection between nucleosome positioning and epigenetics, we created a whole-genome view of DNA methylation using bisulfite conversion. These three techniques were coupled to high-throughput sequencing. All together, we discovered a massive change in chromatin structure that is a critical process in the malaria parasite survival strategy. Furthermore when DNA methylation was compared to the transcriptome, genome-wide nucleosome positioning and post-translational histone modification, we discovered a pattern of hyperactive transcription that shares common features with pluripotent embryonic stem cells with the exception of telomere regions that contain genes involved in virulence. We further detected sharp transitions of methylation occurring at exon-intron boundaries. As a whole, our data suggest that sharp change in nucleosome positioning and DNA methylation regulates virulence, transcription elongation and alternative splicing. Our results highlight the value of high-resolution of nucleosome and methylation maps for the investigation of infectious organisms.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17022.pdf

ISMB 2010 Blog
LBR10: Karine Le Roch - Genome-wide nucleosome positioning and DNA methylation in the malaria parasite: Insights into differentiation and virulence
TOP

LBR11                                                                                Monday, July 12: 11:45 a.m. - 12:10 p.m.
Towards the prediction of protein interaction partners using physical docking
Room: 305
Presenting author: Mark Wass, Imperial College London, United Kingdom

Additional authors:
Mark Wass, Imperial College London, United Kingdom
Gloria Fuentes, CNIO, Spain
Florencio Pazos, National Centre for Biotechnology (CNB-CSIC). Madrid. Spain , Spain
Alfonso Valencia, CNIO, Spain

Abstract:
We demonstrate that it is possible to use protein docking algorithms to detect interaction partners; something previously thought beyond their scope. This is done by comparing the docking models of known interactors with those of non-interacting proteins. Our approach can be developed into methods for interactome prediction.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16860.pdf

ISMB 2010 Blog
LBR11: Mark Wass - Towards the prediction of protein interaction partners using physical docking Show Comments
TOP

LBR12                                                                                Monday, July 12: 12:15 p.m. - 12:40 p.m.
Interrogating Genetic Interaction Networks with High-Capacity Sequencing
Room: 305
Presenting author: Joseph Mellor, Harvard Medical School, United States

Abstract:
When two genes are mutated together, a surprising phenotype sometimes emerges compared to the phenotype of either gene mutation alone. This phenomenon serves to define genetic interaction, and has broadly shaped our understanding of nearly all known biological systems. Several recent GI survey methods such as SGA have exploited the yeast deletion collection to screen large numbers of genetic interactions. We have developed a new approach for screen genetic interactions in yeast, termed "barcode fusion genetics" (BFG). This method couples existing strategies for efficient generation of multiply deleted yeast strains with the enormous throughput of highly parallel sequencing. We use in vitro encapsulated PCR to amplify fused combinations of barcode tags from millions of single yeast cells isolated in a water-in-oil emulsion. Highly parallel sequencing of millions of these fused tags then allows the abundance of every double mutant strain to be accurately quantified. We applied the BFG strategy to a submatrix of 60x74 genes involved in transcription elongation. We measured strain abundances in this pool in multiple time points, and derived empirical growth rates from the changes in relative abundance of each strain. From these growth rates, we were able to calculate scores representing the strength of genetic interactions between pairs of deletion alleles. Among other observations, we were able to recover several known sick or synthetically lethal (SSL) gene pairs based on existing literature, suggesting potential to expand the BFG approach to larger pools of mutants and other conditions.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17011.pdf

ISMB 2010 Blog
LBR12: Joseph Mellor - Interrogating Genetic Interaction Networks with High-Capacity Sequencing Show Comments
TOP

LBR13                                                                                Monday, July 12: 2:30 p.m. - 2:55 p.m.
RGASP: The RNASeq Genome Annotation Assessment Project
Room: 305
Presenting author: Felix Kokocinski, Wellcome Trust Sanger Institute, United Kingdom

Additional authors:
Felix Kokocinski, Wellcome Trust Sanger Institute, United Kingdom
Josep Abril, University of Barcelona, Spain
Gary Williams, Wellcome Trust Sanger Institute, United Kingdom
Ali Mortazavi, California Institute of Technology, United States
Sandrine Dudoit, University of California, United States
Mark Gerstein, Yale University, United States
Alexandre Reymond, University of Lausanne, Switzerland
Tom Gingeras, Cold Spring Harbor Laboratory, United States
Barbara Wold, California Institute of Technology, United States
Roderic Guigo, Center for Genomic Regulation, Spain
Tim Hubbard, Wellcome Trust Sanger Institute, United Kingdom
Jennifer Harrow, Wellcome Trust Sanger Institute, United Kingdom

Abstract:
RNASeq data is revolutionizing eukaryotic transcriptomics, highlighting the extent different loci are expressed and alternatively spliced. At the same time, the large amount of data produced is proving challenging for bioinformatics pipelines making use of it. Following the successful format of the EGASP workshop in 2005 (Guigo et al., 2006), the RNASeq Genome Annotation Assessment Project (RGASP) was launched to assess the current progress of automatic gene building using RNASeq as its primary dataset. The goals of this community effort are to assess the success of computational methods to correctly map RNASeq data onto the genome, assemble transcripts and quantify their abundance in particular datasets. The input data originated from different sequencing platforms from Human, Drosophila and C.elegans. For these three organisms, which are also analyzed as part of the (mod)ENCODE project, high quality genome annotation is available which served as the references for the analysis. We will present the findings resulting from the analysis of submitted predictions on different levels, demonstrating the state-of-the-art of gene prediction technology using RNASeq data.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17459.pdf

ISMB 2010 Blog
LBR13: Felix Kokocinski - RGASP: The RNASeq Genome Annotation Assessment Project
TOP

LBR14                                                                                Monday, July 12: 3:00 p.m. - 3:25 p.m.
The mutation spectrum revealed by paired genome sequences from a lung cancer patient
Room: 305
Presenting author: William Lee, Genentech, Inc., United States

Additional authors:
William Lee, Genentech, Inc., United States

Abstract:
Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small cell lung carcinomas in smokers being the predominant form of the disease. While previous studies have identified important common somatic mutations in lung cancers, they primarily have focused on a limited set of genes and hence provide a constrained view of the mutational spectrum. Recent cancer sequencing efforts have leveraged next-generation sequencing technologies to provide a genome-wide view of mutations in leukemia, breast cancer, and cancer cell lines. Here we present the first complete sequences of a primary lung tumor (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identified a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variations (SNVs). We validated 530 somatic SNVs in this tumor, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of novel somatic mutations and yield an estimated 17.7 per Mb genome-wide somatic mutation rate. Interestingly, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kb upstream of all protein-coding genes. Additionally, we observe a higher rate of amino acid-changing mutations in kinase genes. This report presents the most comprehensive view of somatic alterations in a single lung tumor, and provides the first picture of distinct selective pressures present within the tumor environment.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17097.pdf

ISMB 2010 Blog
LBR14: William Lee - The mutation spectrum revealed by paired genome sequences from a lung cancer patient
TOP

LBR15                                                                                Monday, July 12: 3:30 p.m. - 3:55 p.m.
Quantification of Alternative Splicing from Paired-End Reads
Room: 305
Presenting author: David Rossell, Institute for Research in Biomedicine of Barcelona, Spain

Abstract:
We consider the problem of estimating and comparing splice variant abundance across groups using paired-end RNA-Seq experiments. Previous methods focused on data from single-end experiments. Paired-end data allows determining the exons where each RNA fragment starts and ends, and is therefore much more informative about alternative splicing than single-end data. Some technical challenges are that (i) the RNA fragment length is unknown for reads spanning several exons, (ii) there are technical biases towards the 3' and 5' ends, and (iii) the number of splice variants is generally unknown. We formulate a Bayesian model which estimates the fragment length distribution from the data and allows eliminating the 3' and 5' biases by treating those read counts as missing data.Further, computing residuals based on the model informs about splicing variants to be added to the set under consideration. The model output can be used in a straight-forward manner to compare splice variant-specific expression levels across groups. As most of the formulas can be obtained in closed form, the approach is computationally efficient. Preliminary results show that the approach is able to estimate splice variant abundance in a wider set of cases than single-end data based methods, and with an increased precision.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16994.pdf

ISMB 2010 Blog
LBR15: David Rossell - Quantification of Alternative Splicing from Paired-End Reads Show Comments
TOP

LBR16                                                                                Monday, July 12: 4:00 p.m. - 4:25 p.m.
Elucidating the Intrinsic Sequence Specificity of DNAse I using High-throughput Sequencing
Room: 305
Presenting author: Allan Lazarovici, Columbia University, United States

Additional authors:
Allan Lazarovici, Columbia University, United States

Abstract:
The enzyme DNaseI is widely used to probe interactions between proteins and DNA both in vitro and in vivo. DNaseI is widely believed to lack significant sequence specificity; however, the intrinsic sequence specificity of DNAse has never been accurately characterized. To address this, we coupled next-generation sequencing with a maximum-likelihood framework based on the Poisson distribution to model the cleavage specificity of DNaseI on purified yeast and human genomic DNA as a function of local sequence context. At least three base pairs up- and downstream of the cleavage site contribute to the cleavage rate, and the unprecedented depth of information provided by high-throughput sequencing allowed us accurately and comprehensively to model interactions between base positions within this range. We find that the rate at which DNA is cleaved is strand-specific and varies by more than two orders of magnitude between different sequence combinations. Our analysis also reveals a marked dependency between the first and second nucleotide positions downstream of the cleavage site, which are likely related to the local geometry of the DNA minor groove. Finally, comparison of the predicted specificities of DNaseI inferred independently from yeast and human genomic DNA cleavage patterns exposed species-specific differences in cleavage rates that were isolated to positions flanking CpG dinucleotides, indicating that the enzymatic efficiency of DNaseI may be significantly modulated by DNA methylation status.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17015.pdf

ISMB 2010 Blog
LBR16: Allan Lazarovici - Elucidating the Intrinsic Sequence Specificity of DNAse I using High-throughput Sequencing
TOP

LBR17                                                                                Tuesday, July 13: 10:45 a.m. - 11:10 a.m.
Integrative Structure Determination of Protein-Protein Complexes Using SAXS, EM and NMR
Room: 305
Presenting author: Dina Schneidman, UCSF, United States

Additional authors:
Dina Schneidman, UCSF, United States

Abstract:
Proteomics studies are providing vast amounts of data about putative components of macromolecular assemblies, protein interactions, and processes in which they are involved. To understand these processes, we have to describe the structures of the participating complexes. Atomic structure determination of these complexes remains challenging. However, low-resolution data (eg, small angle X-ray scattering (SAXS) profiles and electron microscopy (EM) maps) are generally easier to obtain. We aim to develop computational methods that can combine such data for solving the protein assembly puzzle. We present a method for modeling binary protein complexes starting from structures of individual components and low-resolution structural information. We test and combine several types of data from practical experimental approaches that produce structural information of varying quantity and quality: a radial distribution function of the complex from a SAXS profile, a two-dimensional projection of the complex from EM micrographs, a three-dimensional density map of the complex from single particle EM, and residue content of the protein interface from 'Fast Mapping' by Nuclear Magnetic Resonance spectroscopy. The method was benchmarked on a large number of binary complexes of known structure using simulated data. It was also applied to modeling an antibody-antigen complex of unknown structure using experimental data.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16974.pdf

ISMB 2010 Blog
LBR17: Dina Schneidman - Integrative Structure Determination of Protein-Protein Complexes Using SAXS, EM and NMR Show Comments
TOP

LBR19                                                                                Tuesday, July 13: 11:45 a.m. - 12:10 p.m.
Quantitative functional annotation of H. sapiens genes
Room: 305
Presenting author: Murat Tasan, Harvard Medical School, United States

Abstract:
Despite the wealth of human genomic and proteomic evidence, a surprisingly small fraction of genes have clear, documented associations with specific functions, and new functions continue to be found for 'characterized' genes. In addition to archival annotation, there is a need to guide ongoing experimentation by summarizing shades of gray in current knowledge. We assembled an integrated collection of diverse genomic and proteomic evidence for 21341 H. sapiens genes. This resource was used to train inferential models combining 'guilt-by-profiling' and 'guilt-by-association' approaches to quantitatively annotate each gene to 4333 Gene Ontology (GO) terms. Performance was evaluated by cross-validation, prospective validation, and by careful evaluation of biological literature. As part of the modeling process we constructed twelve distinct functional-linkage networks (FLNs), each capturing one of twelve types of functional relationship between human genes. We demonstrate the utility of human FLNs by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies (GWAS). All of our predictions are made available to the community via an online web-accessible searchable resource (http://func.med.harvard.edu). Thus, we have established a genome-scale quantitative functional annotation resource for human genes.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17005.pdf

ISMB 2010 Blog
LBR19: Murat Tasan - Quantitative functional annotation of H. sapiens genes
TOP

LBR20                                                                                Tuesday, July 13: 12:15 p.m. - 12:40 p.m.
High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions
Room: 305
Presenting author: Phaedra Agius, Memorial Sloan-Kettering Cancer Center, United States

Additional authors:
Phaedra Agius, Memorial Sloan-Kettering Cancer Center, United States
Aaron Arvey, Memorial Sloan-Kettering Cancer Center, United States
William Chang, Memorial Sloan-Kettering Cancer Center, United States
William Stafford Noble, University of Washington, United States
Christina Leslie, Computational Biology Program, United States

Abstract:
Accurately modeling DNA sequence preferences of transcription factors (TFs) and using them to predict in vivo TF genomic binding sites is key to deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs. Recently, protein binding microarray (PBM) experiments have emerged as a new source of high-resolution data on in vitro TF binding specificities. PBM data has been analyzed either by estimating PSSMs or via rank statistics on probe intensities, where sequence patterns are assigned enrichment scores (E-scores). This representation is informative but unwieldy because every TF is assigned thousands of scored sequence patterns. We have developed a novel, flexible and discriminative framework for learning TF binding preferences from high-resolution in vitro and in vivo data. Using a novel k-mer based string kernel called the di-mismatch kernel, we trained support vector regression (SVR) models on PBM data to learn the mapping from probe sequences to binding intensities. Our compact and expressive SVR models can scan genomic regions to predict in vivo occupancy. Using data for yeast and mouse TFs, our SVR models better predicted probe intensity than E-scores or PSSMs. Moreover, SVR scores for yeast, mouse, and human genomic regions gave improved predictions for genomic occupancy as measured by ChIP-chip and ChIP-seq experiments. Finally, we trained our model directly on ChIP-seq data and found greatly improved in vivo occupancy predictions, and by comparing a TF's in vitro and in vivo models, we identified cofactors and disambiguated direct and indirect binding.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17001.pdf

ISMB 2010 Blog
LBR20: Phaedra Agius - High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions
TOP

LBR21                                                                                Tuesday, July 13: 2:15 p.m. - 2:40 p.m.
Analysis of functional profiles of eukaryotic genomes reveals strong trends related to morphological complexity
Room: 305
Presenting author: Christian Zmasek, Sanford-Burnham Medical Research Institute, United States

Additional authors:
Qing Zhang, Sanford-Burnham Medical Research Institute, United States
Adam Godzik, Sanford-Burnham Medical Research Institute, United States

Abstract:
In this work we investigate the question of the genomic manifestation of the highly variable morphological complexity of eukaryotes, such as the difference between Trichoplax adhaerens that only has 4 different cell types and mammals with around 210 different cell types, and its evolutionary origins and causes. For this purpose, we used more than one-hundred completely sequenced eukaryotic genomes to reconstruct the genome content of putative ancestral species at all major divergence points, including the last eukaryotic common ancestor (LECA), on the level of protein domains defined by the Pfam database. We show that the numbers of distinct protein domains are remarkably constant over large parts of the eukaryotic tree of life and, counter-intuitively, in general domain losses outnumber domain gains. Only at the root of the animal sub-tree and at the root of the vertebrate sub-tree do we see domain gains consistently outnumbering domain losses. Functionally, domains involved in regulation are predominantly gained during animal evolution at the cost of domains with metabolic functions. In contrast, other groups of eukaryotes with the potential of multicellularity, plants and fungi, do not exhibit such an increase in regulatory domains. We show that clustering of genomes according to their functional profiles results in an organization remarkably similar to the eukaryotic tree of life. Finally, we show that it is likely that metabolic functions lost during animal evolution are being replaced (Ôoutsourced') by the metabolic capabilities of symbiotic organisms (such as gut microbes).
Presentation PDF: http://www.iscb.org/uploaded/css/64/17026.pdf

ISMB 2010 Blog
LBR21: Christian Zmasek - Analysis of functional profiles of eukaryotic genomes reveals strong trends related to morphological complexity
TOP

LBR22                                                                                Tuesday, July 13: 2:45 p.m. - 3:10 p.m.
Exploring Disease Interactions Using Combined Gene and Phenotype Networks
Room: 305
Presenting author: Nitesh Chawla, University of Notre Dame, United States

Additional authors:
Darcy Davis, University of Notre Dame, United States
Nitesh Chawla, University of Notre Dame, United States

Abstract:
Faced by unsustainable costs and enormous amounts of under-utilized data, health care needs more efficient practices, research, and tools to harness the benefits of data. These methods should create a feedback loop where computational tools guide and facilitate research, leading to improved biological knowledge and clinical standards, which in turn should generate better data. We build and analyzing disease interaction networks based on data collected from previous genetic association studies and patient medical histories, spanning over 12 years, acquired from a regional hospital. By exploring both individual and combined interactions among these two levels of disease data, we provide in- sight into the interplay between genetics and clinical realities. Our results show a marked difference between the well-defined structure of genetic relationships and the chaotic co-morbidity network, but also highlight clear interdependencies. Additionally, we use significant patterns in the data to locate good target sites for further association research.
Presentation PDF: http://www.iscb.org/uploaded/css/64/17016.pdf

ISMB 2010 Blog
LBR22: Nitesh Chawla - Exploring Disease Interactions Using Combined Gene and Phenotype Networks
TOP

LBR23                                                                                Tuesday, July 13: 3:15 p.m. - 3:40 p.m.
Predicting Selective Drug Targets in Cancer through Metabolic Modeling
Room: 305
Presenting author: Livnat Jerby, Tel-Aviv University, Israel

Additional authors:
Ori Folger, Tel-Aviv University, Israel
Livnat Jerby, Tel-Aviv University, Israel
Christian Frezza, The Beatson Institute for Cancer Research, United Kingdom
Eyal Gottlieb, The Beatson Institute for Cancer Research, United Kingdom
Eytan Ruppin, Tel-Aviv University, Israel
Tomer Shlomi, Technion, Israel

Abstract:
The interest in studying metabolic alterations in cancer and their potential role as novel targets for therapy has been rejuvenated in recent years, in light of the decreasing number of newly released anticancer drugs. We report the development of the first genome-scale network model of cancer metabolism, validated by correctly identifying genes essential for cellular proliferation in cancer cell-lines. The model predicts 52 anticancer drug targets whose inhibition selectively affects cancer cells, of which 40% are targeted by known approved or experimental anticancer drugs. It further predicts combinations of synthetic lethal drug targets, whose synergy is validated using available drug efficacy and gene expression measurements across the NCI-60 cancer cell-line collection. Finally, potential personalized treatment strategies that depend on an individual's germline and their somatic mutations are compiled.
Presentation PDF: http://www.iscb.org/uploaded/css/64/16982.pdf

ISMB 2010 Blog
LBR23: Livnat Jerby - Predicting Selective Drug Targets in Cancer through Metabolic Modeling
TOP

LBR24                                                                                Tuesday, July 13: 3:45 p.m. - 4:10 p.m.
Genetic interactions reveal the evolutionary trajectories of duplicate genes
Room: 305
Presenting author: Chad Myers, University of Minnesota, United States

Additional authors:
Benjamin VanderSluis, University of Minnesota, United States
Jeremy Bellay, University of Minnesota, United States
Gabriel Musso, University of Toronto, Canada
Michael Costanzo, University of Toronto, Canada
Franco Vizeacoumar, University of Toronto, Canada
Balazs Papp, Institute of Biochemistry, Biological Research Center, Hungary
Anastasia Baryshnikova, University of Toronto, Canada
Charles Boone, University of Toronto, Canada
Chad Myers, University of Minnesota, United States

Abstract:
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. High-throughput genetic interaction technology in S. cerevisiae provides a new perspective for addressing these questions through quantitative measurements of epistasis between pairs of duplicated genes and more generally, through the study of duplicates' epistatic interactions across the rest of the genome. In this study, we present a model for the effects of duplicate redundancy on genetic interaction networks, and demonstrate why functionally overlapping duplicate pairs often have disparate genetic interaction profiles. In particular, we show that genetic interactions related to the duplicate pair's common function are 'shielded' by genetic redundancy and do not manifest themselves in pairwise interaction studies. Detectable genetic interactions with duplicate genes instead reflect their divergent functional roles. Furthermore, we find that duplicate genes are highly imbalanced in their number of interactions with other genes, providing compelling evidence for the asymmetric model of evolution. These asymmetric patterns of genetic interactions are predictive of differences in sequence evolution rates, protein-protein interaction degree, single mutant fitness defects, and single mutant sensitivity to chemical environments.

ISMB 2010 Blog
LBR24: Chad Myers - Genetic interactions reveal the evolutionary trajectories of duplicate genes
TOP