A Functional Annotation Project for Novel and Uncharacterised Genes

William Wilson¹, Emily Hodges², Ivana Novak, Claes Wahlestedt, Christer Höög, Boris Lenhard
¹bill.wilson@cgb.ki.se, Karolinska Institute; ²emily.hodges@cgb.ki.se, Karolinska Institute

With the completion of sequencing of the Human and other genomes, annotation projects aim to accurately designate functions and relationships to the open reading frames predicted from genome sequences. We have attempted to streamline and increase the accuracy of this process by applying an integrated bioinformatics–laboratory based approach to functional annotation of a database of novel genes. As the first set of genes for the study, we have taken human protein-coding genes found to contain evolutionarily conserved novel protein domains, and are analysing them in an annotation pipeline that aims to integrate data gathered from diverse sources such as informatics, RNA profiling, protein expression and subcellular localisation, as well as various cell-based assay systems. We show examples of how the results of both experimental and computational analyses are stored in a collaborative web-accessible database which provides a constant source of feedback between researchers. By applying this integrated approach to gene annotation we strive to design more focused experiments for individual genes, as well as to develop a series of tools to aid accurate annotation. By focusing on the rules for annotation for individual novel genes and groups of novel genes, we hope to arrive at a set of rules and procedures that can be used to automate parts of novel gene annotation. bill.wilson@cgb.ki.se