New features for microRNA gene finding

Uwe Ohler1, Chris Burge2, David Bartel
1ohler@mit.edu, MIT; 2cburge@mit.edu, MIT

MicroRNAs are a class of tiny RNA molecules, about 21 nucleotides long, which are implicated in the post-transcriptional regulation of specific target genes. They are excised from a ~75 nucleotide long precursor hairpin structure, which is derived from a larger primary transcript. miRNAs are conserved in a variety of eukaryotic multicellular organisms and are thought to play crucial roles during animal and plant development.
Computational gene finding approaches such as MirScan have used features of the precursor hairpin structure and its conservation across species, and estimated the total number of miRNA genes as around 100 for C. elegans and about twice as many for vertebrate organisms. miRNA genes may be located in introns of protein-coding genes (on either sense or anti-sense strand) as well as in intergenic regions far away from any known gene. In a few instances, several hairpins are clustered together in probable polycistronic transcripts.
Looking at the known C. elegans miRNAs, we attempted to identify conserved motifs upstream and downstream of the precursor hairpin structure that might be involved in miRNA transcription or processing. Using comparative genomics, we identified a significant and well-conserved motif upstream of the majority of miRNAs located in intergenic regions, but not upstream of intronic miRNAs, protein-coding genes, or polII-transcribed snRNAs. These features, together with its preferred distance to the miRNAs, makes this motif a likely promoter element involved in transcription of miRNAs.
RNA genes of different classes are transcribed by different RNA polymerases. No signal resembling the polI or polIII transcription termination motifs or the polII polyadenylation site was found downstream of miRNA precursors. However, the apparent length of primary transcript makes it unlikely that miRNAs are transcribed by polIII, and polII snRNA genes have been reported to be terminated without polyadenylation signals, which also appears to be the case for miRNA transcripts.
We will also report on how much the additional features of: (1) amount of upstream and downstream conservation, and (2) presence of the well-conserved motif upstream of candidate hairpin structures, improve the accuracy of miRNA gene finding.