New features for microRNA gene finding
Uwe Ohler1, Chris Burge2, David Bartel
1ohler@mit.edu, MIT; 2cburge@mit.edu, MIT
MicroRNAs are a class of tiny RNA molecules, about 21 nucleotides
long, which are implicated in the post-transcriptional regulation of
specific target genes. They are excised from a ~75 nucleotide long
precursor hairpin structure, which is derived from a larger primary
transcript. miRNAs are conserved in a variety of eukaryotic
multicellular organisms and are thought to play crucial roles during
animal and plant development.
Computational gene finding approaches such as MirScan have used
features of the precursor hairpin structure and its conservation
across species, and estimated the total number of miRNA genes as
around 100 for C. elegans and about twice as many for vertebrate
organisms. miRNA genes may be located in introns of protein-coding
genes (on either sense or anti-sense strand) as well as in intergenic
regions far away from any known gene. In a few instances, several
hairpins are clustered together in probable polycistronic transcripts.
Looking at the known C. elegans miRNAs, we attempted to identify
conserved motifs upstream and downstream of the precursor hairpin
structure that might be involved in miRNA transcription or
processing. Using comparative genomics, we identified a significant
and well-conserved motif upstream of the majority of miRNAs located in
intergenic regions, but not upstream of intronic miRNAs,
protein-coding genes, or polII-transcribed snRNAs. These features,
together with its preferred distance to the miRNAs, makes this motif a
likely promoter element involved in transcription of miRNAs.
RNA genes of different classes are transcribed by different RNA
polymerases. No signal resembling the polI or polIII transcription
termination motifs or the polII polyadenylation site was found
downstream of miRNA precursors. However, the apparent length of
primary transcript makes it unlikely that miRNAs are transcribed by
polIII, and polII snRNA genes have been reported to be terminated
without polyadenylation signals, which also appears to be the case for
miRNA transcripts.
We will also report on how much the additional features of: (1) amount
of upstream and downstream conservation, and (2) presence of the
well-conserved motif upstream of candidate hairpin structures, improve
the accuracy of miRNA gene finding.