Identification of putative transcription factor binding sites conserved across orthologous human, mouse and rat sequences
Alex Gout1, Tim Beissbarth2, Joelle Michaud, Catherine Carmichael, Matthew Ritchie, Gordon
K. Smyth, Terry Speed, Hamish S. Scott
1gout@wehi.edu.au, The Walter and Eliza Hall Institute of Medical Research; 2beissbarth@wehi.edu.au, The Walter and Eliza Hall Institute of Medical Research
Transcription factors bind transcription factor binding sites (TFBS) in
regulatory elements leading to interaction with the basal transcription
apparatus (TATA-binding protein, TFIIA, TFIIB, TFIIF, TFIIE, TFIIH and RNA
polymerase II) and transcriptional initiation of the target gene. Sequence
data from genome projects and data generated from high throughput genomic
techniques, such as microarrays, both require additional annotation to
allow biological interpretation and to add value to the dataset. For
example, knowledge of the patterns of TFBSs within the upstream regions
of differentially expressed genes from microarray experiments may provide
insight into transcriptional regulation, transcription factor interactions
and help identify regulatory genetic networks.
However, to date, no readily query-able resource exists for observing
potential TFBSs present within such large datasets. We have constructed
a database of predicted TFBSs that are conserved between pairs/triples
of orthologous human, mouse and rat genes. Upon obtaining upstream
sequence, the Match search tool was used in conjunction with the
Transfac database of TFBS matricies to identify potential TFBSs. Following
an alignment of the orthologous upstream regions using MAVID, the
positions of potential TFBSs were adjusted and then used to determine
whether the TFBS is positionally conserved across the two or three
species.