Common transcription factor binding sites in the regulatory regions of a cluster of genes statistically linked to the hox gene HB24

Mar Bellido 1, Whipple Neely2, Fan W, Beppu L, Zhao LP, Radich JP
1mbellido@fhcrc.org, Fred Hutchinson Cancer Research Center; 2whipple@fhcrc.org, Fred Hutchinson Cancer Research Center

The homeobox genes encode proteins which are oncofetal antigens involved in embryonic development and in hematopoiesis. These proteins have a well-conserved DNA binding domain named homeodomain and they are organized into 4 clusters (HOXA, -B, -C, -D). Deregulation of these genes perturbs normal hematopoietic development. We used publicly available microarray data and a statistical approach to identify unknown target genes which cooperate with HOX genes in the development of AML. The strategy consisted of finding a list of genes whose expression values were linked to a fixed set of Hox genes. For that purpose, we took the gene expression values of all hox genes (n=21) present in the affymetix chip HU95 reported in Golub et al. in a study carried out in 25 patients with AML [1]. We detected all genes statistically associated (p value <=0.01) with each of the 21 Hox genes using the software Geneplus, produced by Enodar Biologic (www.enodar.com, Seattle, WA). We obtained a set of 537 genes. In order to detect associations amongst these 537 genes we used a regression model which incorporated both gene-specific and chip-specific effects and computed the association between genes in terms of the statistical significance of the regression coefficients using a standardized z-score. We selected a cutoff > 8 to declare a statistical association between 2 genes and we converted the z-score into an estimate of the number of false-discoveries. The results of this analysis showed different clusters of genes linked with each hox gene. The hox gene HB24 encodes a protein selectively expressed in CD34 cells which plays a role in T-cell activation and may be important for the immunoresponse of neoplasms. The HB24 gene was statistically linked to other immunorregulatory genes such as interferon, tumor necrosis factor, dendritic cell antigens and the human pre-B cell enhancing factor, to PIM1 and JUNB proto-oncogenes and with other “key” genes such as the urokinase-type plasminogen receptor and the fatty acid CoA ligase. We analyzed the regulatory regions of 29 genes that belong to the HB24 cluster to find potential common transcription factor binding sites (TFBS). Human long sequences between 1000 and 2000 bp from –1000 or –2000, respectively, were selected. The program RepeatMasker was used to mask repetitive regions of DNA and the sequences were aligned with their orthologous mouse sequences using the Ensemble software. Only specific regions that showed conservation between human and mouse sequences were manually selected for posterior analysis. We used the Match algorithm (linked to TRANSFAC) to scan the conserved regions for TFBS. Five out of 29 TFBS were highly represented among different genes (PAX4, HNF4, HNF1, USF and CREL) and PAX4 was the most frequently detected. PAX4 and HNF1 proteins contain a homeo domain which makes them good candidates to functionally cooperate with Hox genes, whereas HNF4, USF and CREL represent a zinc finger protein that regulates HNF1, a helix-loop-helix protein and a homolog of a viral oncogene (v-REL) that has a function in cellular transformation, respectively. The statistical approach mentioned above and the analysis of regulatory sequences of secondary grouped genes may provide a good strategy to find out potential new therapeutic targets. 1. Golub TR, Slonim DK, Tamayo P,et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.Science 1999 Oct 15;286(5439):531-7.