We have developed a system for the rapid, automated detection of sequence from putative archaeal pathogens in human EST data, using the sequence alignment tools BLAT and MEGABLAST. Initial analysis using all current archaeal DNA sequences from GenBank (14 657 records) and the human EST subset of the EMBL database (release 74) identified several EST sequences of archaeal origin, all of which are derived from contamination by DNA from the Pfu polymerase gene. In addition, a number of ESTs are derived from those genes of bacterial pathogens with significant similarity to archaeal counterparts. To date, we have identified no sequences that may come from an as-yet unknown archaeal pathogen.
Our search strategy can be adapted to detect any non-human sequence dataset of interest (e.g. bacteria, fungi). The system can be automated to conduct searches as new EST and archaeal sequence data is released monthly and a web front-end for exploration and analysis of the results will be available at http://psychro.bioinformatics.unsw.edu.au.