IBISS - the interactive bovine in Silico SNP database.

Rachel Hawken¹, Wes Barris², Brian Dalrymple
¹Rachel.Hawken@csiro.au, CSIRO Livestock Industries; ²Wes.Barris@csiro.au, CSIRO Livestock Industries

An interactive bovine Insilco SNP database has been constructed . This database contains all available bovine EST and mRNA sequences in the public domain. The central program used for the construction of this database was stackPACK (http://www.sanbi.ac.za/Dbases.html#stackpack), which was used to cluster all sequences and align contigs within each cluster. A reiterative process of clustering and analysis was used to identify chimeric sequences, which were removed from the final analysis. In the absence of substantial genomic sequence data from Bos taurus, human data sets (human reference gene sequences and human working draft chromosome sequences) were used extensively in the further analysis of the clustered sequences. Each separate consensus sequence or singleton produced from stackPACK was treated as a model mRNA. Model mRNAs were annotated using the description lines from the top BLAST hits to the human mRNA and protein RefSeq sets. FASTY searches of human Refseq protein dataset were used to identify the most likely protein sequence allowing for frameshifts in the EST consensus sequence. If no significant hits were identified ESTScan2 was used to determine the most likely theoretical open reading frame. These analyses generated a set of model protein sequences which aligned back to the relevant consensus nucleotide sequence using NAP. For each model mRNA sequence several features have been highlighted in our database. One feature is the identification of putative SNPs by examining the multiple sequence alignment for each cluster. SNPs predicted to change amino acids were identified by comparing the protein predictions of flanking nucleotide sequences using BLAST matching of the two distinct sequences against the human protein RefSeq dataset. The putative gene structure (intron-exon boundaries) of each model mRNA has also been identified. This was accomplished using BLAT searches of each model mRNA v. the human chromosome sequence database. This database is an extensive time saving tool, by providing gene sequence (nucleotide and amino acid sequence) for all known bovine sequences in the public domain, the putative structure of each model mRNA. This information enables primer design flanking putative SNP to be accomplished in a fraction of the time previously required to do the same task.