Incorporating Sequence and Biochemical Information in TOPS models - For Biologically Significant Pattern Matching and Pattern Discovery in Protein
Mallika Veeramalai1, David Gilbert2, David R Westhead
1mallika@dcs.gla.ac.uk, Bioinformatics Research Centre, Dept. of Computing Science, University of Glasgow; 2drg@dcs.gla.ac.uk,
TOPS (Topology of Protein Structure) Database contains 2D abstract
spatial representation of secondary structure elements (SSEs) of the protein structures.
Based on TOPS cartoons TOPS diagrams are developed.
Instead of representing spatial positions by elements in a plane, a TOPS
diagram contains information about the grouping of beta-strands in beta-sheets
(two adjacent elements in a beta-sheet are connected by an H-bond, which can
be either parallel or anti-parallel) and also information about the
orientation of elements (any two SSEs can be connected by either left or
right chirality). Based on these TOPS diagrams, very fast pattern matching
and pattern discovery algorithms for protein topologies were developed.
However, because of its abstract nature, it is possible to loose significant biological information. Incorporation
of sequence information (in the form of PSSM/HMM profiles) and biochemical features such as ligand-binding sites,
active-sites in the TOPS graph-based representation of the protein structure will increase its biological
significance. Interesting results would be valuable efforts to predict protein structure and function from the
sequences, and these problems remain key challenges of direct relevance to projects in structural and functional
genomics.
TOPS database can be accessible from
http://www.tops.leeds.ac.uk