Towards modulating protein-protein interactions: Clustering protein surfaces to identify biologically-relevant structural space to focus molecular design
Stephen Long1, Mark Smythe2, Peter Adams, Darryn Bryant and Tran Trung Tran
1sml@maths.uq.edu.au, School of Physical Sciences, Institute for Molecular Biosciences, The University of Queensland; 2M.Smythe@protagonist.com.au, Institute for Molecular Bioscience, The University of Queensland and Protagonist Pty. Ltd.
Identifying small molecules that modulate protein-protein interactions
continues to be a major challenge for drug discovery. This is
presumably a consequence of a different paradigm (large flat surfaces of
protein-protein interactions compared to cavities of existing
therapeutic targets), the immense size of the chemical universe
(1060 drug-like molecules) and a lack of knowledge of small
molecules that modulate protein-protein interactions as a starting
point for drug discovery.
To focus molecular selection processes for the discovery of molecules
from the vast chemical universe, that have the potential to modulate
protein-protein interactions, we have clustered protein contact surfaces
and identified common side chain positions of proteins involved in
molecular recognition events. Our thesis is based on the well-known
structure-function relation of medicinal chemistry. Consequently
identifying molecules that match common protein side chain shapes should
significantly impact on the discovery of molecules to modulate protein
functions. To achieve this, a database of homologous protein-protein complexes was
created. From this database, two datasets were extracted, each
representing the interaction region of pairs of proteins of these
complexes. The first was produced by extracting residues that form
contact (satisfying a maximum distance criterion) across the protein
interface. The second dataset contains isolated regions of the Connolly
surface of each protein that likewise satisfy a maximum distance
criterion that are in " contact " with opposing the interacting protein.
This dataset also contains information about the electrostatic charge
of these interacting surfaces.
The aim of this research is to cluster features (or motifs) of each of
these datasets, hence extracting general information about the structure
of these interacting surfaces. Clustering these datasets, however, is a
significant challenge. Their large size excludes currently existing
clustering packages because they were too computationally expensive. To
overcome the limitation of computation time, a quick and simple method
to structurally compare our motifs was implemented and an algorithm that
efficiently that efficiently scans the search space was developed and
parallelised over a cluster of processors.