Into protein universe - a global representation of the protein fold space

Jingtong Hou1, Gregory E. Sims2, Chao Zhang, Sung-Hou Kim
1JTHou@lbl.gov, UC Berkeley; 2gsims1997@yahoo.com, UC Berkeley

One of the principal goals of the structural genomics initiative is to identify the total repertoire of protein folds and obtain a global view of the "protein structure universe." Here, we present a 3D map of the protein fold space in which structurally related folds are represented by spatially adjacent points. Pair-wise structural similarities of protein fold domains from SCOP release 1.55 were measured and this high-dimensional data was embedded into three dimensions using a metric matrix distance geometry method. Such a representation reveals a high-level organization of the fold space that is intuitively interpretable. The shape of the fold space and the overall distribution of the folds are defined by three dominant trends: secondary structure class, chain topology, and protein domain size. Random coil-like structures of small proteins and peptides are mapped to a region where the three trends converge, offering an interesting perspective on both the demography of fold space and the evolution of protein structures. In web-based version of this 3d map, researchers are able to submit protein structure coordinates and examine their structural relationships with other fold structures in the “protein universe”. The protein fold space is available at http://pro.lbl.gov/~jingtong/foldspace.