Predicting Co-Complexed Protein Pairs Using Genomic and Proteomic Data Integration

Lan V. Zhang¹, Sharyl L. Wong², Oliver D. King, Frederick P. Roth
¹lan_zhang@student.hms.harvard.edu, Harvard Medical School; ²sharyl_wong@student.hms.harvard.edu, Harvard Medical School

Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false positive rates. In addition, Y2H is more likely to predict transient interactions and is thus less predictive of whether two proteins are present in the same protein complex. Here using a probabilistic decision tree approach, we integrated high-throughput protein interaction data with other gene and protein pair characteristics to predict co-complexed pairs (CCPs) of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among top predictions not already annotated as CCPs in a reference set of protein complexes, a significant fraction were found to physically interact in a separate database (Yeast Proteome Database, YPD), indicating that our approach is promising in detecting unknown CCPs.