Predicting Co-Complexed Protein Pairs Using Genomic and Proteomic Data Integration
Lan V. Zhang1, Sharyl L. Wong2, Oliver D. King, Frederick P. Roth
1lan_zhang@student.hms.harvard.edu, Harvard Medical School; 2sharyl_wong@student.hms.harvard.edu, Harvard Medical School
Identifying all protein-protein interactions in an organism is a
major objective of proteomics. A related goal is to know which protein
pairs are present in the same protein complex. High-throughput methods
such as yeast two-hybrid (Y2H) and affinity purification coupled with
mass spectrometry (APMS) have been used to detect interacting proteins
on a genomic scale. However, both Y2H and APMS methods have substantial
false positive rates. In addition, Y2H is more likely to predict
transient interactions and is thus less predictive of whether two
proteins are present in the same protein complex. Here using a
probabilistic decision tree approach, we integrated high-throughput
protein interaction data with other gene and protein pair
characteristics to predict co-complexed pairs (CCPs) of proteins. Our
predictions proved more sensitive and specific than predictions based on
Y2H or APMS methods alone or in combination. Among top predictions not
already annotated as CCPs in a reference set of protein complexes, a
significant fraction were found to physically interact in a separate
database (Yeast Proteome Database, YPD), indicating that our approach is
promising in detecting unknown CCPs.