![]() |
CAPRI: Critical Assessment of PRediction of InteractionsCommunity wide experiment on the comparative evaluation of protein-protein docking for structure predictionHosted By EMBL/EBI-MSD GroupAnnouncementThe First protein-protein structure interactions prediction Experiment CAPRI COMPARATIVE EVALUATION OF PROTEIN-PROTEIN DOCKING 1) INTRODUCTION In June 2001, a meeting organised by Ilya Vakser and Sandor
Vajda was held in Charleston, SC. Several groups involved
in the development of protein-protein docking algorithms were
present. In addition, John Moult, who is involved with organising
the CASP blind evaluation of protein structure prediction,
was at the meeting. It was agreed 2.1) Organising committee The organising committee consists of Joel Janin Mike Sternberg Lynn Ten Eyck who will consult with John Moult The committee will aim to establish links with the experimental community to obtain targets. 2.2) Targets The major problem facing the docking evaluation is the difficulty of obtaining targets as far fewer complexes are solved than individual components. In addition, for the evaluation, the experimentalist may have to provide coordinates of the components (see below), which would effectively make public his coordinates prior to publication. The following targets would be used, in decreasing order
of preference: 2.3 Evaluation Strategy Unlike protein structure prediction, the season will be
an extended session, starting as soon as possible and ending
roughly 1 Sept 2002. An assessor will need to be selected
who will present the evaluation at a date to be announced.
The longer term nature of the prediction season raises two
issues. 2.4 Evaluation criteria We need to agree the rules for evaluation. As the broad community of potential users are not familiar with RMS deviation of superposed coordinates, the number of correct protein-protein contacts will be used. Actully most of so far published docking work is evaluated using RMS of superposed coordinates. And it makes structural sense. I don't know if it is a good idea to switch to residue contacts. Our suggestion, for discussion, is that A residue-residue contact occurs if any two atoms are closer than 4 A. A good prediction has at least 60% of the true residue-residue contacts in the model. A helpful prediction has at least 25% of the true residue-residue contacts in the model. In addition as a measure (but not the evaluation criterion), the RMS of C alpha and all atoms at the interface will be reported. First the model of the larger component will be superposed using C alpha atoms on the bound larger component. The interface is defined as any residue in one component having at least one atom with 10A of an atom in the other component. The RMS quoted is for the interface residues of the smaller component. Given the present status of the algorithms, submission of
just one model will rarely provide a good prediction. We need
to discuss the scoring of several submitted models. One suggestion
is that each group submits 10 models ranked in order of preference.
The rank of the first good prediction is evaluated and the
rank of the first helpful model. The best score from the following
table is the result to a group for their prediction.
The final score for all targets is the sum of scores. The above distinction between a "good" model and a "helpful" model may be too abrupt. How about: The final score = sum ( coefficient for the rank * percent correctly predicted residue contacts if it is larger than 20%) 2.5 Manual / Fully-automated Predictions In keeping with CASP rules, manual intervention including use of the literature, can be used in producing the entries. However we would like to move towards evaluation of fully automated server predictions. At present such servers are not available. By 2 Jan 2002, we should decide if a server evaluation is practicable. 2.6 CASP Team We require the CASP team at Lawrence Livermore to set up the appropriate e-nail / web software to handle target entries and submissions. 3) BENCHMARK 3.1) Committee A set of people for the benchmark needs to be established. 3.2) Objectives The aim is for all groups with docking algorithms to deposit their program(s) with Zhiping Weng at Boston University. She will run these algorithms on a benchmark set. The results of this docking will be used to generate decoys sets for further developments. We need a volunteer to run Zhiping's programs. Sandor Vajda, who will not take part in the initial generation, will generate the decoy set. An evaluator will be identified. 3.3) Algorithms The algorithm must be implemented under Linux or on an SG. The submission must be in the form of a script that can be run in the form run myprogram protein1 protein2 outputfile number_of_output_complexes. The output is a set of pdb files for the complexes. We will need to work out exact format details. For one protein-protein complex the algorithm must take no longer than 2 weeks on a 1GHz Linux box or ? on the ?Mz SG (Zhiping to provide details). The algorithms need to be sent and implemented at Boston by 2 Jan 2002. I thought we agreed that the CPU limit was 1 week on a linux box ? I would say: The algorithm must take no longer than 1 week on an 1GHz 1GB-RAM Linux processor (CPU time). We find an R10000 SGI to be typically slower than an 1GHz Linux box. So I would say the algorithm must not take longer than 2 weeks on a R10000 SGI processor. Each algorithm can only include information from the two sets of coordinates. No functional site prediction, multiple sequence alignment data or a protein-type specific information can be used. For antibodies, we will cut it to the Fv region. The algorithms must produce a ranked list of model coordinates. In addition, Zhiping will implement and apply to all entries a biological filter generated to reflect some knowledge of the binding site on one of the proteins. This mainly concerns antibodies. My group can define the CDR regions using sequence information, if that is a good idea. 3.4) The targets The targets will be a set of non-homologous complexes (or homologous complexes if the binding modes are totally different). Unbound coordinates for at least one of the starting proteins (with side chains) must be available. The docking of monomers into multimers will NOT be included. Joel Janin will vet the list for biological validity. A list of unbound/unbound docking systems is available at the ICRF (www.bmm.icnet.uk). Other groups have their own test systems. However alternative coordinates or homologues should be used to prevent bias in favour of any groups data set. Or targets could be selected from different groups. 3.5) Evaluation The fraction of correct contacts will be scored. I suggest the same table to score the results both after the initial docking and after the biological filter. 3.6) End of benchmark We would aim to finish the benchmark and evaluation in time to discuss at in Dec 2002. 4) ACTION POINTS i) Discuss this document I would be happy to be on this - evaluation of CAPRI I would be happy to be on this. Actually we have generated a list with 46 targets. I sent the list to Joel and he added a few more. I will forward the email to every one in just a minute. - group to run Zhiping's program ![]() |