|
home > depositions > AutoDep | AutoDep Deposition Tool | contact AutoDep |
![]() | |||||
|
|
Consultative Document for Deposition of Structure Factors20-November-1998 SummaryThis document describes a policy for the deposition of experimental data connected with X-ray crystallographic experiments on macromolecules. This document contains a substantial amount of detail. The current PDB policy for the deposition of structure factors states: It is very important that the structure factors are deposited. PDB will soon begin using these data for structure validation. You may choose to delay release of your structure factors or NMR restraints for up to four years from the date of publication. You must notify the PDB when your paper is published. If you wish the hold to be removed earlier, you must notify the PDB. PDB has chosen to follow the IUCr guidelines which state that coordinates may be held (before release) no longer than one (1) year and structure factors may be held no longer than four (4) years from the date of publication. PDB is applying the same guideline to NMR restraints data, allowing a maximum hold of four (4) years. See also the Protein Data Bank quarterly newsletter ( January 1998), for an article on structure factors and the PDB by Joel Sussman. The initiative to archive structure factors for X-ray diffraction studies was the result of strenuous efforts by Joel Sussman of the PDB, see for example the Nature note, Baker, E. N., Blundell, T. L., Vijayan, M., Dodson, E., Dodson, G., Gilliland, G. I. & Sussman, J. L. (1996). Crystallographic Data Deposition. Nature 379, 202. At present approximately 85% of the X-ray entries are deposited with structure factors and therefore the PDBe recommendation is: The PDBe Group Policy Proposal is that for X-ray diffraction entries structure factor deposition should be mandatory. The PDBe recommends continuing the hold policy adopted by the wwPDB. The issues then are:
1. The format of deposited structure factorsand 4. The format and information content of released structure factor filesThe current PDBe policy for AutoDep states: We encourage you to send your structure factors in ASCII format. If you are using CCP4, we suggest that you run the MTZ2various routine to convert the binary file into ASCII (see the CCP4 web page for mtz2various, which produces reflexion file for MULTAN, SHELX, TNT, X-PLOR, CIF or other ASCII format). For complete information on the CIF format that PDB is now using for structure factors, see the structure factor CIF dictionaryon PDBe's Web Site (here the PDBe mirror URL is given). See also Protein Data Bank quarterly newsletter for
The current PDB distribution format for the released structure factors is the result of an extensive cleanup effort of the legacy SF files held at the PDB. Previous to the PDB's design of a standard there was a large number of different formats deposited. This work was initially carried out largely by Dr. Jiansheng Jiang at the PDB. This format is an extension to mmCIF and has added information extracted from the entries. Currently further work is being carried out by the RCSB, the PDBe and much of the work is prompted from Gerard DVD Kleywegt, Alwyn Jones and Mark Harris at UPPSALA UNIVERSITY where a service, the Uppsala Electron Density Server processes all structure factors. It is hoped that most of the structure factor file problems found by this work will be corrected and re-released. These files are available from the PDB via
ftp ftp.ebi.ac.uk cd pub/databases/rcsb/pdb-remediated/data/structures/all/structure_factors The format adopted uses mmCIF data names for the structure factor values. The essential CELL and SYMMETRY information, together with other annotations, are presented in PDB Format records, as mmCIF comments. An example PDB file is available. The PDBe Group Policy Proposal is that the format for both deposition files and for re-distribution files be given in mmCIF format throughout. mmCIF tokens should be used for the structure factor data items and for the associated annotation, that includes the cell dimensions, space group symmetry and the relationship between the structure factor file and the coordinate file for an entry. Software currently in use for macromolecular crystallography is already capable of producing this format for deposition. For example, CNS has a macro that will produce a deposition file containing associated annotation and structure factors in the example CNS output [This is the work of Paul Adams and Ralf Grosse-Kunstleve]. Note: The current XPLOR and CNS structure factor file is not encouraged as a deposition file as these files are in essence macros to be read by the program control scripts and do not contain essential cell and symmetry information. For CCP4 the procedure mtz2various will produce a file of format for example using the script: mtz2various hklin 1718.mtz hklout 1718.cif <<eof OUTPUT CIF data_1718 LABI FP=mut1718_F SIGFP=mut1718_SIGF END eof which gives this output. The deposition of binary CCP4 MTZ files has been raised before. There are problems in treating deposited binary MTZ files, these are:
The PDBe Group Policy Proposal is therefore NOT to support the deposition of binary MTZ files with the current CCP4 implementation, and advocates that it is reasonable to request files in the well defined mmCIF format for which existing converting software is available. These methods are simple to use and give clearly defined and automatically parasable information. 2. The information content of the deposited file(s)The PDBe Group Policy Proposal is to request that the minimum information content should be h,k,l, Fobs and sigma_Fobs. Ideally additional information could be supplied that is sufficient to include the information required to re-generate the final electron density map and the final refinement. The PDBe accepts that the nature and extent of additional data items should come from the crystallographic producers of the data. A complete list of the current defined mmCIF structure factor data names is given here and links to the full definitions provided by NDB mmCIF web documents are given below, and other links may be found from any PDB mirror site by looking at the file mmcif.html (the PDBe Mirror URL is given here). An uploaded file containing any of these data tags and associated values will be automatically processed by both the RCSB and PDBe deposition services that are under development. Currently the PDB's AutoDep2.1 service would also be able to handle this type of information as the structure factor files are processed by annotators. 3. The use of structure factors in validation at the time of depositionCurrently structure factors are not used in the PDB's AutoDep deposition procedure. However work is underway to use these with Alwyn Jones' density server software. The PDBe Group Policy Proposal is to use structure factors at the point of deposition only in a check to match coordinates to the structure factors, in that the cell dimensions, space group symmetry and a standard R-factor calculation gives a value comparable with the deposited value. At some stage validation for deposition may be extended to use structure factors, giving the depositor an opportunity to comment on a possible gross difference between density correlation and expected values. However in all cases we are keenly aware that a structure determination was carried out to solve a particular problem. It can be argued that no structure is ever finished and one can always tinker away at improving the refinement. The deposition coordinates are a model from a particular experiment for a particular set of reasons. Validation is two fold, firstly to point out at deposition time that there may be extreme geometrical deviations which may be corrected and secondly to give confidence levels for global, per chain, per residue and per atom that can be used in selective search methods and evaluation of the properties of a hit list. Structure factor validation and density correlation factors can be held within the relational database - they are not held in the PDB formatted flatfile. Note: All PDBe validation procedures used will be those recommended by research initiatives such as the CRITQUAL initiative, [CRITQUAL an EU supported network, CT96-0189 : Coordinator Wilson (York), Jones (Uppsala), Kaptein (Utrecht), Lamzin (EMBL-HH), Thornton (London), Vriend (EMBL-HD), Wodak (Brussels) ], this would include any decision to use tools such as the SFCHECK procedure. As for example see Uppsala Electron Density Server and use of SFCHECK. 5. The use of structure factors to derive 'confidence levels' within a relational data base for use as search criteria and analysis of dataThe PDBe Group Policy Proposal is to make available the SQL for its relational database and all application software used to derive information held in the data base tables. The PDBe will treat derived structure factor information in much the same manner that for example B-values can be used as a measure of model quality. Search methods will be available to use for example density correlation values as optional selection criteria. 6. The method for the deposition of multiple sets of structure factors, e.g. MAD or MIR data sets related to the coordinates held for an entryThe deposition of derivative and MAD data sets would be welcome at the deposition centres. However, the data has to be labelled in such a way as to be useful. Simply to up-load a number of files to the deposition tool, that were created by the current software converters (e.g. MTZ2VARIOUS) would not give the deposition archive centres sufficient information to automatically relate the various data sets in the correct manner to the coordinates and experimental method(s) given in the annotated PDB entry. The data in the different files needs to be tagged with the correct relationships. To solve this, the PDBe harvest concept is now being pursued by all the deposition centres to encourage software developers to allow for incorporation of common labels for a project. The mmCIF structure does not allow multiple data sets within the same data_ block (multiple data_ blocks are allowed within the same file). The CCP4 convention of handling multiple data sets, allowing several F_obs values to be related to the same h, k, l columns has no equivalent in either mmCIF nor in CIF rules, i.e. one cannot present data in the form, data_my_entry loop_ _refln.index_h _refln.index_k _refln.index_l _refln.F_meas_[native] _refln.F_meas_sigma_[native] _refln.F_meas_[Derivative_Pb] _refln.F_meas_sigma_[Derivative_Pb] _refln.F_meas_[Derivative_Hg] _refln.F_meas_sigma_[Derivative_Hg] one would be required to deposit in mmCIF as
data_my_entry_native
loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas
_refln.F_meas_sigma
data_my_entry_Derivative_Pb
loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas
_refln.F_meas_sigma
data_my_entry_Derivative_Hg
loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.F_meas
_refln.F_meas_sigma
Alternatively each derivative may be deposited using the mmCIF Category Group _phasing_mir items with for example, loop_ _phasing_mir_refln.index_h _phasing_mir_refln.index_k _phasing_mir_refln.index_l _phasing_mir_refln.der_id _phasing_mir_refln.F_meas_au _phasing_mir_refln.F_calc_au _phasing_mir_refln.phase_calc _phasing_mir_refln.F_meas_sigma_au 0 0 4 HgCl4 197.8 1351.0 -180.0 1.9 0 0 4 AgNO3_1 206.7 1462.0 -180.0 12.3 0 0 4 AgNO3_2 367.3 1551.0 -180.0 36.7 Complete examples are given in the PDB documentation (see above). Within each data_block there will be the correct cell dimensions and symmetry for each data set (and other unique properties such as wavelength). CCP4 are developing a new MTZLIB that will have an extended header to connect each column with a project_name (the in-house equivalent to a PDB idcode) and a data_set_name (the in-house unique identifier per data set associated with the project_name ). The extended header will also carry cell dimensions and symmetry per data set. Once this MTZLIB version is in use by research groups then the automatic deposition of multiple datasets should become simple. 7. Adapting to future changes in refinement protocols where multiple data sets are refined togetherThe methods used in the determination of macromolecular structures are continually improving and changing. Future depositions may include more joint refinement methods. For example, with electron microscopy being used with X-ray diffraction. Another example is in the development of software to refine multiple sets of structure factors and coordinates in the same run. This can, for say a mutant and native pair of structures give a refinement that can reinforce common features an accentuate the differences. This leads to a situation where the nature of a single PDB entry as meaning one set of coordinates and one set of structure factors that the coordinates were refined against as no longer easily mapped. The mmCIF structure also is not currently capable of coping with more than one set of information per data block by using the same category twice. The PDBe development data base will be flexible and capable of being extended to map this type of future deposition. Even in the short term there may well be other innovations in refinement that will require changes to the data base and the export format(s). Developments are anticipated by the flexible design of the PDBe relational database. If you have any comments about this draft, please Contact the PDBe Group at pdbhelp@ebi.ac.uk |
|||