Model Data Blocks and Diffraction Data Blocks

Dale Tronrud (DALE@nickel.uoregon.edu)
Mon, 16 Oct 1995 23:40:50 -0700 (PDT)
Messages sorted by: [ date ][ subject ][ author ]
Previous message: Peter Keller: "Globally-defined categories?"
	   I want to discuss some ideas I have about the separation
	of the mmCIF data into a diffraction data block and a model
	data block.  I know the committee will not want to hear such
	basic matters being questioned at this time.  All I want
	to accomplish is to make the community aware of the problems.

	   As I understand the situation, the mmCIF committee and the
	PDB have decided to allow the diffraction data to be stored
	in one file and models to be stored in another.  I think it
	is very reasonable to divide the data in this fashion because
	of the different natures of the two kinds of data.  However,
	whenever you split data you have to examine each data group
	and decide which direction it must go.  I disagree with the
	details of this split as it appears to be implemented.

	   First, we must recognize that the mapping between these to
	data blocks is many to many.  For a given diffraction data set
	there will be many models.  For a given model there many be
	several data sets upon which it is based (e. g. a X-ray data
	block and a neutron data block, or X-ray and NMR data).  As I
	understand the current division for these files, the diffraction
	file contains the ID of the model data block but the model data
	block does not contain a pointer to the diffraction data block.

	   It is difficult to place a table of models in the diffraction
	data block because this file could be constructed prior to the
	solution of the structure (and the structure might never be solved
	leaving an empty list).  In my work I generate many models which
	need to be passed between program packages and would in a perfect
	world be in mmCIF.  The table in the diffraction mmCIF file would
	have to be updated almost daily.  I suggest that the list of
	models which depend on a diffraction data block be optional.  In
	an archive where both model and diffraction data blocks are stored
	you could put in a table complete within that restricted universe
	of models.  In the lab you would not maintain this table.
	
	   However, it would be easy to have the diffraction data blocks
	listed in the model data block because the programs need to know
	that information anyway.  This field should be mandatory for
	any model based on diffraction data.  However because there may
	be multiple diffraction data blocks the definition of this
	dependency must be in a loop construction.

	   Because many models will be based upon any particular diffraction
	file the calculated F's cannot be stored with the observed F's.
	While I recognize some (but not much) utility to storing the
	calculated F's, if you are going to have them they must be in
	the model's data block -- They are a property of the model alone.
	The many-to-many relationship between model and diffraction data
	cannot be represented when the Fc's are in the diffraction data
	file.

	   The second problem is what data groups goes into each data
	block?

	   Currently the data collection, data reduction, and agreement
	statistics are stored in the model file.  These data belong in
	the diffraction file.  They do not change when the model changes
	and it would be redundant to write the same values over and over
	again.  You also would have to place it all in loops to cover each
	diffraction data set.  With this information in the diffraction
	data block life becomes much simpler.  You do not have to place
	your statistics inside of loops nor do you have to the confusion
	of listing diffraction intensities from multiple crystals in a
	single loop.

	   For structures currently in the PDB without deposited structure
	factors one could construct small diffraction data blocks to
	contain the statistics but without structure factors.  These
	would be like the current PDB files which contain no coordinates.
	However they would contain the proper cross links (hyperlinks?)
	and data dependencies.  Their presence would make clear the huge
	gaps in deposition of these data and might encourage some people
	to deposit older diffraction data sets.

	   The final point I would like to make is the most outlandish,
	but does come as a natural progression of these thoughts.  The
	cell constants are not a property of the model and should not
	be stored in the model's data block.  If you have a model which
	was refined against two diffraction patterns in all likelyhood
	you will have two different sets of cell constants.  In the
	models already deposited the sets will be very similar but there
	are cases where people have refined models with constrained ncs
	between nonisomorphic crystal forms.  In such a case each
	diffraction pattern would not only have unrelated cell constants
	but different space groups.  The cell constants and space group
	belongs in the diffraction data block.

	   Placing the cell constants in the diffraction data block
	immediately solves another problem.  What are the cell constants
	of an NMR structure?  The concept does not apply.  If you
	place the cell constants in the diffraction data block you can
	make their presence mandatory and not affect NMR or theoretical
	models' validation.  Currently the provision of the cell constants
	cannot be mandatory.

	   Related to the cell constants is the deorthogonalization 
	matrix.  In fact the deorthogonalization matrix is a composite
	of two things, the cell constants and the convention.  The cell
	constants are a function of the diffraction data block (which
	indicates that there cannot be a single deorthogonalization matrix
	because there may be more than one crystal type).  This implies that
	the deorthogonalization matrix should be in the diffraction data
	block.  However it is possible that differing conventions might be
	used in different models implying that the orthogonalization
	convention should be in the model data block.  Since mmCIF seems to
	want the matrix and not its convention you must have a loop
	construction in the model data block which identifies each
	diffraction data block and the deorthogonalization convention used
	to move the model into that crystal's coordinate system.

	   It would be cleaner to simply list the convention and not the
	matrix but I don't know of a good way to do this in general.
	Currently mmCIF has the cell constants, the convention, and the
	matrix listed (or listable).  This information is redundant and
	should be consistent with itself.  Without a standard form to
	describe the convention I would not like to be assigned the job
	of writing the validation software.

	   If there is interest in this approach I could put more time 
	into filling in the details.


							Dale Tronrud
Previous message: Peter Keller: "Globally-defined categories?"