Re: mmCIF: A few little things, and one big problem.
Paula Fitzgerald (paula_fitzgerald@Merck.Com)
Thu, 10 Aug 95 16:58:51 EDT
Peter Keller wrote:
> Hi Paula,
>
> I thought that I had just about finished with playing around with the
> dictionary, but in tightening up my cifdic->symbol table code, I have
> opened up a whole new can of worms.
>
> Three small points:
>
> line 1795: '_struct_sheet_gen.label_seq_id' should be
> '_struct_site_gen.label_seq_id'
>
> lines 13200 and 13231: item _phasing_MIR.entry_id is repeated on the
> following lines.
>
> lines 21374 and 21391: '_refln_A_meas' is defined as an alias for both
> _refln.A_meas and _refln.A_meas_au. By analogy with _refln.B_meas*, the
> latter alias should be deleted.
There are all fixed.
> Now for the big point (and this is related to my recent mailing to mmddl
> news, and John's reply). I agree very strongly with the convention you
> have adopted, of providing a save frame definition for every single item.
> I see this as equivalent to compulsory declarations of identifiers in a
> programming language such as C, and without it, it would be impossible to
> find spelling errors such as the one above. BUT, this does create the
> possibility of defining characteristics of a child data item in two
> places, and this has happened for about 260 data items. Take
> _phasing_mad_clust.expt_id as an example. Its declaration is:
>
> save__phasing_mad_clust.expt_id
> _item_description.description
> ; This data item is a pointer to _phasing_mad_expt.id in the
> PHASING_MAD_EXPT category.
> ;
> _item.name '_phasing_mad_clust.expt_id'
> _item.category_id phasing_mad_clust
> _item.mandatory_code yes
> _item_type.code char
> save_
>
> However, _item.category_id and _item.mandatory_code are also defined for
> this item in the save frame of the parent item:
>
> save__phasing_mad_expt.id
> .....
> loop_
> _item.name
> _item.category_id
> _item.mandatory_code
> '_phasing_mad_expt.id' phasing_mad_expt yes
> '_phasing_mad_clust.expt_id' phasing_mad_clust yes
> ......
> ......
> _item_type.code char
> save_
>
> As far as I can see, the multiply-defined items don't conflict, but there
> is nothing to prevent them from doing so, and in the long run, it will
> make the dictionary harder to maintain. As I understand the DDL (and
> John's reply to my comments about _item.mandatory_code on mmddlnews), all
> the characteristics of a child item should be defined in the save frame of
> the parent item, and the placeholder save frame should only contain
> _item_description.description and _item.name. Put another way, it should
> be possible to remove the save frame declaration of the child from the
> dictionary entirely, without losing any information.
>
> [He wrote: It was agreed that in order to provide an easier integration
> with older dictionaries that there be a placeholder definition for every
> item in the mmCIF dictionary. This really results in a large number of
> essentially redundant definitions for data items that are children of
> other items. In these cases only the definition of the data item and
> perhaps the item name have been specified in the mmcif dictionary.]
>
> To conform to this view, the two save frames above would need to be
> changed to:
>
> save__phasing_mad_clust.expt_id
> _item_description.description
> ; This data item is a pointer to _phasing_mad_expt.id in the
> PHASING_MAD_EXPT category.
> ;
> _item.name '_phasing_mad_clust.expt_id'
> save_
>
>
> and
>
> save__phasing_mad_expt.id
> .....
> loop_
> _item.name
> _item.category_id
> _item.mandatory_code
> '_phasing_mad_expt.id' phasing_mad_expt yes
> '_phasing_mad_clust.expt_id' phasing_mad_clust yes
> ......
> ......
> loop_
> _item_type.name
> _item_type.code
> '_phasing_mad_expt.id' char
> '_phasing_mad__clust.expt_id' char
> ......
> save_
>
>
> Note that I have moved _item_type.code out of the child's save frame, and
> into the parent's. Even aliases should be taken out of the placeholder
> save frames, and put in a
> loop_ _item_aliases.name _item_aliases.alias_name construct.
>
> I don't know what tools you are using, but if the thought of doing all
> this is too much for you, I'd be quite happy to help (I could adapt my own
> code to do a lot of it quite easily). As a priority, the multiply defined
> items should be removed (virtually all of them are for
> _item.mandatory_code), and then I could think about moving the others.
> Using the data model which I have put together, I don't believe that it
> would be hard for me to do.
>
> Please let me know what you think.
> Regards,
> Peter.
>
>
> ========================================================================
> Peter Keller. \
> Dept. of Biology and \ "Not even the greatest nonsense is beyond
> Biochemistry, \ the reach of human invention."
> University of Bath, \
> Bath, BA2 7AY, UK. \ --- Ryszard Kapuscinski
> ------------------------------\-----------------------------------------
> Tel. (+44/0)1225 826826 x 4302 | Email: P.A.Keller@bath.ac.uk (Internet)
> Fax. (+44/0)1225 826449 | P.A.Keller%bath.ac.uk@UKACRL (BITNET)
> ========================================================================
I'm not going to go into a discussion of why we decided to carry _item.name
and _item.mandatory_code in the stand-alone definitions for each of the data
items that also definted as a child in a parent tree. In fact, although I
can remember the discussion about adding _item.name, I can't recall why it
was necessary to add _item.mandatory_code.
But we did, and the only thing I care about (at least on this list) at this
point is that we do what we do consistently. The example that Peter points
out (_chem_link.type_comp_1) strikes me as just being a violation of consist-
ency and not a fundamental theoretical issue. I have fixed that problem, and
a number of others just like it (most of which I added inadventently in the
headlong rush to get things pulled together for Montreal), and have declared
yet another version.
[I don't want stiffle creativity on the more basic issues that Peter has
raised, but I suggest that a continuing discussion along those lines is
more appropriate for the DDL list than for this one].
The audit trail for the new changes:
0.7.23 1995-08-10
;
Changes (PMDF):
+ Changed _struct_sheet_gen.label_seq_id to _struct_site_gen.label_seq_id
in _atom_site.label_seq_id tree
+ Removed duplicate entry of _phasing_MIR.entry_id in _entry.id tree
+ Removed alias in definitionof _refln.A_meas_au
+ Removed _item.category_id from
_chem_link.type_comp_1
_chem_link.type_comp_2
_phasing_mad_clust.expt_id
_phasing_mad_set.clust_id
_phasing_mad_set.expt_id
_phasing_mad_set.set_id
_phasing_mad_ratio.expt_id
_phasing_mad_ratio.clust_id
_phasing_mad_ratio.wavelength_1
_phasing_mad_ratio.wavelength_2
+ Removed _item_type.code from most of the above (it wasn't there in all
of them).
+ Added _item.mandatory_code to _phasing_mir_der.der_set_id
+ Corrected _item.name for _phasing_mad_ratio.wavelength_2
;
Bye for now - Paula
********************************************************************************
Dr. Paula M. D. Fitzgerald ______________ voice and FAX: (908) 594-5510
Merck Research Laboratories ______________ email: paula_fitzgerald@merck.com
P.O. Box 2000, Ry50-105 ______________ or bean@merck.com
Rahway, NJ 07065 USA
(for express mail use 126 E. Lincoln Ave. instead of P. O. Box 2000)
********************************************************************************