Re: struct_conn

Paula Fitzgerald (paula_fitzgerald@Merck.Com)
Tue, 19 Mar 96 14:32:20 EST


Herb Berntein wrote:

> struct_conn has ptnrn_label_comp_id, ...label_asym_id and
> ..label_seq_id as mandatory.  It would seem sufficient for these data
> items to be implicit, since ...label_atom_id is also mandatory.
> It is more likely the various ...auth_... tokens would be given
> to improve readability of a connectivity section than the
> ..label_ tokens, once the essential atom_id had been given.  Note
> similar problems exist in other categories, which could be resolved
> by making both the ...label_... and ...auth_... tokens implicit,
> rather than having one be mandatory and the other not.

And then added:

> Sorry, I goofed on the last message about struct_conn, since atom_id
> is the intra-residue atom_id, not the global atom_site.id, but the
> basic point remains -- it would be desirable, and sufficient, to
> specify either the ...label_ token or the ...auth_ token.

This raises an important issue, and deserves discussion, although when Helen,
John and I talked about this last night we didn't feel that we should change
anything in the way the dictionary is currently set up.

If you only knew how long we had sweated about all of this.  There are many
forces at play in the way we finally worked this out.  For instance,
_atom_site.id was created solely to solve formal problems with compatibility
with the core CIF.  It is a intended to be a arbitrary (and probably 
strictly numerical) index, and it will change every time you insert a new
atom into the list.  We don't think it provides a information containing
way of setting up the various structure descriptions tables (for instance,
_struct_conn.)

The various component of the _label identifiers are what we really want
people to use - we would like to see a world evolve in which there are
consistent nomenclatures for residue names, in which sequences are numbered
sequentially and numerically, and in which each "entity" (by our definition
of entity) is numbered from 1 to n.  That is what the _label identifiers must
do, and that is why we have made them the mandatory data items in the
structure descrption tables.

But we realize that we live in a world of independent nomenclatures, of
authors who want to use homology numbering schemes, and of authors who want
molecule 1 to be numbered 1 to 314 and molecule 2 to be numbered 1001 to
1314.  This is why we created the alternative _auth atom identifiers.  But we
quite deliberately made these identifiers a second set - the authors can use
them, and we are sure they will, but they must also use the more database
friendly _label identifiers.

This doesn't really answer Herb's question directly, but I hope it gives you
all the flavor of what we were trying to do, and why.

Paula (writing for all of us)

********************************************************************************
 Dr. Paula M. D. Fitzgerald  ______________ voice and FAX: (908) 594-5510
   Merck Research Laboratories ______________ email: paula_fitzgerald@merck.com
     P.O. Box 2000, Ry50-105     ______________ or bean@merck.com           
       Rahway, NJ 07065  USA 
         (for express mail use 126 E. Lincoln Ave. instead of P. O. Box 2000)  
********************************************************************************