spacer

IIMS Workshop - FINAL REPORT - SECTION III: SCHEMATIC DESCRIPTION OF THE PROJEC



Overall objectives of the project:

In this project our overall goal will be to develop a prototype system to integrate the results of 3D-EM with existing data models from X-ray and NMR methods. The integration of the results of the three major experimental techniques of three-dimensional structural determination into a single standardised data base, which will be made public from the EBI, represent a strategic change in the present definition of the information content of biological data bases. Further aims are; (1) to develop validation techniques for data from 3D-EM projects. These developments will certainly have a different form in close-to-atomic-resolution cryo electron microscopy data than in other areas, but general quality measures will emerge from this joint approach; (2) to simplify data capture for 3D-EM techniques through test development in data harvesting technology to simplify the deposition of data to the new integrated database; (3) to provide access to the prototype system and validation methods by focused access to the www servers at the partners sites, as well as more general public access at the EBI.

Experimental approach and working method:

The aim is to make a logical analysis of the steps involved in all 3D-EM experiments and results to produce a data model that represents the complete system. The particular data items and structures that have been characterisd are then defined in a physical model dictionary using mmCIF and XML terms. This data model for 3D-EM exeriments has then been integrated into an Oracle Relational Database containing a similar represenation for the Protein Data bank structutres, the E-MSD. Developments based on the meta data represenation are to be in the development of a web based submission tool and for development of validation tools and a search and retrieval interface.

Achievements and results:

Complete meta data description of a 3D-EM experiment
Additional templates for describing 3D-EM experiments for the PDB
Modifications to the PDB submission tool http://autodep.ebi.ac.uk to accept submissions from 3D-EM studies that involve fitted coordinates
First draft of an approach to 3D-EM volume map submission and storage.
Fully operational Electron Microscopy Volume map deposition system and data base.
Complete meta data description of a 3D-EM experiment
Complete XML metadata description for the basic EM experiment
International workshop held with software developers

The five most relevant publications emanating from the project:

Gordon Conference Presentation June 2001 http://iims.ebi.ac.uk/final/D16_poster_grc_2001.pdf

Complete mmcif dictionary for iims data items http://iims.ebi.ac.uk/iims_dictionary/iims.dic

New electron microscopy database and deposition system, M. Tagari, R. Newman, M. Chagoyen, J.-M. Carazo and K. Henrick TRENDS in Biochemical Sciences, 27, 589 (2002).

Workshop promoting Software Development in the field of High Resolution Electron Microscopy Structure Determination, November 15-16, 2002, Genome Campus, Hinxton Hall, Cambridge, UK.

EMDep: a web-based system for the deposition and validation of high-resolution electron microscopy macromolecular structural information K.Henrick, R. Newman, M. Tagari and M. Chagoyen, 2003 J. Structural Biology, in press

Supplementary

Supplementary File List for IIMS Final Report
(see attached CD)

D1.doc
D16_grc_workshop_flier.pdf
D16_paper_emdep.pdf
D16_poster03_grc_2003.ppt
D16_poster_grc_2001.pdf
D16_workshop.html
D2.doc
D2.txt
D3.cif
D3.dic
D3.xls
D3.xsd
D3_image.html
D3_map.html
D3_xml.html
D5_dbSchema_em_3drecon.pdf
D5_dbSchema_em_assembly.pdf
D5_dbSchema_em_data.pdf
D5_dbSchema_em_exp.pdf
D5_dbSchema_em_fitting.pdf
D5_dbSchema_processing.pdf
D5_dbSchema_refs.pdf
D6.pdf
M1_emd_xml_schema.pdf
M3.doc
M3_EMDB_PDB.doc
M3_EMDEP_Entries.doc
D7_emdep_software.tar

Project Meeting

Discussion Meeting of Project: Integration of Information about Macromolecular Structure (IIMS)

Participants present 23rd March 2001.

Participant Number 1 (EBI): Project Co-ordinator: Dr. Geoff Barton (GB)
Dr. Richard Newman (RN)

Participant Number 2 (CSIC): Dr. Jose-Maria Carazo (JC)
Ms. Monica Chagoyen Quiles (MC)

Participant Number 3 (UOXF.H5): Mr. Rishi Matadeen (RM)

Participant Number 4 (MPG): Dr. Michael Radermacher (MR)

Agenda:

  1. Comments on Discussion Document 3DEM Metadescriptors (previously circulated).
  2. Decide on those IIMS descriptors from (1) above for 3D-EM which are mandatory and those which are optional.
  3. Timetable in terms of presentation to wider EM community.

Meeting Report:

1. Metadescriptors

Discussion of those Metadescriptors that were mandatory and those which were optional was begun by working through the Discussion Document of 20th Feb 2001 from Monica Chagoyen (MC) and Richard Newman (RN).

A list of mandatory and optional data-items for the 3D-EM/MSD were generated and those areas requiring further discussion were outlined and plans made to complete the 3D-EM Descriptors list.

2. Recruitment

Positions for Partners 3 and 4 were unfilled following initial advertisement and it was agreed to readvertise the positions.

3. Timetable

Plans were drawn up to present examples of 3D-EM data-deposition, with the mandatory and optional data-items, at the 3DEM Gordon Conference from 24-29th June, 2001.

In addition it was agreed that the IIMS project be represented in a general discussion at the Gordon Conference in the form of a joint meeting with the RCSB and EBI. Together with a Poster presentation.

4. File Format

Consideration of image format and additional header information from that currently in use by CCP4 format image files will also be pursued.The aim here would be to be able to store useful information, generated by machine, during the processing of EM data.

Release Time:
Michael Radermacher (MR) raised the issue of release period and thought it Vital to have up to a five year lock-in period for volume data and structure factors.

This was agreed by the participants.

It was also agreed that journals should be approached to advise that a PDB code be necessary before publication of 3D-EM structures, as is the case for X-ray and NMR structures.

3D-EM Metadescriptors:
This resulted in the document (cf below) which will be circulated to all participants for approval. Additionally it was recommended (MR) that consultation of those parameters for helices be asked for from a helical structures specialist. RN agreed to consult Dr. Tony Crowther (LMB, Cambridge) on the subject.

Required Metadescriptors for the 3D-EM MSD

Remarks within * for comment please.

MANDATORY DATA-ITEMS:

GENERAL INFORMATION

Author(s)

Corresponding author:

Name

Affiliation

Reference paper

Release information for volume data: (choose one of the following)

[ ] immediate release

[ ] release on publication

[ ] lock-in

please, give desired hold period (up to 5 years): _____ years

DATA INFORMATION

EM map / volume

file /* did we agree finally on CCP4 format? RN -yes but extend header information-work in progress */

title

size:

number of columns

number of rows

number of sections

spacing (nm):

along columns

along rows

along sections

enforced symmetry

volume origin (or center position): x, y, z /* which is the standard coordinate system we adopt for 3D-EM MSD maps? We need to define the relationship between (x,y,z) and (col,row,sec) unambiguosly, or refer always to col,row,sec

RN:We have attached a suggested co-ord scheme at end of this document*/

/* Pending Crystallographic issues. Relevant points here:

1. We are interested in getting from the authors a map of the biologically relevant unit (as is the case of the PDB/MSD for atomic structures)

2. To be homogeneous across the database, maps should be represented in a rectangular coordinate system.

3. Crystallographers work usually with the unit cell as the coordinate frame:

MC- Questions: - both in real and reciprocal space?*/RN yes but real space co-ords submiitted to PDB - which is the content of a map reconstructed by crystallography? (unit cell and the asymmetric unit?)*/RN asym. unit - are we asking also for these maps?*/ RN yes for 2D and 3D xtals

MC- Do we solve it if we ask for the map of the biological unit (in a rectangular coordinate frame) AND the final structure factors? In that case, structure factors for crystallography should be also required.

*/RN-our EM volume map should be treated like the atomic coords map submitted to the PDB by xray crystallographers and NMR spectroscopists structure factors should be optional as for x-ray structures

*/RN-we should bear in mind the size of the volume map and how long it would take to transfer-if we could apply symmetry operators to the asymmetric unit and recreatethe biological entity that way it would reduce transmission times for deposition-*

  • resolution estimate:
  • value along rows (A)
  • value along columns (A)
  • value along sections (A)
  • /* the exact way of describing how these numbers were obtained depends on the reconstruction schema */

For single particles:

  • FSC curve
  • Pictures/Illustrations (at least one): /* for immediate release */
  • file
  • Three orthogonal slice images: /* for immediate release */

row slice image file corresponds to row number in volume column slice image file corresponds to column number in volume section slice image file corresponds to section number in volume

BIOLOGICAL INFORMATION

Macromolecular complex:

Name

Aggregation state: /* a better name for this? */

(choose one of the following)

[ ] 3D crystal

[ ] 2D crystal

[ ] helix

[ ] icosahedral

[ ] single particles

[ ] individual structure

Name

Type (e.g. protein, RNA, DNA, lipid, sugar, nucleotide, metal ligand, etc.)

Source:

[ ] engineered source (i.e. expression system)

[ ] natural source

Name of (natural or gene) source organism

Engineered mutation(s) [y/n]

Additional information for (icosahedral?) viruses /* Steve, please check */

empty (y/n)

enveloped (y/n)

EXPERIMENTAL DETAILS

Microscope type:

[ ] TEM

[ ] STEM

Specimen temperature range: /* we need to work on the options allowed here */RN OK now?

[ ] helium temperature

[ ] nitrogen

[ ] other

Reconstruction method:

[ ] Lattice line fitting (i.e. crystallography)

[ ] Layer line (i.e. helical)

[ ] Spherical harmonics (i.e. icosahedral)

[ ] Other (including different single particle approaches and tomographic methods)

Crystal parameters:

unit cell data for 2D:

(gamma) (x, y)

plane group symmetry (for 2D crystals)

unit cell data for 3D:

(alpha,beta,gamma)

(x,y,z)

space group symmetry (for 3D crystals)

common phase origin ("undefined" value should be allowed)

/* how is it expressed? coordinate points in the 3D space??? */

Helical parameters: /* to be defined (Richard) */RN helix repeat C

layerline data:

rho (reciprocal radial coordinate)

amplitude, phase.

layerline value n (the Bessel function order)

layerline number l

Icosahedral parameters: /* to be defined (Steve) */

(RN: PDB PROCEDURE FOR PREPARING ATOMIC COORDINATE DATA

1. ORGANIZATION OF COORDINATE SECTION

The complete asymmetric unit must be deposited except for virus coat proteins.)

CONVENTIONS FOR 3D-EM Data at the 3D-EM MSD

1. We propose to adopt the same Coordinate system as the PDB data, i.e. The Right-Handed Cartesian Coordinate System (X, Y, Z) where

X = Y x Z

Y = Z x X

Z = X x Y

Graphically:

(out of page) Z .-------------- Y

|

|

|

|

X

Basic Matrix operations for rotation and translation in PDB are given by:

R: rotation matrix

T: translation matrix

|X'| |R11 R12 R13| |X| |T1|

|Y'| = |R21 R22 R23|*|Y|+|T2|

|Z'| |R31 R32 R33| |Z| |T3|

In PDB files this type of operations are found, for example, in the MTRIXn records fortransformations expressing non-crystallographic symmetry:

MTRIX1 = (M11, M12, M13, V1)

MTRIX2 = (M21, M22, M23, V2)

MTRIX3 = (M31, M32, M33, V3)

We propose also to adopt this convention for the representation of relative orientations.

2. Mapping 3D-EM volumes in Cartesian Coordinate System.

Each 3D-EM map has its own coordinate system in terms of the rectangular grid formed by its voxels (i.e., in terms of colums (fastest), rows and sections (slowest) when represented in a 1D array)

Our proposal for mapping a 3D-EM Volume grid in the PDB Coordinate system is the following:

A. (col, row, sec) along (Y, X, Z)

B. Plus additional information on

- StartingVoxel = O or 1 (first voxel is labelled as 0 or as 1) Our proposal: first voxel labelled as 0 Additionally, the author should provide the following information:

- Spacing = (size_col, size_row, size_sec) (in nm or A)

- OriginMapping = O or O.5 (at voxel vertex or center)

- VolumeOrigin (or center position): (O_col, O_row, O_sec)

This will position unambiguosly the final 3D-EM map within the same coordinate system as the atomic resolution data, allowing to define other geometric elements within this coordinate frame (e.g. symmetry axis), as well as the relative orientations between two structures (two 3D-EM maps or a 3D-EM map and an atomic model).

RN: PDB COORDINATE SYSTEMS AND TRANSFORMATIONS

a. The coordinates distributed by the Protein Data Bank give the atomic positions measured in Angstroms along three orthogonal directions. Unless otherwise specified, the default axial system (detailed below) will be assumed.

b. If a, b, c describe the crystallographic cell edges and A, B, C are unit vectors in the default orthogonal Angstrom system, then:

1) A, B, C and a, b, c have the same origin.

2) A is parallel to a.

3) B is parallel to (a X b) X A (cross product between C and A).

4) C is parallel to a X b (i.e., c*) (cross product between a and b).

c. The matrix which premultiplies the column vector of the fractional crystallographic coordinates to yield the distributed coordinates in the A, B, C system is:

ab(cos(gamma))c(cos(beta))

0b(sin(gamma))c(cos(alpha) - cos(beta) cos(gamma)) / sin(gamma)

00 V/(ab sin(gamma))

where V = abc(1 - cos**2(alpha) - cos**2(beta) - cos**2(gamma)

+ 2(cos(alpha) cos(beta) cos(gamma)))**1/2

d. You need to supply along with the coordinates:

1) A transformation from the submitted to the orthogonal coordinates that will be distributed by the PDB.

2) A transformation from the submitted to fractional crystallographic coordinates.

e. The distributed entry will contain:

1) ORIGX - transformation from the distributed to the submitted coordinates.

2) SCALE - transformation from the distributed to the fractional coordinates. If the submitted coordinates are fractions of the unit cell edges or are in the default orthogonal system, the ORIGX and SCALE transformations will be given default values.

f. The MTRIX transformations express approximate or exact non-crystallographic symmetry elements in the structure. Provide these in the space of the submitted coordinates. These transformations will be transformed so that they operate in the distributed coordinate system.

 


The project is funded by the European Commission as the IIMS,
contract-no. QLRI-CT-2000-31237 under the RTD programme
"Quality of Life and Management of Living Resources"

spacer