Validating XMLs

This document describes the metadata validations during Webin submissions. The following conventions are used in this document:

  • '@': denotes an XML attribute
  • CAPITAL: capital letters denote an XML tag
  • submission.xml: the XML file containing the Submission object
  • study.xml: the XML file containing the Study objects
  • sample.xml: the XML file containing the Sample objects
  • experiment.xml: the XML file containing the Experiment objects
  • run.xml: the XML file containing the Run objects
  • analysis.xml : the XML file containing the Analysis objects

Common XML validation

  1. The submitted XMLs are validated against the latest XML schemas.
  2. The @center_name and @broker_name values must be identical with the information registered in the submission account.
  3. The @alias is mandatory and should be unique with in a submission_account.
  4. If the submission account is a broker account and if the @broker_name has been given then it must match the broker name registered for the submission account.
  5. If the submission account is a broker account and if the @broker_name has not been given then the @broker_name attribute will be created based on information from the submission account.
  6. If the submission account is a normal account and if the @center_name has been given then it must match the center name registered for the submission account.
  7. If the submission account is a normal account and if the @center_name has not been given then the @center_name attribute will be created based on information from the submission account.
  8. The @center_name must be one of the pre-registered centers.
  9. The @broker_name must be one of the pre-registered brokers.
  10. The @refname and @refcenter used in RefNameGroup XML Type (e.g. in <EXPERIMENT_REF> in experiment.xml) must uniquely identify a previously submitted Object or an Object within the same submission. Alternatively @accession can be used to refer to other Objects. In this case the @accession must refer to a previously submitted Object.
  11. If IDENTIFIERS/PRIMARY_ID value is provided then it must match the @accession, if not provided then the IDENTIFIERS/PRIMARY_ID will be created with @accession value.
  12. If IDENTIFIERS/SUBMITTER_ID value is provided then it must match the @alias, if not provided then the IDENTIFIERS/SUBMITTER_ID will be created with @alias value.
  13. If IDENTIFIERS/SUBMITTER_ID@namespace value is provided then it must match the @center_name, if not provided then the IDENTIFIERS/SUBMITTER_ID@namespace will be created with @center_name
  14. If <*_SET> (e.g. <SUBMISSION_SET>) is missing from the submitted XML then this element will be automatically added. All Objects are expected to be nested within a <*_SET> element.
  15. It is possible to add new objects only by using <ADD> action in submission.xml.
  16. It is possible to update existing objects only by using <MODIFY> action in submission.xml.
  17. The <SUPPRESS> action in submission.xml is not supported.
  18. If <ADD> action is used in submission.xml then the @aliases in the objects being submitted must be unique and must not already exist for the given @center_name.
  19. If <MODIFY> action is used in submission.xml then the @aliases in the objects being updated must already have been submitted by the @center_name.
  20. Objects mirrored from NCBI are verified to contain @accessions with either SR or DR prefix.

Submission XML validation

  1. Only a single <SUBMISSION> is allowed in submission.xml.
  2. The @source and @schema are mandatory for <ADD>, <MODIFY> and <VALIDATE> action
  3. The @HoldUntilDate in <HOLD> should be of the data format yyyy-mm-dd.

Study XML validation

  1. The @existing_study_type is mandatory for <STUDY_TYPE>.
  2. <STUDY_TITLE> is mandatory.
  3. If the @existing_study_type="Other" please provide @new_study_type with in <STUDY_TYPE>.
  4. <STUDY_ABSTRACT> must be provided.

Sample XML validation

  1. An assigned INSDC Taxonomic identifier can't be unassigned.
  2. The <TAXON_ID> must be provided with a valid associated <SCIENTIFIC_NAME> and <COMMON_NAME>
  3. If the Sample Object being submitted is connected to a Study Object, then this Sample object will have the status of this Study object(eg: the sample will become private if the associated study is private)

Experiment XML validation

  1. The @NOMINAL_LENGTH is mandatory for <PAIRED>expriments.
  2. If the Experiment Object being submitted refers to a public Study object, then the Experiment object will have the same status as the Study object
  3. If <SPOT_DESCRIPTOR/READ_SPEC> with RELATIVE_ORDER is given then @follows_read_index and @precedes_read_index must be provided.
  4. <SPOT_LENGTH> must exist for ILLUMINA and for ABI_SOLID platforms.

Run XML validation

  1. The @run_center must be one of the pre-registered center names.
  2. The data file denoted by @filename in run.xml must be unique with in a submission.
  3. The <SPOT_DESCRIPTOR>, <PLATFORM> and <PROCESSING> are not supported
  4. In <FILE> the @checksum_method , @checksum, @filename and @filetype are mandatory
  5. The <READ_LABEL> must exist in the referenced experiment.
  6. If the Run Object being submitted will get the same status of the associated Study object.