Submission updates


FOR EGA submitters: Subscribe to the EGA submitter announcement list to receive the latest updates


Changes to requirements for sample metadata submissions effective from 15th July 2013

'Gender', 'Donor ID' (anonymised subject identifier) and 'Phenotype' should be provided for all samples submitted to the EGA. 

Gender should be provided as 'male', 'female' or 'unknown'.  Where gender is unknown due to sex chromosome aneuploidiescustom tag pairs should be provided named 'other' and the sex karyotype should be provided, for example, 'XXY'.

Phenotypes should be provided for all characteristics that are critical for facilitating analysis (for example, defining your tumour and non-tumour samples and/or defining disease state using controlled ontology terms)

The EGA recommends using the Experimental Factor Ontology Database for describing your sample phenotypes.

Gender, Donor ID and Phenotype are currently free text fields.  These fields are currently optional, with a view to the fields being mandatory in the near future.

Gender and Phenotype fields may be subject to controlled vocabulary in the future.

Submission documentation has been updated accordingly.


Changes to Analysis Schema effective from 15th July 2013



  • Allowed valued for EXPERIMENT_TYPE are:
  • Whole genome sequencing
  • Exome sequencing 
  • Genotyping by array
  • Curation
Added 'readme_file' filetype used for SEQUENCE_VARIATION and REFERENCE_ALIGNMENT.Added 'vcf_aggregate' filetype used only for SEQUENCE_VARIATION.Added 'tabix' filetype used only for SEQUENCE_VARIATION.File suffix for ‘vcf’ and 'vcf_aggregate' filetypes must be '.vcf.gz'.File suffix for ‘tabix’ filetype must be '.tbi'.Only one file of type ‘vcf’ or 'vcf_aggregate' is allowed in an analysis.Added 'other' filetype used only for SEQUENCE_VARIATION.Any number of files with ‘other’ filetype are allowed in an analysis.


SAMPLE_PHENOTYPE analysis type

  • Added new analysis type: SAMPLE_PHENOTYPE. 
  • Only allowed for EGA submissions.
  • Added 'phenotype_file' filetype to be used only for SAMPLE_PHENOTYPE. 
  • Each SAMPLE_PHENOTYPE analysis must have one 'phenotype_file'.

New XML schema effective from 15th July 2013: SRA.analysis.xsd

The Webin interface will be updated to accommodate these changes.


Changes to dataset schema effective from 15th July 2013.

  • DATASET_TYPE is mandatory.
  • Allowed values for DATASET_TYPE are:
    • Whole genome sequencing
    • Exome sequencing
    • Genotyping by array
    • Transcriptome profiling by high-throughput sequencing
    • Transcriptome profiling by array
    • Amplicon sequencing
    • Methylation binding domain sequencing
    • Methylation profiling by high-throughput sequencing
    • Phenotype information
    • Study summary information
    • Genomic variant calling

New XML schema effective from 15th July 2013: EGA.dataset.xsd

The Webin interface will be updated to accommodate these changes.


Changes to SRA XML Schema on 11th of August 2014


Added the following: 

  • new platform OXFORD_NANOPORE
  • 'MinION' instrument_model for platform OXFORD_NANOPORE
  • 'GridION' as instrument_model for platform OXFORD_NANOPORE 
  • 'HiSeq X Ten' as instrument_model for platform ILLUMINA  
  • 'NextSeq 500' as instrument_model for platform ILLUMINA  
  • 'Illumina HiSeq 1500' as instrument_model for platform ILLUMINA
  • New library strategy 'RAD-Seq' : RAD (Restriction site Associated DNA) Sequencing is a method for sampling the genomes of multiple individuals in a population using next generation DNA sequencing
  • New library selection 'Oligo-dT' :  Select primarily messenger RNA, which conveniently is polyadenylated so these transcripts can be captured with oligo-dT beads (mRNA-seq)
  • New library selection 'Inverse rRNA selection' : Remove the ribosomal transcripts by inverse selection: you capture them by annealing with specific oligos, also bound to beads, and then discard that (total RNA-seq)


  • Added EXPERIMENT_REF as a new optional element 
  • For REFERENCE_ALIGNMENT and SEQUENCE_VARIATION the ASSEMBLY element is made optional
  • For REFERENCE_ALIGNMENT and SEQUENCE_VARIATION the SEQUENCE element is made optional
  • Added 'transcriptomics' to SEQUENCE_VARIATION/EXPERIMENT_TYPE element


  • Made DATASET_TYPE optional.
  • Made TITLE mandatory.
  • Added  'Chromatin accessibility profiling by high-throughput sequencing' as DATASET_TYPE 
  • Added  'Histone modification profiling by high-throughput sequencing' as DATASET_TYPE 


  • Added 'main_contact' as attribute for CONTACT: accepts boolean value