Multiple sample BAM reads

The submission of metadata required for multiple sample BAM reads must be submitted using a combination of Webin and XML submission to the REST server, the guidelines for this workflow are described on this page. 

Large scale and/or frequent submitters may wish to consider submitting all your metadata programmatically to our REST server.  

If only multi-sample BAM or CRAM alignment files are submitted but not the original unaligned FASTQ files, then please ensure that the BAM or CRAM files also contain the unaligned reads.  This is critical to enable primary re-analysis and re-alignment of the dataset using new tools or future genome assembilies

**Metadata submitted as xmls or through the Webin tool will be made publicly available to view on the EGA website and other EBI resource/partner websites**



The metadata objects required for read submissions are as follows, accession prefix is provided in brackets:

Study (EGAS): Information about the sequencing study

Samples (EGAN): Information about the sequencing samples

Analysis (EGAZ): References the BAM file associated with the samples; each analysis can reference multiple samples that are linked to a single file

DAC (EGAC): Contains information about the Data Access Committee (DAC)

Policy (EGAP): Contains the Data Access Agreement (DAA); associated with DAC

Dataset (EGAD): Contains the collection of analysis data files to be subject to controlled access; associated with Policy

**Study, samples, DAC and policy metadata can all be registered prior to uploading files**

**The Analysis object must be submitted as an analysis XML to the REST server, all other objects may be submitted using Webin**


1) Register your Study, Samples, DAC and Policy using Webin

Go to the EGA Webin page and log in using your submission account name and password.  

Components must be registered individually, and can be registered in any order. 

Study (EGAC) 
Samples (EGAN)
Data Access Committee (DAC) (EGAC)
Data access policy (EGAP)


Register your Study (EGAS)

  • Go to the New Submission tab
  • Choose Register study (project), click Next and complete the web form
  • Click submit to accession your study

To use the study accession number in a publication, we suggest the following format:

"Sequence data has been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGASXXXXXXXXXXX."


Register your Samples (EGAN)

All samples must have 'gender', 'subject_id' and 'phenotype' attributes.

gender should be described as 'male', 'female' or 'unknown'.  If 'unknown' due to a known sex chromosome aneuploidy, please create a user defined attribute called 'Sex chromosome karyotype' and add the appropriate value, for example, 'XXY'.

subject_id should be a de-identified subject handle.  If unknown, please add 'unknown' to the field.

Phenotypes should, where possible, be an Experimental Factor Ontology accession.  If a term cannot be found to describe your phenotype please use free text.  All sample phenotypes considered important for further analysis of the data should be provided (for example, tumour type), additional phenotype attributes can be created by defining your own attributes; use the notion 'phenotype2', 'phenotype3', etc.

  • Go to the New Submission tab
  • Choose Register samples and click Next







Register your Data Access Committee (DAC) (EGAC)

Further information on the role of your DAC can be found here.

  • Go to the New Submission tab
  • Choose Register Data Access Committee (DAC) and click Next and follow the online prompts 


Register your Data access policy (EGAP)

Your Data access policy provides the terms and conditions of data use, this is also referred to as the Data Access Agreement (DAA).

Completion of a DAA by the applicant/s should form part of the application process to the Data Access Committee (DAC).

  • Go to the New Submission tab
  • Choose Register Data access policy and click Next and follow the online prompts 


2) Submit your Submission and Analysis (EGAZ) XML to the REST server

Webin does not currently support the submission of Analysis objects (aligned BAM files).  We are working on adding this functionality to Webin, but in the meantime, we require that all submitters complete an analysis xml to upload to the REST server.  

Below you will find a step by step guide of the process.  Please contact should you require additional support.

i) Prepare a submission XML and Analysis XML - click on the links to be taken to a description and example of each xml.  The latest xml schemas can be found here.

ii) Upload your Submission XML and Analysis XML to the REST production server:

   **The field marked 'Location in the drop box' can be left blank**

iii) Upon successful submission to REST you will obtain analysis accessions (EGAZXXXXXXXXXXX) for use in your dataset. Be sure to keep a copy these accessions for use later.


3) Submit your dataset (EGAD) using Webin

The dataset describes the data files, defined by the run (EGARXXXXXXXXXXX) and analysis (EGAZ00000000000) accessions that make up the dataset and links the collection of data files to a specified Data Access Committee and Data access policy.

As a result, you must have registered your Analysis,  Data Access Committee (DAC) and Data access policy before submitting your Dataset. 

Please consider the number of datasets that your submission consists of, for example, a case control study is likely to consist of at least two datasets.  In addition, we suggest that multiple datasets should be created for studies using the same samples but different sequence technologies.  Please contact EGA Helpdesk for further assistance.

  • Go to the New Submission tab
  • Choose Submit Dataset and click Next
  • Select/Register Data Access Committee (DAC) and Data access policy
  • Register your dataset

  • After submitting your dataset you should contact the EGA Helpdesk to provide a release date for your dataset.

 Datasets are automatically held (i.e. not released) unless they are affiliated to a study that has already been released.  

**Metadata submitted as xmls or through the Webin tool will be made publicly available to view on the EGA website and other EBI resource/partner websites**


What happens after the submission of a dataset?

All datasets affiliated to unreleased studies are automatically placed on hold until the authorised submitter or DAC contact instructs our for the study to be released.

Datasets affiliated to released studies will automatically be released.

When your study progresses is released the named DAC contacts will be provided access to the EGA DAC admin tools  to create and manage EGA accounts with access permissions to the dataset/s affiliated to the study.

Further information regarding the role of the Data Access Committee can be found here

Finally, your data is archived within our databases and prepared for encrypted distribution upon the request of permitted EGA account holders.

We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.