Submit to EGA


Why submit your data to the EGA?

The European Genome-phenome Archive (EGA) offers services for archiving, processing and distribution for all types of potentially identifiable genetic and phenotypic human data at the European Bioinformatics Institute (EBI).

1.     Data sharing policies

Journals and funders increasingly require researchers to have a data sharing plan:

Wellcome Trust's "Policy on data management and sharing"

Nature "Availability of data and materials"

Public Library of Science (PloS) "Sharing of Materials, Methods, and Data"


2.     Experience

The EBI has run public databases that disseminate data to the wider scientific community for many years. 


3.     Security

The EGA is designed to provide an appropriate archive for data on subjects who have consented to the use of their individual genetic data for biomedical research, but not for unlimited public data release.


4.     Flexibility

Data can be submitted to the EGA prior to publication, at other significant milestones, and at study close in accordance with the Toronto statement


Is the EGA the right archive for my data?

The must suitable EBI archive for your data is dependent on the type of data you are wishing to submit and if the data requires public or controlled access. Public access is defined as complete and open access to all files submitted.  Controlled access, in the context of the EGA, requires formal applications to be made to access the submitted data files.  


 Links for submissions to ENA, EVAArrayExpress, BioSD and DGVarchive.


Should your submission be subject to controlled access?

Controlled access data is defined by the original informed consent agreements signed by the participants involved in your study, these consents prevent the derived data files from being dispersed by open and public access.  Controlled access data often consists of human data derived from medical research and consortium projects.   All data submitted to the EGA MUST be subject to controlled access as defined by the original informed consents.  If in doubt consult the informed consent agreements that apply to your study.

Controlled access does not correspond to holding a release prior to publication.  All EBI archive resources enable you to hold a submission before publication.


How is access to submitted data controlled at the EGA?

As part of the submission process, submitted data files are packaged into datasets.  Access to dataset/s are controlled by a Data Access Committee (DAC), which must be registered as part of the submission process. A DAC may consist of a single or several committee member/s that are responsible for making data access decisions in response to applications made by individuals wishing to access data.  A DAC may be responsible for approving access to single or multiple datasets.  

 An overview of the EGA data distribution model

Detailed information on the creation and operation of DAC's can be found here. 


How are data access decisions passed to the EGA?

A named individual, referenced on the DAC Access policy document, within the Data Access Committee (DAC) is provided access to the EGA DAC admin tools, which enable EGA accounts to be created and managed with access permissions for the dataset/s that fall under the responsibility of the DAC.  


What data types can be submitted to the EGA?

Data types accepted by the EGA can be split into three categories: Sequence, Array-based and Phenotypes.


All manufacturer-specific raw data formats for the major next generation sequencing platforms are accepted, including aligned BAM files and variation files in VCF format.

All array-based technologies are accepted, which may include the raw data, intensity and analysis files, and there are no restrictions on data formats accepted. 


Are there any sample specific requirements for EGA?

All samples submitted to the EGA must include the attributes of gender, donor ID (anonymised individual identifier) and phenotype information critical for facilitating analysis (for example, defining tumour and non-tumour samples and/or defining disease state) using controlled ontology terms.

The EGA recommends using the Experimental Factor Ontology Database for describing your sample phenotypes.


Key stages for all EGA submissions

The stages provide an overview of an EGA submission. Detailed descriptions of the key submission stages for your data type can be viewed for sequencearray based and phenotype submissions pages.

Subscribe to the EGA submitter announcement list to receive EGA submission updates




Complete a submission request form to provide details of the data type (sequence/array-based/phenotype) and estimated size of your submission.

Please inform us if your submission is associated with an existing project consortium, such as the International Cancer Genome Project (ICGC).

Submission, archiving and data processing leading to distribution can take several weeks.

Please contact us in advance, to ensure that your data is ready to release as required.

Please note: The EGA operates a queuing system for submission processing.  As a result, one submission CANNOT be prioritised over another. 




Receive submission pack, which will include:

i)   Submission account log-in for uploading your files and registering your metadata

ii)  Template for metadata (for array-based submissions ONLY)

iii) Web links to submission documentation relevant to your submission

iV) Submission statements and DAC Access Policy documents for completion and return to EGA Helpdesk




Prepare your files using EgaCryptor and upload EGA compliant files using FTP or Aspera.  




Register metadata* using Webin, which may include details of your study, samples, experiments, runs/analysis, policy and dataset/s 

*metadata required is dependent on the data type submitted

 --Metadata provided will be made publicly available to view on the EGA website and other EBI resource/partner websites--


**Detailed descriptions of the key submission stages for your data type can be viewed for sequencearray based and phenotype submissions pages**


What happens after the key submission stages have been completed?

All registered studies are automatically placed on hold until the named submission or DAC contact instructs our Helpdesk for the study to be released.      

When your study is released the named DAC contacts will be provided access to the EGA DAC Admin tools  to create and manage EGA accounts with access permissions to the dataset/s affiliated to the study.

Data is archived within our databases and prepared for encrypted distribution to DAC approved users.

We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.