Pathogen surveillance data submission instructions
Introduction
This page provides instructions for submitters of genome-scale pathogen sequence data to the European Nucleotide Archive (ENA). It includes a minimal checklist of information to be reported for such studies, developed in collaboration with the Global Microbial Identifier initiative (GMI), and points the user towards detailed instructions for different categories of submission. The scope of this text is submissions relating to next generation sequencing platforms used in high-throughput genome-scale surveys of pathogens in clinical, animal and environmental samples.
Checklist of required fields
The checklist that we present here is intended to assist practically those preparing their data for submission to the ENA. We do not propose that the information described as mandatory here is necessarily sufficient for successful reproduction of experimental findings and wish to note that the broader reporting standard framework, MIxS, exists that serves this purpose.
Broadly, the components of a submission are the sequence data themselves (raw sequence reads are mandatory, while assembly information is optional) and contextual data. The figure below lays out the fields, highlighting those that are mandatory and those that are recommended. Please note that information reported in these fields, with the exception of the sequence_reads, taxon and organism_name fields, should be directed towards extended sample object attribute fields in sample records (as TAG:VALUE pairs), using the field names given in the figure. Submission route-specific instructions are given for this in the submissions instructions.
Please click on the image below for an enlarged view.
We present this checklist as a living document that we expect over time will be edited and updated according to emerging methods and practises and community feedback, which we welcome at datasubs@ebi.ac.uk.
Submission routes
Both interactive and programmatic tools are available to aid in the submission of data to ENA. For instructions on submissions of raw data alone, instructions upon programmatic submissions are here, while the interactive submission tool is available from here. For submissions of assembly information with raw read data, please refer here. Please note that we welcome submission enquiries and requests for assistance at datasubs@ebi.ac.uk.
Discovery and retrieval
Data submitted as part of pathogen surveillance studies under the Global Microbial Identifier initiative are labelled as such using the keyword 'GMI' in ENA study records. Samples records representing samples meeting the minimal requirements for contextual data are labelled with the keyword 'GMI:MDM:<version>'. ENA search services can be used to discover and access these records.


