The European Nucleotide Archive (ENA) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. More information about ENA is available here.
The Sequence Read Archive (SRA) is part of ENA that provides a repository for sequence read and analysis data (such as alignments and variation calls).
Data submitted to ENA is associated directly, or indirectly via samples, to taxids. Please contact firstname.lastname@example.org to request for taxids for newly sequenced organisms.
Environmental samples, to which sequencing is applied without isolation of individual species or strains from the sample, are labelled in ENA with one of a range of 'environmental sequence' taxids, that relate to the sampling environment rather than to a conventional taxonomic grouping. The list of these taxids is available in the 'Navigation' tab from the following ENA Browser page: http://www.ebi.ac.uk/ena/data/view/Taxon:408169
Or as a result from the ENA Advanced Search.
Which taxid should I use for a mixed community of organisms (e.g. for 16S rRNA sequencing)?
All samples of mixed communities for the purposes of metagenomics, metatranscriptomics, diversity analysis, etc., must use environmental taxids. Please refer to the previous answer for instructions on how to obtain the list of environmental taxids.
The submission process consists of two steps. First you need to upload your data files only into your drop box. After this metadata is submitted interactively using SRA Webin or programmatically using SRA Rest. More information about submissions is available here.
We only accept GZIP and BZIP2 compression formats. Especially we don't accept 7-ZIP or TAR compressed files. More information about supported Fastq file formats is available here.
ENA does not currently accept multiplexed data. Submitted data files must only contain reads from a single sample. Demultiplexing can be done using tools such as sfffile.
Further instructions for demultiplexing sff files is available here.
I tried to upload my files using FTP but failed because of '553 Could not create file' error. What should I do?
Most command line FTP clients use passive FTP data transfer that may be blocked by local firewalls in their default configurations. Other FTP clients, such as FileZilla, use active FTP data channels, which are by default allowed by firewalls.
Please contact your local systems administrator to allow passive FTP data transfers or use an FTP client that supports active FTP data tranfer. Instructions on using FileZilla to upload data are available here.
You can edit your records using SRA Webin.
I have received an error notification by email about my submission: "we have found problems with one or more of your submitted files"
The email will include a table of your files and the associated errors. The table has the following columns:
FILE_NAME | ERROR | MD5 | FILE_SIZE | DATE | RUN_ID/ANALYSIS_ID
The most common error in the ERROR column is "missing file" or "invalid file checksum". During your submission you provided an md5 checksum for each file (or if using our ENA uploader client it was calculated automatically and you may not have noticed).
If the file transfer is interrupted or not 100% complete then the file will no longer have the same md5sum as the original file. We need to pick these up because we will not be able to process broken or partial files. "missing file" simply means that there is no file in your ftp area with the same name that is in the FILE_NAME column AND with an md5sum matching the one that was registered (MD5 column).
You should take 1 of 2 possible actions:
- The transfer was not 100% successful and the file is corrupted. Solution: please try the upload again and ensure that you are using ftp in binary mode (especially for compressed files)
- You may have registered the wrong md5sum. Check your local copy of the file and compare the md5sum with the one in the MD5 column of the table.
If it does not match then you should update the metadata for the file. You can do this in Webin by finding the run in the 'Runs' tab and selecting 'edit' or by finding the analysis (e.g. genome assembly) from the 'Analyses' tab and selecting ‘edit’.
- Go to the Advanced Search in the ENA Browser
- Select 'Read' domain
- In the 'Taxon name' field, enter: Bos taurus
- In the 'Library strategy' field, select: RNA-Seq
Genome assembly questions
For questions related to genome assembly submission, please go here