Frequently asked questions

Do you have a question about ENA? We have a selection of frequently asked questions below:

If you do not see your question here, please contact us.

General questions

What is ENA?

The European Nucleotide Archive (ENA) provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. More information about ENA is available here.

What is SRA?

The Sequence Read Archive (SRA) is part of ENA that provides a repository for sequence read and analysis data (such as alignments and variation calls).

Taxonomy-related questions

How do I request a new taxid?

Data submitted to ENA is associated directly, or indirectly via samples, to taxids. Please contact datasubs@ebi.ac.uk to request for taxids for newly sequenced organisms.

Which taxid should I use for environmental samples?

Environmental samples, to which sequencing is applied without isolation of individual species or strains from the sample, are labelled in ENA with one of a range of 'environmental sequence' taxids, that relate to the sampling environment rather than to a conventional taxonomic grouping. The list of these taxids is available in the 'Navigation' tab from the following ENA Browser page: http://www.ebi.ac.uk/ena/data/view/Taxon:408169

Or as a result from the ENA Advanced Search.

Which taxid should I use for a mixed community of organisms (e.g. for 16S rRNA sequencing)?

All samples of mixed communities for the purposes of metagenomics, metatranscriptomics, diversity analysis, etc., must use environmental taxids. Please refer to the previous answer for instructions on how to obtain the list of environmental taxids.

Submission-related questions

Can I upload my read data files and XML into my drop box?

The submission process consists of two steps. First you need to upload your data files only into your drop box. After this metadata is submitted interactively using SRA Webin or programmatically using SRA Rest. More information about submissions is available here.

Which compression formats do you accept for Fastq files?

We only accept GZIP and BZIP2 compression formats. Especially we don't accept 7-ZIP or TAR compressed files. More information about supported Fastq file formats is available here.

Can I submit multiplexed read data? If not, how do I demultiplex?

ENA does not currently accept multiplexed data. Submitted data files must only contain reads from a single sample. Demultiplexing can be done using tools such as sfffile.

Further instructions for demultiplexing sff files is available here.

I tried to upload my files using FTP but failed because of '553 Could not create file' error. What should I do?

Most command line FTP clients use passive FTP data transfer that may be blocked by local firewalls in their default configurations. Other FTP clients, such as FileZilla, use active FTP data channels, which are by default allowed by firewalls.

Please contact your local systems administrator to allow passive FTP data transfers or use an FTP client that supports active FTP data tranfer. Instructions on using FileZilla to upload data are available here.

How can I edit my submitted records?

You can edit your records using SRA Webin.

I have received an error notification by email about my submission: "we have found problems with one or more of your submitted files"

The email will include a table of your files and the associated errors. The table has the following columns:
FILE_NAME | ERROR | MD5 | FILE_SIZE | DATE | RUN_ID/ANALYSIS_ID

The most common error in the ERROR column is "missing file" or "invalid file checksum". During your submission you provided an md5 checksum for each file (or if using our ENA uploader client it was calculated automatically and you may not have noticed).

If the file transfer is interrupted or not 100% complete then the file will no longer have the same md5sum as the original file. We need to pick these up because we will not be able to process broken or partial files. "missing file" simply means that there is no file in your ftp area with the same name that is in the FILE_NAME column AND with an md5sum matching the one that was registered (MD5 column).

You should take 1 of 2 possible actions:

  1. The transfer was not 100% successful and the file is corrupted. Solution: please try the upload again and ensure that you are using ftp in binary mode (especially for compressed files)
  2. You may have registered the wrong md5sum. Check your local copy of the file and compare the md5sum with the one in the MD5 column of the table.

    If it does not match then you should update the metadata for the file. You can do this in Webin by finding the run in the 'Runs' tab and selecting 'edit' or by finding the analysis (e.g. genome assembly) from the 'Analyses' tab and selecting ‘edit’.

  3. image

Search-related questions

How do I search for all RNA-Seq experiments from bovine samples?

  1. Go to the Advanced Search in the ENA Browser
  2. Select 'Read' domain
  3. In the 'Taxon name' field, enter: Bos taurus
  4. In the 'Library strategy' field, select: RNA-Seq

Genome assembly questions

For questions related to genome assembly submission, please go here

Latest ENA news

11 Oct 2017: Read data download issues resolved

Read data download issues previously affecting ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk services now resolved.

06 Oct 2017: ENA read data download issues

Issues with read data download from ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk

04 Oct 2017: ENA Release 133

Release 133 of ENA's assembled/annotated sequences now available