Expanded INSDC sequence accession formats in December 2018

Expanded INSDC sequence accession formats in December 2018

18 Oct 2018 - 10:30

By the end of 2018, ENA and other INSDC members will expand the accession formats used for sequences. We have assigned almost all the possible accession numbers using the current, shorter formats. Using these longer formats will allow us to expand accession ranges and give us greater capacity.

The expanded format for Whole Genome Shotgun (WGS), Transcriptome Shotgun Assembly (TSA), and Targeted Locus Study (TLS) sequencing projects will use a six-letter prefix followed by a two-digit version number and 7, 8, or 9 digits (for example, AAAAAA020000001).

Non-WGS/TLS/TSA nucleotide sequences currently use a “2+6” format: a two-letter prefix followed by six digits. This format will be expanded to use eight digits.

Protein sequences currently use a “3+5” accession format: a three-letter prefix followed by five digits. This format will be expanded to use seven digits.

You will need to adjust your processing methods to accommodate these new expanded accessions. Please write to the ENA helpdesk (datasubs@ebi.ac.uk) with any questions about the new formats. Note that no pre-existing accessions will be affected by these changes.

Subscribe to the e-mail newsletter
Get a monthly round-up of the hottest news and features from EMBL, straight to your inbox.
Or stay updated with the RSS feed (EMBL-EBI only).