News

1 Jul 2014 ENA release 120

Release 120 of assembled/annotated sequences from the European Nucleotide Archive (ENA) is now available on the EBI public ftp server at ftp://ftp.ebi.ac.uk/pub/databases/embl/release/  and also at ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/.

It contains 441,582,575 sequences comprising 883,726,016,221 nucleotides. You can see the full release notes at:http://bit.ly/LKFtrE.

ENA captures, preserves and presents the world's nucleotide sequence data. New content is included in ENA on a continuous basis and are distributed daily from our browser and RESTful service. The ENA assembled/annotated sequence release provides a quarterly snapshot of content in this important subset of ENA content.

Please note that we also provide the release in a new FTP location, ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/.
From R121 (September, 2014), this will be the primary FTP location for ENA releases.

See http://www.ebi.ac.uk/ena/about/news/change-ena-release-ftp-location-september-2014 for full details.

23 May 2014 Change to date format for advanced search
From Monday 16th June 2014 there will be a change in the date format supported by ENA's advanced search to ISO format as shown below.  Note that ISO date ranges and time will not be supported for searching, but date ranges will be given where available in the tabulated reports.
 
Single date format (for search and report):  YYYY-MM-DD
Example:  collection_date < 2014-04-01
 
Date range format (for report only):  YYYY-MM-DD/YYYY-MM-DD
Example: 2010-01-01/2010-01-31
20 May 2014 Update to the ENA SAMPLE checklist

From 10th of June 2014 the ENA SAMPLE checklist XML will be updated and the older version will be deprecated.

The updated ENA SAMPLE checklist XML will:

(a) support the version 4.0 of the Genomic Standards Consortium standard on Minimum Information about any (x) Sequence (GSC MIxS),
     which includes a new environmental package, the GSC MIxS built environment, as well as an update to attribute tags, attribute
     definitions and recommended measurement units. Even though this is a major GSC MIxS update it does not changes compliance
     of existing MIGS, MIMS or MIMARKS records.

(b) activate the 'GSC MIxS miscellaneous natural or artificial environment' checklist designed for description of molecular samples where
     no other GSC-defined environmental package is appropriate.

(c) introduce the 'ENA Tara Oceans' checklist designed for description of molecular samples acquired during the Tara Oceans Expedition.

(d) update the 'ENA Micro B3' checklist designed for description of molecular samples acquired during the Ocean Sampling Day (OSD) Campaign.

(e) incorporate recent upgrade to the 'ENA GMI report' checklist designed for description of pathogen samples for the
     Global Microbial Identifier (GMI) reporting system.

13 May 2014 MD5 sequence checksums in Coding flat files

MD5 sequence checksums are planned to be included in Coding flat files on 11th of June.

The MD5 checksum will be available on the DR line, for example:

DR MD5; 5830b6060dc4ec4602fc6e5629505072.

Please contact datasubs@ebi.ac.uk if you have any questions of concerns about this change.

27 Mar 2014 Retirement of legacy file report URL in July 2014

From 1st July 2014, we will be retiring the following report URLs
http://www.ebi.ac.uk/ena/data/view/reports/sra/fastq_files/<accession>
http://www.ebi.ac.uk/ena/data/view/reports/sra/submitted_files/<accession>

In their place, you should use the new URL outlined here.

This new method of fetching file information provides flexibility as to which data are contained within the report (you decide which data you want to be returned). The new file report URL will generate the reports faster than the older URLs, and is also expected to be more reliable. The format of the report output, described here will also undergo some changes at this point.

18 Mar 2014 Change of ENA release FTP location in September 2014

Starting from March 2014 (release 119) onwards the release of assembled/annotated sequences from the European Nucleotide Archive (ENA) is available both in a new FTP location:

ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release

as well as in the old FTP location:

ftp://ftp.ebi.ac.uk/pub/databases/embl/release

The directory structure in the new FTP location is slightly different from the old FTP location.

Starting from September 2014 (release 121) onwards the old FTP location will be symlinked to the new FTP location. Consequently, in September 2014 the directory structure in the old FTP location will change to match the one in the new FTP location.

In the new FTP location, documents and auxiliary files are made available in a separate sub-directory:

ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/doc

The CON (constructed) and STD (standard) sequences will also be in separate sub-directories:

ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/con

ftp://ftp.ebi.ac.uk/pub/databases/ena/sequence/release/std

If you have any questions or concerns about the change please do not hesitate to contact datasubs@ebi.ac.uk.

17 Mar 2014 ENA release 119

Release 119 of assembled/annotated sequences from the European Nucleotide Archive (ENA) is now available on the EBI public ftp server at

ftp://ftp.ebi.ac.uk/pub/databases/embl/release/.

It contains 393,460,058 sequences comprising 783,467,257,469 nucleotides. You can see the full release notes at: http://bit.ly/LKFtrE.ENA captures, preserves and presents the world's nucleotide sequence data. New content is included in ENA on a continuous basis and are distributed daily from our browser and RESTful service. The ENA assembled/annotated sequence release provides a quarterly snapshot of content in this important subset of ENA content.

19 Feb 2014 NGS data CRAMming, Archiving & Exploring course on April 2nd 2014

NGS data CRAMming, Archiving & Exploring course at EMBL-EBI, Cambridge on 2nd of April 2014. http://www.ebi.ac.uk/training/course/ENA_April2014 Learn about data standards, the ENA data model, compression of NGS data, large-scale data management, submission tools, data retrieval, APIs and more... Please pass this invitation on if appropriate. To avoid disappointment, please book your place as early as possible.

14 Feb 2014 CRAM 2.1

The CRAM specification has been updated to version 2.1 to allow for "end of file" marker. The change is expected to be compatible with existing tools.

Please see the specification for more details:
https://github.com/enasequence/cramtools/blob/master/CRAMFIleFormat2.1.pdf?raw=true.

The java implementation has been updated accordingly: https://github.com/enasequence/cramtools.

A freshly build jar file can be downloaded here:https://github.com/enasequence/cramtools/blob/master/cramtools-2.1.jar?raw=true.

Please note that for files with pre-2.1 versions there may be a warning about missing EOF marker. For files with version 2.1 or later a missing EOF marker should cause an error. Please report any issues on github issue tracker (https://github.com/enasequence/cramtools/issues?state=open) or to cram-dev@ebi.ac.uk mailing list.

4 Feb 2014 Change to coding results in ENA Browser

On Thursday 6th February (between 10am and 11am), the advanced search is being updated to replace the current single coding result with two new results: one for release and one for update. This will impact searches performed on the coding and marker domains, as well as the results listed in the Taxonomy and Project portals.

For programmatic users, note that the current coding result ID (sequence_coding) will no longer be supported and in its place you should use the coding_release and/or coding_update result IDs.

29 Jan 2014 Webin authentication

From 29 January 2pm onwards submitters may login into all ENA's submission systems (Webin) using the same user name and password.

We recommend the use of Webin submission account name (Webin-<number>) as the user name. However, e-mail address or era-drop-<number> FTP account name are also supported.

All new submitters will be instructed to use our new FTP server at webin.ebi.ac.uk (port 8021) to upload files.

Users of the Webin Data Upload application will be transparently directed to this new FTP service.

We will also continue to support file uploads using the existing era-drop-<number> accounts at ftp.sra.ebi.ac.uk.

Programmatic submitters may choose between the existing authentication method:

auth=ERA era-drop-<number> <password digest>

and a new method:

auth=ENA<user> <password>

The latter will allow authentication using the Webin-<number> account name or an e-mail address associated with the submission account.

Please note that password reset requests made through the interactive Webin application will not affect the password required to authenticate as the era-drop-<number> FTP user through the programmatic interface (auth=ERA) or through ftp.sra.ebi.ac.uk.

If you have any questions or experience any difficulties after the maintenance please contact us at datasubs@ebi.ac.uk.

10 Jan 2014 Change in CDS FTP products

CDS flat file and fasta FTP products are being replaced by new products. Old products to be deprecated by end of March 2014.

We have introduced a new coding flat file and fasta products in:

ftp://ftp.ebi.ac.uk/pub/databases/ena/coding

By the end of March 2014, these are intended to replace the existing CDS products in:

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds

The format of the flat files remains the same.

The primary improvement is the separation of release and daily update products to allow
faster continues mirroring of the data.

Coding data in the most recent release is available here (updated every three months):

ftp://ftp.ebi.ac.uk/pub/databases/ena/coding/release

Coding data created or updated after the most recent release is available here (updated daily):

ftp://ftp.ebi.ac.uk/pub/databases/ena/coding/update

Please note that the suffix for flat files in the new product is: cds.gz (was dat.gz in the old product).

The suffix for fasta files in the new product is: cds.fasta.gz.

If you have any questions or concerns please do not hesitate to contact us on datasubs@ebi.ac.uk.

More information about FTP products is available from:

http://www.ebi.ac.uk/ena/about/sequence_download

17 Dec 2013 ENA release 118

Release 118 of assembled/annotated sequences from the European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/) is now available on the EBI public ftp server at ftp://ftp.ebi.ac.uk/pub/databases/embl/release/. It contains 332,944,272 sequences comprising 714,448,074,322 nucleotides. You can see the fullrelease notes at:http://bit.ly/LKFtrE.

ENA captures, preserves and presents the world's nucleotide sequence data. New content is included in ENA on a continuous basis and are distributed daily from our browser and RESTful service. The ENA assembled/annotated sequence release provides a quarterly snapshot of content in this important subset of ENA content.

29 Oct 2013 First bulk CRAM submission to ENA

The first large-scale read data set in the CRAM compressed format has been submitted, undergone processing and been made public at ENA. Using CRAM in lossless mode, the submission represents a pre-publication data release from the Wellcome Trust Sanger Insitute and comprises around 4,000 run records covering a number of pathogen species. Data are available for download in both CRAM and FASTQ formats. The data set in CRAM format consumes 80% of the disk space or network bandwidth for download required for its gzipped FASTQ equivalent.

An example of a study in this data set can seen here.

Further information on CRAM sequence data compression technology is available here. The starting point for data submissions to ENA, including those in CRAM format, is here.

12 Sep 2013 ENA release 117

Release 117 of assembled/annotated sequences from the European Nucleotide Archive (ENA) is now available on the EBI public ftp server at ftp://ftp.ebi.ac.uk/pub/databases/embl/release/.
It contains 324,201,244 sequences comprising 670,004,320,378 nucleotides. You can see the full release notes at: http://bit.ly/LKFtrE.

ENA captures, preserves and presents the world's nucleotide sequence data. New content is included in ENA on a continuous basis and are distributed daily from our browser and RESTful service.
The ENA assembled/annotated sequence release provides a quarterly snapshot of content in this important subset of ENA content.

2 Jul 2013 Change to EMBL-Bank flat file AC lines

Use of AC lines to denote 'WGS/TSA set membership' in addition to 'record replacement' has been a frequent cause of confusion among our users. To solve this we have moved WGS/TSA master accession numbers from AC lines to DR lines.
E.g.:
Before the change:
AC CAJV010000001; CAJV010000000; CAJV000000000;

After the change:
AC CAJV010000001;
DR ENA; CAJV010000000; SET.
DR ENA; CAJV000000000; SET.
The DR line format is:

DR ENA; <WGS/TSA master accession number with set version>; SET
DR ENA; <WGS/TSA master accession number without set version>; SET
If you have any questions please do not hesitate to contact us at datasubs@ebi.ac.uk.

27 Jun 2013 ENA release 116

Release 116 of assembled/annotated sequences from the European Nucleotide Archive (ENA) is now available on the EBI public ftp server at ftp://ftp.ebi.ac.uk/pub/databases/embl/release/.
It contains 309,513,621 sequences comprising 616,248,681,624 nucleotides. You can see the full release notes at: http://bit.ly/LKFtrE.

ENA captures, preserves and presents the world's nucleotide sequence data. New content is included in ENA on a continuous basis and are distributed daily from our browser and RESTful service. The ENA assembled/annotated sequence release provides a quarterly snapshot of content in this important subset of ENA content.

13 Jun 2013 CRAM 2.0 launch
26 Mar 2013 CRAM 2.0 pre-launch

CRAM 2.0 release candidate made available and two month review period announced.

Today sees the pre-launch of CRAM 2.0 and an announcement of the expected full launch date as the 1st of June, 2013.

This pre-launch comprises the publication of a release candidate for CRAM 2.0, our sequence data compression format, its supporting software toolkit and the CRAM reference registry

CRAM 2.0 contains numerous improvements over CRAM 1.0 made possible by active community participation. Minor modifications to the format are possible during the two month review period that now begins. 

CRAM 1.0 will be superseded by CRAM 2.0 at time of full launch.

More information about CRAM can be found here.

13 Feb 2013 Changes to EMBL-CDS FTP products

The introduction of ENA Advanced Search and other recent service improvements have made a number of EMBL-CDS FTP products redundant. Any remaining users are asked to contact datasubs@ebi.ac.uk to discuss alternative ways to retrieve this information.

If you have any questions please do not hesitate to contact datasubs@ebi.ac.uk.

Changes to EMBL-CDS flat file format

  • Removal of CRC32 checksum from SQ lines. After this change the EMBL-CDS flat file SQ line will be identical to EMBL-Bank flat file SQ line.
  • Removal of OX line. Please use /db_xref="taxon:" in the source feature instead as in EMBL-Bank flat files.
  • Removal of /func_charaterized qualifier.

Changes to EMBL-CDS FTP reports

The following reports will be discontinued.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/numbers.txt

This file contains the number of protein_ids and the number of unique CRC32 checksums, for example:

protein_ids# crc32#
------------ --------
    41700210 29242014

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/README.txt

The README file explains the reports in the directory.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_exon.gr.gz

A grouping by shared exons. For example:

ID   A00033.1:1..1155
CAA00001
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_gene.gr.gz

A grouping by gene name. For example:

ID   00011b03
CAR63565
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_inference.gr.gz

A grouping by /inference qualifier. For example:

ID   alignment:clustalx:2.0:insd|p29289.1, insd|p29290.1,insd|p29291.1
ACS36539
ACS36540
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_species.gr.gz

A grouping by organism name, as seen in the OS line of the entry. For example:

ID   'Azulnota' sp. 'punctana'
AEZ87687
AEZ87688
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_characterised.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where /func_characterised qualifier is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_complete.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where the CDS sequence is complete.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_experimental.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where /experiment qualifiers is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_characterised.lis.gz

List of CDS accession numbers extracted from the wgs section of the database where /func_characterised qualifier is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_complete.lis.gz

List of CDS accession numbers extracted from the wgs section of the database where the CDS sequence is complete.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_experimental.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where /experiment qualifiers is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_withNs.lis.gz

List of CDS accession numbers extracted from the wgs section which contains at least one completely ambiguous codon.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_withNs.lis.gz

List of CDS accession numbers extracted from the non-wgs section which contains at least one completely ambiguous codon.

20 Nov 2012 CRAM launches...

Today sees the launch of the CRAM compression software and format. CRAM provides not only powerful compression through its lossless and lossy models, but also supports full computational access to data in compressed form.

See the full news item here and find out more about CRAM here.

9 Nov 2012 RESTful and Query builder interfaces to ENA launched in beta

New tools to discover and retrieve the world’s nucleotide sequence data have been launched today, in early beta, based on a new custom data warehouse that brings together the diverse content of ENA.

The Query builder function, a web interface for the construction of powerful queries, provides interactive access, while the RESTful interface supports programmatic calls to the warehouse. The Query builder is available from the ‘Advanced search’ tab on any search results page and is documented, along with the RESTful interface here.

These two interfaces will in due course be complemented with a third warehouse-based search interface that offers more intuitive access with much of the search power.

We encourage users to send feedback on this and other ENA services to datasubs@ebi.ac.uk.

3 Aug 2012 Approaching full production for CRAM

Since our proof-of-principle publication of reference-based sequence read data compression in 2011 (http://genome.cshlp.org/content/21/5/734), the EBI has been moving towards a production release of CRAM, a framework technology comprising file format and toolkit in which we combine highly efficient and tunable compression with a data format that is directly available for computational use (http://www.ebi.ac.uk/ena/about/cram_toolkit). Details of our plans are provided here.

13 Jul 2012 Just published: 'The future of DNA sequence archiving'

Our latest paper has been published in the inaugural issue of Gigascience. In this commentary, our starting point is the existence of a viable compression method, CRAM - currently in late beta and soon to be released in full production - in which compression can be applied with varying levels of intensity to different data sets. Our goal with the paper is to stimulate the broadest possible community discussion about exactly where the most aggressive forms of compression should be applied and where the approach should be far more cautious.

25 Apr 2012 CRAM 0.8 released

CRAM toolkit 0.8 has been released. More information with download, installation and usage instructions are available here.

25 Apr 2012 ENA policy relating to compression of submitted data

ENA details its policy on use of CRAM raw sequence data compression during the archiving process.

7 Mar 2012 CRAM toolkit 0.7 released

CRAM toolkit 0.7 has been released. More information with download, installation and usage instructions are available here.

2 Mar 2012 The future of sequence archiving

The future of sequence archiving and the role of data compression is explored in a new paper from EBI to be published in Gigascience. Planned for the first issue of the journal, the paper is available pre-publication here. In the paper, we propose a way forward through the use a graded system in which the ease of reproduction of a sequencing-based experiment and the relative availability of a sample for resequencing be used as a means to define the level of lossy compression to apply to the stored data.

13 Feb 2012 CRAM toolkit 0.6 released

CRAM toolkit version 0.6 has been released. More information with download, installation and usage instructions are available here.

20 Jan 2012 ENA training videos now available on the EMBL-EBI YouTube Channel

Train yourself on Sequence Read Archive submissions theory and practice with ENA training videos now uploaded to the EMBL-EBI YouTube Channel for simpler access. We believe that such videos are useful additions to our support services and plan to cover more areas of ENA as time goes by. Contact us at datasubs@ebi.ac.uk with suggestions of ENA services that you would like to see covered.

 

6 Jan 2012 ENA data go Galactic

Next generation sequence data from ENA's Sequence Read Archive (SRA) are now available as a data source in the Galaxy analysis system.

SRA data can be browsed and selected for upload using the Galaxy 'Get Data' tool. The 'EBI SRA' data source forwards users to ENA's pages to upload data into Galaxy.

In ENA, links are provided to Galaxy that upload data into the system and launch a session. For example, the following URL provides a list of all fastq files associated with SRA study ERP000591: http://www.ebi.ac.uk/ena/data/view/ERP000591. The Galaxy file upload links are available in the column 'Galaxy'.

4 Nov 2011 CRAM toolkit 0.5 released

CRAM toolkit version 0.5 has been released with support for several new quality budget models, random access index, and support for the Picard API. More information with download, installation and usage instructions are available here.

3 Aug 2011 CRAM toolkit 0.3 released

CRAM toolkit version 0.3 has been released with support for read quality masking (selective preservation of base quality scores). More information with download, installation and usage instructions are available here.

16 Mar 2011 ArchiveBAM 1.0 specification

The ArchiveBAM 1.0 specification has been published. SRA submitters are adviced to submit their data using the BAM format.

The specification is available here.

11 Mar 2011 ENA User Survey 2011 now available

Have your say and help us to improve ENA in our brief survey at http://www.surveymonkey.com/s/ENA_User_Survey_2011.

7 Mar 2011 Next Generation Sequencing Workshop at EBI

Spaces still available on the SRA user course: EBI Affiliated Course - Next Generation Sequencing Workshop 2011. The course will take place at EBI on the 4th-6th April 2011.

Further details are available here.

26 Jan 2011 EMBL-EBI and Sequence Read Archive

EMBL-EBI will continue to support the Sequence Read Archive for raw data.

See here for further details.

6 Dec 2010 MIENS specification

The MIENS specification for describing marker genes was published in Nature Precedings.

The full article is available here.

9 Nov 2010 SRA article
An article describing the Sequence Read Archive (SRA) services was published in the Nucleic Acids Research 2011 Database Issue.
 
The full article is available here.
23 Oct 2010 ENA article
An article describing the European Nucleotide Archive (ENA) services was published in the Nucleic Acids Research 2011 Database Issue.
 
The full article is available here.