CRAM reference registry

The CRAM reference registry provides access to reference sequences used in CRAM files. The reference registration component of the service is intended for use by those groups working with data that are ultimately destined for publication, while the reference retrieval component of the service is intended for broad use. Retrieval of reference sequences from the CRAM reference registry is provided by MD5 or SHA1 checksum and through no other means. Please note that all sequences in the registry are fully public but are not retrievable without knowing the MD5 or SHA1 checksums. The CRAM reference registry is provided for free. Please contact us at datasubs@ebi.ac.uk should you require write access to the registry.

Retrieval of reference sequences

Retrieval is supported by using MD5 or SHA1 checksums using the following URLs:
www.ebi.ac.uk/ena/cram/md5/<hashvalue>
www.ebi.ac.uk/ena/cram/sha1/<hashvalue>

Calculation of MD5 or SHA1 checksum

Reference sequences are uppercased for checksum calculation (as in BAM). The followings alphabet is supported:

Code Base
A A
C C
G G
T T
U U
R A or G
Y C, T or U
K G, T or U
M A or C
S C or G
W A, T or U
B not A (i.e. C, G, T or U)
D not C (i.e. A, G, T or U)
H not G (i.e., A, C, T or U)
V neither T nor U (i.e. A, C or G)
N A C G T U

Programmatic submission of reference sequences

Reference sequences must be submitted in gzipped Fasta files using Analysis XML. Most common Fasta file representations are supported. Empty lines are ignored. Header lines are treated as separators between sequences but otherwise ignored. Wrapped sequences are supported. More information about programmatic submission can be found here.

An example of an acceptable Fasta file is:

>...
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
>...
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG

 

The submitted Analysis XML must be of analysis type REFERENCE_SEQUENCE, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
  <ANALYSIS alias="TODO: unique name for analysis" >
    <TITLE>TODO: a title for the analysis</TITLE> 
    <ANALYSIS_TYPE>
      <REFERENCE_SEQUENCE/>
    </ANALYSIS_TYPE>
    <FILES>
      <FILE filename="TODO: FILENAME.fasta.gz" filetype="fasta" checksum_method="MD5"
        checksum="TODO: MD5 CHECKSUM"/>
    </FILES>
  </ANALYSIS>
</ANALYSIS_SET>

Latest ENA News

20 Aug 2014: Read data through Globus GridFTP
Read data can now be downloaded using Globus GridFTP through ebi#ena Globus Online public endpoint.

18 Aug 2014: Changes to SRA XML 1.5
Small changes to Experiment XML, Analysis XML, EGA Dataset XML, EGA DAC XMLs were deployed on 11th of August 2014.

1 Jul 2014: ENA release 120
Release 120 of ENA's assembled/annotated seqences now available

23 May 2014: Change to date format for advanced search
From 16th June 2014, the date format used in the advanced search will be changed to ISO format (YYYY-MM-DD).

20 May 2014: Update to the ENA SAMPLE checklist
From 10th of June 2014 the ENA SAMPLE checklist XML will be updated and the older version will be deprecated.