CRAM reference registry

The CRAM reference registry provides access to reference sequences used in CRAM files. The reference registration component of the service is intended for use by those groups working with data that are ultimately destined for publication, while the reference retrieval component of the service is intended for broad use. Retrieval of reference sequences from the CRAM reference registry is provided by MD5 or SHA1 checksum and through no other means. Please note that all sequences in the registry are fully public but are not retrievable without knowing the MD5 or SHA1 checksums. The CRAM reference registry is provided for free. Please contact us at datasubs@ebi.ac.uk should you require write access to the registry.

Retrieval of reference sequences

Retrieval is supported by using MD5 or SHA1 checksums using the following URLs:

www.ebi.ac.uk/ena/cram/md5/<hashvalue>
www.ebi.ac.uk/ena/cram/sha1/<hashvalue>

 

Calculation of MD5 or SHA1 checksum

Reference sequences are uppercased for checksum calculation (as in BAM). The followings alphabet is supported:

Code Base
A A
C C
G G
T T
U U
R A or G
Y C, T or U
K G, T or U
M A or C
S C or G
W A, T or U
B not A (i.e. C, G, T or U)
D not C (i.e. A, G, T or U)
H not G (i.e., A, C, T or U)
V neither T nor U (i.e. A, C or G)
N A C G T U

Programmatic submission of reference sequences

Reference sequences must be submitted in gzipped Fasta files using Analysis XML. Most common Fasta file representations are supported. Empty lines are ignored. Header lines are treated as separators between sequences but otherwise ignored. Wrapped sequences are supported. More information about programmatic submission can be found here.

An example of an acceptable Fasta file is:

>...
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
>...
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG

 

The submitted Analysis XML must be of analysis type REFERENCE_SEQUENCE, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
  <ANALYSIS alias="TODO: unique name for analysis" >
    <TITLE>TODO: a title for the analysis</TITLE> 
    <ANALYSIS_TYPE>
      <REFERENCE_SEQUENCE/>
    </ANALYSIS_TYPE>
    <FILES>
      <FILE filename="TODO: FILENAME.fasta.gz" filetype="fasta" checksum_method="MD5"
        checksum="TODO: MD5 CHECKSUM"/>
    </FILES>
  </ANALYSIS>
</ANALYSIS_SET>

Latest ENA news

01 Jul 2015: ENA release 124
Release 124 of ENA's assembled/annotated sequences now available

20 Jun 2015: Sample Checklist Updates - June 2015
ENA are planning to update several sample metadata reporting checklists. Some of these changes have been carried out for harmonisation of attributes/fields between various checklist. Other changes were made to allow a standardised missing/null value reporting. All changes will come into effect as of 3rd August 2015.