CRAM reference registry

The CRAM reference registry provides access to reference sequences used in CRAM files. The reference registration component of the service is intended for use by those groups working with data that are ultimately destined for publication, while the reference retrieval component of the service is intended for broad use. Retrieval of reference sequences from the CRAM reference registry is provided by MD5 or SHA1 checksum and through no other means. Please note that all sequences in the registry are fully public but are not retrievable without knowing the MD5 or SHA1 checksums. The CRAM reference registry is provided for free. Please contact us at datasubs@ebi.ac.uk should you require write access to the registry.

Retrieval of reference sequences

Retrieval is supported by using MD5 or SHA1 checksums using the following URLs:

www.ebi.ac.uk/ena/cram/md5/<hashvalue>
www.ebi.ac.uk/ena/cram/sha1/<hashvalue>

 

Calculation of MD5 or SHA1 checksum

Reference sequences are uppercased for checksum calculation (as in BAM). The followings alphabet is supported:

Code Base
A A
C C
G G
T T
U U
R A or G
Y C, T or U
K G, T or U
M A or C
S C or G
W A, T or U
B not A (i.e. C, G, T or U)
D not C (i.e. A, G, T or U)
H not G (i.e., A, C, T or U)
V neither T nor U (i.e. A, C or G)
N A C G T U

Programmatic submission of reference sequences

Reference sequences must be submitted in gzipped Fasta files using Analysis XML. Most common Fasta file representations are supported. Empty lines are ignored. Header lines are treated as separators between sequences but otherwise ignored. Wrapped sequences are supported. More information about programmatic submission can be found here.

An example of an acceptable Fasta file is:

>...
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
>...
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG

 

The submitted Analysis XML must be of analysis type REFERENCE_SEQUENCE, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
  <ANALYSIS alias="TODO: unique name for analysis" >
    <TITLE>TODO: a title for the analysis</TITLE> 
    <ANALYSIS_TYPE>
      <REFERENCE_SEQUENCE/>
    </ANALYSIS_TYPE>
    <FILES>
      <FILE filename="TODO: FILENAME.fasta.gz" filetype="fasta" checksum_method="MD5"
        checksum="TODO: MD5 CHECKSUM"/>
    </FILES>
  </ANALYSIS>
</ANALYSIS_SET>

Latest ENA news

27 Apr 2017: New ENA discovery API

ENA has launched a new API to support programmatic search across all data types: https://www.ebi.ac.uk/ena/portal/api

03 Apr 2017: ENA Release 131

Release 131 of ENA's assembled/annotated sequences now available