CRAM reference registry

The CRAM reference registry provides access to reference sequences used in CRAM files. The reference registration component of the service is intended for use by those groups working with data that are ultimately destined for publication, while the reference retrieval component of the service is intended for broad use. Retrieval of reference sequences from the CRAM reference registry is provided by MD5 or SHA1 checksum and through no other means. Please note that all sequences in the registry are fully public but are not retrievable without knowing the MD5 or SHA1 checksums. The CRAM reference registry is provided for free. Please contact us at datasubs@ebi.ac.uk should you require write access to the registry.

Retrieval of reference sequences

Retrieval is supported by using MD5 or SHA1 checksums using the following URLs:

https://www.ebi.ac.uk/ena/cram/md5/<hashvalue>
https://www.ebi.ac.uk/ena/cram/sha1/<hashvalue>

 

Calculation of MD5 or SHA1 checksum

Reference sequences are uppercased for checksum calculation (as in BAM). The followings alphabet is supported:

Code Base
A A
C C
G G
T T
U U
R A or G
Y C, T or U
K G, T or U
M A or C
S C or G
W A, T or U
B not A (i.e. C, G, T or U)
D not C (i.e. A, G, T or U)
H not G (i.e., A, C, T or U)
V neither T nor U (i.e. A, C or G)
N A C G T U

Programmatic submission of reference sequences

Reference sequences must be submitted in gzipped Fasta files using Analysis XML. Most common Fasta file representations are supported. Empty lines are ignored. Header lines are treated as separators between sequences but otherwise ignored. Wrapped sequences are supported. More information about programmatic submission can be found here.

An example of an acceptable Fasta file is:

>...
ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT
>...
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG

 

The submitted Analysis XML must be of analysis type REFERENCE_SEQUENCE, e.g.:

<?xml version="1.0" encoding="UTF-8"?>
<ANALYSIS_SET>
  <ANALYSIS alias="TODO: unique name for analysis" >
    <TITLE>TODO: a title for the analysis</TITLE> 
    <ANALYSIS_TYPE>
      <REFERENCE_SEQUENCE/>
    </ANALYSIS_TYPE>
    <FILES>
      <FILE filename="TODO: FILENAME.fasta.gz" filetype="fasta" checksum_method="MD5"
        checksum="TODO: MD5 CHECKSUM"/>
    </FILES>
  </ANALYSIS>
</ANALYSIS_SET>

Latest ENA news

11 Oct 2017: Read data download issues resolved

Read data download issues previously affecting ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk services now resolved.

06 Oct 2017: ENA read data download issues

Issues with read data download from ftp.sra.ebi.ac.uk and fasp.sra.ebi.ac.uk

04 Oct 2017: ENA Release 133

Release 133 of ENA's assembled/annotated sequences now available