CRAM reference registry
The CRAM reference registry provides access to reference sequences used in CRAM files. The reference registration component of the service is intended for use by those groups working with data that are ultimately destined for publication, while the reference retrieval component of the service is intended for broad use. Retrieval of reference sequences from the CRAM reference registry is provided by MD5 or SHA1 checksum and through no other means. Please note that all sequences in the registry are fully public but are not retrievable without knowing the MD5 or SHA1 checksums. The CRAM reference registry is provided for free. Please contact us at firstname.lastname@example.org should you require write access to the registry.
Retrieval of reference sequences
Retrieval is supported by using MD5 or SHA1 checksums using the following URLs:
Calculation of MD5 or SHA1 checksum
Reference sequences are uppercased for checksum calculation (as in BAM). The followings alphabet is supported:
|R||A or G|
|Y||C, T or U|
|K||G, T or U|
|M||A or C|
|S||C or G|
|W||A, T or U|
|B||not A (i.e. C, G, T or U)|
|D||not C (i.e. A, G, T or U)|
|H||not G (i.e., A, C, T or U)|
|V||neither T nor U (i.e. A, C or G)|
|N||A C G T U|
Programmatic submission of reference sequences
Reference sequences must be submitted in gzipped Fasta files using Analysis XML. Most common Fasta file representations are supported. Empty lines are ignored. Header lines are treated as separators between sequences but otherwise ignored. Wrapped sequences are supported. More information about programmatic submission can be found here.
An example of an acceptable Fasta file is:
>... ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT >...
The submitted Analysis XML must be of analysis type REFERENCE_SEQUENCE, e.g.:
<?xml version="1.0" encoding="UTF-8"?> <ANALYSIS_SET> <ANALYSIS alias="TODO: unique name for analysis" > <TITLE>TODO: a title for the analysis</TITLE> <ANALYSIS_TYPE> <REFERENCE_SEQUENCE/> </ANALYSIS_TYPE> <FILES> <FILE filename="TODO: FILENAME.fasta.gz" filetype="fasta" checksum_method="MD5" checksum="TODO: MD5 CHECKSUM"/> </FILES> </ANALYSIS> </ANALYSIS_SET>