Changes to EMBL-CDS FTP products

Date: 
Wed, 2013-02-13

The introduction of ENA Advanced Search and other recent service improvements have made a number of EMBL-CDS FTP products redundant. Any remaining users are asked to contact datasubs@ebi.ac.uk to discuss alternative ways to retrieve this information.

If you have any questions please do not hesitate to contact datasubs@ebi.ac.uk.

Changes to EMBL-CDS flat file format

  • Removal of CRC32 checksum from SQ lines. After this change the EMBL-CDS flat file SQ line will be identical to EMBL-Bank flat file SQ line.
  • Removal of OX line. Please use /db_xref="taxon:" in the source feature instead as in EMBL-Bank flat files.
  • Removal of /func_charaterized qualifier.

Changes to EMBL-CDS FTP reports

The following reports will be discontinued.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/numbers.txt

This file contains the number of protein_ids and the number of unique CRC32 checksums, for example:

protein_ids# crc32#
------------ --------
    41700210 29242014

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/README.txt

The README file explains the reports in the directory.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_exon.gr.gz

A grouping by shared exons. For example:

ID   A00033.1:1..1155
CAA00001
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_gene.gr.gz

A grouping by gene name. For example:

ID   00011b03
CAR63565
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_inference.gr.gz

A grouping by /inference qualifier. For example:

ID   alignment:clustalx:2.0:insd|p29289.1, insd|p29290.1,insd|p29291.1
ACS36539
ACS36540
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/by_species.gr.gz

A grouping by organism name, as seen in the OS line of the entry. For example:

ID   'Azulnota' sp. 'punctana'
AEZ87687
AEZ87688
//

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_characterised.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where /func_characterised qualifier is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_complete.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where the CDS sequence is complete.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_experimental.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where /experiment qualifiers is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_characterised.lis.gz

List of CDS accession numbers extracted from the wgs section of the database where /func_characterised qualifier is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_complete.lis.gz

List of CDS accession numbers extracted from the wgs section of the database where the CDS sequence is complete.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_experimental.lis.gz

List of CDS accession numbers extracted from the non-wgs section of the database where /experiment qualifiers is present.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_wgs_withNs.lis.gz

List of CDS accession numbers extracted from the wgs section which contains at least one completely ambiguous codon.

ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/lists/cds_withNs.lis.gz

List of CDS accession numbers extracted from the non-wgs section which contains at least one completely ambiguous codon.