spacer

EMBL - Archived Changes

Date Changes Status
June 2008 New Transcriptome Shotgun Assembly (TSA) dataclass
From June 2008, EMBL will introduce a new dataclass for Transcriptome Shotgun Assembly (TSA) data. TSA entries will be available as part of future releases, update products and srs (srs.ebi.ac.uk) during 2008. The structure of TSA entries is similar to that of TPA entries, but with a modified AH line; instead of 'TPA_SPAN', the term 'LOCAL_SPAN' will be used.
Done
December 2008 AH line change in TPA entries
In December 2008, the AH line in TPA entries will be changed such that'TPA_SPAN' will be substituted with 'LOCAL_SPAN' to be uniform across all dataclasses with AH lines.
Done
October 2008 Changes to /mol_type qualifier
In October 2008, a new /mol_type qualifier value, 'transcribed RNA' will be added and values 'snoRNA', 'snRNA', 'scRNA', 'pre-RNA' and 'tmRNA' will be dropped.
Done
Now and December 2008 Change to information content of AC lines
In some entries, AC lines cite not only primary or primary and secondary accession numbers, but also cite accession numbers of CON entries that have been assembled from the entry in which the AC line appears. In order to render consistent usage of the AC line, we will disallow references to CON entries henceforth and remove legacy instances over by December 2008.
Done
December 2008 Removal of <er> publication type in RL lines
In December 2008, the electronic publication token in RL lines ('') will become invalid. Legacy records will be converted to conventional article citations where possible.
Done
15th of October /specific_host qualifier will become /host qualifier
For clarity, the /specific_host qualifier will be replaced with a new qualifier, /host, on the 15th of October. Legacy records will be updated to reflect this change.
Done
October 2008 Removal of the /virion qualifier
The /virion qualifier will become illegal in October 2008. The /proviral qualifier will remain in use.
Done
October 2008 Change to the /frequency qualifier
In October 2008, we will permit value formats for the /frequency qualifier in addition to decimal fractions in order to represent sample size; an example of such a value format is '2 in 25'.
Done
October 2008 Removal of /cons_splice
In October 2008, the /cons_splice qualifier will become illegal.
Done
October 2008 Expected further feature table format changes
In October 2008, we expect further minor changes to a number of qualifiers. In particular, we expect to add the new qualifier, /mating_type and to modify usage of /gene, /germline, /inference. Details of changes will be made available on this page shortly.
Done
December 2007 New citation cross-reference resource, AGRICOLA, to appear on RX line
AGRICOLA is a bibliographic database of over 4 million publications and resources encompassing all aspects of agriculture and allied disciplines, including animal and veterinary sciences, entomology, plant sciences, forestry, aquaculture and fisheries, farming and farming systems, agricultural economics, extension and education, food and human nutrition, and earth and environmental sciences. AGRICOLA is maintained by the US National Agriculture Library (NAL) of the US Department of Agriculture (USDA). Please see http://agricola.nal.usda.gov/ for more details.
Done
December 2007 Change to FTP organisation, affecting ANN dataclass entries
Annotated constructed (ANN dataclass) entries have been included in the EMBL release from release 92. To reflect this, from December 2007, the annotated_con FTP directory (ftp://ftp.ebi.ac.uk/pub/databases/embl/annotated_con) will be organised in the same way as the CON and TPA directories (ftp://ftp.ebi.ac.uk/pub/databases/embl/con and ftp://ftp.ebi.ac.uk/pub/databases/embl/tpa, respectively). Release files will no longer be present in the annotated_con directory (but rather in the release directory as rel_ann_*.dat). List and cumulative files will be present in the annotated_con directory.
Done
December 2007 Change to WGS Master flatfile CON linetype
Currently, ANN and CON accession ranges in WGS Masters are reported separately. Because ANN and CON accession numbers can be interleaved this significantly increased the size of some WGS Masters (e.g. AACY00000000). Therefore, ANN accession ranges will be merged together with CON accession ranges in a single line, starting with text 'CON'.
Done
October 2007 New feature "tmRNA" and qualifier /tag_peptide
New feature "tmRNA" (definition: "transfer messenger RNA") is going to be introduced into the Feature Table document in October 2007, to be implemented in December 2007. "tmRNA" feature will have a new qualifier /tag_peptide.
Done
October/December 2007 New qualifiers /culture_collection and /bio_material
New qualifiers /culture_collection and /bio_material will be introduced into the Feature Table document in October 2007 and implemented in December 2007

New qualifiers structure: /culture_collection="<institution-code>:[<collection-code>:]<culture_id>" where <collection-code> token is optional /bio_material="[ <institution-code>:[<collection-code>:]]<material_id>&quot;
where <collection-code> and <institution-code> tokens are optional
Done
September 2007 Annotated CON (ANN) entries will be included in the quartely release starting from release 92 in September 2007. Done
September 2007 Line type change for expanded CON dataclass entries
CC lines are currently in use for the representation of assembly information in expanded CON dataclass entries. For consistency between unexpanded and expanded CON dataclass entries, assembly information will be represented in CO lines in expanded CON dataclass entries.
Done
October 2007 "old_sequence" feature becomes illegal for new entries
"old_sequence" feature becomes illegal for new entries starting from October 2007, with the new edition of the Feature Table Document
Done
October 2007 "5'clip" and "3'clip" features become illegal
"5'clip" and "3'clip" features become illegal starting from October 2007, with the new edition of the Feature Table Document. Existing instances of those features are going to be retrofitted.
Done
October 2007 /organism qualifier becomes illegal on "misc_recomb" feature
Qualifier /organism becomes illegal on "misc_recomb" feature starting from October 2007, with the new edition of the Feature Table Document
Done
October 2007 /operon qualifier becomes legal on "protein_bind" feature
Qualifier /operon becomes legal on "protein_bind" feature starting from October 2007, with the new edition of the Feature Table Document
Done
October 2007 /specimen_voucher qualifier becomes structured
/specimen voucher qualifier becomes structured starting from October 2007

New qualifier structure: /specimen_voucher="[<institution-code>:[<collection-code>:]]<specimen_id>"

Where both <collection-code> and <institution-code> tokens are optional.

Due to the optional nature of second and first tokens, no retrofit is required for the existing entries.
Done
October/December 2007 New feature "ncRNA" and qualifier /ncRNA_class
New feature "ncRNA" (definition: "a non-protein-coding gene, other than ribosomal RNA and transfer RNA, the functional molecule of which is the RNA transcript") is going to be introduced into the Feature Table document in October 2007, to be implemented in December 2007. This feature replaces scRNA, snRNA, snoRNA features; it also replaces misc_RNA feature where it is currently used to annotate microRNAs. "ncRNA" feature will have mandatory qualifier /ncRNA_class.
Done
March 2007 New XML attribute
A new XML attribute projectAccession will be introduced into EMBL XML entry element to contain INSDC -assigned ID for the sequencing projects.

At the same time, EMBL XML will start supporting entry types ANN (annotated constructed entry) and and TPA (Third Party Annotation entry)
Done
March 2007 March release of EMBL database - New line type for project ID's

New line type with two-character line type code PR, will be introduced into EMBL flatfiles with the March release of EMBL database. The line will contain INSDC-assigned ID for the sequencing project.

Line structure for the PR lines:

PR Project:17285;

where "17285" is the project identifier (integer)
Done
December 2006 Creation of a new division - Transgenic (TGN)
A new database taxonomic division, Transgenic (TGN), will be created in the December 2006 release. Entries representing transgenic organisms (indicated by the inclusion of the /transgenic qualifier in one of the source features), currently stored in the Synthetic (SYN) division, will be stored in the new TGN division.
Done
December 2006 New qualifier /mobile_element and dropping of two existing qualifiers
New qualifier /mobile_element will be introduced in December 2006 to hold type and name or identifier of the mobile element which is described by the parent feature . At the same time, two less generic qualifiers - /transposon and /insertion sequence are going to be dropped and all existing instances of them will be retrofitted to make use of the new qualifier.
Done
October 2006 Amino Acid Abbreviation Change
A single-letter amino acid abbreviation "O" will be used to represent pyrrolysine in the CDS translation starting from October 2006.
Done
October 2006 Usage of qualifier /operon
Qualifier /operon will become valid on the "rRNA" feature.
Done
19 June 2006 EMBL database release 87 will become public on Monday, 19th June (afternoon)
The data will be published in the following ftp directory ftp://ftp.ebi.ac.uk/pub/databases/embl/release Changes to the release file names and changes to ID line (described below) will be implemented in this release. The daily data distribution will start producing files with new-style ID line after release 87 is public; first distribution is scheduled to start on Monday 19th and first daily files with new format ID line will appear on the ftp on Tuesday, 20th June.
Done
June 2006 Release file names change

Starting from the EMBL release 87 June 2006 the naming of the release files will change in accordance to the new ID line structure (see relevant item). Data will be split will according to the data class and the taxonomic division.

Starting from EMBL release 87 June 2006 the naming of the release data files has changed. The data file names now looks as follows

rel_dtc_tax_nn_rRN.dat

where
"dtc" is a three lowercase letters abbreviation for the dataclas
"tax" is a three lowercase letters taxonomic division abbreviation
"nn" - number of the file in a particular sequence (starting from "01")
"RN" - number of the release where the file belongs

Examples:

rel_est_hum_01_r87.dat
rel_htg_mus_04_r87.dat

cum_est_hum_01_r87.dat
cum_htg_mus_04_r87.dat

Dataclass list : EST, GSS, HTC, HTG, PAT, STS, STD, TPA, CON
Taxonomic division list : HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV, INV, SYN, UNC, VRL, PHG
Filesize will be kept under 4 Gb by regulating the number of entries in each file.

File name change doesn't affect WGS data, indexes and accompanying documentation.
Done
June 2006

ID line changes

Now that release 87 (available since JUN-2006) the format of the EMBL flat file has undergone a change: the ID line now has a different structure (see below) and the SV line has been removed.

The changes affecting the ID line structure are:

  • All tokens will be separated by a semicolon.
  • The entry name will not be displayed, in its place there will be the primary accession number.
  • The sequence version will be indicated.
  • The topology will be a separate token and will be indicated for both circular and linear molecules.
  • Both the data class and the taxonomic divisions will be displayed.
This is an example of the new ID line:
ID   CD789012; SV 4; linear; genomic DNA; HTG; MAM; 500 BP.
       (1)     (2)     (3)      (4)       (5)  (6)   (7)

The tokens represent:
  1. Primary accession number.
  2. 'SV' + sequence version number.
  3. Topology: 'circular' or 'linear'.
  4. Molecule type.
  5. Data class (ANN, CON, PAT, EST, GSS, HTC, HTG, MGA, WGS, TPA, STS, STD, "normal" entries will have STD for standard).
  6. Taxonomic division (HUM, MUS, ROD, PRO, MAM, VRT, FUN, PLN, ENV, INV, SYN, UNC, VRL, PHG)."
  7. Sequence length + 'BP.'.

The entry name will not be displayed any more in the ID line. Since EMBL release 3 (Dec 1983) the stable identifier of an entry has been the primary accession number.

A mapping file (entryname to accession number) will be provided in the future for those entries where the entryname doesn't coincide with the accession number.

To give users a test dataset, one file with new-style ID lines called new_id_line.test.gz was provided together with the March release of the EMBL database. The file should be used for testing purposes only, i.e. the data contained in it shouldn't be considered a part of the release; the data will not be included into any of the release statistics.

In order to facilitate the changeover two small utilities were released: 'new2oldID.pl' and 'old2newID.pl'. They can be used to convert EMBL flat files from the old to the new format and vice-versa.

The converters can be picked up from ftp://ftp.ebi.ac.uk/pub/databases/embl/tools

In the same directory, a new version of SynCron tools for maintaining synchronised copies of the EMBL database updates can be found. Note : This version of SynCron will work only with the new ID line format. Please switch to it now that EMBL release 87 is public.

 

Done
April 2006 Changes to the Feature Table Document: Chapter 3.5 "Location"
  • the use of range (.) descriptor within location spans will no longer be legal from April 2006.
Done
From October 2005 Changes to the Feature Table Document: Chapter 3.5 "Location"
  • combinations of "join" and "order" operators in one location will be illegal from October 2005
  • the use of two identical location construction operators within one location will be illegal from October 2005
  • the usage of '^' will be restricted to adjacent nucleotides from October 2005
  • the use of range (.) descriptor within location spans will no longer be legal from April 2006.
Done
March 2006 File names change in ftp://ftp.ebi.ac.uk/pub/databases/embl/new/

Starting from the EMBL release 86 going public (March 2006) the cumulative (cumulative.dat.gz) file in:

ftp://ftp.ebi.ac.uk/pub/databases/embl/new/

will be split into smaller files according to the data class and the taxonomic division.

At the same time, daily distribution files (files with names like r##u###.dat.gz in ftp://ftp.ebi.ac.uk/pub/databases/embl/new/) will be renamed.

The page http://www.ebi.ac.uk/embl/Documentation/changesdetails.html contains further details.

Attention : the date for this change has been changed from December 2005 to March 2006
Done
March 2006 Release indices to be discontinued : March 2006 release of EMBL database

All release indices (files with names like *.ndx) apart from division.ndx are going to be discontinued starting from the March release of EMBL database. Feedback is sought from users (http://www.ebi.ac.uk/support/)
Done
March 2006 Qualifier order change : March 2006 release of EMBL database

The order in which qualifiers are printed within a feature is going to be changed in the March release of EMBL database. Further details will be posted at http://www.ebi.ac.uk/embl/Documentation/changesdetails.html
Done
March 2006 Changes - "source" feature to be added to all entries in EMBLCDS dataset
Shortly after the release 86 of the EMBL database in March 2006, all EMBLCDS entries will include the "source" feature with all the relevant biological source information derived from the parent EMBL entry. EMBLCDS dataset can be found at ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/ and is accessible via EBI SRS and webservices.
Done
Dec 2005 ORG division to be dissolved

From Dec 2005, the ORG division of the EMBL database will be dissolved. The entries which are now forming ORG division will be directed into the appropriate taxonomic divisions. For the continuity, org.dat file is going to be created and placed into ftp://ftp.ebi.ac.uk/pub/databases/embl/misc/ after each release.
Done
December 2005 The following new qualifiers were introduced into the "Feature Table document" in October 2005:
/experiment
/inference
/ribosomal_slippage
/trans_splicing
/lat_lon
/collected_by
/collection_date
/identified_by
/PCR_primers
/rpt_unit_seq
/rpt_unit_range

For the complete document, see http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

The new qualifiers and other changes are going to be fully implemented in December 2005, with the next release of the EMBL database.
Done
December 2005 TPA data to be included into the release

TPA (Third Party Annotation) data will be included into the EMBL release dataset starting from December Release 85.
Please see more information about the TPA at http://www.ebi.ac.uk/embl/Documentation/third_party_annotation_dataset.html
Done
September 2005 New prefix for the RL line

A new prefix (misc) will be introduced to mark those citations where no ISSN is assigned to the publication, such as proceedings and abstracts.

Example:

RL (misc) Proc. 7th Int. Symp. Biolumin. Chemilumin. 7:142-145(1993).
Done
June release
of
EMBL database
Weekly index files to be dropped

After the June 2005 release of the EMBL database weekly index files (files with names like DD-MMM-YYYY.NEW and file newentries.ndx in directory ftp://ftp.ebi.ac.uk/pub/databases/embl/new/) are not going to be produced any longer. If you think that your work is going to be affected by this, please write to us, using the feedback form at http://www.ebi.ac.uk/support/ (please select "EMBL" from the list of options).
Done
June release
of
EMBL database
MEDLINE identifiers to be dropped

Starting from the June 2005 release of EMBL database, MEDLINE identifiers will only be printed in the flatfiles when no corresponding PUBMED id is available.
In majority of cases it means that there will be no MEDLINE identifiers in the flatfiles, only PUBMED
Done
June release
of
EMBL database
A new division - "ENV" (Environmental) will be created in EMBL database.

Entries that have "environmental samples" in the taxonomic lineage and/or "/environmental_sample" qualifier will be placed in this division.
Done
March release
of
EMBL database

InterPro cross-references will be added to all of EMBL database.

Cross-referencing to InterPro is done via UniProt and Uniparc.
Uniparc is used to check for 100% protein sequence identity between the contents of /translation qualifier in EMBL entry and the sequence in UniProt entry; cross-references to InterPro are only inherited from those UniProt entries where the sequences are identical.

Done
March release
of
EMBL database

Secondary accession number ranges in AC line

Starting from next release, consecutive secondary accession numbers in EMBL database flatfiles will be shown in the form of accession number ranges

Example

AC line that now appears:

AC Y00001; X00001; X00002; X00003; X00004; X00005;

will appear:

AC Y00001; X00001-X00005;

A mixture of ranges and single accession numbers will be possible.

AC Y00001; X00001-X00005; X00008; Z00001-Z00005;

The first item in the AC line is the primary accession number; the primary accession number of a given entry will not be displayed as a part of a range.

Note: lists of accession numbers will continue to be syntactically legal in EMBL flatfiles

Done
18th Jan 2005 InterPro cross-references will be added to EMBL CDS database.
(ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/)

Cross-referencing to InterPro is done via UniProt and Uniparc.
Uniparc is used to check for 100% protein sequence identity between the contents of /translation qualifier in EMBL entry and the sequence in UniProt entry; cross-references to InterPro are only inherited from those UniProt entries where the sequences are identical.

Done
18th Jan 2005 CDS dataset will be split into subsets

CDS dataset in ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/ will be split into two subsets : CDSs from the standard section of EMBL database and CDSs from WGS section.

The files will be named as follows:

cds.dat.gz - file containing all CDSs from the standard entries
cds_wgs.dat.gz - file containing all CDSs from WGS section

The split files will be called

cds_*.dat.gz
cds_wgs_*.dat.gz

respectively

Done
18th Jan 2005

CRC32 will be introduced into EMBL CDS flatfiles

(ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/)

CRC32 will be introduced into the SQ line in the EMBL CDS flatfiles:

SQ Sequence 714 BP; 183 A; 184 C; 184 G; 163 T; 0 other; <crc32> CRC32;

where <crc32> is the crc32 checksum

NOTE: this change concerns EMBL CDSs only. The structure of normal EMBL flatfile remains the same.s

Done



spacer
spacer