UniProtKB/TrEMBL - Old Release Notes

N.B. From release 27, the UniProtKB/TrEMBL release notes are included in the UniProt release notes.

UniProtKB/TrEMBL Release Notes
Release 26, March 2004

EMBL Outstation
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Cambridge CB10 1SD
United Kingdom

Telephone: (+44 1223) 494 444
Fax: (+44 1223) 494 468
Electronic mail address:
WWW server:

Swiss Institute of Bioinformatics (SIB)
Centre Medical Universitaire
1, rue Michel Servet
1211 Geneva 4

Telephone: (+41 22) 702 50 50
Fax: (+41 22) 702 58 58
Electronic mail address:
WWW server:


UniProtKB/TrEMBL has been prepared by:

o Maria Jesus Martin, Claire O'Donovan, Nicola Althorpe, Rolf
Apweiler, Daniel Barrell, Kirsty Bates, Paul Browne, Kirill
Degtyarenko, Ruth Eberhardt, Gill Fraser, Alexander Fedetov, Andre
Hackmann, Alexander Kanapin, Youla Karavidopoulou, Paul Kersey, Ernst
Kretschmann, Kati Laiho, Minna Lehvaslaiho, Michele Magrane, Michelle
McHale, Virginie Mittard, Nicola Mulder, John F. O'Rourke, Markiyan
Oliynyk, Sandra Orchard, Astrid Rakow, Sandra van den Broek, Eleanor
Whitfield and Allyson Williams at the EMBL Outstation - European
Bioinformatics Institute (EBI) in Hinxton, UK;
o Amos Bairoch, Alexandre Gattiker, Karine Michoud, Isabelle Phan and
Sandrine Pilbout at the Swiss Institute of Bioinformatics in Geneva,

Copyright Notice
TrEMBL copyright (c) 2004 EMBL-EBI
This manual and the database it accompanies may be copied and
redistributed freely, without advance permission, provided
that this copyright statement is reproduced with each copy.


If you want to cite TrEMBL in a publication please use
the following reference:

Apweiler R., Bairoch A., Wu C.H., Barker W.C., Boeckmann B., Ferro
S., Gasteiger E., Huang H., Lopez R., Magrane M., Martin M.J.,
Natale D.A., O'Donovan C., Redaschi N. and Yeh L.L.
UniProt: the Universal Protein knowledgebase
Nucleic Acids Res. 32: D115-D119 (2004)

1. Introduction

UniProtKB/TrEMBL is a computer-annotated protein sequence database
complementing the Swiss-Prot Protein Knowledgebase. UniProtKB/TrEMBL contains
the translations of all coding sequences (CDS) present in the
EMBL/GenBank/DDBJ Nucleotide Sequence Databases and also protein
sequences extracted from the literature or submitted to Swiss-Prot,
which are not yet integrated into UniProtKB/Swiss-Prot. For all UniProtKB/TrEMBL entries, UniProtKB/Swiss-Prot accession numbers have been assigned.

2. Why a complement to UniProtKB/Swiss-Prot?

The ongoing gene sequencing and mapping projects have dramatically
increased the number of protein sequences to be incorporated into
UniProtKB/Swiss-Prot. We do not want to dilute the quality standards of UniProtKB/Swiss-Prot by incorporating sequences without proper sequence analysis and
annotation but we do want to make the sequences available as quickly as
possible. UniProtKB/TrEMBL achieves this second goal, and is a major step in the
process of speeding up subsequent upgrading of annotation to the standard
UniProtKB/Swiss-Prot quality. To address the problem of redundancy, the translations
of all coding sequences (CDS) in the EMBL Nucleotide Sequence Database
already included in Swiss-Prot have been removed from UniProtKB/TrEMBL.

3. The Release

This UniProtKB/TrEMBL release has been produced in synch with Swiss-Prot release 43. It was created from the EMBL Nucleotide Sequence Database release 77 and
updates until the 26-January-2004 and contains 1'069649 entries and
335'331'748 amino acids.

UniProtKB/TrEMBL is organized in subsections:

arc.dat (Archaea): 1850 entries
arp.dat (Complete Archaeal proteomes): 30436 entries
fun.dat (Fungi): 26545 entries
hum.dat (Human): 33713 entries
inv.dat (Invertebrates): 116930 entries
mam.dat (Other Mammals): 14217 entries
mhc.dat (MHC proteins): 9401 entries
org.dat (Organelles): 87676 entries
phg.dat (Bacteriophages): 9349 entries
pln.dat (Plants): 95077 entries
pro.dat (Prokaryotes): 94217 entries
prp.dat (Complete Prokaryotic Proteomes): 293602 entries
rod.dat (Rodents): 42282 entries
unc.dat (Unclassified): 456 entries
vrl.dat (Viruses): 96447 entries
vrt.dat (Other Vertebrates): 20986 entries
vrv.dat (Retroviruses): 96465 entries

67'168 new entries have been integrated in UniProtKB/TrEMBL. The sequences of
2443 UniProtKB/TrEMBL entries have been updated and the annotation has been
updated in 441'151 entries.

In the document deleteac.txt, you will find a list of all accession numbers
which were previously present in UniProtKB/TrEMBL, but which have now been deleted from
the database.

4. Format Differences Between UniProtKB/Swiss-Prot and UniProtKB/TrEMBL

The format and conventions used by UniProtKB/TrEMBL follow as closely as possible
that of UniProtKB/Swiss-Prot. Hence, it is not necessary to produce an additional
user manual and extensive release notes for UniProtKB/TrEMBL. The information given
in the UniProtKB/Swiss-Prot release notes and user manual are in general valid for
UniProtKB/TrEMBL. The differences are mentioned below.

The general structure of an entry is identical in UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. The data class used in UniProtKB/TrEMBL (in the ID line) is always 'PRELIMINARY', whereas in UniProtKB/Swiss-Prot it is always 'STANDARD'.

Differences in line types present in UniProtKB/Swiss-Prot and TrEMBL:

The ID line (IDentification):

The entry name used in UniProtKB is the same as the Accession Number of the

The DT line (DaTe)

The format of the DT lines that serve to indicate when an entry was
created and updated are identical to that defined in UniProtKB/Swiss-Prot; but the
DT lines in UniProtKB/TrEMBL refer to the UniProtKB/TrEMBL release. The difference is
shown in the example below.

DT lines in a UniProtKB/Swiss-Prot entry:

DT 01-JAN-1988 (Rel. 06, Created)
DT 01-JUL-1989 (Rel. 11, Last sequence update)
DT 01-AUG-1992 (Rel. 23, Last annotation update)

DT lines in a UniProtKB/TrEMBL entry:

DT 01-NOV-1996 (UniProtKB/TrEMBLrel. 01, Created)
DT 01-NOV-1999 (UniProtKB/TrEMBLrel. 12, Last sequence update)
DT 01-MAR-2004 (UniProtKB/TrEMBLrel. 26, Last annotation update)

5. Weekly updates of UniProtKB/TrEMBL and non-redundant data sets

5.1 UniProtKB/TrEMBL updates

Biweekly cumulative updates of TrEMBL are available by anonymous FTP and
from the EBI SRS server.

5.2 UniProtKB

We also produce biweekly a complete non-redundant protein sequence
collection by providing three compressed files: uniprot.sprot.dat.gz and
uniprot.trembl.dat.gz are in /pub/databases/uniprot/knowledgebase and
uniprot.trembl_new.dat.gz is in /pub/databases/uniprot/knowledgebase/new
on the EBI ftp server.

This set of non-redundant files is especially important for two types of
(i) Managers of similarity search services. They can now provide what is
currently the most comprehensive and non-redundant data set of protein
(ii) Anybody wanting to update their full copy of UniProtKB/Swiss-Prot + UniProtKB/TrEMBL to their own schedule without having to wait for full releases of UniProtKB/Swiss-Prot or UniProtKB/TrEMBL (UniProtKB).

5.3 XML

A version of UniProtKB/Swiss-Prot and UniProtKB/TrEMBL in XML format has been developed and is provided with this release. More information is available at and the data can be
downloaded from

We would welcome any feedback from the user community.

5.4 Varsplic Expand

We also provide Varsplic Expand which is a program to generate
" expanded" sequences from Swiss-Prot and TrEMBL records i.e. sequences
including the variants specified by the varsplic, variant and conflict
annotations. New records are produced in either pseudo-Swiss-Prot or
FASTA format for each specified variant. More information and the data is
available at

6. Access/Data Distribution

FTP server:
SRS server:

UniProtKB/TrEMBL is also available on the UniProtKB/Swiss-Prot CD-ROM.
UniProtKB/Swiss-Prot + UniProtKB/TrEMBL is searchable on the following servers at the EBI:

Scanps (
MPSrch (

For each UniProtKB/TrEMBL release, a synchronized version of the concurrent UniProtKB/Swiss-Prot release is distributed at

7. General announcements and Forthcoming changes

7.1 What's new

We have introduced new resources for users to enable us to communicate
effectively between releases about what is new in UniProtKB/Swiss-Prot and UniProtKB/TrEMBL and what is planned for the future.
These are available at: