UniParc at the EBI
UniProt Archive (UniParc) is part of UniProt project. It is a non-redundant archive of protein sequences extracted from public databases UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, PIR-PSD, EMBL, EMBL WGS, Ensembl, IPI, PDB, PIR-PSD, RefSeq, FlyBase, WormBase, H-Invitational Database, TROME database, European Patent Office proteins, United States Patent and Trademark Office proteins (USPTO) and Japan Patent Office proteins.
UniParc contains only protein sequences. All other information about the protein must be retrieved from the source databases using the database cross-references. Each unique sequence is stored only once with a stable identifier. The format of the identifier is UPI followed by ten hexadecimal numbers, e.g. UPI000000000A.
UniParc proteins are linked to their source databases by database cross-references. Each cross-reference links one protein in UniParc to an accession number in a source database. The database cross-reference is active as long as the sequence identified by the source accession number remains unchanged. When the sequence is modified or removed in the source database, the cross-reference from UniParc becomes inactive. Active cross-reference can be used to directly access the source databases but inactive cross-references can only be used to access sequences archives, such as the Sequence Version Archive.
UniParc is available for text- and sequence-based searches. Sequences, which are no longer part of any source database, are excluded from sequence-based searches, but they are available for text-based SRS searches. Performing a similarity search against UniParc is equivalent to performing the same search against all databases cross-referenced in UniParc, as UniParc contains all proteins from its source databases. Sequence similarity searches can be done using FASTA, BLAST or Mpsrch.
Why is UniParc not available via ftp?
UniParc is a non-redundant protein sequence archive, containing both
active and dead sequences, and it is species-merged since sequences are
handled just as strings - all sequences 100% identical over the whole
length of the sequence between species are merged.
UniParc records do
not have any annotation since the annotation will be only true in the
real context of the sequence. For example, proteins with the same
sequence may have different functions depending on species, tissue,
developmental stage, etc. All this context dependent information is not present in UniParc. Rather, it is the purpose of the UniProt Knowledgebase to provide this annotation.
Therefore links to the UniProt Knowledgebase are provided in the sequence similarity search results against UniParc to enable users to access the relevant annotation.
The
unavailability of annotation, the merging of sequences into one single
record and presence of both active and inactive sequences in UniParc
makes it unsuitable for any kind of large scale parsing and
manipulation. Hence, UniParc is not made available via ftp.
Accessing the database
Query the database
 |