spacer

UniParc at the EBI


UniProt Archive (UniParc) is part of UniProt project. It is a non-redundant archive of protein sequences extracted from public databases UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, PIR-PSD, EMBL, EMBL WGS, Ensembl, IPI, PDB, PIR-PSD, RefSeq, FlyBase, WormBase, H-Invitational Database, TROME database, European Patent Office proteins, United States Patent and Trademark Office proteins (USPTO) and Japan Patent Office proteins.

UniParc contains only protein sequences. All other information about the protein must be retrieved from the source databases using the database cross-references. Each unique sequence is stored only once with a stable identifier. The format of the identifier is UPI followed by ten hexadecimal numbers, e.g. UPI000000000A.

UniParc proteins are linked to their source databases by database cross-references. Each cross-reference links one protein in UniParc to an accession number in a source database. The database cross-reference is active as long as the sequence identified by the source accession number remains unchanged. When the sequence is modified or removed in the source database, the cross-reference from UniParc becomes inactive. Active cross-reference can be used to directly access the source databases but inactive cross-references can only be used to access sequences archives, such as the Sequence Version Archive.

UniParc is available for text- and sequence-based searches. Sequences, which are no longer part of any source database, are excluded from sequence-based searches, but they are available for text-based SRS searches. Performing a similarity search against UniParc is equivalent to performing the same search against all databases cross-referenced in UniParc, as UniParc contains all proteins from its source databases. Sequence similarity searches can be done using FASTA, BLAST or Mpsrch.

Why is UniParc not available via ftp?

UniParc is a non-redundant protein sequence archive, containing both active and dead sequences, and it is species-merged since sequences are handled just as strings - all sequences 100% identical over the whole length of the sequence between species are merged.

UniParc records do not have any annotation since the annotation will be only true in the real context of the sequence. For example, proteins with the same sequence may have different functions depending on species, tissue, developmental stage, etc. All this context dependent information is not present in UniParc. Rather, it is the purpose of the UniProt Knowledgebase to provide this annotation.

Therefore links to the UniProt Knowledgebase are provided in the sequence similarity search results against UniParc to enable users to access the relevant annotation.

The unavailability of annotation, the merging of sequences into one single record and presence of both active and inactive sequences in UniParc makes it unsuitable for any kind of large scale parsing and manipulation. Hence, UniParc is not made available via ftp.


Accessing the database


Query the database


Link Explanation
UniParc Search Database queries
SRS Database queries
dbfetch Entry and sequence retrieval
Wu-BLAST2 Sequence similarity searches
NCBI-BLAST2 Sequence similarity searches
FASTA Sequence similarity searches
SSEARCH Sequence similarity searches
MPsrch Sequence similarity searches
ScanPS Sequence similarity searches



spacer
spacer