UniProt Archive (UniParc) is part of  UniProt  project. It is a non-redundant archive of protein sequences extracted from public databases  UniProtKB/Swiss-Prot , UniProtKB/TrEMBL, PIR-PSD, EMBL, EMBL WGS, Ensembl, IPI, PDB, PIR-PSD, RefSeq, FlyBase, WormBase, H-Invitational Database, TROME database, European Patent Office proteins, United States Patent and Trademark Office proteins (USPTO) and  Japan Patent Office proteins.

UniParc contains only protein sequences. All other information about the protein must be retrieved from the source databases using the database cross-references. Each unique sequence is stored only once with a stable identifier. The format of the identifier is UPI followed by ten hexadecimal numbers, e.g. UPI000000000A.

UniParc proteins are linked to their source databases by database cross-references. Each cross-reference links one protein in UniParc to an accession number in a source database. The database cross-reference is active as long as the sequence identified by the source accession number remains unchanged. When the sequence is modified or removed in the source database, the cross-reference from UniParc becomes inactive. Active cross-reference can be used to directly access the source databases but inactive cross-references can only be used to access sequences archives, such as the  Sequence Version Archive

UniParc is available for text- and sequence-based searches. Sequences, which are no longer part of any source database, are excluded from sequence-based searches, but they are available for text-based searches. Performing a similarity search against UniParc is equivalent to performing the same search against all databases cross-referenced in UniParc, as UniParc contains all proteins from its source databases. Sequence similarity searches can be done using  FASTA  or   BLAST

