- Course overview
- Search within this course
- What is UniProt?
- Where does the data come from?
- Why do we need UniProt?
- When to use UniProt
- Quiz: Check your learning I
- How to access and navigate UniProt
- How to search UniProt
- Annotation score
- Quiz: Check your learning II
- Exploring a UniProtKB entry
- How to use UniProt tools
- How to get data from UniProt
- How to submit data to UniProt
- Exercise: finding entries with 3D structures
- Exercise: mapping other database identifiers to UniProt
- Summary
- Get help and support on UniProt
- References
- Next steps
- Your feedback
Downloading a proteome set for a specific organism
You can use UniProt to download protein sets for completely sequenced organisms (also known as ‘proteomes’). For example, let’s try and download the proteome for Escherichia coli strain K12. Go to the UniProt website and either click on the search selection drop-down to choose ‘Proteomes’ or on the ‘Species Proteomes’ tile to go directly to the Proteomes page (Figure 74).

In either case, type Escherichia coli into the search box and click on ‘Search’ (Figure 75).

You get a results page with ‘Escherichia coli (strain K12)’ being the top hit. Hint – you can make the search more specific by searching for “Escherichia coli” (i.e. put quotes around your search phrase) which tells the database that the words must appear as an exact phrase. You can also see a red icon next to the name showing that this is a Reference Proteome. Reference Proteomes have been selected to cover well-studied model organisms and other proteomes of interest for biomedical research (Figure 76).

You also see additional information relating to the completeness and quality of the proteome. The BUSCO score quantifies genomic data completeness in terms of expected gene content based on single-copy orthologs. The Complete Proteome Detector (CPD) statistically analyses each proteome against a group of closely related proteomes in order to determine completeness. More details are available on the UniProt website.
You can click on the Proteome identifier in the ‘Entry’ column of the results table to see the full proteome entry (Figure 77).

If you scroll down to the ‘Components’ section, you can see a download button (Figure 78). This will allow you to download the full proteome or individual components such as chromosomes for those species with multiple chromosomes or organelles such as the mitochondrial genome.
