0%

Downloading a proteome set for a specific organism

You can use UniProt to download protein sets for completely sequenced organisms (also known as ‘proteomes’). For example, let’s try and download the proteome for Escherichia coli strain K12. Go to the UniProt website and either click on the search selection drop-down to choose ‘Proteomes’ or on the ‘Species Proteomes’ tile to go directly to the Proteomes page (Figure 74).

Figure 74 Dataset selection drop-down.

In either case, type Escherichia coli into the search box and click on ‘Search’ (Figure 75).

Figure 75 Proteomes search for ‘Escherichia coli’.

You get a results page with ‘Escherichia coli (strain K12)’ being the top hit. Hint – you can make the search more specific by searching for “Escherichia coli” (i.e. put quotes around your search phrase) which tells the database that the words must appear as an exact phrase. You can also see a red icon next to the name showing that this is a Reference Proteome. Reference Proteomes have been selected to cover well-studied model organisms and other proteomes of interest for biomedical research (Figure 76).

Figure 76 Complete proteomes found as a result of searching for Escherichia coli.

You also see additional information relating to the completeness and quality of the proteome. The BUSCO score quantifies genomic data completeness in terms of expected gene content based on single-copy orthologs. The Complete Proteome Detector (CPD) statistically analyses each proteome against a group of closely related proteomes in order to determine completeness. More details are available on the UniProt website.

You can click on the Proteome identifier in the ‘Entry’ column of the results table to see the full proteome entry (Figure 77).

Figure 77 Proteome entry for ‘Escherichia coli (strain K12)’.

If you scroll down to the ‘Components’ section, you can see a download button (Figure 78). This will allow you to download the full proteome or individual components such as chromosomes for those species with multiple chromosomes or organelles such as the mitochondrial genome.

Figure 78 Components in the entry for ‘Escherichia Coli (strain K12)’.