UniProt
Trainer: Aurelien Luciani
Overview:
UniProt is a comprehensive, expert-led, publicly available database of protein sequence, function and variation information.
This session will give an overview of programmatic access to the UniProt database using Python and cover key aspects of protein entry searches, data filtering, batch downloads and give examples of further processing of downloaded target data.
Following a brief introduction to UniProt services, where to find relevant documentation and help features, the session will focus on worked examples. These will include how to programmatically search and retrieve protein entries and sequences, within the results we will then show how to filter and access features of interest.
The session will also cover programmatic examples of the UniProt ID mapping service, batch downloads, filtering, and processing data.
Learning outcomes:
By the end of the session you will be able to:
- Identify the different routes to access UniProt data and know how to pick the most appropriate for your workflow
- List the UniProt services such as UniProtKB, Proteomes, UniParc and UniRef
- Find documentation and useful help to guide your programmatic access
- Retrieve full UniProtKB entries or specific fields using Python
- Filter entries by annotation types and other target characteristics
Materials:
- Presentation slides
- Interactive activity
- File format of the notebook (if the Google Colab is blocked for you)
Recorded session: