|
PICR - Protein Identifier Cross-Reference ServiceUsing PICR is very simple with very few options that need setting.Main Search Page Options
The main search form is divided into four main sections:
PICR can be used to map protein identifiers, sequences or, BLAST sequences, so adjust the data type
selector accordingly in the Input Data section>. You can paste a list
of protein identifiers (one per line), protein sequences in FASTA format, or BLAST sequences also in a FASTA format.
Alternatively, you can upload a file containing this data by clicking on
the
The Input Parameters section can be used to refine your search. By default, PICR will not restrict mappings based on taxonomical information. If you want to obtain mappings for a specific organism, select it from the pull-down list. If the organism you wish to limit to is not in the list, you can type a partial name in the space provided and query the NEWT taxonomy using the Ontology Lookup Service (OLS). A list should appear with the required organism. Any selected value will override the choice selected in the species list above.
Select which databases you wish to map to from the Mapping Databases section. You can map to any number of databases. Note that the choices can sometimes refer to more than one database. For example, selecting Ensembl will attempt to map to all species-specific Ensembl releases, as is the case for Vega, Trome and Refseq. Selecting SwissProt and TREMBL will also include the splice variant databases of each source database. Selecting the Ensembl Genomes database brings up a list of taxon specific databases to search. If your output type is Simple HTML "UniProt 'best guess'" will be available as a mapping database. "Uniprot 'best guess'" returns the identifier from the longest matching UniProt entry from (in order of precedence) the following subsections of UniProt: Swiss-Prot, TrEMBL, Swiss-Prot varsplic, and TrEMBL varsplic. For more information on the difference between Swiss-prot and TrEMBL see the UniProt faqs. For more information on the varsplic databases see this article.
When you select the BLAST mapping option the BLAST options panel appears. In this panel you can select which BLAST database to use and whether to filter results by identity. Note that when filtering by species in the input parameters, returned BLAST results will be filtered by species as well.
You can fine tune the BLAST query by clicking on Show advanced BLAST options and filling in any desired options NB: the defaults should be fine for most searches, and should only be changed if you know what you are doing. Executing A SearchOnce all search parameters have been selected, select the desired output format and click
on the
Searches will try and collate information from multiple databases and may involve SOAP queries to the NCBI. While your search is being executed, a progress bar will be displayed and refreshed every 2 seconds. Once your search is done, the appropriate result page will be shown.
Selecting BLAST accessionsIf you are searching by BLAST sequences an intermediate page will come up allowing you to select which accessions to submit to the cross-reference search. For each BLAST fragment that was submitted a list of the top results, in order of identity is presented. Choose the accession which best matches the submitted data and click Proceed to Mapping.
The program returns to the in progress page to perform cross-reference mapping. Understanding The ResultsSimple HTML view
The table is organized such that each row is a submitted accession or sequence and each column represents a selected mapping database. An empty cell means that no mappings could be found to the corresponding database for the search parameters you entered.
By default, PICR only returns mappings to active database entries, though many more might
be available. PICR queries the Uniprot Archive (UniParc), which is a historical archive
of all known protein entries for over 60 protein sequence databases. As entries are
deleted or obsoleted from the source databases, they are never deleted from UniParc but
are marked as inactive. PICR can include these inactive mappings in the results if the
Entries that can map to an active SwissProt or TREMBL may also have additional mappings, which will be shown in blue. These mappings are obtained from the Uniprot Knowledge Base and, while valid, might not have 100% sequence identity to the submitted accession. Once a search has been done, results can be saved in CSV format or another search can be started.
A dialog box will be shown prompting you to save or open your file.
If the submitted accession or sequence is not present in the Uniprot Archive, it cannot be mapped at this time.
The detailed HTML view will contain additional information not shown in the simple HTML view. Mappings are done on the basis of 100% sequence identity. As such, one protein accession (P29375 in this example) can map to more than one protein sequence. Each sequence will have a UPI (Uniparc Protein Identifier) as well as multiple cross-references. Each cross-reference will contain:
|