spacer
News
January 2009: Addition of SEGUIDs

We have implemented the SEGUID algorithm to generate sequence-based unique identifiers, as described in Babnigg, 2006.

January 2009: New databases and bugfixes

We have added the KIPO database as well as several ENSEMBL genome databases to the mapping algorithm. We have also improved the PICR codebase to make it more robust. PICR has been the victim of its own success in recent months and has had problems coping with the demands put on the web application. These problems should now be resolved.

July 2007: Updated CSV and XLS formats

The CSV and XLS output formats have been improved to be more usable and to fit into size restrictions imposed by the Microsoft Excel format. If you are submitting large data sets to be mapped, we recommend choosing the XLS format for batch downloads.

June 2007: Added functionality

We have added a new output format for mapping results. You may now save your results as a Microsoft Excel workbook (XLS file). This fine can be opened in MS Excel or Open Office. This is still experimental, so please contact us if you are experiencing problems.

June 2007: Added functionality

You may now submit an unilmited number of protein identifiers or sequences. Please be aware that if you submit more than 500 items at one time, you will be prompted to enter your email address and a notification will be emailed to you when your job is done. You will then be able to download a CSV file containing your results.

April 2007: BETA Release

A BETA version of the Protein Identifier Cross-Referencing (PICR) service has now been released and we invite your feedback! Please email us any questions or comments you may have.

spacer

PICR - Protein Identifier Cross-Reference Service

Using PICR is very simple with very few options that need setting.

Main Search Page Options

Main Search Options

The main search form is divided into four main sections:

  • Input Data
  • Input Parameters
  • Mapping Databases
  • Output Parameters

PICR can be used to map protein identifiers or sequences, so adjust the data type selector accordingly in the Input Data section>. You can either paste a list of protein identifiers (one per line) or protein sequences in FASTA format. Alternatively, you can upload a file containing this data by clicking on the browse button and selecting the appropriate file. Please note that only 100 protein accessions or sequences can be mapped at one time and the maximum size of the uploaded file is 2 mb.

Limit By Species

The Input Parameters section can be used to refine your search. By default, PICR will not restrict mappings based on taxonomical information. If you want to obtain mappings for a specific organism, select it from the pull-down list. If the organism you wish to limit to is not in the list, you can type a partial name in the space provided and query the NEWT taxonomy using the Ontology Lookup Service (OLS). A list should appear with the required organism. Any selected value will override the choice selected in the species list above.

Limit By Species

Select which databases you wish to map to from the Mapping Databases section. You can map to any number of databases. Note that the choices can sometimes refer to more than one database. For example, selecting Ensembl will attempt to map to all species-specific Ensembl releases, as is the case for Vega, Trome and Refseq. Selecting SwissProt and TREMBL will also include the splice variant databases of each source database.

Executing A Search

Once all search parameters have been selected, select the desired output format and click on the search button.

  • Simple HTML will return a simple HTML table.
  • Detailed HTML will return a more detailed HTML table.
  • CSV will return a comma-separated value file containing the same information as the simple HTML view.

Searches will try and collate information from multiple databases and may involve SOAP queries to the NCBI. While your search is being executed, a progress bar will be displayed and refreshed every 2 seconds. Once your search is done, the appropriate result page will be shown.

Search In Progress

Understanding The Results

Simple HTML view

Simple HTML view

The table is organized such that each row is a submitted accession or sequence and each column represents a selected mapping database. An empty cell means that no mappings could be found to the corresponding database for the search parameters you entered.

Simple HTML view

By default, PICR only returns mappings to active database entries, though many more might be available. PICR queries the Uniprot Archive (UniParc), which is a historical archive of all known protein entries for over 60 protein sequence databases. As entries are deleted or obsoleted from the source databases, they are never deleted from UniParc but are marked as inactive. PICR can include these inactive mappings in the results if the Return only active mappings box is unchecked in the search options. These inactive mappings will be shown in red in both HTML result views but will not be distinguishable from active mappings in the CSV view.

Entries that can map to an active SwissProt or TREMBL may also have additional mappings, which will be shown in blue. These mappings are obtained from the Uniprot Knowledge Base and, while valid, might not have 100% sequence identity to the submitted accession.

Once a search has been done, results can be saved in CSV format or another search can be started.

Simple HTML view

A dialog box will be shown prompting you to save or open your file.

Simple HTML view

If the submitted accession or sequence is not present in the Uniprot Archive, it cannot be mapped at this time.

Simple HTML view

The detailed HTML view will contain additional information not shown in the simple HTML view. Mappings are done on the basis of 100% sequence identity. As such, one protein accession (P29375 in this example) can map to more than one protein sequence. Each sequence will have a UPI (Uniparc Protein Identifier) as well as multiple cross-references. Each cross-reference will contain:

  • the source database
  • the versioned accession
  • if the cross-reference is active or deleted
  • the NEWT taxonomy ID (if available)
  • the corresponding NCBI GI number (if available)
  • the date the entry was added to UniParc
  • the date the entry was last seen or deleted
The same color-coding applies as described above.

spacer
spacer