spacer

Home

InterPro BioMart Help Page - the MartView Interface

Introduction

BioMart is developed jointly by the Ontario Institute for Cancer Research (OiCR) and the European Bioinformatics Institute (EBI).

For quick help with the BioMart interface, linked both from here and directly from the BioMart Logo InterPro BioMart page itself, you will find a short help page (linked from the Help button at the top of the BioMart MartView interface).

BioMart allows you to retrieve public InterPro data from a query-optimised data warehouse that is synchronised with the main InterPro database shortly after each InterPro release. The BioMart interface allows you to build simple or complex queries, with total control over both how the data is filtered (to restrict which records are included) and also which attributes (equivalent to columns in a spreadsheet) are included in the results. This allows you to avoid the need to search through a massive table of results, much of which may be irrelevant to you, and focus specifically on the information that is important.

You can specify the format of the results with options such as:

  • HTML table in your browser
  • comma separated values (plain text format)
  • tab separated values (plain text format)
  • Microsoft Excel spreadsheet

You can also have the results returned to you as a compressed file if the number of results is expected to be large.

This page describes the details of the InterPro BioMart.

BioMart Logo


Page Contents

Selecting a BioMart Database

BioMart allows data to be queried and joined from separate BioMart databases. This is under the control of the BioMart developer. At the time of writing, the InterPro BioMart includes links to both the Reactome curated database of biological pathways and the PRIDE Proteomics Identifications Database, which is a repository of protein and peptide identifications arising from mass spectrometry. It is likely that additional BioMart databases will be linked to from the InterPro BioMart as they become available.

The links from the InterPro BioMart to both the Reactome and the PRIDE BioMart are based upon common UniProt protein accession, for example if you build a query against the InterPro BioMart, perhaps for the proteins that match a particular InterPro Entry, you can then additionally select to see data from Reactome describing the metabolic pathways that these proteins are part of, or alternatively find details of identifications of these proteins in the PRIDE database. More details are given at the foot of this page.

Both the InterPro BioMart itself and the linked BioMart databases appear in the "- CHOOSE DATABASE -" pull down list on the InterPro BioMart "MartView" interface.

Screenshot: Choose a Database

You can start by querying any of these databases. The following instructions on this page focus on the "InterPro BioMart", so to follow through the page you should start by selecting this option.

Selecting a Data Set

Assuming that you have selected the InterPro BioMart database in the step above, you now have a choice of two 'Datasets' to choose from, as illustrated below.

Screenshot: Choose a Dataset

You should select one dataset depending on how you are building your filters. (See Building Filters below). If you are querying by protein, select the "Protein Matches" dataset. If your filter is focused on InterPro entries, or member database signatures, select the "InterPro Entries" dataset.

If on the other hand, you cannot distinguish based upon the filter, (e.g. you wish to build a complex filter based on both proteins and entries / signatures), you are advised to select the "InterPro Entries" dataset that provides the largest range of attributes for inclusion in the results.

If you wish to build a filter based upon protein taxonomy you should use the "Protein Matches" dataset.

Building Filters

You can build simple or complex filters to restrict the records that are returned to you from the BioMart. It is possible to specify several different criteria for the records returned, all of which must be met for each record returned to you in the results.

To build your filter, click on the "Filters" heading in the left panel of the BioMart interface. You can then define your filter on the right hand panel.

You will see several expandable sections from which you can select filter criteria (you can select any number and combination of filters from all the sections). To open a section, click on the Plus.


Filters That Accept Multiple Values

Each filter item is described in the tables below, one table for each dataset. Note that for some of the filters it is possible to specify more than one item. When this is used, the filter returns all records that match any of the items specified (using OR logic). Filters that accept multiple values are indicated below. To select multiple items, follow the following instructions:

  • For fields that accept a typed, pasted or uploaded list, you can separate values using either white space or commas as illustrated in the two figures below:
    Screenshot: White space separated options

    Screenshot: Comma separated options
    You may also upload a text file from your computer containing information formatted in the same way by clicking on the browse button.

  • For fields that contain a list of possible values, you can select multiple values by holding the CTRL key (Microsoft Windows & Linux) or Apple & Shift key (Apple Mac) while clicking on the values that you wish to select as illustrated below:
    Multiple Select

Filters for the "InterPro Entries" Dataset


Section Filter Multiple Values Allowed Description and Notes
InterPro Entry Filters InterPro Entry ID (IPR######) Yes
InterPro Entry Name Yes
InterPro Entry Short Name Yes
InterPro Entry Type Yes
Member Database Signature Filters Signature Accession Yes
Signature ID (Name) Yes
Source Signature Database Yes
Protein Signature Match Filters UniProtKB Protein Accession Yes
UniProtKB Protein ID (Name) Yes
Source Protein Database No Either the UniProtKB/SwissProt database of human curated proteins, or the UniProtKB/TrEMBL database of automatically curated proteins.
Match Location on Protein Sequence No Includes only fully-contained matches
Include Protein Fragments? No If you do not want to filter based upon whether or not the proteins are fragments, do not select (check) this filter. If you wish to include only proteins that are fragments, select "Y", or "N" if you only wish to include full-length proteins.

Filters for the "Protein Matches" Dataset


Section Filter Multiple Values Allowed Description and Notes
Protein Filters UniProtKB Protein Accession Yes
UniProtKB Protein ID (Name) Yes
Source Protein Database No Either the UniProtKB/SwissProt database of human curated proteins, or the UniProtKB/TrEMBL database of automatically curated proteins.
Sequence Checksum (CRC64) Yes
Sequence Length No
Include Protein Fragments? No If you do not want to filter based upon whether or not the proteins are fragments, do not select (check) this filter. If you wish to include only proteins that are fragments, select "Y", or "N" if you only wish to include full-length proteins.
NCBI Taxonomy ID Yes This is a hierarchical query - for example if you include the value '40674' for 'Mammalia' your results will include all proteins that are annotated as any kind of mammal.
Model Organism Species Yes The full list of species described in the Ensembl database.
Taxonomic Kingdom Yes
Taxonomic Phylum Yes
Taxonomic Class Yes
Signature Match Filters Signature Accession Yes The member database specific identifier for the signature / model / pattern.
Signature ID (Name) Yes The member database specific name for the signature / model / pattern.
Source Signature Database Yes Filter to include only the selected InterPro member database(s).
Match Status Yes One of the values '?' for unknown status, 'F' for false positive and 'T' for true positive.
InterPro Match Filters InterPro Entry Accession Yes
InterPro Entry Name yes
InterPro Entry Type Yes

Specifying Attributes

The InterPro BioMart, in common with all BioMart interfaces, allows you to specify precisely which data items are included in the results. Data items are called "Attributes" in BioMart, being equivalent to a column of data in a spreadsheet.

To select attributes, click on "Attributes" in the left panel of the BioMart window. You can then check the check-boxes adjacent to each attribute that you wish to include in the results.

The InterPro BioMart attributes selection page is very simple, consisting of a single page of attributes split into four sections. You can select any number of attributes from any of the sections. The following table describes these four sections and lists each attribute.


Attributes for the "InterPro Entries" Dataset


Section Attribute Description and Notes
InterPro Entry InterPro Entry Accession e.g. IPR012335
InterPro Entry Short Name
InterPro Entry Name
InterPro Entry Type
InterPro Abstract
Protein Matches UniProtKB Protein Accession
UniProtKB Protein ID (Name)
Match Start Position
Match Stop Position
Match Score
Source Protein Database
NCBI Taxonomy ID
Protein Taxonomic Name
Taxonomy Name (Species)
Sequence checksum (CRC-64)
Is a Fragment?
Match Status
Sequence Length
Signature Attributes Signature Accession
Signature ID (Name)
Source Signature Database
Related Parent / Child InterPro Entries Parent Entry Accession
Parent Entry Name
Parent Entry Short Name
Parent Entry Type
Child Entry Accession
Child Entry Name
Child Entry Short Name
Child Entry Type
Related Contains / Found-in InterPro Entries Found In Entry Accession
Found In Entry Name
Found In Entry Short Name
Found In Entry Type
Contains Entry Accession
Contains Entry Name
Contains Entry Short Name
Contains Entry Type
GO (Gene Ontology) Terms GO ID
GO Term Name
GO Root Term (Process / Component / Function)
Other Database Cross References Database Name
Cross Reference Accession
Cross Reference Name

Attributes for the "Protein Matches" Dataset

Attribute Tab Section Attribute Description and Notes
InterPro Protein Dataset Attributes Protein Attributes UniProtKB Protein Accession
UniProtKB Protein ID (Name)
Source Protein Database
Sequence Checksum (CRC64)
Sequence Length
Is a Fragment ? (Y or N)
NCBI Taxonomy ID
Taxonomy Name (Species)
Taxonomy Full Name
Signature Match Attributes Signature Accession
Signature ID (Name)
Source Signature Database
Start Position
Stop Position
Match Status
Match Score
InterPro Match Attributes InterPro Entry ID
InterPro Entry Short Name
InterPro Entry Name
InterPro Entry Type
InterPro Supermatch Attributes Protein Attributes UniProtKB Protein Accession
UniProtKB Protein ID (Name)
Source Protein Database
Sequence Checksum (CRC64)
Sequence Length
Is a Fragment ? (Y or N)
NCBI Taxonomy ID
Taxonomy Name (Species)
Taxonomy Full Name
InterPro Supermatch Attributes InterPro Entry ID (Supermatch)
InterPro Entry Short Name (Supermatch)
InterPro Entry Name (Supermatch)
InterPro Entry Type (Supermatch)
Start Position (Supermatch)
Stop Position (Supermatch)

Formatting Results

When you have completed selecting attributes to display and filtering the data to include only the required results, you are then ready to preview and return the results for your query.

If you click on the 'Count' button at the top of the BioMart interface, you will be presented with a count of the number of InterPro entries or matching proteins, depending upon the dataset that you are querying. This is not necessarily the same as the number of results (rows of data) that will be returned. Depending upon the attributes that you have selected, the number of results may be several orders of magnitude greater than the stated count.

Clicking on the 'Results' button will return the first ten result rows in HTML format. You can then specify the format for your data set, as illustrated in the image below.


BioMart Results.

Click on the Rows as HTML select pull down. This will allow you to select from a number of options as described below.


HTML Formatted table for viewing in an internet browser
CSV Comma separated values (plain text file)
TSV Tab separated values (plain text file)
XLS Microsoft Excel spreadsheet

When you have selected the most appropriate format, you can then export the results as a file, for viewing in your internet browser (HTML, CSV or TSV only) or as a compressed file if you suspect that the number of results will be large.

For very large results sets, you can request an email to be sent to you with a link to the results set. Click on the "File" select pull-down and select the "Compressed web file (notify by email)" item. Finally enter your email address in the "Email notification to" text box and click on "Go"


spacer
InterPro 35.0