We kindly ask users to submit NO MORE THAN 30 JOBS AT THE TIME AND NOT TO SUBMIT MORE JOBS UNTIL YOU HAVE OBTAINED RESULTS FOR THE LAST 30. There are many people using these services and a fair share policy has been implemented that allows us to block users that submit jobs in a manner that prevents others from using the service. This block may affect access to the EMBL-EBI Web Services for an entire organisation or a class B or C subnet. Also make sure you USE A REAL EMAIL ADDRESS in your submissions. Using a fake email means we cannot contact you and will very likely result in your jobs being killed and your IP, Organisation or entire domain being black-listed. We do apologise for any inconvenience this may cause.
Unlike the standalone version of InterProScan (see https://code.google.com/p/interproscan/), the InterProScan web services (InterProScan 5 (REST), InterProScan 5 (SOAP)) only accept a single sequence as input. This limitation is a result of extensive throughput testing, which has shown that the InterProScan services can process more sequences if each job contains only a single sequence. Thus to process multiple sequences they have to be submitted individually, there are a number of ways to do this:
Use a tool such as Blast2GO
which implement methods for batch submission of InterProScan jobs.
Most of the sample clients for the InterProScan web services have an option (
) which enables processing of a set of input sequences in fasta sequence format
. So a large set of sequences can either be processed serially, in parallel (with the
option), or broken up into sections and processed in parallel.
The query sequences can be submitted individually, say by breaking the set of sequences into files containing a single sequence each (for example using EMBOSS seqretsplit
and then running a job for each file.
Use a workflow system (Workflows
) to manage the jobs.
See FAQ: How can I analyse multiple sequences? for details of additional approaches.
Note: if running jobs in parallel the restrictions detailed above apply.
Due to resource limitations the InterProScan service no longer accepts nucleotide sequence submissions.
To process nucleotide sequences using InterProScan:
Translate your nucleotide sequence
: the standalone version of InterProScan 4.x uses EMBOSS sixpack
and InterProScan 5 uses EMBOSS getorf
to perform the translation and filter the resulting open reading frame (ORF) sequences by length. Alternative tools such as EMBOSS transeq
are also available, but may require an additional filtering process to limit the ORF sequences to those above a certain length. These tools are available in Soaplab
and as part of the EMBOSS
Filter ORFs by sequence length
: short sequences (<80 aa) are unlikely to have any signature matches, so unless there is additional evidence that the sequence occurs, short sequences can be discarded. The EMBOSS sixpack
and EMBOSS getorf
tools provide options to perform length filtering when performing the translation.
Significant hits from sequence similarity searches
: the signatures used by InterProScan are based on known protein sequences so a filtering step by performing a BLAST
or FASTA sequence similarity search with the ORF translations against the UniProtKB or UniParc protein sequence databases and only keeping sequences which have hits with E-values <0.001. In the case where an exact match is found to the sequence, you can go directly to the InterPro Matches databases (available in dbfetch
and from the EMBL-EBI FTP
) to get the signature matches for the sequence.
Note: the standalone version of InterProScan can perform the translation and ORF length filtering as part of the submission and is recommended if you need to perform large numbers of analysis and have access to the required resources. For details see: