Using The UniProtJAPI
UniProt Knowledgebase Release [2012_01]
Access to all the UniProtJAPI services is provided through an enumeration called UniProtJAPI
it has an instance factory which provides the remote data access services.
To retrieve an entries use EntryRetrievalService.
//Create entry retrival service
EntryRetrievalService entryRetrievalService = UniProtJAPI.factory.getEntryRetrievalService();
It is also possible to retrieve a set of entries matching to a query. For UniProtKB you will use UniProtQueryService.
// Create UniProt query service
UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();
For UniParc you will use UniParcQueryService.
// Create UniParc query service
UniParcQueryService uniParcQueryService = UniProtJAPI.factory.getUniParcQueryService();
It is possible to find out what version of UniProt you are using.
//Use the factory to print out the version.
System.out.println("UniProt Version = " + UniProtJAPI.factory.getVersion());
UniProtJAPI jars will not change unless a major format change occurs. This normally only happens during a major point release.
It is easy to retrieve a single UniProt knowledgebase entry
//Create entry retrival service
EntryRetrievalService entryRetrievalService = UniProtJAPI.factory.getEntryRetrievalService();
//Retrieve UniProt entry by its accession number
UniProtEntry entry = (UniProtEntry) entryRetrievalService.getUniProtEntry("Q96AZ6");
//If entry with a given accession number is not found, entry will be equal null
if (entry != null) {
System.out.println("entry = " + entry.getUniProtId().getValue());
}
//Retrieve UniRef entry by its ID
UniRefEntry uniRefEntry = entryRetrievalService.getUniRefEntry("UniRef90_Q12979-2");
if (uniRefEntry != null) {
System.out.println("Representative Member Organism = " +
uniRefEntry.getRepresentativeMember().getSourceOrganism().getValue());
}
It is possible to query and retrieve sets of entries matching to your condition. You will need to use
UniProtQueryService to execute your queries, and the utility class UniProtQueryBuilder to build the queries.
You can query for data with a
specific gene or
protein name, EC number, keyword, etc. Moreover, you can restrict your search by reviewed (SwissProt set) or unreviewed
(TrEMBL set) entries. You can find a full listing of possible index field here
// Create UniProt query service
UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();
// Example1: get entries with protein name "ClpC"
Query query = UniProtQueryBuilder.buildProteinNameQuery("ClpC");
EntryIterator<UniProtEntry> entryIterator = uniProtQueryService.getEntryIterator(query);
int resultSize = entryIterator.getResultSize();
System.out.println("resultSize = " + resultSize);
for (UniProtEntry uniProtEntry : entryIterator) {
System.out.println("Primary Acc = " +
uniProtEntry.getPrimaryUniProtAccession().getValue());
}
// Example2: get accession numbers of SwissProt entries with protein EC number "EC 3.1.6.-"
Query query2 = UniProtQueryBuilder.buildECNumberQuery("EC 3.1.6.-");
Query queryReviewed = UniProtQueryBuilder.setReviewedEntries(query2);
AccessionIterator accs = uniProtQueryService.getAccessions(queryReviewed);
System.out.println("uniProtAccessions count = " + accs.getResultSize());
You can retrieve entries which were created or updated between given dates.
//Build query to select entries which were updated during the certain period
Date startDate = DateFormat.getDateInstance(DateFormat.DEFAULT).parse("09-Jan-2007");
Date endDate = DateFormat.getDateInstance(DateFormat.DEFAULT).parse("23-Jan-2007");
Query dateQuery = UniProtQueryBuilder.buildUpdatedQuery(startDate, endDate);
//Query for SwissProt data set
Query reviewedQuery = UniProtQueryBuilder.setReviewedEntries(dateQuery);
AccessionIterator accessions = queryService.getAccessions(reviewedQuery);
System.out.println("Updated entries = " + accessions.getResultSize());
It is also possible to supply a list of protein identifiers.
// Create UniProt query service
UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();
//Create a list of accession numbers (both primary and seconday are acceptable)
List<String> accList = new ArrayList<String>();
accList.add("O60243");
accList.add("Q8IZP7");
accList.add("P02070");
//Isoform IDs are acceptable as well
accList.add("Q4R572-1");
//as well as entry IDs
accList.add("14310_ARATH");
Query query = UniProtQueryBuilder.buildIDListQuery(accList);
EntryIterator<UniProtEntry> entries = uniProtQueryService.getEntryIterator(query);
for (UniProtEntry entry : entries) {
System.out.println("entry.getUniProtId() = " + entry.getUniProtId());
}
If you work with sets of entries we have now added a convience method to allow you to generate different EntryIterators and then apply set opperations to them to create new ones.
// build a human interator
EntryIterator<UniProtEntry> humanIterator = uniProtQueryService.getEntryIterator(UniProtQueryBuilder.buildOrganismQuery("human"));
//.... do something
System.out.println(humanIterator.getResultSize());
// Build a kinase iterator
EntryIterator<UniProtEntry> kinaseIterator = uniProtQueryService.getEntryIterator(UniProtQueryBuilder.buildProteinNameQuery("kinase"));
//.... do something
System.out.println(kinaseIterator.getResultSize());
// now I want to intersect these two sets
EntryIterator<UniProtEntry> intersection = uniProtQueryService.getEntryIterator(humanIterator,SetOperation.And,kinaseIterator);
//.... do something
System.out.println(intersection.getResultSize());
In addition to being able to retrieve the entire UniProt object (UniProtKB, UniParc or UniRef) you are also able
to simply retrieve a given attribute of that object. The attribute is described by an expression over the UniProt
object model. Currently the UniProtJAPI supports 2 standard expression languages and an additional non standard
flat file expression format.
The two support expression lanuages are:
UniProtJAPI supports two methods for accessing these attributes from the objects, either via a single entry or through an AttributeIterator over a query.
//Create entry retrival service
EntryRetrievalService entryRetrievalService = UniProtJAPI.factory.getEntryRetrievalService();
//Retrieve list of keyword objects from a UniProt entry by its accession number
Object attribute = UniProtJAPI.factory.getEntryRetrievalService().getUniProtAttribute("P99999" , "ognl:keywords");
// Cast the object to UniProt Keywords
Final List<Keyword> keywords = (List<Keyword>)attribute;
// Create UniProt query service
UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();
// Example1: get entries with protein name "ClpC"
Query query = UniProtQueryBuilder.buildProteinNameQuery("ClpC");
//Retrieve the attribute iterator
AttributeIterator<UniProtEntry> it = uniProtQueryService.getAttributes(query, "ognl:keywords");
// Iterator over all the objects
for (Attribute attribute : it) {
// The attribute is a key value pair, the key is the primary accession
String accession = attribute. getAccession();
// Cast the object to UniProt Keywords which is the value portion of the key-value pair.
Final List<Keyword> keywords = (List<Keyword>) attribute.getValue();
}
OGNL stands for Object Graph Navigation Language. It is an expression and binding language for getting and
setting properties of Java objects. Normally the same expression is used for both getting and
setting the value of a property.
# List of keywords via the getKeywords method
ognl:keywords
# List of secondary UniProt Accessions
ognl:secondaryUniProtAccessions
#List of Entry Citations
ognl:citations
# List of all Journal Aritcles � Use the @ to access static methods and enumarations.
ognl:getCitations(@uk.ac.ebi.kraken.interfaces.uniprot.citations.CitationTypeEnum@JOURNAL_ARTICLE)
# List of all comments
Ognl:comments
# List values (Strings) for CoFactor comments. Use the { to apply the expression to the contents of the list.
ognl:getComments(@uk.ac.ebi.kraken.interfaces.uniprot.comments.CommentType@COFACTOR).{value}
# List of String values representing Go Id�s from all the entries Go terms.
ognl:goTerms.{goId.value}
# The entry organism.
ognl:organism
# The scientific name of the entry
ognl:organism.scientificName
# The sequence as a string, this is common across UniProtKB and UniParc entries
ognl:sequence.value
# Accession chain features
ognl:features.{? #this instanceof uk.ac.ebi.kraken.interfaces.uniprot.features.ChainFeature}
# Getting the number of database crossreferences.
ognl:databaseCrossReferences.size
# Conditions are also allowed if the entry is a Swissprot entry return the description otherwise return the genes
ognl:type==@uk.ac.ebi.kraken.interfaces.uniprot.UniProtEntryType@SWISSPROT?description:genes
For full language reference and further documentation on OGNL see http://www.ognl.org/2.6.9/Documentation/html/LanguageGuide/apa.html
Java Expression Language (JEXL) is an expression language engine.
# Number of keywords.
jexl:keywords.size()
# Get the list of ProteinNames
jexl:description.getProteinNames()
For full lanuage reference and further documention on JEXL see http://jakarta.apache.org/commons/jexl/
A simple expression language to access the flat file line types
# Return the ID line
ff:id
# Return the DE line
ff:de
It is also possible to query and retrieve sets of UniParc entries matching to your condition. You will need to use
UniParcQueryService to execute your queries, and the utility class UniParcQueryBuilder to build the queries.
You can query for data with a
specific gene or
protein name, EC number, keyword, etc. Moreover, you can restrict your search by reviewed (SwissProt set) or unreviewed
(TrEMBL set) entries.
// Create UniParc query service
UniParcQueryService uniParcQueryService = UniProtJAPI.factory.getUniParcQueryService();
// Example1: get entries with "P99999" as a crossreference
Query query = UniParcQueryBuilder.buildCrossReferenceQuery("P99999");
EntryIterator<UniParcEntry> entryIterator = uniParcQueryService.getEntryIterator(query);
int resultSize = entryIterator.getResultSize();
System.out.println("resultSize = " + resultSize);
for (UniParcEntry uniParcEntry : entryIterator) {
System.out.println("UniParc ID = " +
uniParcEntry.getUniParcId().getValue());
}
You can retrieve entries which have a sequence length between two values.
//Build query to select entries where the sequence length is between 50 and 100 amino acids long
Query dateQuery = UniParcQueryBuilder.buildSequenceLengthQuery(50, 100);
AccessionIterator accessions = queryService.getAccessions(reviewedQuery);
System.out.println("Updated entries = " + accessions.getResultSize());
Coming Soon It will be possible to query and retrieve sets of UniRef entries matching to your condition. Currently the only query service
availalbe using the UniRefQueryService is the Blast service.
//Access the UniRefQueryService
UniRefQueryService uniRefQueryService = UniProtJAPI.factory.getUniRefQueryService();
Coming Soon It will be possible to query and retrieve sets of UniMes entries matching to your condition. Currently the only query service
availalbe using the UniMesQueryService is the Blast service.
//Access the UniMesQueryService
UniMesQueryService uniMesQueryService = UniProtJAPI.factory.getUniMesQueryService();
The UniProt knowledge base object model has a fine grained level of detail. The detail is mainly driven from the flat file format. Some
important points to not about the object model is that the objects are mutable but will not accept null values. Similarly null values will not be returned from
object accessors instead empty objects are returned.
The UniProtKB object model is the most complex of the 3 databases. It contains all the annotation information found in the flat files. For further information see the JavaDoc
//Get the UniProtId
UniProtId uniProtId = entry.getUniProtId();
//Get the secondary accession numbers for this entry
for (UniProtAccession accesson : entry.getSecondaryUniProtAccessions()) {
System.out.println("Accession = " + accesson.getValue());
}
For further information see the JavaDoc
//Get the UniParcId
UniParcId uniParcId = entry.getUniProtId();
//Get the sequence object
Sequence sq = entry.getSequence();
String CRC64 = sq.getCRC64();
int sqLength = sq.length();
//Get the String represenation of the UniParc Sequence
String aminoAcids = sq.value();
For further information see the JavaDoc
//Get the UniRef Representative Member
UniRefRepresentativeMember repMember = entry.getRepresentativeMember();
System.out.println("Protein Name = " + repMember.getProteinName().getValue());
For further information see the JavaDoc
//Get the UniMes Peptide Id
PeptideId peptideId = entry.getPeptideId();
System.out.println("UniMes entries peptide Id "+ + peptideId.getValue());
WU-Blast2 stands for Washington University Basic Local Alignment Search Tool Version 2.0. The emphasis of this tool is to find regions of sequence similarity quickly, with minimum loss of sensitivity. This will yield functional and evolutionary clues about the structure and function of your novel sequence. Dr Warren Gish at Washington University released this first "gapped" version of BLAST allowing for gapped alignments and statistics.
The UniProtKB database can be querried using our Blast service. Below is a simple example of how to use this service. For more information
on the possible parameters please see the documentation.
//Get the UniProt Service. This is how to access the blast service
UniProtQueryService service = UniProtJAPI.factory.getUniProtQueryService();
//Create a blast input with a Database and sequence
BlastInput input = new BlastInput(DatabaseOptions.UNIPROT_ARCHAEA, sequence);
//Submitting the input to the service will return a job id
String jobid = service.submitBlast(input);
//Use this jobid to check the service to see if the job is complete
while (!(service.checkStatus(jobid) == JobStatus.DONE)) {
try {
//Sleep a bit before the next request
Thread.sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
//The blast data contains the job information and the hits with entries
BlastData<UniProtEntry> blastResult = service.getResults(jobid);
The BlastInput accepts any number of blast parameters in any order. Below are the Enum's you can use and their default values
//Default options
uk.ac.ebi.kraken.model.blast.parameters.BlastVersionOption.BLASTP
uk.ac.ebi.kraken.model.blast.parameters.SimilarityMatrixOptions.BLOSUM_62
uk.ac.ebi.kraken.model.blast.parameters.ExpectedThreshold.TEN
uk.ac.ebi.kraken.model.blast.parameters.ViewFilterOptions.NO
uk.ac.ebi.kraken.model.blast.parameters.FilterOptions.NONE
uk.ac.ebi.kraken.model.blast.parameters.MaxNumberResultsOptions.ONE_HUNDRED_FIFTY
uk.ac.ebi.kraken.model.blast.parameters.ScoreOptions.ONE_HUNDRED
uk.ac.ebi.kraken.model.blast.parameters.SensitivityValue.NORMAL
uk.ac.ebi.kraken.model.blast.parameters.SortOptions.PVALUE
uk.ac.ebi.kraken.model.blast.parameters.StatsOptions.KAP
uk.ac.ebi.kraken.model.blast.parameters.FormatOptions.DEFAULT
uk.ac.ebi.kraken.model.blast.parameters.TopcomboN.ONE
//If say you wish to select a different Stat Option and sort option it's easy
BlastInput input = new BlastInput(DatabaseOptions.UNIPROT_HUMAN, sequence, StatsOptions.POISSON, SortOptions.HIGHSCORE);
//Or perhaps you want to re-run the same input against UniRef100
input.setDatabase(DatabaseOptions.UNIREF_100);
//Make sure you run it against the correct query service, in this case the UniRefQueryService.
UniRefQueryService service = UniProtJAPI.factory.getUniRefQueryService();
 |