spacer

Using The UniProtJAPI

UniProt Knowledgebase Release [2012_01]


Initialization of services

Access to all the UniProtJAPI services is provided through an enumeration called UniProtJAPI it has an instance factory which provides the remote data access services.

To retrieve an entries use EntryRetrievalService.

    //Create entry retrival service
    EntryRetrievalService entryRetrievalService = UniProtJAPI.factory.getEntryRetrievalService();
	

It is also possible to retrieve a set of entries matching to a query. For UniProtKB you will use UniProtQueryService.

    // Create UniProt query service
    UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();
	

For UniParc you will use UniParcQueryService.

    // Create UniParc query service
    UniParcQueryService uniParcQueryService = UniProtJAPI.factory.getUniParcQueryService();
	

Determine version of UniProt

It is possible to find out what version of UniProt you are using.

    //Use the factory to print out the version.
    System.out.println("UniProt Version = " + UniProtJAPI.factory.getVersion());

UniProtJAPI jars will not change unless a major format change occurs. This normally only happens during a major point release.


Fetching a single entry

It is easy to retrieve a single UniProt knowledgebase entry

    //Create entry retrival service
    EntryRetrievalService entryRetrievalService = UniProtJAPI.factory.getEntryRetrievalService();

    //Retrieve UniProt entry by its accession number
    UniProtEntry entry = (UniProtEntry) entryRetrievalService.getUniProtEntry("Q96AZ6");

    //If entry with a given accession number is not found, entry will be equal null
    if (entry != null) {
      System.out.println("entry = " + entry.getUniProtId().getValue());
    }

    //Retrieve UniRef entry by its ID
    UniRefEntry uniRefEntry = entryRetrievalService.getUniRefEntry("UniRef90_Q12979-2");

    if (uniRefEntry != null) {
      System.out.println("Representative Member Organism = " +
       uniRefEntry.getRepresentativeMember().getSourceOrganism().getValue());
    }
              
    

Querying UniProtKB data

It is possible to query and retrieve sets of entries matching to your condition. You will need to use UniProtQueryService to execute your queries, and the utility class UniProtQueryBuilder to build the queries.

Field based queries

You can query for data with a specific gene or protein name, EC number, keyword, etc. Moreover, you can restrict your search by reviewed (SwissProt set) or unreviewed (TrEMBL set) entries. You can find a full listing of possible index field here

    // Create UniProt query service
    UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();

    // Example1: get entries with protein name "ClpC"
    Query query = UniProtQueryBuilder.buildProteinNameQuery("ClpC");

    EntryIterator<UniProtEntry> entryIterator = uniProtQueryService.getEntryIterator(query);

    int resultSize = entryIterator.getResultSize();
    System.out.println("resultSize = " + resultSize);

    for (UniProtEntry uniProtEntry : entryIterator) {

      System.out.println("Primary Acc = " +
       uniProtEntry.getPrimaryUniProtAccession().getValue());

    }
    // Example2: get accession numbers of SwissProt entries with protein EC number "EC 3.1.6.-"
    Query query2 = UniProtQueryBuilder.buildECNumberQuery("EC 3.1.6.-");
    Query queryReviewed = UniProtQueryBuilder.setReviewedEntries(query2);

    AccessionIterator accs = uniProtQueryService.getAccessions(queryReviewed);

    System.out.println("uniProtAccessions count = " + accs.getResultSize());


	

Entry history queries

You can retrieve entries which were created or updated between given dates.


    //Build query to select entries which were updated during the certain period
    Date startDate = DateFormat.getDateInstance(DateFormat.DEFAULT).parse("09-Jan-2007");
    Date endDate = DateFormat.getDateInstance(DateFormat.DEFAULT).parse("23-Jan-2007");

    Query dateQuery = UniProtQueryBuilder.buildUpdatedQuery(startDate, endDate);

    //Query for SwissProt data set

    Query reviewedQuery = UniProtQueryBuilder.setReviewedEntries(dateQuery);

    AccessionIterator accessions = queryService.getAccessions(reviewedQuery);
    System.out.println("Updated entries = " + accessions.getResultSize());


Getting entries for a list of accession numbers

It is also possible to supply a list of protein identifiers.


    // Create UniProt query service
    UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();


    //Create a list of accession numbers (both primary and seconday are acceptable)
    List<String> accList = new ArrayList<String>();
    accList.add("O60243");
    accList.add("Q8IZP7");
    accList.add("P02070");
    //Isoform IDs are acceptable as well 
    accList.add("Q4R572-1");
    //as well as entry IDs 
    accList.add("14310_ARATH");

    Query query = UniProtQueryBuilder.buildIDListQuery(accList);


    EntryIterator<UniProtEntry> entries = uniProtQueryService.getEntryIterator(query);
    for (UniProtEntry entry : entries) {
      System.out.println("entry.getUniProtId() = " + entry.getUniProtId());
    }

Set operations on entry iterators

If you work with sets of entries we have now added a convience method to allow you to generate different EntryIterators and then apply set opperations to them to create new ones.


    // build a human interator
    EntryIterator<UniProtEntry> humanIterator = uniProtQueryService.getEntryIterator(UniProtQueryBuilder.buildOrganismQuery("human"));
    //.... do something
    System.out.println(humanIterator.getResultSize());


    // Build a kinase iterator
    EntryIterator<UniProtEntry> kinaseIterator = uniProtQueryService.getEntryIterator(UniProtQueryBuilder.buildProteinNameQuery("kinase"));
    //.... do something
    System.out.println(kinaseIterator.getResultSize());


    // now I want to intersect these two sets
    EntryIterator<UniProtEntry> intersection = uniProtQueryService.getEntryIterator(humanIterator,SetOperation.And,kinaseIterator);
    //.... do something
    System.out.println(intersection.getResultSize());



Fetching a Attributes of Entries

In addition to being able to retrieve the entire UniProt object (UniProtKB, UniParc or UniRef) you are also able to simply retrieve a given attribute of that object. The attribute is described by an expression over the UniProt object model. Currently the UniProtJAPI supports 2 standard expression languages and an additional non standard flat file expression format.

The two support expression lanuages are:

  • OGNL
  • JEXL
UniProtJAPI supports two methods for accessing these attributes from the objects, either via a single entry or through an AttributeIterator over a query.

Accessing a attribute for a single entry

    //Create entry retrival service
    EntryRetrievalService entryRetrievalService = UniProtJAPI.factory.getEntryRetrievalService();

    //Retrieve list of keyword objects from a UniProt entry by its accession number
    Object attribute = UniProtJAPI.factory.getEntryRetrievalService().getUniProtAttribute("P99999" ,  "ognl:keywords");

    // Cast the object to UniProt Keywords
    Final List<Keyword> keywords  = (List<Keyword>)attribute;
            

Accessing attributes for a set of UniProt entries

    // Create UniProt query service
    UniProtQueryService uniProtQueryService = UniProtJAPI.factory.getUniProtQueryService();

    // Example1: get entries with protein name "ClpC"
    Query query = UniProtQueryBuilder.buildProteinNameQuery("ClpC");
    //Retrieve the attribute iterator
    AttributeIterator<UniProtEntry> it = uniProtQueryService.getAttributes(query, "ognl:keywords");

    // Iterator over all the objects
    for (Attribute attribute : it) {
        // The attribute is a key value pair, the key is the primary accession
        String accession = attribute. getAccession();
        // Cast the object to UniProt Keywords which is the value portion of the key-value pair.
        Final List<Keyword> keywords  = (List<Keyword>) attribute.getValue();
    }
        

Expression Languages

OGNL

OGNL stands for Object Graph Navigation Language. It is an expression and binding language for getting and setting properties of Java objects. Normally the same expression is used for both getting and setting the value of a property.

    # List of keywords via the getKeywords method
    ognl:keywords  
    # List of secondary UniProt Accessions
    ognl:secondaryUniProtAccessions   
    #List of Entry Citations
    ognl:citations        
    # List of all Journal Aritcles � Use the @ to access static methods and enumarations.
    ognl:getCitations(@uk.ac.ebi.kraken.interfaces.uniprot.citations.CitationTypeEnum@JOURNAL_ARTICLE)
    # List of all comments
    Ognl:comments
    # List values (Strings) for CoFactor comments. Use the { to apply the expression to the contents of the list.
    ognl:getComments(@uk.ac.ebi.kraken.interfaces.uniprot.comments.CommentType@COFACTOR).{value}
    # List of String values representing Go Id�s from all the entries Go terms.
    ognl:goTerms.{goId.value}
    # The entry organism.
    ognl:organism
    # The scientific name of the entry
    ognl:organism.scientificName
    # The sequence as a string, this is common across UniProtKB and UniParc entries
    ognl:sequence.value
    # Accession chain features
    ognl:features.{? #this instanceof uk.ac.ebi.kraken.interfaces.uniprot.features.ChainFeature}
    # Getting the number of database crossreferences.
    ognl:databaseCrossReferences.size
    # Conditions are also allowed if the entry is a Swissprot entry return the description otherwise return the genes
    ognl:type==@uk.ac.ebi.kraken.interfaces.uniprot.UniProtEntryType@SWISSPROT?description:genes
        

For full language reference and further documentation on OGNL see http://www.ognl.org/2.6.9/Documentation/html/LanguageGuide/apa.html

JEXL

Java Expression Language (JEXL) is an expression language engine.

    # Number of keywords.
    jexl:keywords.size()
    #  Get the list of ProteinNames
    jexl:description.getProteinNames()
    

For full lanuage reference and further documention on JEXL see http://jakarta.apache.org/commons/jexl/

Flat File Expression

A simple expression language to access the flat file line types

    # Return the ID line
    ff:id
    # Return the DE line
    ff:de
    

Querying UniParc data

It is also possible to query and retrieve sets of UniParc entries matching to your condition. You will need to use UniParcQueryService to execute your queries, and the utility class UniParcQueryBuilder to build the queries.

Field based queries

You can query for data with a specific gene or protein name, EC number, keyword, etc. Moreover, you can restrict your search by reviewed (SwissProt set) or unreviewed (TrEMBL set) entries.

    // Create UniParc query service
    UniParcQueryService uniParcQueryService = UniProtJAPI.factory.getUniParcQueryService();

    // Example1: get entries with "P99999" as a crossreference
    Query query = UniParcQueryBuilder.buildCrossReferenceQuery("P99999");

    EntryIterator<UniParcEntry> entryIterator =  uniParcQueryService.getEntryIterator(query);

    int resultSize = entryIterator.getResultSize();
    System.out.println("resultSize = " + resultSize);

    for (UniParcEntry uniParcEntry : entryIterator) {

      System.out.println("UniParc ID = " +
       uniParcEntry.getUniParcId().getValue());

    }
	

Entry history queries

You can retrieve entries which have a sequence length between two values.


    //Build query to select entries where the sequence length is between 50 and 100 amino acids long    
    Query dateQuery = UniParcQueryBuilder.buildSequenceLengthQuery(50, 100);

    AccessionIterator accessions = queryService.getAccessions(reviewedQuery);
    System.out.println("Updated entries = " + accessions.getResultSize());



Querying UniRef data

Coming Soon It will be possible to query and retrieve sets of UniRef entries matching to your condition. Currently the only query service availalbe using the UniRefQueryService is the Blast service.

    //Access the UniRefQueryService
    UniRefQueryService uniRefQueryService = UniProtJAPI.factory.getUniRefQueryService();
    

Querying UniMes data

Coming Soon It will be possible to query and retrieve sets of UniMes entries matching to your condition. Currently the only query service availalbe using the UniMesQueryService is the Blast service.

    //Access the UniMesQueryService
    UniMesQueryService uniMesQueryService = UniProtJAPI.factory.getUniMesQueryService();
    

UniProt Knowledgebase Object Model

The UniProt knowledge base object model has a fine grained level of detail. The detail is mainly driven from the flat file format. Some important points to not about the object model is that the objects are mutable but will not accept null values. Similarly null values will not be returned from object accessors instead empty objects are returned.


UniProtKB

The UniProtKB object model is the most complex of the 3 databases. It contains all the annotation information found in the flat files. For further information see the JavaDoc


    //Get the UniProtId
    UniProtId uniProtId = entry.getUniProtId();

    //Get the secondary accession numbers for this entry
    for (UniProtAccession accesson : entry.getSecondaryUniProtAccessions()) {
      System.out.println("Accession = " + accesson.getValue());    
    }


UniParc

For further information see the JavaDoc


    //Get the UniParcId
    UniParcId uniParcId = entry.getUniProtId();

    //Get the sequence object
    Sequence sq = entry.getSequence();
    String CRC64 = sq.getCRC64();
    int sqLength = sq.length();
    //Get the String represenation of the UniParc Sequence
    String aminoAcids = sq.value();


UniRef

For further information see the JavaDoc


    //Get the UniRef Representative Member
    UniRefRepresentativeMember repMember = entry.getRepresentativeMember();
    System.out.println("Protein Name = " + repMember.getProteinName().getValue());

UniMes

For further information see the JavaDoc


    //Get the UniMes Peptide Id
    PeptideId peptideId = entry.getPeptideId();
    System.out.println("UniMes entries peptide Id "+  + peptideId.getValue());


Blast against UniProt Knowledgebase

WU-Blast2 stands for Washington University Basic Local Alignment Search Tool Version 2.0. The emphasis of this tool is to find regions of sequence similarity quickly, with minimum loss of sensitivity. This will yield functional and evolutionary clues about the structure and function of your novel sequence. Dr Warren Gish at Washington University released this first "gapped" version of BLAST allowing for gapped alignments and statistics.

The UniProtKB database can be querried using our Blast service. Below is a simple example of how to use this service. For more information on the possible parameters please see the documentation.


    //Get the UniProt Service. This is how to access the blast service
    UniProtQueryService service = UniProtJAPI.factory.getUniProtQueryService();
    //Create a blast input with a Database and sequence
    BlastInput input = new BlastInput(DatabaseOptions.UNIPROT_ARCHAEA, sequence);
    //Submitting the input to the service will return a job id
    String jobid = service.submitBlast(input);
    //Use this jobid to check the service to see if the job is complete
    while (!(service.checkStatus(jobid) == JobStatus.DONE)) {
    try {
      //Sleep a bit before the next request
          Thread.sleep(5000);
    } catch (InterruptedException e) {
          e.printStackTrace();
    }
    //The blast data contains the job information and the hits with entries
    BlastData<UniProtEntry> blastResult = service.getResults(jobid);

Blast Parameters

The BlastInput accepts any number of blast parameters in any order. Below are the Enum's you can use and their default values


    //Default options
    uk.ac.ebi.kraken.model.blast.parameters.BlastVersionOption.BLASTP
    uk.ac.ebi.kraken.model.blast.parameters.SimilarityMatrixOptions.BLOSUM_62
    uk.ac.ebi.kraken.model.blast.parameters.ExpectedThreshold.TEN
    uk.ac.ebi.kraken.model.blast.parameters.ViewFilterOptions.NO
    uk.ac.ebi.kraken.model.blast.parameters.FilterOptions.NONE
    uk.ac.ebi.kraken.model.blast.parameters.MaxNumberResultsOptions.ONE_HUNDRED_FIFTY
    uk.ac.ebi.kraken.model.blast.parameters.ScoreOptions.ONE_HUNDRED
    uk.ac.ebi.kraken.model.blast.parameters.SensitivityValue.NORMAL
    uk.ac.ebi.kraken.model.blast.parameters.SortOptions.PVALUE
    uk.ac.ebi.kraken.model.blast.parameters.StatsOptions.KAP
    uk.ac.ebi.kraken.model.blast.parameters.FormatOptions.DEFAULT
    uk.ac.ebi.kraken.model.blast.parameters.TopcomboN.ONE



    //If say you wish to select a different Stat Option and sort option it's easy
    BlastInput input = new BlastInput(DatabaseOptions.UNIPROT_HUMAN, sequence, StatsOptions.POISSON, SortOptions.HIGHSCORE);
    //Or perhaps you want to re-run the same input against UniRef100
    input.setDatabase(DatabaseOptions.UNIREF_100);
    //Make sure you run it against the correct query service, in this case the UniRefQueryService.
    UniRefQueryService service = UniProtJAPI.factory.getUniRefQueryService();


spacer
spacer