Contents


Gene Expression Atlas APIs

Introduction

REST APIs

Note: As of Expression Atlas release 2.0.21, the REST API interface as described below is deprecated. It will be replaced by a new interface when the new Expression Atlas goes live in Q4 2013.

We urge all users of our current API to contact us via mailto:atlas-feedback@ebi.ac.uk so that we are able to consider their requirements when designing the new REST API.

We would like to apologise for any inconvenience the transition to the new REST API causes.

Atlas REST API provides all the results available in the main web application in a pragmatic, easy to use form - simple HTTP GET queries as input and either JSON or XML formats as output.

All REST API URLs now have the following general form:

http://www.ebi.ac.uk/gxa/api/deprecated?<query>[&format=<json|xml>][&indent]
  • format parameter specifies output format and can be either xml or json for XML and JSON formats respectively.
  • indent option allows you to get a pretty-formatted source useful for debugging purposes, so that results can be viewed in text editor. DO NOT use this option in production as it increases bandwidth use and significantly increases response and processing time.

JSON output format also supports JSONP-style callbacks, which should be specified with argument callback=callbackName. In this case resulting JSON output wil be wrapped into a callback function call, ready to be evaluated bythe requesting client's <script> tag. For example:

http://www.ebi.ac.uk/gxa/api/deprecated?<query>&format=json&callback=processResult

will render

processResult({ ...query result data... })

This could be useful to create cross-site AJAX mashups using Atlas data. Please refer to this HTML/Javascript code example:

        function processData(data) {
            if(data.error) {
                alert(data.error);
                return;
            }

            alert('Experiment accession: ' + data.experimentInfo.accession + ', description: ' + data.experimentInfo.description);
        }

        function queryExperiment(experimentId) {
            var head = document.getElementsByTagName('head');
            var script = document.createElement('script');
            script.type = "text/javascript";
            script.src = atlasHomeUrl + "/api/deprecated?experiment="+escape(experimentId)+"&format=json&callback=processData";
            head[0].appendChild(script);
        }
        ...
        <input type="button" value="Show experiment" onclick="queryExperiment('E-AFMX-1')">

Gene queries

The gene query interface resembles the original "structured query" interface available through the web and allows to combine gene properties queries with conditions. Query parameters are passed through a set of HTTP GET URL parameter name=value pairs and should be URL-escaped as usual (%-based encoding, like %20%30%40).

Gene filter parameters have the format of gene<Property>Is=... or gene<Property>IsNot=..., where <Property> should be one of:

  • Name - Gene name
  • Goterm - Gene Ontology Term
  • Interproterm - InterPro Term
  • Disease - Gene-Disease Assocation
  • Keyword - Gene Keyword
  • Protein - Protein
  • Dbxref - Other Database Cross-Refs
  • Embl - EMBL-Bank ID
  • Ensfamily - Ensembl Family
  • Ensgene - Ensembl Gene ID
  • Ensprotein - Ensembl Protein ID
  • Enstranscript - Ensembl Transcript ID
  • Goid - Gene Ontology ID
  • Image - IMAGE ID
  • Interproid - InterPro ID
  • Locuslink - Entrez Gene ID
  • Omimid - OMIM ID
  • Orf - ORF
  • Refseq - RefSeq ID
  • Unigene - UniGene ID
  • Uniprot - UniProt Accession
  • Hmdb - HMDB ID
  • Chebi - ChEBI ID
  • Cas - CAS
  • Uniprotmetenz - Uniprotmetenz
  • Gene - Gene Name or Identifier
  • Synonym - Gene Synonym

All fields are searched if <Property> is omitted.

Examples:

http://www.ebi.ac.uk/gxa/api/deprecated?geneGotermIs=spindle+pole&geneDisease=heart+disease
http://www.ebi.ac.uk/gxa/api/deprecated?geneIs=ASPM
http://www.ebi.ac.uk/gxa/api/deprecated?geneIs=ENSG00000012048

Condition queries

Condition queries can be combined with gene queries to filter result genes according to their expression statistics. Condition parameters have the format <any|none|up|down|updown>In<Factor>=...

where expressions:

  • 'any' above refers to any expression out of: UP, DOWN or Non-differential expression
  • 'none' above refers to Non-differential expression only

and <Factor> (optional, if omitted, all condition properties are searched) is one of:

or, more specifically:

  • Age - Age
  • Biometric - Biometric measurement
  • Cell_line - Cell line
  • Cell_type - Cell type
  • Clinical_history - Clinical history
  • Clinical_information - Clinical info
  • Clinical_treatment - Clinical treatment
  • Compound - Compound treatment
  • Developmental_stage - Developmental stage
  • Disease_location - Disease location
  • Disease_staging - Disease staging
  • Disease - Disease
  • Dose - Dose
  • Ecotype - Ecotype
  • Environmental_history - Environmental history
  • Family_history - Family history
  • Genetic_modification - Gene modification
  • Genotype - Genotype
  • Growth_condition - Growth condition
  • Histology - Histology
  • Individual_genetic_characteristics - Genotype
  • Individual - Individual
  • Infect - Infection
  • Initial_time_point - Initial time
  • Injury - Injury
  • Light - Light
  • MaterialType - Material type
  • Media - Media
  • Observation - Observation
  • Organism - Organism
  • Organism_part - Organism part
  • Organism_status - Organism status
  • Performer - Performer
  • Phenotype - Phenotype
  • Population - Population
  • Protocol_type - Protocol type
  • Qcdescrtype - QC description
  • Replicate - Replicate
  • RNAi - RNAi
  • Sex - Sex
  • Source_provider - Source provider
  • Strain - Strain
  • T_cell_type - Target cell type
  • Temperature - Temperature
  • Test - Test
  • Test_result - Test result
  • Test_type - Test type
  • Time - Time
  • Tumor_grading - Tumor grading
  • Vehicle - Vehicle
  • Generation - Generation

All factors are searched if <Factor> is omitted.

If you want to get genes, which are up or down only in one specified value of factor and alternatively expressed in all other values of the same factor, it is possible to specify:

  • upOnlyIn<factor>=<factor value>
  • downOnlyIn<factor>=<factor value>

This feature DOES NOT work for EFO and "any" factor, existing factor must be specified.

Note: up or down only in factor functionality has been temporarily removed and will be re-instated in a future Atlas release.

It is also possible to limit results only for those genes which are up/down regulated in more than specified number of experiments in specified condition:

  • up<N>In<factor>=<factor value>
  • down<N>In<factor>=<factor value>

Examples:

http://www.ebi.ac.uk/gxa/api/deprecated?upIn=liver
http://www.ebi.ac.uk/gxa/api/deprecated?updownInOrganism_part=heart
http://www.ebi.ac.uk/gxa/api/deprecated?up3In=heart
http://www.ebi.ac.uk/gxa/api/deprecated?down5InSex=male
http://www.ebi.ac.uk/gxa/api/deprecated?downInOrganism_part=kidney&upInSex=male
http://www.ebi.ac.uk/gxa/api/deprecated?geneIs=p53&downInOrganism_part=kidney&upInSex=male
http://www.ebi.ac.uk/gxa/api/deprecated?geneIs=aspm&noneInOrganism_part=heart&upInSex=male
http://www.ebi.ac.uk/gxa/api/deprecated?geneIs=aspm&anyInOrganism_part=heart&upInSex=male

Organisms

User can limit output genes to specified organisms using one or more species parameters. Examples:

http://www.ebi.ac.uk/gxa/api/deprecated?upIn=liver&species=Mus+Musculus
http://www.ebi.ac.uk/gxa/api/deprecated?upIn=liver+heart&species=Mus+Musculus&species=Homo+Sapiens

Note: the above query is interpreted in Atlas as 'up either in liver or in heart' and genes in species 'either Mus Musculus or Homo Sapiens'. Atlas stores genes at species rather than sequence level, hence a query for genes in species 'both Mus Musculus and Homo Sapiens' would yield no results in Atlas.

As of now supported organisms are following:

  • Arabidopsis thaliana
  • Bacillus subtilis
  • Caenorhabditis elegans
  • Danio rerio
  • Drosophila melanogaster
  • Homo sapiens
  • Mus musculus
  • Rattus norvegicus
  • Saccharomyces cerevisiae
  • Schizosaccharomyces pombe

Limiting number of result genes

There are two parameters to control result output:

  • start specifies starting number of gene in results, default value is 0.
  • rows limits number of genes to output (starting from start), the default value is 50 and the maximum is 200.

Using these two parameters it is possible to get gene results in "pages". To calculate exact amount of pages, one could use results.totalResults value from the output (see below), which is total amount of genes found by the query.

Examples:

http://www.ebi.ac.uk/gxa/api/deprecated?upIn=liver&start=200&rows=50

Output

The query output structure is self-explanatory, and it can be tested using a plain web browser just pointing it to an API URL and adding an &indent argument. Here is an example of the result set rendered by the gene query http://www.ebi.ac.uk/gxa/api/deprecated?geneIs=ENSG00000159216&format=json&indent'':

{
    "results" : [
        {
            "gene" : {
                "id" : "ENSG00000159216",
                "name" : "RUNX1",
                "orthologs" : [
                    "B0414.2",
                    "ENSMUSG00000022952",
                    "ENSRNOG00000001704"
                ],
                "ensemblGeneId" : "ENSG00000159216",
                "goTerms" : [
                    "ATP binding",
                    "chloride ion binding",
                    "nucleus",
                    ...
                ],
                "interProIds" : [
                    "IPR000040",
                    "IPR012346",
                    ...
                ],
                "interProTerms" : [
                    "Runx inhibition",
                    "Transcription factor, Runt-related, RUNX",
                    ...
                ],
                "keyword" : [
                    "3D-structure",
                    "Alternative splicing",
                    ...
                ],
                "diseases" : [
                    "Leukemia, acute myeloid",
                    "Platelet disorder, familial, with associated myeloid malignancy",
                    "Rheumatoid arthritis, susceptibility to"
                ],
                "uniprotIds" : [
                    "Q01196"
                ],
                "synonyms" : [
                    "AML1",
                    "CBFA2",
                    "RUNX1"
                ],
                "goIds" : [
                    "GO:0003700",
                    ...
                ],
                "emblIds" : [
                    "AP000330",
                    ...
                ],
                "ensemblProteinIds" : [
                    "ENSP00000300305",
                    ...
                ],
                "omimiIds" : [
                    "151385",
                    "601399"
                ],
                "refseqIds" : [
                    "NM_001001890",
                    "NM_001122607",
                    "NM_001754"
                ],
                "unigeneIds" : [
                    "Hs.149261",
                    ...
                ]
            },
            "expressions" : [
                {
                    "ef" : "RNAi",
                    "efv" : "SIM2s",
                    "experiments" : [
                        {
                            "pvalue" : 2.44739812717842E-6,
                            "expression" : "UP",
                            "accession" : "E-MEXP-101"
                        }
                    ],
                    "upExperiments" : 1,
                    "downExperiments" : 0,
                    "upPvalue" : 2.4473981738992734E-6,
                    "downPvalue" : 0.0
                },
                {
                    "efoTerm" : "T cell",
                    "efoId" : "EFO_0000208",
                    "experiments" : [
                        {
                            "pvalue" : 0.0209469933341716,
                            "expression" : "DOWN",
                            "accession" : "E-AFMX-5"
                        },
                        {
                            "pvalue" : 0.0261270675699093,
                            "expression" : "DOWN",
                            "accession" : "E-GEOD-6053"
                        }
                    ],
                    "upExperiments" : 0,
                    "downExperiments" : 2,
                    "upPvalue" : 0.0,
                    "downPvalue" : 0.020946992561221123
                },
                ....
            ]
        }
    ],
    "totalResults" : 1,
    "numberOfResults" : 1,
    "startingFrom" : 0
}

The XML output has similar structure and is meant to be processed with either XSLT or software modules, similar to Perl or Ruby's XML::Simple and similar, transforming it to a tree of objects.

<?xml version="1.0" encoding="utf-8"?>
<atlasResponse>
    <results>
        <result>
            <gene>
                <id>ENSG00000159216</id>
                <name>RUNX1</name>
                <orthologs>
                    <ortholog>B0414.2</ortholog>
                    <ortholog>ENSMUSG00000022952</ortholog>
                    <ortholog>ENSRNOG00000001704</ortholog>
                </orthologs>
                <ensemblGeneId>ENSG00000159216</ensemblGeneId>
                <goTerms>
                    <goTerm>ATP binding</goTerm>
                    <goTerm>chloride ion binding</goTerm>
                    ...
                </goTerms>
                <interProIds>
                    <interProId>IPR000040</interProId>
                    <interProId>IPR012346</interProId>
                    ...
                </interProIds>
                <interProTerms>
                    <interProTerm>Runx inhibition</interProTerm>
                    <interProTerm>Transcription factor, Runt-related, RUNX</interProTerm>
                    ...
                </interProTerms>
                <keywords>
                    <keyword>3D-structure</keyword>
                    <keyword>Alternative splicing</keyword>
                    ...
                </keywords>
                <diseases>
                    <disease>Leukemia, acute myeloid</disease>
                    <disease>Platelet disorder, familial, with associated myeloid malignancy</disease>
                    <disease>Rheumatoid arthritis, susceptibility to</disease>
                </diseases>
                <uniprotIds>
                    <uniprotId>Q01196</uniprotId>
                </uniprotIds>
                <synonyms>
                    <synonym>AML1</synonym>
                    <synonym>CBFA2</synonym>
                    <synonym>RUNX1</synonym>
                </synonyms>
                <goIds>
                    <goId>GO:0003700</goId>
                    <goId>GO:0005515</goId>
                    ...
                </goIds>
                <emblIds>
                    <emblId>AP000330</emblId>
                    <emblId>BC136381</emblId>
                    ...
                </emblIds>
                <ensemblProteinIds>
                    <ensemblProteinId>ENSP00000300305</ensemblProteinId>
                    ...
                </ensemblProteinIds>
                <omimiIds>
                    <omimiId>151385</omimiId>
                    <omimiId>601399</omimiId>
                </omimiIds>
                <refseqIds>
                    <refseqId>NM_001001890</refseqId>
                    <refseqId>NM_001122607</refseqId>
                    <refseqId>NM_001754</refseqId>
                </refseqIds>
                <unigeneIds>
                    <unigeneId>Hs.149261</unigeneId>
                    ...
                </unigeneIds>
            </gene>
            <expressions>
                <expression>
                    <ef>RNAi</ef>
                    <efv>SIM2s</efv>
                    <experiments>
                        <experiment>
                            <pvalue>2.44739812717842E-6</pvalue>
                            <expression>UP</expression>
                            <accession>E-MEXP-101</accession>
                        </experiment>
                    </experiments>
                    <upExperiments>1</upExperiments>
                    <downExperiments>0</downExperiments>
                    <upPvalue>2.4473981738992734E-6</upPvalue>
                    <downPvalue>0.0</downPvalue>
                </expression>
                <expression>
                    <efoTerm>GLI56</efoTerm>
                    <efoId>EFO_0001104</efoId>
                    <experiments>
                        <experiment>
                            <pvalue>0.00340485089589589</pvalue>
                            <expression>UP</expression>
                            <accession>E-GEOD-4717</accession>
                        </experiment>
                    </experiments>
                    <upExperiments>1</upExperiments>
                    <downExperiments>0</downExperiments>
                    <upPvalue>0.0034048508387058973</upPvalue>
                    <downPvalue>0.0</downPvalue>
                </expression>
                ...
            </expressions>
        </result>
    </results>
    <totalResults>1</totalResults>
    <numberOfResults>1</numberOfResults>
    <startingFrom>0</startingFrom>
</atlasResponse>

Experiments query

To search for experiments one can use a set of conditions similar to genes structured query. All conditions are joined with AND operator.

  • experiment=<keyword1 keyword2 ...> conditions matches experiments, containing one of space separated keywords in accession, type or description
  • experimentHasFactor=<factor1 factor2 ...> condition matches experiments which have one of space separated factors as experimental factor
  • experimentHas<Factor>=<value1 value2 ...> condition matches experiments which have one of space separated values as experimental factor's <Factor> value for any of the assays
  • experimentHasAnyFactor=<value1 value2 ...> condition matches experiments which have one of space separated values as any experimental factor's value for any of the assays

A special keyword value experiment=listAll can be used to list all available experiments.

The query may be accompanied with one or many geneIs=<GeneIdentifier> arguments to get expression details of specified genes in matching experiments. <GeneIdentifier> can be magic value topN, where N is the number of top N highly expressed genes in each experiment.

Similar to gene query, there are two parameters to control result output:

  • start specifies starting number of experiments in results, default value is 0.
  • rows limits number of experiments to output (starting from start), as for gene queries: the default value is 50 and the maximum is 200.

There are also two additional parameters to control which subset of the experiment's genes from a list sorted by best expression statistics first should be output:

  • offset specifies starting position of the gene subset in results, default value is 0.
  • limit limits number of genes to output (starting from offset), the default and maximum value is 10.
  • experimentInfoOnly limits amount of output to just experimentInfo section (see below in output example), this is useful if you need to retrieve a list of experiment accessions and titles only.
  • experimentExpressionsOnly outputs just gene expression values and the experiment design; note that the default experiment output includes just geneExpressionStatistics section (see example output below).

Using these two parameters it is possible to get experiment results in "pages". To calculate exact amount of pages, one could use results.totalResults value from the output, which is total amount of experiments found by the query.

Examples:

http://www.ebi.ac.uk/gxa/api/deprecated?experiment=E-MTAB-1066
http://www.ebi.ac.uk/gxa/api/deprecated?experiment=cell
http://www.ebi.ac.uk/gxa/api/deprecated?experimentHasOrganism_part=lung
http://www.ebi.ac.uk/gxa/api/deprecated?experimentHasDisease_state=normal&experiment=cancer&start=10&rows=1  kil
http://www.ebi.ac.uk/gxa/api/deprecated?experimentHasFactor=cell_type&experiment=cycle
http://www.ebi.ac.uk/gxa/api/deprecated?experiment=E-MTAB-1066&geneIs=FBgn0037534&geneIs=FBgn0004431&format=xml
http://www.ebi.ac.uk/gxa/api/deprecated?experiment=listAll&experimentInfoOnly
http://www.ebi.ac.uk/gxa/api/deprecated?experiment=E-MTAB-1066&offset=10&limit=8&format=xml
http://www.ebi.ac.uk/gxa/api/deprecated?experiment=E-MTAB-1066&offset=10&limit=10&experimentExpressionsOnly&format=xml

Output is the list of experiments' details. The geneExpressions section contains actual microarray expression values for all design elements for the specified gene(s). assayIds array contains a series of assay IDs in the same order in which expression values will be in the corresponding designElement arrays.

geneExpressionStatistics section contains Atlas expression analytics results for design elements where it is available.

<?xml version="1.0" encoding="utf-8"?>
<atlasResponse>
 <results>
  <result>
    <geneExpressions>
        <arrayDesign accession="A-AFFY-33">
            <assayIds>0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... </assayIds>
            <genes>
                <gene id="ENSG00000160766">
                    <designElement id="17599">19.6 62.3 132.8 101.2 373.6 47.2 114.6 50.8 65.9 21.4 26.4 24.4 ... </designElement>
                </gene>
            </genes>
        </arrayDesign>
        <arrayDesign accession="A-AFFY-40">
            <assayIds>158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 ... </assayIds>
            <genes/>
        </arrayDesign>
    </geneExpressions>
    <experimentInfo>
        <accession>E-AFMX-5</accession>
        <description>Transcription profiling of human cell lines and tissues (GNF/Novartis)</description>
        <types>
            <type>organism_part_comparison_design</type>
        </types>
    </experimentInfo>
    <experimentOrganisms>
        <organism>Homo sapiens</organism>
    </experimentOrganisms>
    <experimentDesign>
        <experimentalFactors>
            <experimentalFactor>disease_state</experimentalFactor>
            <experimentalFactor>cell_type</experimentalFactor>
            <experimentalFactor>organism_part</experimentalFactor>
        </experimentalFactors>
        <sampleCharacteristics>
            <sampleCharacteristic>sex</sampleCharacteristic>
            <sampleCharacteristic>age</sampleCharacteristic>
            <sampleCharacteristic>observation</sampleCharacteristic>
            <sampleCharacteristic>disease_state</sampleCharacteristic>
            <sampleCharacteristic>clinical_history</sampleCharacteristic>
            <sampleCharacteristic>developmental_stage</sampleCharacteristic>
            <sampleCharacteristic>t_cell_type</sampleCharacteristic>
            <sampleCharacteristic>organism_part</sampleCharacteristic>
        </sampleCharacteristics>
        <arrayDesigns>
            <arrayDesign>
                <accession>A-AFFY-33</accession>
            </arrayDesign>
            <arrayDesign>
                <accession>A-AFFY-40</accession>
            </arrayDesign>
        </arrayDesigns>
        <samples>
            <sample>
                <id>0</id>
                <sampleCharacteristics>
                    <sex/>
                    <age/>
                    <disease_state/>
                    <observation/>
                    <clinical_history/>
                    <developmental_stage>adult</developmental_stage>
                    <t_cell_type/>
                    <organism_part>thymus</organism_part>
                </sampleCharacteristics>
                <relatedAssays>
                    <assayId>26</assayId>
                </relatedAssays>
            </sample>
            <sample>
                <id>1</id>
                <sampleCharacteristics>
                    <sex/>
                    <age/>
                    <disease__state/>
                    <observation/>
                    <clinical_history/>
                    <developmental_stage>adult</developmental_stage>
                    <t_cell_type/>
                    <organism_part>thymus</organism_part>
                </sampleCharacteristics>
                <relatedAssays>
                    <assayId>42</assayId>
                </relatedAssays>
            </sample>
            ...
        </samples>
        <assays>
            <assay>
                <id>0</id>
                <factorValues>
                    <disease_state/>
                    <cell_type/>
                    <organism_part>lung</organism_part>
                </factorValues>
                <arrayDesign>A-AFFY-33</arrayDesign>
                <relatedSamples>
                    <sampleId>112</sampleId>
                </relatedSamples>
            </assay>
            <assay>
                <id>1</id>
                <factorValues>
                    <disease_state/>
                    <cell_type/>
                    <organism_part>heart</organism_part>
                </factorValues>
                <arrayDesign>A-AFFY-33</arrayDesign>
                <relatedSamples>
                    <sampleId>62</sampleId>
                </relatedSamples>
            </assay>
            ...
        </assays>
    </experimentDesign>
    <geneExpressionStatistics>
        <arrayDesign accession="A-AFFY-33">
            <genes>
                <gene id="ENSG00000160766">
                    <designElement id="17599">
                        <expression>
                            <ef>organism_part</ef>
                            <efv>appendix</efv>
                            <stat>
                                <expression>UP</expression>
                                <pvalue>0.00892422770197655</pvalue>
                                <tstat>3.42359698327164</tstat>
                            </stat>
                        </expression>
                        ...
                    </designElement>
                </gene>
            </genes>
        </arrayDesign>
        <arrayDesign accession="A-AFFY-40">
            <genes/>
        </arrayDesign>
    </geneExpressionStatistics>
  </result>
  ...
 </results>
 <totalResults>20</totalResults>
 <numberOfResults>10</numberOfResults>
 <startingFrom>0</startingFrom>
</atlasResponse>

Error output

If an error occurred in the processing of the query, the output will contain an error message.

For example, for JSON:

{
    "error" : "Empty query specified"
}

For XML:

<?xml version="1.0" encoding="utf-8"?>
<atlasResponse>
    <error>Empty query specified</error>
</atlasResponse>

API links

On every Atlas result page you can find a "REST API" link, showing REST API URLs corresponding to results being displayed. So, it is easy to integrate interactive results into your program for further processing.

Atlas Resources