Welcome to RNASeq-er API - a gateway to systematically updated analysis of public RNA-Seq data

The RNASeq-er REST API provides easy access to the results of the systematically updated and continually growing analysis of public RNA-seq data in European Nucleotide Archive (ENA). The analysis of each sequencing run is performed by the EMBL-EBI's Gene Expression Team using the iRAP pipeline.

1. Get Started
2. What does the RNASeq-er pipeline do?
3. How is the RNASeq-er performed?
4. How to use the RNASeq-er API?
5. What are the main classes of API calls?
      5.1. Analysis results Per Run
            5.1.1. Making per-run API calls
            5.1.2. Results of per-run API calls
      5.2. Analysis results Per Study
            5.2.1. Making per-study API calls
            5.2.2. Results of per-study API calls
      5.3. Sample Attributes Per Run
            5.3.1. Making sample attributes per-run API calls
            5.3.2. Results of sample attributes per-run API calls
      5.4. Baseline expression By Gene
            5.4.1. Making baseline expression per-gene API calls
            5.4.2. Results of baseline expression per-gene API calls
      5.5. Mapping Quality Statistics For All Organisms
            5.5.1. Retrieving the mean and standard deviation of mapping quality for all organisms


1. Get Started

The RNASeq-er REST API provides easy access to the results of the systematically updated and continually growing analysis of public RNA-seq data in European Nucleotide Archive (ENA). The analysis of each sequencing run is performed by the EMBL-EBI's Gene Expression Team using the iRAP pipeline. Try the following examples for a quick overview of the kind of queries that you will be able to perform by using the RNASeq-er API:

2. What does the RNASeq-er pipeline do?

The RNASeq-er REST API automatically discover new public RNA-seq runs in European Nucleotide Archive (ENA) for over 270 species on a daily basis, analyse new public RNA-seq runs with the iRAP pipeline, retrieve metadata from ArrayExpress and BioSamples and automatically annotate to Experimental Factor Ontology (EFO) the metadata using the mapping tool Zooma.

RNASeq-er REST API

 

3. How is the RNASeq-er performed?

The analysis of each sequencing run is performed using the iRAP pipeline. The main steps followed by iRAP during the RNA-seq analysis are the following ones:

iRAP pipeline

iRAP pipeline. Representation of the main steps followed by iRAP in the analysis of each sequencing run. The RNASeq-er API provides the FTP locations for CRAM, bigWig and bedGraph files per ENA run and the gene and exon quantification matrices (raw counts, FPKM, TPM) per ENA study.

Top

 

4. How to use the RNASeq-er API?

Usually, a REST API works pretty much the same way as the website does. It makes a call from a client to a server and you get data back over the HTTP protocol. In the RNASeq-er API, the API calls are made via the HTTP method GET so you will be able to retrieve data without modifying it. The result of an API call is the data matching a specified query.

The first thing you need to know is how to construct the URL. Let’s have a look at the following URL to explain how to use the RNASeq-er API. You will need to paste this URL into a web browser to see the results:

getOrganism API call

Let’s break down that URL and see how it’s made up:

Explanation of getOrganism API call

As a result of this API call you will see a list of plants in two columns:

Results of getOrganism API call

Top

 

5. What are the main classes of API calls?

The main classes of API calls for the RNASeq-er REST API are the following ones:

  1. Analysis Results Per Run (getRun...): to request the results of the alignment (CRAM, bedGraph and bigWig files) per run (getRun/SRR1042759) or for all runs in a particular ENA study (getRunByStudy/SRP033494).
    1. Analysis
  2. Analysis Results Per Study (getStudy...): to retrieve the results of the gene/exon expression quantification (raw counts, gene/exon FPKM and gene/exon TPM) for all runs in a particular ENA study.
  3. Sample Attributes Per Run (getSampleAttributes...): to retrieve all attributes and their ontology annotations for all the samples in a particular ENA study.
  4. Baseline expression Per Gene (getExpression...): to retrieve the median of expression of a gene across all runs corresponding to a given condition (such as organism part, cell type, developmental stage, sex or strain).

Main classes of API calls

Main classes of API calls. Examples of the four classes of API calls for the RNASeq-er REST API.

Top

 

5.1. Analysis Results Per Run

5.1.1. Making per-run API calls

When using per-run API calls you will need to specify the format of the data returned (tsv or JSON) and the minimum percentage of reads mapped to the reference genome (mapping quality):

Making per-run API calls

Let's try the following examples:

Example 1. Give me the location of the results of the RNA-seq alignment (CRAM, bedGraph and bigWig files) for all runs in ENA study SRP049001 in which at least 70% of the reads were successfully mapped to the reference genome as a tab-delimited format.

Example1 Making per-run API calls

Example 2. Give me the location of the results of the RNA-seq alignment (CRAM/bedGraph/bigWig) for all runs in ENA from Solanum lycopersicum whatever the mapping quality is as a tab-delimited format.

Example2 Making per-run API calls

Example 3. Give me the location of the results of the RNA-seq alignment (CRAM/bedGraph/bigWig)for all runs in ENA on samples of human lung, whatever the mapping quality is, as a tab-delimited format.

Example3 Making per-run API calls

5.1.2. Results per-run API calls

Here you have the result of the first RUN from study SRP049001 retrieved after making that particular per-run API call:

Result1 per-run API calls

Result1 per-run API calls

 

Top

5.2 Analysis Results Per Study

5.2.1. Making per-study API calls

When using per-study API calls you will need to specify just the format of the data returned (tsv or JSON). There is no need of choosing the mapping quality because we are including expression data for all runs in a given study.

Making per-study API calls

Let's try the following examples:

Example 1. Give me the location of the results of the RNA-seq analysis (gene/exon quantification as raw counts, FPKM and TPM) for all runs in ENA study SRP049001 as a tab-delimited format.

Example1 Making per-study API calls

Example 2. Give me the location of the results of the RNA-seq analysis (gene/exon quantification as raw counts, FPKM and TPM) for all studies in ENA with runs from Arabidopsis thaliana as a tab-delimited format.

Example2 Making per-study API calls

5.2.2. Results per-study API calls

If you want to retrieve the location of the results of the RNA-seq analysis (gene/exon quantification as raw counts, FPKM and TPM) for all studies in ENA with runs from Solanum tuberosum as a tab-delimited format you will need to run the following per-study API call:

Example3 per-study API calls

As a result you will see the results of the analysis of all studies in ENA for the specified organism Solanum tuberosum (potato). Here you have the result of the first study retrieved after making that particular per-study API call:

Result1 per-study API calls

Result2 per-study API calls

 

Top

5.3. Sample Attributes Per Run

5.3.1. Making sample attributes per-run API calls

When using per-study API calls you will need to specify just the format of the data returned (tsv or JSON):

Making Sample Attributes per-run API calls

Let's try the following example:

Example 1. Give me the sample attributes and their ontology annotations for all runs in ENA study SRP047482 as a tab-delimited format.

Example1 sample attributes per-run API calls

5.3.2. Results of the sample attributes per-run API calls

Here you have the sample attributes and their ontology annotations for all runs in study SRP047482 retrieved after making the corresponding sample attributes per-run API call:

Result1 sample attributes per-run API calls

 

Top

5.4. Baseline Expression Per Gene

5.4.1. Making baseline expression per-gene API calls

When using per-study API calls you will need to specify the format of the data returned (tsv or JSON) and the minimum number of runs that you want to include in the analysis:

Making baseline expression per-gene API calls

Let's try the following example:

Example 1. Give me the median expression (in TPMs) and the coefficient of variation for the human gene SFTPC for all the conditions studied in at least 25 sequencing runs each, as a tab-delimited format.

Example1 baseline expression per-gene API calls

5.4.2. Results of baseline expression per-gene API calls

Here you have the median expression (TPM) and the coeficient of variation for the human gene SFTPC for all the combined conditions studied in at least 25 sequencing runs each, sorted by high median expression first:

Result1 sample attributes per-run API calls

Result1 sample attributes per-run API calls

As a result of this kind of call, you will also see a column called 'ALL_SAMPLE_ATTRIBUTES' that returns the API URL that displays all sample attributes and ontology annotations for all runs studying the condition reported. For example: http://www.ebi.ac.uk/fg/rnaseq/api/tsv/getSampleAttributesByCondition/1517

 

5.5. Mapping Quality Statistics For All Organisms

5.5.1. Retrieving the mean and standard deviation of mapping quality for all organisms

The API call: http://www.ebi.ac.uk/fg/rnaseq/api/tsv/getOrganismsMappingQuality returns the mean and the standard deviation of mapping quality for each organism available in the API.

Top