spacer

AltSplice Mouse Release 1 (Nov 2004)

Introduction

The data in his release is generated for 'known genes' as annotated in Ensembl 24.33.1. The extracted nucleotide region includes the gene region as defined in EnsEMBL. Such a region is extended both at the 5' and 3' ends by 3000 bases. Mouse EST and mRNA (transcript) sequences are mapped to these extended gene regions. Transcript confirmed introns and exons are delineated from these alignments. The matching transcript sequences are further classified into groups in a manner that each of these groups represents an isoform splice pattern. Each group is represented by a transcript representative structure - called transcript class.

Such isoform transcript structures are compared with one another to delineate the alternative events. Thus the presented data lists all the alternative splice events as seen in the observed transcripts for a gene. The basic events that are identified in this work are: exon isoforms (extension/truncation of an exon), cassette exons (an exon is present in one transcript but absent in an isoform of the transcript), alternating exons (exons are used in alternative transcripts in a mutually exclusive manner), and intron retention (a nucleotide region is used as an exon in a transcript while it is an intron in an alternative transcript). The latter three events (namely cassette exon, alternating exon, intron retention) are further characterised as 'complex' or 'simple' depending on whether the 5' or/and 3' flanking exons also undergo modifications (e.g. the flanking exon may be extended or truncated or the exon that flanks a retained intron is cassetted or alternated).
SNPs have been mapped to our data and we display them for features as well as for individual events.

We have implemented, in this release, integration of AEdb with AltSplice (for both human and mouse entries). Entries that are common between AltSplice and AEdb are associated and are indicated so in the dispay pages that are resultant of queries to Altsplice and/or AEdb. Queries can be raised for common entries. AltSplice exons and splice events that have experimental evidence from AEdb are indicated so. In addition, we have built a wrapper that passes on queries to both the AEdb and AltSplice.

We have further implemented SplicePatternViewer to visualise the isoform splice patterns. Annotation pertaining the expression states of the isoforms is being added to the data.

We will carry out further work on this data towards annotating through various means and towards implementing an oracle version of the database along with query tools. As soon as these take place we will update this page to reflect the new data and/or tools.

Documentation

Concise documentation of the procedure followed to produce this data and the naming conventions used is available in PDF format. The document is a work in progress and will be updated now and then to reflect now developments.

Statistics

Gene set

Start-up gene set (Ensembl 24.33.1) 20718 mouse 'known' genes
After cleanup 19586 genes
No. of genes with one or more confirmed intron/exon features 15069

Grand totals of genes, transcript sequences, transcript classes, and events

Genes 15069

EST/mRNA sequences


668051

   
Confirmed introns 146006

Confirmed exons

111099
   
Total number of transcript structures 617217
Avg. contexts per unique exon 186004 / 111099 = 1.7

Avg. contexts per unique intron

243901 / 146006 = 1.7
   
Genes with >1 splice pattern 11434
Genes with delineated events
7305
   
Total number of exon events 15629
Exon Isoform events 4428
Cassette exon events 8100
Alternating exon events 414
Intron retention events 2687
   
Intron Isoform events 9619
Total number of intron events 20820
Events per gene 3.5

Distribution statistics

Various distributions are located on the distribution statistics page (genes per classes, intron types, event types, length of retained introns and cassette exons, effective length change of exon isoforms).

Data files

Copyright Notice
This ASD database has been generated within a research programme financed by the European Community1. ExonHit Therapeutics SA has been granted commercial exploitation rights to this database under the EU funded consortium, of which it is a member, and the database is protected by EC Directive2.

ExonHit currently holds an option to an exclusive license to the contents of the Alternative Splicing Database for commercial use including without limitation, providing services, and designing products (Commercial Rights). Consequently, such Commercial Rights are not currently available to anyone else. For up to date contact information for ExonHit Therapeutics SA, please see its website: www.exonhit.com.

The database is covered by paragraphs 2-17 of the EBI's normal terms of use, to be found at www.ebi.ac.uk/Information/termsofuse

By clicking the button below, or hyperlinking to pages beyond this notice, you indicate your agreement with these terms.

1. DIRECTIVE 96/9/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 11 March 1996 on the legal protection of databases.

2. DIRECTIVE 96/9/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 11 March 1996 on the legal protection of databases.

Examples and formats of the data files.

Gene file

The gene file has a standard FASTA format with a header, and a section right after the header that lists the sequence. The header will show the Ensembl gene identifier, and various flags. The important flag is the 'ext' flag which indicates how many bases we have added up- and downstream to the listed sequence.

Example:

>ENSG00000170613 chrom: 5 strand: -1 orientation: reversed ext: 3000 map_start: 3001(local) => 161664496(ensembl) map_end: 6941(local) => 161668436(ensembl)
TGCTATATTCCTGCACCTAAAACAGGGTCTGGCACAAAGTAAACAAT
TAATTATATTAATGGAGTGAATGGATAAATTTATGCTGCTTTGCATT
TGTATGTTTGTATTCTATCTGTCCTATTAGTGTCACCAGTCTAGTCC
:
GTCTTTGAAGCAGAGGAAA

 

Transcript file

The transcript file lists all those EST's and mRNA's that were used in determining the individual introns and exons. Besides listing the transcript identifier and version, it also gives a pointer to the gene where it confirmed a feature, the description of the transcript, and the alignment as we found it.

Example:

K057543.1 [ENSG00000170613] Homo sapiens cDNA FLJ32981 fis, clone TESTI3000002, weakly similar to L.mexicana lmsap2 gene for secreted acid phosphatase 2 (SAP2). g(3004..3705)e(1..702),g(5610..6928)e(703..2021)

Reference transcript structure file

The reference transcript structure file lists the transcript structure from the Ensembl annotation that we chose to be the point of reference with regard to numbering the features and comparing the new features that we found. Each gene only has one reference transcript structure; in the file Ensembl gene ID, Ensembl Transcript ID, and the complete reference transcript structure are given. UFR and UDR features are added features by us that denote the upstream and downsteam flanking region that was added to the gene sequence.

Example:

>ENSG00000167987 ENST00000301765 UFR(1..3000 3000),e1(3001..3102 102),i1(3103..7631 4529),e2(7632..7803 172),i2(7804..8501 698),e3(8502..8584 83),i3(8585..9299 715),e4(9300..11583 2284),DFR(11584..14583 3000)

Intron file

The file lists all the introns that were determined by matching the EST/mRNA's against the genes listed in the gene file.

Example:

>ENSG00000128891 (3018..4965)
TYPE: GT-AG
ELM: UFR(3018..4964 4964)e1(1..1 257)
NUMT: 24
FSDE: cgcccctcccgatttcctccgggctacaggcgacagagctgagccaagcgtttactgggcagctgttacg
FSDI: GTAAGTGAGGAGGGGCTGGGGTGCCCAGCGTTTTGGATCTCCCACTCTGGCCCGGCCCCGGAATACCACA
FSAI: AGCCACTGTGCTCAACCTTATGCTGTATTCTTAAAGCCAGTTCTTACTCACTTGAGCTTCTGTTTTATAG
FSAE: ctcagattccaaatgaaaatgtttgagagcgctgactctacagccacaagatctggccaggatctctggg
CNTX: ~2940..3017,4966..5221,10621..10777,13866..~14025
CNTX: ~2977..3017,4966..5221,10621..10777,35691..~35902
CNTX: ~2958..3017,4966..5221,5986..~6012
CNTX: ~2974..3017,4966..5221,10621..~10815
BPPPT: PPT(-67, -57), PPT(-54, -38), BP(-50,4.17), BP(-36,3.3), PPT(-29, -17), BP(-24,4.67), BP(-20,4.67), BP(-15,3.42), PPT(-13, -2), BP(-3,3.09)
END


The first line lists the Ensembl id and the start/end of the intron. TYPE indicates the type of the intron, which can be any of the three GT-AG, GC-AG, or AT-AC. ELM shows how this feature relates to the reference feature - in the above example the intron covers part of the upstream flanking region and the first exon. NUMT is the number of transcripts that confirmed this intron. FSDE and FSAE are the up- and downstream 70 bases into the flanking exons of this intron, respectively. FSDI and FSAI are the 70 bases intronic sequence on the donor and acceptor side of the intron, respectively.
The CNTX lines show in which context (read: isoform splice pattern) this intron was observed. BPPPT indicates the branchpoint position and scores and the polypyrimidine tract positions within the intron.

Exon file

The exon file lists all exons that were confirmed by the EST/mRNA's in the transcript file.

Example:

>ENSG00000174815 (20264..20442)
TYPE: GT-AG
ELM: DFR(2581..2759 3000)
NUMT: 2
FSAI: gggattgtcctcagaaatctaggtgcagagtgggagaaagggttagcgatcatctctctgtgttctccag
FSAE: GTCCCTATGCCTCCCCCACGTTCCTCCCGACGGCTCCGAGCTGGCACTCTGGAGGCCCTGGTCAGACACC
FSDE: TGTCAGCCTTCCTGGCTACCCACCGGGCCTTCACCTCCACGCCTGCCTTGCTAGGGCTTATGGCTGACAG
FSDI: gtcagagtcataagggacgcagggtagtggagtatctgcccggatttcctaaagccgcaacatcccacca
CNTX: ~19382..19826,19925..20008,20264..20442,20576..~20627
CNTX: ~19654..20008,20264..20442,20576..~20627
END


The first line lists the Ensembl id and the start/end of the exon. The second line gives the dinucleotides from the introns at the donor (3' end of exon) and acceptor (5' end of exon) sites. FSAI lists 70 bases of intronic sequence at the acceptor site. FSAE lists 70 bases of exonic sequence at the acceptor site. FSDE lists 70 bases of exonic sequence at the donor site. FSDI lists 70 bases of intronic sequence at the donor site. CNTX lines show in which context (read: isoform splice pattern) this exon was observed.

Events file

The events file lists for every gene all the events that were observed among the transcript classes (see documentation). Every event is annotated with (i) the splice patterns of isoform transcript classes (from which the event is delineated); (ii) the different manipulations of the event as seen between all combinations of di-transcript classes; and (iii) the participating introns/exons in the event.

For each gene, the Ensembl identifier is given, followed by a listing of representatives of the transcript classes, any possible relation between the classes, and the grouped events themselves.

Example:

>ENSG00000170312
Class 1 ~3122..3191,4664..4725,9228..9384,10187..10310,12583..12753,16413..16576,16671..~16804
Class 2 ~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454
Class 3 ~9226..9384,10187..10310,12583..12753,16413..16576,16671..16812,18400..~18527
Class 4 ~9226..9384,10187..10310,16413..16576,16671..16812,18400..~18491
Class 5 ~12194..12753,16413..~16450
Class 6 ~9336..9384,10187..~10405
Classes with staggered overlap + same structure : (3 & 1), (3 & 2)
Classes with staggered overlap only : (2 & 1), (4 & 1), (4 & 2), (4 & 3)

Type : INTRON ISOFORM (II-5P)
Struct : 3094..4663 (intron) <=> 3192..4663 (intron)
Length change: -98, 0 (-98)
Occurs in: (2 & 1)

(2 & 1) :
3094..4663 (intron) <=> 3192..4663 (intron)
e2 (4664..4725) <=> e2 (4664..4725) [0, 0]
23 Confirm. EST's <=> 4 Confirm. EST's

TRPT-ISO1: ~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454
TRPT-ISO2: ~3122..3191,4664..4725,9228..9384,10187..10310,12583..12753,16413..16576,16671..~16804

Type : CASSETTE EXON (SCE)
Cassette exons: 12583..12753 [171 b]
Occurs in: (4 & 2) is Complete SCE, (4 & 1) is Complete SCE, (4 & 3) is Complete SCE

(4 & 2) (4 & 1) (4 & 3) :
10311..16412 (intron) <=> 10311..12582,12754..16412 (introns)
e1 (10187..10310) <=> e1 (10187..10310) [0, 0]
e2 (16413..16576) <=> e2'' (16413..~16454) [0, -122]
3 Confirm. EST's <=> 27 Confirm. EST's


TRPT-ISO1: ~9226..9384,10187..10310,16413..16576,16671..16812,18400..~18491
TRPT-ISO2: ~3122..3191,4664..4725,9228..9384,10187..10310,12583..12753,16413..16576,16671..~16804
TRPT-ISO2: ~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454
TRPT-ISO2: ~9226..9384,10187..10310,12583..12753,16413..16576,16671..16812,18400..~18527


Each event block consists of:

  • a basic type (e.g. EXON ISOFORM, CASSETTE EXON, etc) and a list of all detailed forms in which such a basic type exists
  • an event specific structure, e.g. location changes for an exon isoform, or a list of cassette exons.
  • for exon and intron isoforms a length change (donor/acceptor side and total)
  • between which transcript class combinations this event occurs, and exactly in which form. Exon isoforms can be part of another event (e.g. CCE complex cassette exon) as the flanking exons, part of none if they flank only an intron isoform, or part of unknown if the intron it flanks is not completely defined in the other transcript class.
  • a (grouped) list of flanking features and class combinations
  • a list of transcript isoforms, with the reference (isoform 1) pertaining to the left-hand side of the structure relationship (listed as part of the above flanking features or as part of the event specific structure in case of exon/intron isoforms).

A more detailed description of the events file and our naming conventions can be found in the documentation.

Splice pattern sequence file

This FASTA formatted file lists all the sequences of the observed splice patterns, together with the structure of the splice pattern.

Example (some sequence deleted for brevity):

>ENSG00000136114 SP:2 STRUCTURE:~6418..6558,10907..11869,30312..~31934 THSD1 (HUGO)
AGGTGTTTTTGGGGAAAAAAATCACAATCTGGACGTGAGAAAGGACATGAGGAGACTAAAG
ACCTGGGATTTTGTCAATCAGAATGAAACCAATGTTGAAAGACTTTTCAAATCTATTGTTG
.
.
TAACTATTTGTACCGTAGGACAGAATGTGAGGAGGAAGTAACACACAGAGGAGGATGTGTG
TGTATGCATGTGTTTGAATTCACAAGGAAGAAATTATTTATCTTGAGCTTTTTCCTTTGTT
ATTCAATTTCTATTGATTTATTAGTAATAACAATGATAATAAAATGTAAATGAGCAAA

 

spacer
spacer