spacer

AltSplice Human Release 2 (April 2005)

Introduction

The data in this release is generated for annotated genes from Ensembl release2_27.35a.1. The extracted nucleotide region includes the gene region as defined in Ensembl. Such a region is extended both at the 5' and 3' ends by 3000 bases. Human EST and mRNA (transcript) sequences are mapped to these extended gene regions. Transcript confirmed introns and exons are delineated from these alignments. The matching transcript sequences are further classified into groups in a manner that each of these groups represents an isoform splice pattern. Each group is represented by a transcript representative structure - called splice pattern. Isoform peptide sequences as expressed by the splice patterns have been delineated and are presented as part of the database.

Such isoform splice patterns are compared with one another to delineate the alternative events. Thus the presented data lists all the alternative splice events as seen in the observed transcripts for a gene. The basic events that are identified in this work are: exon isoforms (extension/truncation of an exon), cassette exons (an exon is present in one transcript but absent in an isoform of the transcript), alternating exons (exons are used in alternative transcripts in a mutually exclusive manner), and intron retention (a nucleotide region is used as an exon in a transcript while it is an intron in an alternative transcript). The latter three events (namely cassette exon, alternating exon, intron retention) are further characterised as 'complex' or 'simple' depending on whether the 5' or/and 3' flanking exons also undergo modifications (e.g. the flanking exon may be extended or truncated or the exon that flanks a retained intron is cassetted or alternated).

Introns/exons are annotated for splice signals such as strength donor/acceptor sites, branch points, and polypyrimidine tracts. Conserved exons/introns/events in the orthologous genes from human and mouse have been identified and are annotated in the database. SNP positions and alleles used have been mapped to our data and we display them for isoform splice patterns as well as for individual events. Annotation pertaining the expression states of the isoforms is being added to the data. Subtractive library expression queries can now be raised from the interfaces. We will carry out further work on this data towards annotating through various other features.

We have implemented, in this release, integration of AEdb with AltSplice (for both human and mouse entries). Entries that are common between AltSplice and AEdb are associated and are indicated so in the display pages that are resultant of queries to Altsplice and/or AEdb. Queries can be raised for common entries. AltSplice exons and splice events that have experimental evidence from AEdb are indicated so. In addition, we have built a wrapper that passes on queries to both the AEdb and AltSplice.

We have further implemented a SplicePatternViewer to visualise the isoform splice patterns. AltSplice data can also be seen from the geneview and contigview pages of Ensembl.

Documentation

Concise documentation of the procedure followed to produce this data and the naming conventions used is available in PDF format. The document is a work in progress and will be updated now and then to reflect now developments.

Statistics

Gene set

Start-up gene set (Ensembl 19.34b2) 22216 human genes
After cleanup 21796 genes
No. of genes with one or more confirmed intron/exon features 16293

Grand totals of genes, transcript sequences, transcript classes, and events

Genes 16293
EST/mRNA sequences
915500
   
Confirmed introns 184731
Confirmed exons 137195
   
Total number of transcript structures 898295
Avg. contexts per unique exon 289646 / 137195 = 2.1
Avg. contexts per unique intron 366019 / 184731 = 2.0
   
Genes with >1 splice pattern 13572
Genes with delineated events
9945
   
Total number of exon events 33338
Exon Isoform events 7575
Cassette exon events 18815
Alternating exon events 1678
Intron retention events 5270
   
Intron Isoform events 13874
Total number of intron events 39637
Events per gene 4.0

Distribution statistics

Various distributions are located on the distribution statistics page (genes per classes, intron types, event types, length of retained introns and cassette exons, effective length change of exon isoforms).

Data files

Copyright Notice
This ASD database has been generated within a research programme financed by the European Community1. ExonHit Therapeutics SA has been granted commercial exploitation rights to this database under the EU funded consortium, of which it is a member, and the database is protected by EC Directive2.

ExonHit currently holds an option to an exclusive license to the contents of the Alternative Splicing Database for commercial use including without limitation, providing services, and designing products (Commercial Rights). Consequently, such Commercial Rights are not currently available to anyone else. For up to date contact information for ExonHit Therapeutics SA, please see its website: www.exonhit.com.

The database is covered by paragraphs 2-17 of the EBI's normal terms of use, to be found at www.ebi.ac.uk/Information/termsofuse.html

By clicking the button below, or hyperlinking to pages beyond this notice, you indicate your agreement with these terms.

1. DIRECTIVE 96/9/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 11 March 1996 on the legal protection of databases.

2. DIRECTIVE 96/9/EC OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 11 March 1996 on the legal protection of databases.
I Agree

Examples and formats of the data files.

Gene file

The gene file has a standard FASTA format with a header, and a section right after the header that lists the sequence. The header will show the Ensembl gene identifier, and various flags. The important flag is the 'ext' flag which indicates how many bases we have added up- and downstream to the listed sequence.

Example:

>ENSG00000170613 chrom: 5 strand: -1 orientation: reversed ext: 3000 map_start: 3001(local) => 161664496(ensembl) map_end: 6941(local) => 161668436(ensembl)
TGCTATATTCCTGCACCTAAAACAGGGTCTGGCACAAAGTAAACAAT
TAATTATATTAATGGAGTGAATGGATAAATTTATGCTGCTTTGCATT
TGTATGTTTGTATTCTATCTGTCCTATTAGTGTCACCAGTCTAGTCC
:
GTCTTTGAAGCAGAGGAAA


Reference transcript structure file

The reference transcript structure file lists the transcript structure from the Ensembl annotation that we chose to be the point of reference with regard to numbering the features and comparing the new features that we found. Each gene only has one reference transcript structure; in the file Ensembl gene ID, Ensembl Transcript ID, and the complete reference transcript structure are given. UFR and UDR features are added features by us that denote the upstream and downsteam flanking region that was added to the gene sequence.

Example:

>ENSG00000167987 ENST00000301765 UFR(1..3000 3000),e1(3001..3102 102),i1(3103..7631 4529),e2(7632..7803 172),i2(7804..8501 698),e3(8502..8584 83),i3(8585..9299 715),e4(9300..11583 2284),DFR(11584..14583 3000)

Transcript file

The transcript file lists all those EST's and mRNA's that were used in determining the individual introns and exons. Besides listing the transcript identifier and version, it also gives a pointer to the gene where it confirmed a feature, the description of the transcript, and the alignment as we found it.

Example:

K057543.1 [ENSG00000170613] Homo sapiens cDNA FLJ32981 fis, clone TESTI3000002, weakly similar to L.mexicana lmsap2 gene for secreted acid phosphatase 2 (SAP2). g(3004..3705)e(1..702),g(5610..6928)e(703..2021)

Intron file

The file lists all the introns that were determined by matching the EST/mRNA's against the genes listed in the gene file.

Example:

>ENSG00000128891 (3018..4965)
TYPE: GT-AG
ELM: UFR(3018..4964 4964)e1(1..1 257)
NUMT: 24
FSDE: cgcccctcccgatttcctccgggctacaggcgacagagctgagccaagcgtttactgggcagctgttacg
FSDI: GTAAGTGAGGAGGGGCTGGGGTGCCCAGCGTTTTGGATCTCCCACTCTGGCCCGGCCCCGGAATACCACA
FSAI: AGCCACTGTGCTCAACCTTATGCTGTATTCTTAAAGCCAGTTCTTACTCACTTGAGCTTCTGTTTTATAG
FSAE: ctcagattccaaatgaaaatgtttgagagcgctgactctacagccacaagatctggccaggatctctggg
CNTX: ~2940..3017,4966..5221,10621..10777,13866..~14025
CNTX: ~2977..3017,4966..5221,10621..10777,35691..~35902
CNTX: ~2958..3017,4966..5221,5986..~6012
CNTX: ~2974..3017,4966..5221,10621..~10815
BPPPT: PPT(-67, -57), PPT(-54, -38), BP(-50,4.17), BP(-36,3.3), PPT(-29, -17), BP(-24,4.67), BP(-20,4.67), BP(-15,3.42), PPT(-13, -2), BP(-3,3.09)
END


The first line lists the Ensembl id and the start/end of the intron. TYPE indicates the type of the intron, which can be any of the three GT-AG, GC-AG, or AT-AC. ELM shows how this feature relates to the reference feature - in the above example the intron covers part of the upstream flanking region and the first exon. NUMT is the number of transcripts that confirmed this intron. FSDE and FSAE are the up- and downstream 70 bases into the flanking exons of this intron, respectively. FSDI and FSAI are the 70 bases intronic sequence on the donor and acceptor side of the intron, respectively.
The CNTX lines show in which context (read: isoform splice pattern) this intron was observed. BPPPT indicates the branchpoint position and scores and the polypyrimidine tract positions within the intron.

Exon file

The exon file lists all exons that were confirmed by the EST/mRNA's in the transcript file.

Example:

>ENSG00000174815 (20264..20442)
TYPE: GT-AG
ELM: DFR(2581..2759 3000)
NUMT: 2
FSAI: gggattgtcctcagaaatctaggtgcagagtgggagaaagggttagcgatcatctctctgtgttctccag
FSAE: GTCCCTATGCCTCCCCCACGTTCCTCCCGACGGCTCCGAGCTGGCACTCTGGAGGCCCTGGTCAGACACC
FSDE: TGTCAGCCTTCCTGGCTACCCACCGGGCCTTCACCTCCACGCCTGCCTTGCTAGGGCTTATGGCTGACAG
FSDI: gtcagagtcataagggacgcagggtagtggagtatctgcccggatttcctaaagccgcaacatcccacca
CNTX: ~19382..19826,19925..20008,20264..20442,20576..~20627
CNTX: ~19654..20008,20264..20442,20576..~20627
END


The first line lists the Ensembl id and the start/end of the exon. The second line gives the dinucleotides from the introns at the donor (3' end of exon) and acceptor (5' end of exon) sites. FSAI lists 70 bases of intronic sequence at the acceptor site. FSAE lists 70 bases of exonic sequence at the acceptor site. FSDE lists 70 bases of exonic sequence at the donor site. FSDI lists 70 bases of intronic sequence at the donor site. CNTX lines show in which context (read: isoform splice pattern) this exon was observed.


Splice pattern file

The different EST/mRNAs sequences that map to a gene are grouped into classes. The longest EST/mRNA sequence in each class is chosen as a representative. A class is  composed of all the EST/mRNAs confirming the same splice pattern. The region of overlapping between classes may contain different introns. If this happens, the respective classes represent alternative splice patterns. Classes that do not overlap with one another  represent different regions of the gene and do not represent alternative splice  patterns. Every EST/mRNA identifier is followed by the identifiers of the classes to which the sequence belong. It is possible, for non-representative EST/mRNAs, to belong to more than one class.

Example:
>ENSG00000137274
CLASS 1
X81372-1    ~3081..3120,3558..3785,7930..8033,11515..11681,13318..13471,21635..21766,24659..24782,36761..~37301
BG121617-1,2    ~21651..21766,24659..24782,36761..~37286
BM455725-1,2    ~21705..21766,24659..24782,36761..~37206
BU933348-1,2,3    ~24663..24782,36761..~37155
BU594882-1,2,3    ~24715..24782,36761..~37181
BG714668-1    ~2649..3120,3558..~3776
CLASS 2
AJ617684-2    ~3001..3120,7930..8033,11515..11681,13318..13471,21635..21766,24659..24782,36761..~37301
CLASS 3
AL832502-3    ~7928..8033,11515..11681,13318..13471,21635..21766,22314..24782,36761..~37301
CLASS 4
BU902205-4    ~11217..11681,13318..13471,21635..~21695
END

Splice pattern sequence file

This FASTA formatted file lists all the sequences of the observed splice patterns, together with the structure of the splice pattern.

Example (some sequence deleted for brevity):

>ENSG00000136114 SP:2 STRUCTURE:~6418..6558,10907..11869,30312..~31934 THSD1 (HUGO)
AGGTGTTTTTGGGGAAAAAAATCACAATCTGGACGTGAGAAAGGACATGAGGAGACTAAAG
ACCTGGGATTTTGTCAATCAGAATGAAACCAATGTTGAAAGACTTTTCAAATCTATTGTTG
.
.
TAACTATTTGTACCGTAGGACAGAATGTGAGGAGGAAGTAACACACAGAGGAGGATGTGTG
TGTATGCATGTGTTTGAATTCACAAGGAAGAAATTATTTATCTTGAGCTTTTTCCTTTGTT
ATTCAATTTCTATTGATTTATTAGTAATAACAATGATAATAAAATGTAAATGAGCAAA

Peptide sequence file

This FASTA formatted file lists all the peptide sequences that could be derived from the observed splice patterns, together with indication of the splice pattern number (e.g. SP2, see events file) and a gene symbol if known.

Example (some sequence deleted for brevity):

>ALTS_HUM_PP:ENSG00000004399_SP2_17861 PLXND1
MKELLVDLIDASAAKNPKLMLRRTESVVEKMLTNWMSICMYSCLRETVGEPFFLLLCAIKQQINKGSIDAITGKARYTLS
EEWLLRENIEAKPRNLNVSFQGCGMDSLSVRAMDTDTLTQVKEKILEAFCKNVPYSQWPRAEDVDLEWFASSTQSYILRD
LDDTSVVEDGRKKLNTLAHYKIPEGASLAMSLIDKKDNTLGRVKDLDTEKYFHLVLPTDELAEPKKSHRQSHRKKVLPEI
YLTRLLSTKGTLQKFLDDLFKAILSIREDKPPLAVKYFFDFLEEQAEKRGISDPDTLHIWKTNRWRPSSPVLGEHPEEPP
VCL
>ALTS_HUM_PP:ENSG00000004469_SP3_23085 KCNK4
MRSTTLLALLALVLLYLVSGALVFRALEQPHEQQAQRELGEVREKFLRAHPCVSDQELGLLIKEVADALGGGADPETNST
SNSSHSAWDLGSAFFFSGTIITTIGYGNVALRTDAGRLFCIFYALVGIPLFGILLAGVGDRLGSSLRHGIGHIEAIFLKW
HVPPELVRVLSAMLFLLIGCLLFVLTPTFVFCYMEDWSKLEAIYFVIVTLTTVGFGDYVAGADPRQDSPAYQPLVWFWIL
LGLAYFASVLTTIGNWLRVVSRRTRAEMGGLTAQAASWTGTVTARVTQRAGPAAPPPEKEQPLLPPPPCPAQPLGRPRSP
SPPEKAQPPSPPTASALDYPSENLAFIDESSDTQSERGCPLPRAPRGRRRPNPPRKPVRPRGPGRPRDKGVPV

Events file

The events file lists for every gene all the events that were observed among the transcript classes (see documentation). Every event is annotated with (i) the splice patterns of isoform transcript classes (from which the event is delineated); (ii) the different manipulations of the event as seen between all combinations of di-transcript classes; and (iii) the participating introns/exons in the event.

For each gene, the Ensembl identifier is given, followed by a listing of representatives of the transcript classes, any possible relation between the classes, and the grouped events themselves.

Example:

>ENSG00000170312
Class 1 ~3122..3191,4664..4725,9228..9384,10187..10310,12583..12753,16413..16576,16671..~16804
Class 2 ~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454
Class 3 ~9226..9384,10187..10310,12583..12753,16413..16576,16671..16812,18400..~18527
Class 4 ~9226..9384,10187..10310,16413..16576,16671..16812,18400..~18491
Class 5 ~12194..12753,16413..~16450
Class 6 ~9336..9384,10187..~10405
Classes with staggered overlap + same structure : (3 & 1), (3 & 2)
Classes with staggered overlap only : (2 & 1), (4 & 1), (4 & 2), (4 & 3)

Type : INTRON ISOFORM (II-5P)
Struct : 3094..4663 (intron) <=> 3192..4663 (intron)
Length change: -98, 0 (-98)
Occurs in: (2 & 1)

(2 & 1) :

3094..4663 (intron) <=> 3192..4663 (intron)
e2 (4664..4725) <=> e2 (4664..4725) [0, 0]
23 Confirm. EST's <=> 4 Confirm. EST's

TRPT-ISO1: ~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454
TRPT-ISO2: ~3122..3191,4664..4725,9228..9384,10187..10310,12583..12753,16413..16576,16671..~16804

Type : CASSETTE EXON (SCE)
Cassette exons: 12583..12753 [171 b]
Occurs in: (4 & 2) is Complete SCE, (4 & 1) is Complete SCE, (4 & 3) is Complete SCE

(4 & 2) (4 & 1) (4 & 3) :

10311..16412 (intron) <=> 10311..12582,12754..16412 (introns)
e1 (10187..10310) <=> e1 (10187..10310) [0, 0]
e2 (16413..16576) <=> e2'' (16413..~16454) [0, -122]
3 Confirm. EST's <=> 27 Confirm. EST's


TRPT-ISO1: ~9226..9384,10187..10310,16413..16576,16671..16812,18400..~18491
TRPT-ISO2: ~3122..3191,4664..4725,9228..9384,10187..10310,12583..12753,16413..16576,16671..~16804
TRPT-ISO2: ~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454
TRPT-ISO2: ~9226..9384,10187..10310,12583..12753,16413..16576,16671..16812,18400..~18527

Each event block consists of:

  • a basic type (e.g. EXON ISOFORM, CASSETTE EXON, etc) and a list of all detailed forms in which such a basic type exists
  • an event specific structure, e.g. location changes for an exon isoform, or a list of cassette exons.
  • for exon and intron isoforms a length change (donor/acceptor side and total)
  • between which transcript class combinations this event occurs, and exactly in which form. Exon isoforms can be part of another event (e.g. CCE complex cassette exon) as the flanking exons, part of none if they flank only an intron isoform, or part of unknown if the intron it flanks is not completely defined in the other transcript class.
  • a (grouped) list of flanking features and class combinations
  • a list of transcript isoforms, with the reference (isoform 1) pertaining to the left-hand side of the structure relationship (listed as part of the above flanking features or as part of the event specific structure in case of exon/intron isoforms).

A more detailed description of the events file and our naming conventions can be found in the documentation.

spacer
spacer