![]() |
AltSplice Mouse Release 1 (Nov 2004)IntroductionThe data in his release is generated for 'known genes' as annotated in Ensembl 24.33.1. The extracted nucleotide region includes the gene region as defined in EnsEMBL. Such a region is extended both at the 5' and 3' ends by 3000 bases. Mouse EST and mRNA (transcript) sequences are mapped to these extended gene regions. Transcript confirmed introns and exons are delineated from these alignments. The matching transcript sequences are further classified into groups in a manner that each of these groups represents an isoform splice pattern. Each group is represented by a transcript representative structure - called transcript class. Such isoform transcript structures are compared with
one another to delineate the alternative events. Thus
the presented data lists all the alternative splice
events as seen in the observed transcripts for a gene.
The basic events that are identified in this work
are: exon isoforms (extension/truncation of an exon),
cassette exons (an exon is present in one transcript
but absent in an isoform of the transcript), alternating
exons (exons are used in alternative transcripts in
a mutually exclusive manner), and intron retention
(a nucleotide region is used as an exon in a transcript
while it is an intron in an alternative transcript).
The latter three events (namely cassette exon, alternating
exon, intron retention) are further characterised
as 'complex' or 'simple' depending on whether the
5' or/and 3' flanking exons also undergo modifications
(e.g. the flanking exon may be extended or truncated
or the exon that flanks a retained intron is cassetted
or alternated). We have implemented, in this release, integration of AEdb with AltSplice (for both human and mouse entries). Entries that are common between AltSplice and AEdb are associated and are indicated so in the dispay pages that are resultant of queries to Altsplice and/or AEdb. Queries can be raised for common entries. AltSplice exons and splice events that have experimental evidence from AEdb are indicated so. In addition, we have built a wrapper that passes on queries to both the AEdb and AltSplice. We have further implemented SplicePatternViewer to visualise the isoform splice patterns. Annotation pertaining the expression states of the isoforms is being added to the data. We will carry out further work on this data towards annotating through various means and towards implementing an oracle version of the database along with query tools. As soon as these take place we will update this page to reflect the new data and/or tools. DocumentationConcise documentation of the procedure followed to produce this data and the naming conventions used is available in PDF format. The document is a work in progress and will be updated now and then to reflect now developments. StatisticsGene set
Grand totals of genes, transcript sequences, transcript classes, and events
Distribution statistics Various distributions are located on the distribution statistics page (genes per classes, intron types, event types, length of retained introns and cassette exons, effective length change of exon isoforms). Data files
Examples and formats of the data files. Gene file The gene file has a standard FASTA format with a header, and a section right after the header that lists the sequence. The header will show the Ensembl gene identifier, and various flags. The important flag is the 'ext' flag which indicates how many bases we have added up- and downstream to the listed sequence. Example:
Transcript file The transcript file lists all those EST's and mRNA's that were used in determining the individual introns and exons. Besides listing the transcript identifier and version, it also gives a pointer to the gene where it confirmed a feature, the description of the transcript, and the alignment as we found it. Example: K057543.1 [ENSG00000170613] Homo sapiens cDNA FLJ32981 fis, clone TESTI3000002, weakly similar to L.mexicana lmsap2 gene for secreted acid phosphatase 2 (SAP2). g(3004..3705)e(1..702),g(5610..6928)e(703..2021) Reference transcript structure file The reference transcript structure file lists the transcript structure from the Ensembl annotation that we chose to be the point of reference with regard to numbering the features and comparing the new features that we found. Each gene only has one reference transcript structure; in the file Ensembl gene ID, Ensembl Transcript ID, and the complete reference transcript structure are given. UFR and UDR features are added features by us that denote the upstream and downsteam flanking region that was added to the gene sequence. Example: >ENSG00000167987 ENST00000301765 UFR(1..3000 3000),e1(3001..3102 102),i1(3103..7631 4529),e2(7632..7803 172),i2(7804..8501 698),e3(8502..8584 83),i3(8585..9299 715),e4(9300..11583 2284),DFR(11584..14583 3000) Intron file The file lists all the introns that were determined by matching the EST/mRNA's against the genes listed in the gene file. Example: >ENSG00000128891 (3018..4965) Exon file The exon file lists all exons that were confirmed by the EST/mRNA's in the transcript file. Example: >ENSG00000174815 (20264..20442) Events file The events file lists for every gene all the events that were observed among the transcript classes (see documentation). Every event is annotated with (i) the splice patterns of isoform transcript classes (from which the event is delineated); (ii) the different manipulations of the event as seen between all combinations of di-transcript classes; and (iii) the participating introns/exons in the event. For each gene, the Ensembl identifier is given, followed by a listing of representatives of the transcript classes, any possible relation between the classes, and the grouped events themselves. Example: >ENSG00000170312 Type : INTRON ISOFORM
(II-5P) (2 & 1) :
3094..4663 (intron) <=> 3192..4663 (intron) TRPT-ISO1: ~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454 Type : CASSETTE EXON
(SCE) (4 & 2) (4 &
1) (4 & 3) :
10311..16412 (intron) <=> 10311..12582,12754..16412 (introns)
A more detailed description of the events file and our naming conventions can be found in the documentation. Splice pattern sequence file This FASTA formatted file lists all the sequences of the observed splice patterns, together with the structure of the splice pattern. Example (some sequence deleted for brevity): >ENSG00000136114 SP:2 STRUCTURE:~6418..6558,10907..11869,30312..~31934 THSD1 (HUGO)
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||