![]() |
AltSplice - Mouse Release 3 (February 2006)IntroductionThe data in this release is generated for annotated genes from Ensembl 37.34e. The extracted nucleotide region includes the gene region as defined in Ensembl. Such a region is extended both at the 5' and 3' ends by 10000 bases. Mouse EST and mRNA (transcript) sequences are mapped to these extended gene regions. Transcript confirmed introns and exons are delineated from these alignments. The matching transcript sequences are further classified into groups in a manner that each of these groups represents an isoform splice pattern. Each group is represented by a transcript representative structure - called splice pattern. Isoform peptide sequences as expressed by the splice patterns have been delineated and are presented as part of the database. Such isoform splice patterns are compared
with one another to delineate the alternative events. Thus the
presented data lists all the alternative splice events as seen in the
observed transcripts for a gene. The basic events that are identified
in this work are: exon isoforms (extension/truncation of an exon),
cassette exons (an exon is present in one transcript but absent in an
isoform of the transcript), alternating exons (exons are used in
alternative transcripts in a mutually exclusive manner), and intron
retention (a nucleotide region is used as an exon in a transcript while
it is an intron in an alternative transcript). The latter three events
(namely cassette exon, alternating exon, intron retention) are further
characterised as 'complex' or 'simple' depending on whether the 5'
or/and 3' flanking exons also undergo modifications (e.g. the flanking
exon may be extended or truncated or the exon that flanks a retained
intron is cassetted or alternated). Introns/exons are annotated for splice signals such as strenth donor/acceptor sites, branch points, and polypyrimidine tracts. Conserved exons/introns/events in the orthologous genes from human and mouse have been identified and are annotated in the database. SNP positions and alleles used have been mapped to our data and we display them for isoform splice patterns as well as for individual events. We will carry out further work on this data towards annotating through various other features. We have implemented, in this release, integration of AEdb with AltSplice (for both human and mouse entries). Entries that are common between AltSplice and AEdb are associated and are indicated so in the dispay pages that are resultant of queries to Altsplice and/or AEdb. Queries can be raised for common entries. AltSplice exons and splice events that have experimental evidence from AEdb are indicated so. In addition, we have built a wrapper that passes on queries to both the AEdb and AltSplice. We have further implemented a SplicePatternViewer to visualise the isoform splice patterns. AltSplice data can also be seen from the geneview and contigview pages of Ensembl. DocumentationConcise documentation of the procedure followed to produce this data and the naming conventions used is available in PDF format. The document is a work in progress and will be updated now and then to reflect now developments. Statistics
Distribution statisticsVarious distributions are located on the distribution statistics page (genes per classes, intron types, event types, length of retained introns and cassette exons, effective length change of exon isoforms).
Data filesIn this release the following data files are available:
Examples and formats of the data files.Gene file The gene file has a standard FASTA format with a header, and a section right after the header that lists the sequence. The header will show the Ensembl gene identifier, and various flags. The important flag is the 'ext' flag which indicates how many bases we have added up- and downstream to the listed sequence. Example: >ENSMUSG00000053600
chrom: 17 strand: 1 orientation: same ext: 3000 map_start:
3001(local)=>31583594(ensembl) map_e The reference transcript structure file lists the transcript structure from the Ensembl annotation that we chose to be the point of reference with regard to numbering the features and comparing the new features that we found. Each gene only has one reference transcript structure; in the file Ensembl gene ID, Ensembl Transcript ID, and the complete reference transcript structure are given. UFR and UDR features are added features by us that denote the upstream and downsteam flanking region that was added to the gene sequence. Example: >ENSMUSG00000053600
ENSMUST00000039132
UFR(1..3000 3000),e1(3001..3088 88),i1(3089..13073 9985),e2(13074.. Transcript file The transcript file lists all those EST's and mRNA's that were used in determining the individual introns and exons. Besides listing the transcript identifier and version, it also gives a pointer to the gene where it confirmed a feature, the description of the transcript, and the alignment as we found it. Example: AY195875.1 [ENSMUSG00000053600] Mus musculus KRAB box containing zinc finger protein mRNA, complete cds. g(13073..13200)e(34..161),g(13387..13444)e(162..219),g(14319..16157)e(220..2057) Intron file The file lists all the introns that were determined by matching the EST/mRNA's against the genes listed in the gene file. Example: >ENSMUSG00000053600
(3089..13073) Exon file The exon file lists all exons that were confirmed by the EST/mRNA's in the transcript file. Example: >ENSMUSG00000053600
(13074..13200) Splice pattern file The different EST/mRNAs sequences that map
to a gene are grouped into classes. The longest
EST/mRNA sequence in each class is chosen as a representative. A class
is composed of all
the EST/mRNAs confirming the same splice pattern. The region
of overlapping between
classes may contain different introns. If this happens, the
respective classes represent
alternative splice patterns. Classes that do not overlap with one
another represent
different regions of the gene and do not represent alternative
splice patterns. Every EST/mRNA identifier is followed by the
identifiers of the classes to which the
sequence belong. It is possible, for non-representative EST/mRNAs, to
belong to more than one class. CLASS 1 BC084731-1 ~2728..3498,30090..30277,34505..34552,38231..38342,40594..40731,46757..46953,48105..48185,60862..60970,70799..70919,76030..76120,85067..~86178 AK173082-1 ~3319..3498,30090..30277,34505..34552,38231..38342,40594..40731,46757..46953,48105..48185,60862..60970,70799..70919,76030..76120,85067..~86175 BC022158-1 ~30179..30277,34505..34552,38231..38342,40594..40731,46757..46953,48105..48185,60862..60970,70799..70919,76030..76120,85067..~86161 BE375165-1 ~30171..30277,34505..34552,38231..38342,40594..40731,46757..~46928 BI414565-1 ~34507..34552,38231..38342,40594..40731,46757..46953,48105..~48186 BU057653-1 ~48093..48185,60862..60970,70799..70919,76030..76120,85067..~85129 BQ887651-1 ~48093..48185,60862..60970,70799..70919,76030..76120,85067..~85383 BI331480-1 ~30180..30277,34505..34552,38231..38342,40594..~40731 CK626709-1,2 ~60876..60970,70799..70919,76030..76120,85067..~85380 AA798337-1,2,4 ~76035..76120,85067..~85294 AA823099-1,2,4 ~76068..76120,85067..~85445 CLASS 2 BC039629-2 ~56602..56676,60862..60970,70799..70919,76030..76120,85067..~86174 BI101709-2 ~56603..56676,60862..60970,70799..70919,76030..76120,85067..~85501 BY095625-2 ~56577..56676,60862..60970,70799..70919,76030..~76078 AI304194-2 ~56620..56676,60862..60970,70799..70919,76030..~76122 CLASS 3 BB652948-3 ~38178..38342,40594..40731,46757..46953,48105..~48186 CLASS 4 CA495201-4 ~56547..56676,60862..60970,76030..76120,85067..~85518 Splice pattern sequence file This FASTA formatted file lists all the sequences of the observed splice patterns, together with the structure of the splice pattern. Example (some sequence deleted for brevity): >ENSMUSG00000053600
SP:1 STRUCTURE:~2973..3088,13074..13200,13387..13444,14319..~14352 KRIM1 Peptide sequence file This FASTA formatted file lists all the peptide sequences that could be derived from the observed splice patterns, together with indication of the splice pattern number (e.g. SP2; see events file) and a gene symbol if known. Example (some sequence deleted for brevity): >ALTS_MUS_PP:ENSMUSG00000000001_SP1_5928 GNAI3MGCTLSAEDKAAVERSKMIDRNLREDGEKAAKEVKLLLLGAGESGKSTIVKQMKIIHEDGYSEDECKQYKVVVYSNTIQSIIAIIRAMGRLKIDFGESARADDARQLFVL AGSAEEGVMTSELAGVIKRLWRDGGVQACFSRSREYQLNDSASYYLNDLDRISQTNYIPTQQDVLRTRVKTTGIVETHFTFKELYFKMFDVGGQRSERKKWIHCFEGVTA IIFCVALSDYDLVLAEDEEMNRMHESMKLFDSICNNKWFTDTSIILFLNKKDLFEEKIKRSPLTICYPEYTGSNTYEEAAAYIQCQFEDLNRRKDTKEVYTHFTCATDTK NVQFVFDAVTDVIIKNNLKECGLY Events file The events file lists for every gene all the events that were observed among the transcript classes (see documentation). Every event is annotated with (i) the splice patterns of isoform transcript classes (from which the event is delineated); (ii) the different manipulations of the event as seen between all combinations of di-transcript classes; and (iii) the participating introns/exons in the event. For each gene, the Ensembl identifier is given, followed by a listing of representatives of the transcript classes, any possible relation between the classes, and the grouped events themselves. Example: >ENSMUSG00000053600Class 1 ~2973..3088,13074..13200,13387..13444,14319..~14352 Class 2 ~3004..3088,8773..8885,13074..13200,13387..~13442 Type : CASSETTE EXON (SCE) Cassette exons: 8773..8885 [113 b] Occurs in: (2 & 1) is Complete SCE (2 & 1) SCE: 3089..8772,8886..13073 (introns) <=> 3089..13073 (intron) [~-82] e1 (~3004..3088) <=> e1' (~2973..3088) [31, 0] e3 (13074..13200) <=> e3 (13074..13200) [0, 0] 1 Confirm. EST's <=> 18 Confirm. EST's TRPT-ISO1: ~3004..3088,8773..8885,13074..13200,13387..~13442 TRPT-ISO2: ~2973..3088,13074..13200,13387..13444,14319..~14352 Each event block consists of:
A more detailed description of the events file and our naming conventions can be found in the documentation. Events file The events file lists for every gene all the events that were observed among the transcript classes (see documentation). Every event is annotated with (i) the splice patterns of isoform transcript classes (from which the event is delineated); (ii) the different manipulations of the event as seen between all combinations of di-transcript classes; and (iii) the participating introns/exons in the event. For each gene, the Ensembl identifier is given, followed by a listing of representatives of the transcript classes, any possible relation between the classes, and the grouped events themselves. Example: >ENSG00000170312 Type :
INTRON ISOFORM (II-5P) (2
& 1) :
3094..4663 (intron) <=> 3192..4663 (intron) TRPT-ISO1:
~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454 Type :
CASSETTE EXON (SCE) (4
& 2) (4 & 1) (4 & 3) :
10311..16412 (intron) <=> 10311..12582,12754..16412 (introns)
A more detailed description of the events file and our naming conventions can be found in the documentation. ![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||