![]() |
AltSplice Human Release 3 (December 2005)IntroductionThe data in this release is generated for annotated genes from Ensembl 36.35i. The extracted nucleotide region includes the gene region as defined in Ensembl. Such a region is extended both at the 5' and 3' ends by 3000 bases. Human EST and mRNA (transcript) sequences are mapped to these extended gene regions. Transcript confirmed introns and exons are delineated from these alignments. The matching transcript sequences are further classified into groups in a manner that each of these groups represents an isoform splice pattern. Each group is represented by a transcript representative structure - called splice pattern. Isoform peptide sequences as expressed by the splice patterns have been delineated and are presented as part of the database. Such isoform splice patterns are compared
with one another to delineate the alternative events. Thus the
presented data lists all the alternative splice events as seen in the
observed transcripts for a gene. The basic events that are identified
in this work are: exon isoforms (extension/truncation of an exon),
cassette exons (an exon is present in one transcript but absent in an
isoform of the transcript), alternating exons (exons are used in
alternative transcripts in a mutually exclusive manner), and intron
retention (a nucleotide region is used as an exon in a transcript while
it is an intron in an alternative transcript). The latter three events
(namely cassette exon, alternating exon, intron retention) are further
characterised as 'complex' or 'simple' depending on whether the 5'
or/and 3' flanking exons also undergo modifications (e.g. the flanking
exon may be extended or truncated or the exon that flanks a retained
intron is cassetted or alternated). Introns/exons are annotated for splice signals such as strength donor/acceptor sites, branch points, and polypyrimidine tracts. Conserved exons/introns/events in the orthologous genes from human and mouse have been identified and are annotated in the database. SNP positions and alleles used have been mapped to our data and we display them for isoform splice patterns as well as for individual events. Annotation pertaining the expression states of the isoforms is being added to the data. Subtractive library expression queries can now be raised from the interfaces. We will carry out further work on this data towards annotating through various other features. We have implemented, in this release, integration of AEdb with AltSplice (for both human and mouse entries). Entries that are common between AltSplice and AEdb are associated and are indicated so in the display pages that are resultant of queries to Altsplice and/or AEdb. Queries can be raised for common entries. AltSplice exons and splice events that have experimental evidence from AEdb are indicated so. In addition, we have built a wrapper that passes on queries to both the AEdb and AltSplice. We have further implemented a SplicePatternViewer to visualise the isoform splice patterns. AltSplice data can also be seen from the geneview and contigview pages of Ensembl. DocumentationConcise documentation of the procedure followed to produce this data and the naming conventions used is available in PDF format. The document is a work in progress and will be updated now and then to reflect now developments. StatisticsGene set
Distribution statistics Various distributions are located on the distribution statistics page (genes per classes, intron types, event types, length of retained introns and cassette exons, effective length change of exon isoforms). Data files
Examples and formats of the data files. Gene file The gene file has a standard FASTA format with a header, and a section right after the header that lists the sequence. The header will show the Ensembl gene identifier, and various flags. The important flag is the 'ext' flag which indicates how many bases we have added up- and downstream to the listed sequence. Example: >ENSG00000170613
chrom: 5 strand: -1 orientation: reversed ext: 3000 map_start: Reference transcript structure file The reference transcript structure file lists the transcript structure from the Ensembl annotation that we chose to be the point of reference with regard to numbering the features and comparing the new features that we found. Each gene only has one reference transcript structure; in the file Ensembl gene ID, Ensembl Transcript ID, and the complete reference transcript structure are given. UFR and UDR features are added features by us that denote the upstream and downsteam flanking region that was added to the gene sequence. Example: >ENSG00000167987 ENST00000301765 UFR(1..3000 3000),e1(3001..3102 102),i1(3103..7631 4529),e2(7632..7803 172),i2(7804..8501 698),e3(8502..8584 83),i3(8585..9299 715),e4(9300..11583 2284),DFR(11584..14583 3000) Transcript file The transcript file lists all those EST's and mRNA's that were used in determining the individual introns and exons. Besides listing the transcript identifier and version, it also gives a pointer to the gene where it confirmed a feature, the description of the transcript, and the alignment as we found it. Example: K057543.1 [ENSG00000170613] Homo sapiens cDNA FLJ32981 fis, clone TESTI3000002, weakly similar to L.mexicana lmsap2 gene for secreted acid phosphatase 2 (SAP2). g(3004..3705)e(1..702),g(5610..6928)e(703..2021) Intron file The file lists all the introns that were determined by matching the EST/mRNA's against the genes listed in the gene file. Example: >ENSG00000128891
(3018..4965) Exon file The exon file lists all exons that were confirmed by the EST/mRNA's in the transcript file. Example: >ENSG00000174815
(20264..20442) Splice pattern file The different EST/mRNAs sequences that
map to a gene are grouped into classes. The longest
EST/mRNA sequence in each class is chosen as a representative. A class
is composed of all
the EST/mRNAs confirming the same splice pattern. The region
of overlapping between
classes may contain different introns. If this happens, the
respective classes represent
alternative splice patterns. Classes that do not overlap with one
another represent
different regions of the gene and do not represent alternative
splice patterns. Every EST/mRNA identifier is followed by the
identifiers of the classes to which the
sequence belong. It is possible, for non-representative EST/mRNAs, to
belong to more than one class.
Splice pattern sequence file This FASTA formatted file lists all the sequences of the observed splice patterns, together with the structure of the splice pattern. Example (some sequence deleted for brevity): >ENSG00000136114
SP:2 STRUCTURE:~6418..6558,10907..11869,30312..~31934 THSD1 (HUGO) Peptide sequence file This FASTA formatted file lists all the peptide sequences that could be derived from the observed splice patterns, together with indication of the splice pattern number (e.g. SP2, see events file) and a gene symbol if known. Example (some sequence deleted for brevity): >ALTS_HUM_PP:ENSG00000004399_SP2_17861
PLXND1 Events file The events file lists for every gene all the events that were observed among the transcript classes (see documentation). Every event is annotated with (i) the splice patterns of isoform transcript classes (from which the event is delineated); (ii) the different manipulations of the event as seen between all combinations of di-transcript classes; and (iii) the participating introns/exons in the event. For each gene, the Ensembl identifier is given, followed by a listing of representatives of the transcript classes, any possible relation between the classes, and the grouped events themselves. Example: >ENSG00000170312 Type
: INTRON ISOFORM (II-5P) (2 & 1) : 3094..4663 (intron) <=> 3192..4663 (intron) TRPT-ISO1:
~3016..3093,4664..4725,9228..9384,10187..10310,12583..12753,16413..~16454 Type
: CASSETTE EXON (SCE) (4 & 2) (4 & 1) (4 & 3) : 10311..16412 (intron) <=> 10311..12582,12754..16412 (introns)
A more detailed description of the events file and our naming conventions can be found in the documentation. ![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||