Summary Table

Distribution of lengths for all 1097 sequences in the dataset.

 

Distribution of E-values

Cumulative E-Value hits to Swiss-Prot, PDB and TrEMBL for all 1097 sequences in the database.

 

 

EBI/UCL Statistics calculated on a reduced dataset, taking only one spliced variant where multiples exist. Total sequences in this dataset total 434.

 

Number of sequences with hits at 100% sequence identity (95% overlap) to Swiss-Prot, PDB and Trembl. A dataset of the longest spliced variants was used (434 sequences).  

 

 

PsiPred Predicted Secondary Structure Composition

Protein secondary structure composition was calculated  from the percentage of helix and strand residues for the dataset of longest spliced variants (434 sequences). Cutoffs were derived empirically from domains assigned in the CATH database. Michie AD et al. (1996). J. Mol. Biol., 262, 168-185. Assignments here relate to the whole chain and are derived from predicted secondary structure.

 

Trans-membrane Helices Predicted by MEMSAT

A total of 232 out of the 434 longest spliced variants have no predicted TM helices. Careful consideration should also be undertaken with those sequences with one TM helix annotated as these may be membrane anchors. 

 

Sequences with Predicted Disordered Regions (Disopred)

The dataset of longest spliced variants were used (434 sequences). 373 sequences had no predicted regions of disorder at all. These sequences are not shown on the pie chart. 22 sequences (purple) had predicted regions of disorder which were less than 30 residues long. Regions of > 30 residues of predicted disorder have less than 1% false positive rate.