Comment[ArrayExpressAccession] E-GEOD-41338 MAGE-TAB Version 1.1 Public Release Date 2012-12-22 Investigation Title RNA-seq of poly-A enriched total RNA of brain, liver, kidney, heart and skeletal muscle samples from 5 vertebrate species: mouse, chicken, frog, lizard and pufferfish Comment[Submitted Name] The evolutionary landscape of alternative splicing in vertebrate species Experiment Description How species with similar repertoires of protein coding genes differ so dramatically at the phenotypic level is poorly understood. From comparing the transcriptomes of multiple organs from vertebrate species spanning ~350 million years of evolution, we observe significant differences in alternative splicing complexity between the main vertebrate lineages, with the highest complexity in the primate lineage. Moreover, within as little as six million years, the splicing profiles of physiologically-equivalent organs have diverged to the extent that they are more strongly related to the identity of a species than they are to organ type. Most vertebrate species-specific splicing patterns are governed by the highly variable use of a largely conserved cis-regulatory code. However, a smaller number of pronounced species-dependent splicing changes are predicted to remodel interactions involving factors acting at multiple steps in gene regulation. These events are expected to further contribute to the dramatic diversification of alternative splicing as well as to other gene regulatory changes that contribute to phenotypic differences among vertebrate species. mRNA profiles of several organs (brain, liver, kidney, heart, skeletal muscle) in multiple vertebrate species (mouse, chicken, lizard, frog, pufferfish) generated by deep sequencing using Illumina HiSeq Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Barbosa-Morais Barbosa-Morais Kutter Watt Odom Blencowe Person First Name Nuno Nuno Claudia Stephen Duncan Benjamin Person Mid Initials L L T J Person Email nuno.barbosa.morais@utoronto.ca Person Affiliation University of Toronto Person Phone 4169784633 Person Fax 4169465545 Person Address The Donnelly Centre, University of Toronto, 160 College Street, Room 908, Toronto, ON, Canada Person Roles submitter Protocol Name P-GSE41338-1 P-GSE41338-2 Protocol Description RNA samples were isolated from brain, liver, kidney, heart, and skeletal muscle from four vertebrate species: mouse (Mus musculus C57Bl/6), chicken (Gallus gallus), African clawed frog (Xenopus tropicalis), and anole lizard (Anolis carolinensis). Brain RNA was also isolated from pufferfish (Tetraodon nigroviridis). RNA samples analyzed by RNA- Seq comprised pools of at least two females and two males from each species. A subset of tissues corresponded to those from a previous study (7). Mouse tissues (2.5 months old) were obtained from the Cambridge Research Institute under Home Office license PPL 80/2197. Lizard tissues (1.5 months old) were obtained from the Michigan State University (East Lansing, USA). Frog brain and liver (2 months old) were obtained from the Wellcome Trust/Cancer Research UK Gurdon Institute (Cambridge, UK). The Tc1 mouse and corresponding wild-type Tc0 (2.5 months old, males only) strains used in this study were previously described (29, 36). All tissues were freshly dissected and flash-frozen in liquid N2 prior to RNA isolation. About 20 ug of each tissue was homogenized in 700 ul QIAzol Lysis Reagent (Qiagen) using ceramic beads (Precellys). RNA was extracted according to manufacturerM-bM-^@M-^Ys recommendations. Briefly, 140 ul of chloroform was added to the homogenate. After phase separation, 450 ul of isopropanol was added to the upper, aqueous phase. The yield and quality of the total RNA were monitored by spectrophotometry at 260, 280, and 230 nm using a Bioanalyzer Eukaryote Total RNA Nano Series II chip (Agilent). 10 ug extracted total RNA was DNase-treated (Turbo DNase, Ambion) and polyadenylated RNA was enriched twice from total RNA using polyATtract mRNA isolation system IV (Promega). RNA was reversed transcribed and converted into double-stranded cDNA (SuperScript cDNA synthesis kit, Invitrogen), sheared by sonication followed by end-repairment, A-tailing, paired end adapter (Illumina) ligation and, prior to PCR amplification, cDNA was UNG-treated to maintain strand-specificity. cDNA was amplified by 15 cycles of PCR and size selected (200-300 bp). After passing quality control on a Bioanalyzer 1000 DNA chip (Agilent), libraries were optically interrogated using an Illumina Cluster Station and were sequenced on the Illumina Genome Analyzer II (GAII) or HiSeq (paired-ended, 72-76 bp) following manufacturer's protocols. Sequencing reads were extracted from the image files generated by GAII or HiSeq and post-processed using the standard GA pipeline software v1.4 (Illumina). The quality of reads was controlled by measuring their alignability to the respective genome and using FastQC: (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Full genomic sequences for the analyzed species were downloaded from the UCSC Genome Browser database. Full transcriptomic sequences for all species were downloaded from Ensembl. For each gene, a canonical transcript was selected for gene expression analysis based on the hierarchy derived from the BioMart associated transcript names. For the cases in which such information was not available, the longest protein-coding transcript was selected as the gene representative. For each species, we assembled all canonical transcript (as defined above) sequences. In order to calculate the effective number of unique mappable positions in each transcript (i.e. the effective length) we performed the following steps. For each read length k, we extracted the L-k+1 (L being the transcript length) k-mer sequences from each canonical transcript and then aligned the full set of k-mers against the respective genome using Bowtie, allowing for a maximum of two mismatches. k-mers with no or one unique genomic alignment were then likewise aligned back to the canonical transcriptome. For each transcript, the number of such k-mers having a unique transcriptomic alignment was determined. This corresponds to the transcriptM-b's effective number of unique mappable positions for k-mer mRNA-Seq reads. For each sample, the corresponding mRNA-Seq data were aligned against the respective genome using Bowtie, allowing for a maximum of two mismatches. Reads with one unique genomic alignment were then aligned against the canonical transcriptome and, for each transcript, the number of reads with one unique transcriptomic alignment were counted. Gene expression levels were determined as reads per thousand mappable positions of target transcript sequence per million of reads, where the reads uniquely align to the analyzed transcriptome. This procedure for estimating gene expression levels is a corrected version of the widely used RPKM (reads per kilobase of target transcript sequence per million of total reads) metric, and is referred to as M-bM-^@M-^\cRPKMM-bM-^@M-^] [Labbe RM et al. Stem Cells (2012)]. Genome_build: mm9, galGal3, anoCar2, xenTro2, tetNig2 Supplementary_files_format_and_content: Tab-delimited text files include cRPKM values (as defined above), as well as the mappability and the total read count for each canonical transcript, for each Sample. To generate bigWig files we aligned the mRNA-Seq data against the respective genomes using TopHat; genomeCoverageBed (part of the BEDTools suite) was run on the resulting BAM files for the quantification of genomic coverage, outputting bedGraph files; bedGraphToBigWig (dowloaded from UCSC) was then used to generate bigWig files from the bedGraph files. Protocol Type nucleic acid library construction protocol feature_extraction Experimental Factor Name organism organism part Experimental Factor Type organism organism part Publication Title The evolutionary landscape of alternative splicing in vertebrate species. Publication Author List Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, Kim T, Misquitta-Ali CM, Wilson MD, Kim PM, Odom DT, Frey BJ, Blencowe BJ PubMed ID 23258890 Publication DOI 10.1126/science.1230612 Comment[SecondaryAccession] GSE41338 Comment[GEOReleaseDate] 2012-12-22 Comment[ArrayExpressSubmissionDate] 2012-10-04 Comment[GEOLastUpdateDate] 2013-01-30 Comment[AEExperimentType] RNA-seq of coding RNA Comment[SecondaryAccession] SRP015997 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR579545-SRR579565 SDRF File E-GEOD-41338.sdrf.txt