Comment[ArrayExpressAccession] E-MTAB-1505 MAGE-TAB Version 1.1 Investigation Title GENCODE PCR-Seq Batch XI Comment[Submitted Name] GENCODE PCR-Seq Batch XI Experiment Description As part of the ENCODE consortium the GENCODE project is producing a reference gene set through manual and automated gene prediction. Selected transcript models are verified experimentally by RT-PCR amplification of at least one of their unique splice junctions followed by sequencing. The experiment targets are manually annotated transcripts with novel or putative status, non-pseudogene biotype and unique splice junctions not validated previously. For Batch XI 966 splice junctions from GENCODE13 (released July 2012) and 640 splice junctions from GENCODE14 (released October 2012) were chosen for experimental verification. Experimental Design organism_part_comparison_design reference_design co-expression_design Comment[AEExperimentType] genotyping by high throughput sequencing Comment[AEExperimentDisplayName] RT-PCR of eight different human tissues (GENCODE PCR-Seq Batch XI) Experimental Factor Name organism part Experimental Factor Type organism part Public Release Date 2013-02-19 Person Last Name Gonzalez Hubbard Reymond Guigo Person First Name Jose Tim Alexandre Roderic Person Mid Initials M Person Email jmg@sanger.ac.uk th@sanger.ac.uk Alexandre.Reymond@unil.ch roderic.guigo@crg.cat Person Phone +44-1223-496827 +44-1223-496876 Person Address Wellcome Trust Genome Campus, Hinxton, UK Wellcome Trust Genome Campus, Hinxton, UK Lausanne, Switzerland Barcelona, Spain Person Affiliation Wellcome Trust Sanger Institute Wellcome Trust Sanger Institute University of Lausanne Centre for Genomic Regulation (CRG) Person Roles submitter investigator investigator investigator PubMed ID 22955982 22955987 Publication Author List Howald C, Tanzer A, Chrast J, Kokocinski F, Derrien T, Walters N, Gonzalez JM, Frankish A, Aken BL, Hourlier T, Vogel JH, White S, Searle SMJ, Harrow J, Hubbard T, Guigo R, Reymond A Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken B, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N , Balasubramanian S, Pei B, Tress M, Rodriguez JM, Ezkurdia I, van Baren J , Brent M, Haussler D, Kellis M, Valencia A, Reymond A, Gerstein M, Guigo R, Hubbard T Publication Title Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome GENCODE: The reference human genome annotation for The ENCODE project Publication Status published published Protocol Name P-MTAB-28724 P-MTAB-28725 P-MTAB-28726 P-MTAB-24825 P-MTAB-23508 P-MTAB-31106 Protocol Type extraction specified_biomaterial_action specified_biomaterial_action sequencing scanning nucleic acid library construction protocol Protocol Description Human polyA+ RNAs were purchased from Clontech. The RNAs were isolated from tissue samples using a modified guanidium thiocynate extraction method followed by polyA+ RNA selection with two rounds of oligo(dT)-cellulose columns. We experimentally assessed three categories of gene models: (1) spliced models in which one primer could be placed within 75 nt of a junction and that will result in about half of the sequencing reads covering the junction (“Multi-span”); (2) spliced models in which this was unfeasible (“Multi”); and (3) monoexonic genes (“Mono”). In the case of “Multi-span” primers, the “junction primer” is positioned within an exon not more than 65 bp away from the targeted junction to ensure that sequencing reads will cross the junctions with a minimum of 10 nt, while the second primer maps within an adjacent exon. To increase the fraction of junctions we could possibly test, we then designed primer pairs further away from the junction (“Multi” primers). Primers were designed using Primer3. We used parameters minimizing the formation of primer dimers and maximizing primers “stickiness”. Primer pairs were further filtered for mapping within repeat-regions and alternative priming within 30 kb in duplications and paralogous sequences (maximum of two tolerated mismatches). First-strand cDNA samples were prepared from human poly(A)+ RNAs (BD-Clontech) with the SuperScript III kit (Invitrogen). Amplifications were performed in a final volume of 12.5 μL with JumpStart REDTaq ReadyMix (Sigma-Aldrich) and a primer concentration of 0.4 μM in 384-well plates format on an automatized Evoware platform (TECAN) combined with a Tetrad2 thermocycler (Bio-Rad) that allows processing four plates in parallel. Because monoexonic amplification is sensitive to genomic DNA contaminations, monoexonic models were assessed by amplification of cDNA in which a dNTP analog was incorporated using the mRNA Selective PCR Kit (TAKARA). Aliquots of up to 5000 RT-PCRs reactions were pooled together and the sequencing library was prepared with no fragmentation using the TruSeq DNA protocol. DNA fragments were sequenced on a Illumina HiSeq2000 sequencer and 2 multiplexed libraries were loaded per lane. The sequencing was performed at the University of Lausanne, Lausanne, Switzerland. Illumina pipeline export files were transformed into fastq format by applying this Unix command: awk '{print @$1_$2:$3:$4:$5:$6#$7/$8\n$9\n+\n$10}' file_export.txt > file.fastq DNA contaminations, monoexonic models were assessed by amplification of cDNA in which a dNTP analog was incorporated using the mRNA Selective PCR Kit (TAKARA). Protocol Hardware Illumina HiSeq 2000 Term Source Name ArrayExpress Term Source File http://www.ebi.ac.uk/arrayexpress Comment[SecondaryAccession] ERP002245 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/ERR231576-ERR231583 SDRF File E-MTAB-1505.sdrf.txt