Comment[ArrayExpressAccession] E-GEOD-47650 MAGE-TAB Version 1.1 Public Release Date 2013-06-05 Investigation Title Sequence analysis of scnRNAs Comment[Submitted Name] Sequence analysis of scnRNAs Experiment Description The developmentally regulated 26- to 32-nt siRNAs (scnRNAs) are loaded to the Argonaute protein Twi1p and display a strong bias for uracil at the 5' end. In this study, we used deep sequencing to analyze loaded and unloaded populations of scnRNAs. We show that the size of the scnRNA is determined during a pre-loading process, whereas their 5' uracil bias is attributed to both pre-loading and loading processes. We also demonstrate that scnRNAs have a strong bias for adenine at the third base from the 3' terminus, suggesting that most scnRNAs are direct Dicer products. Furthermore, we show that the thermodynamic asymmetry of the scnRNA duplex does not affect the guide and passenger strand decision. Finally, we show that scnRNAs frequently have templated uracil at the last base without a strong bias for adenine at the second base indicating non-sequential production of scnRNAs from substrates. These findings provide a biochemical basis for the varying attributes of scnRNAs, which should help improve our understanding of the production and turnover of scnRNAs in vivo. We compared Twi1p-loaded scnRNAs to scnRNAs before they have been loaded into Twi1p by deep sequencing to understand how the two processes, the production of siRNAs by Dicer and the loading of siRNAs into Argonaute, shape the population of siRNAs in vivo. Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Mochizuki Mochizuki Kurth Person First Name Kazufumi Kazufumi Henriette Person Mid Initials M Person Email kazufumi.mochizuki@imba.oeaw.ac.at Person Affiliation IMBA Vienna Person Address IMBA Vienna, Dr. Bohr-Gasse 3, Vienna, Austria Person Roles submitter Protocol Name P-GSE47650-2 P-GSE47650-1 Protocol Description Sequence reads were filtered as described (Schoeberl et al. 2012). To study the base composition and thermodynamic stability of small RNAs, reads containing any bases other than A, C, G or T were excluded and the remaining reads were collapsed to non-redundant datasets. To study thermodynamic stability, 29 nt RNA sequences were mapped to the LMR genomic region (corresponding to bases 64,932-165,807 of the supercontig 231 in the 2nd assembly of the Tetrahymena thermophila micronuclear genome, obtained from Broad Institute Web site: http://www.broadinstitute.org/annotation/genome/Tetrahymena.1/MultiHome.html) without allowing any mismatches. Then, the passenger strand scnRNA sequences were estimated. The local thermodynamic stabilities (kcal/mol) of 4 base pairs at the ends of the guide and passenger duplexes were calculated using the free energy parameters at 37M-:C in 1 M NaCl (Freier et al. 1986) without including the initiation value. To study potential trimming and tailing reactions on small RNAs, 24-32 nt RNA sequences were mapped to the LMR genomic region by allowing two mismatches, and the mismatched positions in the scnRNAs were analyzed. Genome_build: LMR genomic region (corresponding to bases 64,932-165,807 of the supercontig 231 in the 2nd assembly of the Tetrahymena thermophila micronuclear genome, obtained from Broad Institute Web site: http://www.broadinstitute.org/annotation/genome/Tetrahymena.1/MultiHome.html) Supplementary_files_format_and_content: The .txt files represent abundance counts of the scnRNAs. Wild-type Tetrahymena thermophila strains B2086 and CU428 and the complete TWI1 KO strains (Mochizuki and Gorovsky 2004) were used for total RNA preparation. Twi1p was immunoprecipitated (Noto et al. 2010) from either wild-type cells using an anti-Twi1p antibody (Noto et al. 2010) or from the strains expressing HA-Twi1p and HA-Twi1p-hAGO2Lmut (Mochizuki and Kurth in press) using an anti-HA antibody (16B12, Covance). The cDNA libraries were constructed as described (Schoeberl et al. 2012). For multiplex sequencing, oligonucleotides listed below were used. Single-read 36- or 50-base sequences were generated using the Illumina GAII or HiSeq2000, respectively. Sequence reads were filtered as described (Schoeberl et al. 2012). The 3' adapter for multiplex sequencing: 5M-bM-^@M-^Y-App-NNN NAG ATC GGA AGA GCA CAC GTC T-3ddC; The 5' adapter for multiplex sequencing: 5'-ACA CUC UUU CCC UAC ACG ACG CUC UUC CGA UCU NNN-3'; The reverse-transcription primer for multiplex sequencing: 5M-bM-^@M-^Y-CAA GCA GAA GAC GGC ATA CGA GAT BBB BBB GTG ACT GGA GTT CAGACG TGT GCT CTT CCG ATC T-3M-bM-^@M-^Y (M-bM-^@M-^\BBB BBB" = 6 base barcodes for maltiplex sequencing: 5'-CGT GAT-3', 5'-ACT TCG-3', 5'-GCC TAA-3', 5'- TGG TCA-3', 5'- CAC TGT-3', 5'-ATT GGC-3', 5'- GAT CTG-3', 5'- TCA AGT-3', 5'-CTG ATC -3', or 5'-AAG CTA-3'); The PCR amplification primer for multiplex sequencing: 5M-bM-^@M-^Y-AAT GAT ACG GCG ACC ACC GAC AGG TTC AGA GTT CTA CAG TCC GAT CT-3M-bM-^@M-^Y The RNA was gel-fractionated, and a small RNA population, including scnRNAs (26M-bM-^@M-^S32 nt), was used to construct the cDNA library for deep sequencing. Protocol Type normalization data transformation protocol nucleic acid library construction protocol Experimental Factor Name STRAIN OR LINE GENOTYPE IP ANTIBODY Experimental Factor Type strain or line genotype ip antibody Comment[SecondaryAccession] GSE47650 Comment[GEOReleaseDate] 2013-06-05 Comment[ArrayExpressSubmissionDate] 2013-06-05 Comment[GEOLastUpdateDate] 2013-06-05 Comment[AEExperimentType] RNA-seq of non coding RNA Comment[SecondaryAccession] SRP024243 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR880684-SRR880687 SDRF File E-GEOD-47650.sdrf.txt