Comment[ArrayExpressAccession] E-GEOD-53693 MAGE-TAB Version 1.1 Public Release Date 2014-04-04 Investigation Title Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation Comment[Submitted Name] Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation Experiment Description Identification of the coding elements in the genome is a fundamental step to understanding the building blocks of living systems. Short peptides (< 100 aa) have emerged as important regulators of development and physiology, but their identification has been limited by their size. We have leveraged the periodicity of ribosome movement on the mRNA to define actively translated ORFs by ribosome footprinting. This approach identifies several hundred translated small ORFs in zebrafish and human. Computational prediction of small ORFs from codon conservation patterns corroborates and extends these findings and identifies conserved sequences in zebrafish and human, suggesting functional peptide products (micropeptides). These results identify micropeptideM-bM-^@M-^Pencoding genes in vertebrates, providing an entry point to define their function in vivo. Ribosome profiling experiments at five timepoints across zebrafish development in WT embryos Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Giraldez Bazzini Johnstone Christiano Mackowiak Obermayer Fleming Vejnar Lee Rajewsky Walther Giraldez Person First Name Antonio Ariel Timothy Romain Sebastian Benedikt Elizabeth Charles Miler Nikolaus Tobias Antonio Person Mid Initials J. A G D S E T C J Person Email antonio.giraldez@yale.edu Person Affiliation Yale University Person Phone 203 785 5423 Person Address Genetics, Yale University, 333 Cedar Street, New Haven, CT, USA Person Roles submitter Protocol Name P-GSE53693-3 P-GSE53693-2 P-GSE53693-1 Protocol Description Base calling was performed using CASAVA-1.8.2. The Illumina TruSeq adaptor sequence was then trimmed from raw reads by aligning its sequence, requiring 100% match of the first five base-pairs and a minimum alignment score of 60 (Matches:5, Mismatches:-4, Gap opening:-7, Gap extension:-7). Trimmed reads were then depleted of rRNA, tRNA, snRNA, snoRNA, and misc_RNA from Ensembl and RepeatMasker annotations using strand-specific alignment performed with Bowtie2 v2.1.0 with default parameters. Filtered reads were aligned strand-specific to the Zebrafish Zv9 genome assembly using Tophat v2.0.8 with default parameters and the exon-junction coordinates from Ensembl r70. Using the spliced version of each transcript (ensembl r73, Pauli et al lincRNAs, Ulitsky et al lincRNAs) we defined all ORFs as stop codon to most distal in-frame AUG codon without an intervening stop. Read coverage and counts were calculated using 28 & 29nt reads. For position-related calculation of ORFScore and Frame1 coverage, we counted reads only with the +12 position within the interior of the ORF (defined as the region excluding the reads aligning to the start and stop(-1) codons). RPKM values were calculated using input mRNA per transcript, normalizing WITHIN but not between transcript sets (total reads mapping to ensembl r73 or either lincRNA set). ORFScore and coverage values were calculated as outlined in Bazzini et al, 2013. Sequencing data were combined for each timepoint. Genome_build: Zv9/danRer7 Supplementary_files_format_and_content: allinfo_final_ORFs.txt :: orfID=unique ID of ORF; geneID; transcriptID; startLoc & stopLoc=1-based transcript coordinates of ORF; txCDSStart&txCDSEnd=start and end of annotated CDS; Source=transcript source; orfFrame=ORFScore; orfRate=# of reads in ORF; covF1Prop=proportion of in-frame positions with aligning read(s); tx_Input_RPKM=RPKM of transcript from input mRNA For ribosome profiling, 50 wild type embryos for each condition were collected at a given stage. Embryos were lysed using 800ul of a mammalian cell lysis buffer containing 100ug/ml Cycloheximide as per the manufacturerM-bM-^@M-^Ys instruction (ARTseq Ribosome Profiling Kit, RPHMR12126, Epicentre). For nuclease treatment, 3ul of ARTseq Nuclease was used. Ribosome protected fragments were run and 28-29nt fragments were gel purified as previously described in (Bazzini et al., 2012) and cloned according to the manufacturers protocol (ARTseq Ribosome Profiling Kit, RPHMR12126, Epicentre). For RNA input, total RNA was isolated from 400 ul of the 800 ul of clarified extract before ART- seq Nuclease treatment. Poly-A selection was done according to manufacturer guide- lines (Dynabeads mRNA Purification Kit, Cat no.610.06), and RNA fragmented using the Artseq Ribosome Profiling Kit Mammalian protocol. Both RPF and RNA input fragments were cloned according to the Artseq Ribosome Profiling Kit, Mammalian. The final PCR was carried out with an initial 15 second denaturation at 98 M-0C, followed by 9-12 cycles of 15 seconds at 98 M-0C, 5 seconds at 55 M-0C and extension at 72 M-0C for 10 s. Reactions were separated on a non-denaturing 8% polyacrylamide TBE gel and DNA fragments of the correct size were extracted and sequenced. Embryos are incubated at 28C Protocol Type normalization data transformation protocol nucleic acid library construction protocol growth protocol Experimental Factor Name STAGE SEPARATION Experimental Factor Type Stage separation Publication Title Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. Publication Author List Bazzini AA, Johnstone TG, Christiano R, Mackowiak SD, Obermayer B, Fleming ES, Vejnar CE, Lee MT, Rajewsky N, Walther TC, Giraldez AJ PubMed ID 24705786 Publication DOI 10.1002/embj.201488411 Comment[SecondaryAccession] GSE53693 Comment[GEOReleaseDate] 2014-04-04 Comment[ArrayExpressSubmissionDate] 2013-12-27 Comment[GEOLastUpdateDate] 2014-04-11 Comment[AEExperimentType] RNA-seq of coding RNA Comment[AEExperimentType] RNA-seq of non coding RNA Comment[AdditionalFile:Data1] GSE53693_allinfo_final_ORFs.txt Comment[SecondaryAccession] SRP034750 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR1062197-SRR1062818 SDRF File E-GEOD-53693.sdrf.txt