Comment[ArrayExpressAccession] E-GEOD-51861 MAGE-TAB Version 1.1 Public Release Date 2013-10-30 Investigation Title RNA-seq (Illumina and PacBio) of hESC Comment[Submitted Name] RNA-seq (Illumina and PacBio) of hESC Experiment Description We used PacBio data to identify more reliable transcripts from hESC, based on which we can estimate gene/transcript abundance better from Illumina data. PacBio long reads and Illumina short reads were generated from the same hESC cell line H1. PacBio reads were error-corrected by Illumina reads to identify transcripts. rSeq is used to estimate gene/transcript abundance of the identified transcriptome. Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Au Au Underwood Sebastiano Lee Williams Van Bakel Schadt Wong Person First Name Kin Fai Kin-Fai Jason Vittorio Lawrence Brian Honoratus Eric Wing Person Mid Initials A H Person Email kinfai@stanford.edu Person Affiliation Stanford Person Address Statistics, Stanford, James H. Clark Center, 1st Floor East, 318 Campus Drive,, Stanford, USA Person Roles submitter Protocol Name P-GSE51861-3 P-GSE51861-4 P-GSE51861-2 P-GSE51861-1 Protocol Description Illumina reads were aligned to reference genome hg19 by SpliceMap PacBio reads were corrected by LSC Gene isoforms were identified by IDP Gene/transcript abundance was estimated by rSeq (optimized and integrated into IDP) Genome_build: hg19 Supplementary_files_format_and_content: gtf and fasta file Illumina reads were aligned to reference genome hg19 by SpliceMap PacBio reads were corrected by LSC Gene isoforms were identified by IDP Gene/transcript abundance was estimated by rSeq (optimized and integrated into IDP) Genome_build: hg19 Supplementary_files_format_and_content: tab-delimited text file including RPKM values for each transcript Total RNA was prepared by TRIzol extraction (Ambion; http://www.ncbi.nlm.nih.gov/pubmed/2440339) and treated with RNase-free DNase I to degrade contaminating gDNA. This was followed by a second extraction with acid-phenol-chloroform and the RNA was precipitated in 0.5M ammonium acetate and 2.5 volumes of ethanol. The integrity of the RNA was tested by a Bioanalyzer RNA Nano assay (Agilent). PolyA RNA was purified from the total RNA (>100 M-5g) using magnetic oligo-dT beads (Dynal) according to the manufacturerM-bM-^@M-^Ys recommended protocol. Two different manufacturerM-bM-^@M-^Ys kits and accompanying protocols for generating full-length cDNAs were employed; Clontech SMARTer PCR and Invitrogen Superscript Full-Length cDNA kit. Each utilizes a retroviral reverse transcriptase to initiate cDNA synthesis at the 3M-bM-^@M-^Y polyA tail, but the two kits use different mechanisms to capture the 5M-bM-^@M-^Y end information. The Clontech method relies upon template-switching to an exogenous tag oligonucleotide when the reverse transcriptase reaches the 5M-bM-^@M-^Y end of the molecule. The Invitrogen method employs an RNase I digestion and subsequent 7-methyl-guanosine affinity step to purify the RNA-DNA hybrids that represent a full copy of the transcript. This is followed by an RNA ligase step to add a defined sequence to the 5M-bM-^@M-^Y ends of the polyA RNAs, prior to reverse transcription. For each method, 1 M-5g of polyA RNA served as the input material into the recommended protocol and the manufacturer methods were followed up through second strand cDNA synthesis. At this point, PCR (12-15 cycles; Phusion DNA polymerase. New England Biolabs) was used to amplify the double-stranded cDNA by using specific primers directed to the defined sequences present at the 5M-bM-^@M-^Y and 3M-bM-^@M-^Y ends of the cDNAs. Long extension times of 3.5 minutes ensure that longer molecules can be captured as well. The cDNA libraries were assayed by fluorimetry and Bioanalyzer assays. Both of Clontech and Invitrogen kits are reproducible. For example, housekeeping genes at different sizes (ACTB, 1808bp; GADPH, 1310bp; SARS 2291bp; IARS, NM_002161, 4406bp and NM_013417, 4510bp) are detected by full length long reads from the two preps . Male human Embryonic Stem cells (H1) were used for our study. The cells were routinely cultured in feeder free conditions on Matrigel (BD) and in mTeSR1 (Stem Cell technologies). RNA seq was performed on cells between passage 50 and 55. The undifferentiated state of the hESCs was assayed by immunofluorescence for transcription factors OCT4 and NANOG and for the surface markers SSEA4, TRA-1-60 and TRA-1-81 (Figure S9). Pluripotency, assessed by teratoma assay, revealed the capacity of hESCs to form in vivo derivatives of the three germ layers Protocol Type normalization data transformation protocol normalization data transformation protocol nucleic acid library construction protocol growth protocol Comment[SecondaryAccession] GSE51861 Comment[GEOReleaseDate] 2013-10-30 Comment[ArrayExpressSubmissionDate] 2013-10-29 Comment[GEOLastUpdateDate] 2013-11-01 Comment[AEExperimentType] RNA-seq of coding RNA Comment[AdditionalFile:Data1] GSE51861_File_Formats_README.txt Comment[AdditionalFile:Data2] GSE51861_isoform.gtf Comment[AdditionalFile:Data3] GSE51861_isoform_RPKM.txt Comment[SecondaryAccession] SRP032367 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR1020625 SDRF File E-GEOD-51861.sdrf.txt