Comment[ArrayExpressAccession] E-GEOD-46323 MAGE-TAB Version 1.1 Public Release Date 2013-07-12 Investigation Title Evaluating the Impact of Sequencing Depth on Transcriptome Profiling in Human Adipose Comment[Submitted Name] Evaluating the Impact of Sequencing Depth on Transcriptome Profiling in Human Adipose Experiment Description Recent advances in RNA sequencing (RNA-Seq) have enabled the discovery of novel transcriptomic variations that are not possible with traditional microarray-based methods. Tissue and cell specific transcriptome changes during pathophysiological stress, in disease cases versus controls and in response to therapies are of particular interest to investigators studying cardiometabolic diseases. Thus, knowledge on the relationships between sequencing depth and detection of transcriptomic variation is needed for designing RNA-Seq experiments and for interpreting results of analyses. Using deeply sequenced RNA-Seq data derived from adipose of a healthy individual before and after systemic administration of endotoxin (LPS), we investigated the sequencing depths needed for studies of gene expression and alternative splicing (AS). We found that to detect expressed genes and AS events, ~100 million (M) filtered reads were needed. However, the requirement on sequencing depth for the detection of LPS modulated differential expression (DE) and differential alternative splicing (DAS) was much higher. To detect 80% of events, ~300M filtered reads were needed for DE analysis whereas at least 400M filtered reads were necessary for detecting DAS. Although the majority of expressed genes and AS events can be detected with modest sequencing depths (~100M filtered reads), the estimated gene expression levels and exon/intron inclusion levels were less accurate. We report the first study that evaluates the relationship between RNA-Seq depth and the ability to detect DE and DAS in human adipose. Our results suggest that a much higher sequencing depth is needed to reliably identify DAS events than for DE genes. Random sampling the RNA-seq data in different depth for gene and alternative-splicing analysis Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Liu Liu Ferguson Xue Silverman Gregory Reilly Li Person First Name Yichuan Yichuan Jane Chenyi Ian Brian Muredach Mingyao Person Mid Initials M Person Email geo@ncbi.nlm.nih.gov Person Affiliation University of Pennsylvania Person Address University of Pennsylvania, 423 Guardian Drive, Philadelphia, USA Person Roles submitter Protocol Name P-GSE46323-2 P-GSE46323-1 Protocol Description Generate gene expression by cufflink v1.3.0 Generate alternative splicing files by MATS2.1.0 Genome_build: hg19 Supplementary_files_format_and_content: Standard cufflink outputs, tab-delimited text files including FPKM values and FDR adjusted p-values. Supplementary_files_format_and_content: A_adipose_gene_exp.xls contains the differentially expressed genes from cuffdiff at variant read depths. The read depths were ranged from 5M reads to 500M reads (full set). In the excel file one tab has the cuffdiff result for one read depth, e.g. adipose_100M indicates the differentially expressed gene results for adipose tissue at read depth 100M reads. Supplementary_files_format_and_content: A_adipose_gene_exp_simluation_100M.xls contains the differentially expressed genes for certain read depth (100M reads here) for 10 times simulation. The purpose is to evaluate sampling variations and to see whether the sequenced reads are representative. Each tab in the excel file contains the differentially expressed genes for each simulation rounds, as a result, 10 tabs in this file. Similar description for A_adipose_gene_exp_simulation_10M.xls file, the only difference is the read depth in simulation is 10M reads. Supplementary_files_format_and_content: MATS_A_adipose.xls represents the alternative-splicing (AS) results from Multivariate Analysis of Transcript Splicing (MATS). Similar to A_adipose_gene_exp.xls, it contains the AS results for adipose tissue at variant depths from 5M to 500M reads. Each tab represents the AS results for one depth. Supplementary_files_format_and_content: MATS_A_adipose_simulation_100M.xls and MATS_A_adipose_simulation_10M.xls represents the AS results for 10 times simulation at read depths 100M and 10M reads. Similar to A_adipose_gene_exp_simulation_100M.xls and A_adipose_gene_exp_simulation_10M.xls, each tab contains the AS result for one simulation rounds, as a result, totally 10 tabs. For adipose tissue, the RNA was extracted using RNeasy lipid total RNA mini kit (Qiagen, Valencia, CA). Extracted RNA samples underwent quality control (QC) assessment using the Agilent Bioanalyzer (Agilent, Santa Clara, CA). Poly-A library preparation and RNA sequencing were performed at the Penn Genome Frontiers Institute’s High-Throughput Sequencing Facility per Illumina’s (San Diego, CA) standard protocols. Briefly, we generated first-strand cDNA using random hexamer-primed reverse transcription, followed by second-strand cDNA synthesis using RNase H and DNA polymerase, and ligation of sequencing adapters using the Illumina paired-end sample preparation kit. Fragments of ~350 bp were selected gel electrophoresis, followed by 15 cycles of PCR amplification. The prepared libraries were then sequenced using Illumina’s HiSeq 2000 at four lanes per sample (~456 million to 701 million 2 × 101 bp paired-end reads per sample). Protocol Type normalization data transformation protocol nucleic acid library construction protocol Experimental Factor Name TREATMENT Experimental Factor Type treatment Publication Title Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. Publication Author List Liu Y, Ferguson JF, Xue C, Silverman IM, Gregory B, Reilly MP, Li M PubMed ID 23826166 Publication DOI 10.1371/journal.pone.0066883 Comment[SecondaryAccession] GSE46323 Comment[GEOReleaseDate] 2013-07-12 Comment[ArrayExpressSubmissionDate] 2013-04-23 Comment[GEOLastUpdateDate] 2013-08-26 Comment[AEExperimentType] RNA-seq of coding RNA Comment[AdditionalFile:Data1] GSE46323_A_adipose_gene_exp.xls Comment[AdditionalFile:Data2] GSE46323_A_adipose_gene_exp_simluation_100M.xls Comment[AdditionalFile:Data3] GSE46323_A_adipose_gene_exp_simluation_10M.xls Comment[AdditionalFile:Data4] GSE46323_MATS_A_adipose.xls Comment[AdditionalFile:Data5] GSE46323_MATS_A_adipose_simulation_100M.xls Comment[AdditionalFile:Data6] GSE46323_MATS_A_adipose_simulation_10M.xls Comment[AdditionalFile:Data7] GSE46323_README_processed_data_column_descriptions.txt Comment[SecondaryAccession] SRP021478 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR833716-SRR833731 SDRF File E-GEOD-46323.sdrf.txt