Comment[ArrayExpressAccession] E-GEOD-46428 MAGE-TAB Version 1.1 Public Release Date 2013-05-01 Investigation Title High-throughput sequencing of methylated cytosine enriched by modification-dependent restriction endonuclease MspJI Comment[Submitted Name] High-throughput sequencing of methylated cytosine enriched by modification-dependent restriction endonuclease MspJI Experiment Description In this study, we combine MspJI digestion and electrophoretic band selection with next generation high-throughput sequencing technology to detect 5-methylcytosines in Arabidopsis genome. By developing a bioinformatics workflow to attribute the CNNR sites recognized by MspJI to the reference genome, we fulfilled the systematic assessment of this method. According to the assessment, here we provide the method for generating a detailed map of plant methylome that could be feasible, reliable and economical in methylation investigation. Extracting the MspJI digested fragments, constructing sequencing library according to the Illumina protocol and sequencing with Illumina HiSeq2000. Repeatability and reproducibility studies were performed between two samples from the same individual. Specificity and sensitivity of the method was examined by comparing our data with WGBS data downloaded from GEO (GSE15922: GSM399600). **The WGBS seq data ('aerial_tissues_BS_seq_CNNR.gff') was generated from the GSM399600 Sample supplementary files ('aerial_tissues_BS_seq_alignment_batch-*.gff.gz') and was utilized to do comparative study with our MspJI-seq data including sensitivity and specificity analysis. Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Xia Huang Lu Person First Name Yudong Xiaojun Hanlin Person Email xiayudong@genomics.org.cn Person Affiliation BGI-Shenzhen Person Address BGI-Shenzhen, No. 11, Beishan Industrial Zone, Yantian District, Shenzhen, China Person Roles submitter Protocol Name P-GSE46428-4 P-GSE46428-1 P-GSE46428-3 P-GSE46428-2 Protocol Description The library products were sequenced using the Illumina HiSeq 2000 Firstly, Low-quality reads that contained more than 30% M-bM-^@M-^XNM-bM-^@M-^Ys or over 10% of the sequence with low quality value (quality value <20) per read were omitted from data analysis. Then the sequencing adapters were trimmed off from clean reads. The raw reads were mapped to the reference sequence TAIR9 using SOAPaligner soap2.20(http://soap.genomics.org.cn/index.html) with default parameters. The digested sites were detected from uniquely mapped reads and CNNR sites methylation was determined by a perl script. The regular matching seeking algorithm in perl was used to identify the CNNR sites within the mapped reads, and the cytosines in notarized CNNR sites obtained by MspJI-seq were determined as the methylated cytosines. The WGBS seq data (aerial_tissues_BS_seq_CNNR.gff) generated from GSM399600 Supplementary files (GSM399600_aerial_tissues_BS_seq_alignment_batch-*.gff.gz) was utilized to do comparative study with our MspJI-seq data including sensitivity and specificity analysis. Genome_build: TAIR9 Supplementary_files_format_and_content: ara_rep*_MspJI_site.txt file format: 1.Chromosome; 2.Coordinate for single 'c'; 3.the strand of the single 'c'; 4.tags; 5.consensus sequences type for single 'c'; 6.the real sequence context; 7.three bases sequence context. See 'README.txt' for more information. arabidopsis leaves were cutted down and cleaned by ddH2O, then they were soaked in ethylalcohol for 1-2 minutes and dried by absorbent paper before extract preparation. 1.5M-5g genomic DNA was digested at 37M-0C for 16h by 12U MspJI enzyme (NEB) in the presence of 0.8M-5M double-stranded DNA activator (Invitrogen) in a 30M-5l volume. The digestion system was optimized for the Arabidopsis genome from the original NEB protocol. By running the digested DNA in a 15% native polyacrylamide gel electrophoresis (PAGE), a narrow-band containing all the visible fragments around 28-35bp was excised in reference of 10bp DNA ladder (NEB). DNA was isolated by Crush and Soak Method(Sambrook J: Gel Electrophoresis of DNA and Pulsed-Field Agarose. In Molecular cloning. Volume 2. 3 rd edition. New York: Cold Spring Harbor Laboratory Press; 2001) and purified by ethanol precipitation. Recovered DNA was used to construct sequencing library according to the Illumina Pair-End protocol including procedures of DNA end-repair, M-bM-^@M-^XAM-bM-^@M-^Y base addition, adapters ligation and PCR amplification. Phenol: chloroform extraction and ethanol precipitation were used to purify the products of each process. PCR reaction was fulfilled by JumpStartM-bM-^DM-" Taq DNA Polymerase (Sigma) for 6 cycles, and its products at length of 148-155bp were recovered from a 2% agarose gel electrophoresis in reference of 50bp DNA ladder (NEB), and purified according to QIAquick gel extraction kit (Qiagen). The obtained Library was analyzed by Bioanalyzer analysis system (Agilent, Santa Clara, USA) before sequencing with Illumina HiSeq2000. arabidopsis thaliana (ecotype columbia) grow in 28M-bM-^DM-^C ,2000 lx light for 45 days. Protocol Type normalization data transformation protocol sample treatment protocol nucleic acid library construction protocol growth protocol Comment[SecondaryAccession] GSE46428 Comment[GEOReleaseDate] 2013-05-01 Comment[ArrayExpressSubmissionDate] 2013-04-26 Comment[GEOLastUpdateDate] 2013-05-02 Comment[AEExperimentType] methylation profiling by high throughput sequencing Comment[AdditionalFile:Data1] GSE46428_README.txt Comment[AdditionalFile:Data2] GSE46428_aerial_tissues_BS_seq_CNNR.gff Comment[SecondaryAccession] SRP021537 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR835460-SRR835461 SDRF File E-GEOD-46428.sdrf.txt