Comment[ArrayExpressAccession] E-GEOD-57085 MAGE-TAB Version 1.1 Public Release Date 2015-04-24 Investigation Title Genome-Wide Specificity of DNA-Binding, Gene Regulation, and Chromatin Remodeling by TALE- and CRISPR/Cas9-Based Transcription Factors Comment[Submitted Name] Genome-Wide Specificity of DNA-Binding, Gene Regulation, and Chromatin Remodeling by TALE- and CRISPR/Cas9-Based Transcription Factors Experiment Description Synthetic DNA-binding proteins have found broad application in gene therapies and as tools for interrogating biology. Engineered proteins based on the CRISPR/Cas9 and TALE systems have been used to alter genomic DNA sequences, control transcription of endogenous genes, and modify epigenetic states. Although the activity of these proteins at their intended genomic target sites have been assessed, the genome-wide effects of their action have not been extensively characterized. Additionally, the role of chromatin structure in determining the binding of CRISPR/Cas9 and TALE proteins to their target sites and the regulation of nearby genes is poorly understood. Characterization of the activity these proteins using modern high-throughput genomic methods would provide valuable insight into the specificity and off-target effects of CRISPR- and TALE-based genome engineering tools. We have analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators targeted to the promoters of two different endogenous human genes in HEK293T cells using a variety of high-throughput DNA sequencing methods. In particular, we assayed the DNA-binding specificity of these proteins and their effects on the epigenome. DNA-binding specificity was evaluated by ChIP-seq and RNA-seq was used to measure the specificity of these activators in perturbing the transcriptome. Additionally, DNase-seq was used to identify the chromatin state at target sites of the synthetic transcriptional activators and the genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these genome engineering technologies are highly specific in both binding to their promoter target sites and inducing expression of downstream genes when multiple activators bind to a single promoter. Moreover, we show that these synthetic activators are able to induce the expression of silent genes in heterochromatic regions of the genome by opening regions of closed chromatin and decreasing DNA methylation. Interestingly, the transcriptional activation domain was not necessary for DNA-binding or chromatin remodeling in these regions, but was critical to inducing gene expression. This study shows that these CRISPR- and TALE-based transcriptional activators are exceptionally specific. Although we detected limited binding of off-target sites in the genome and changes to genome structure, these off-target event did not lead to any detectable changes in gene regulation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. HEK293T cells were transfected in triplicate with plasmids expressing synthetic transcription factors. The synthetic TFs were either (a) dCas9-VP64 fusion protein and a targeting guide RNA (gRNA), or (b) a TALE-VP64 fusion protein engineered to bind to a specific target site in the genome. As a control, cells were transfected with plasmids expressing GFP. After transfection, ChIP-seq was used to identify both on-target and off-target binding sites for the synthetic TFs. Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Reddy Polstein Perez-Pinera Kocak Vockley Bledsoe Safi Song Crawford Reddy Gersbach Person First Name Timothy Lauren Pablo Dewran Christopher Peggy Alexias Lingyun Gregory Timothy Charles Person Mid Initials E R D M E E A Person Email tim.reddy@duke.edu Person Affiliation Duke University Person Address Department of Biostatistics & Bioinformatics, Duke University, 2347 CIEMAS, 101 Science Drive, Durham, NC, USA Person Roles submitter Protocol Name P-GSE57085-4 P-GSE57085-1 P-GSE57085-3 P-GSE57085-2 Protocol Description Base-calling was performed on instrument using CASAVA software Reads were aligned to the hg19 version of the human genome using bowtie version 0.12.9, and the --best alignment option Binding sites were called for each replicate using macs (v1.4) relative to a pooled GFP control library, and with the parameters -g hs -p 1e-4 --nomodel --shiftsize=65 Reproducibile binding sites were identified by calculating the Irreproducibility Discovery Rate (IDR) for each pair of replicates, accepting only binding sites with an IDR < 0.05. Binding sites with IDR < 0.05 in any pairwise comparison were merged into a single set of binding sites per experiment Binding sites found in the ENCODE blacklist (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeDacMapabilityConsensusExcludable.bed.gz) were removed from further analysis. The number of sequencing reads in each reproducible binding site in each replicate and each control were tabulated The counts were loaded into R and analyzed for differential binding relative to control (i.e. HA-ChIP-seq for GFP-transfected cells) using DESeq. Within DESeq, size factors were set based on total sequencing depth for the replicate; dispersions were estimates using the pooled method and a local fit; and a False Discovery Rate of 0.1% was used for final reporting. Genome_build: hg19 Supplementary_files_format_and_content: bedgraph files of the genome-wide sequenced tag density, split by strand. HEK293T cells were transfected with Lipofectamine 2000 (Invitrogen) according to manufacturer’s instructions. Transfection efficiencies were routinely higher than 80% as determined by fluorescence microscopy following delivery of a control eGFP expression plasmid. dCas9-VP64 expression plasmid was transfected at a mass ratio of 3:1 to the total amount of gRNA expression plasmid consisting of a mixture of equal amounts of the four gRNAs. 10 million cells were crosslinked for 10 minutes in 1% formaldehyde followed by 5 minutes of quenching in 0.125 M glycine. Cells were then washed with cold PBS pH7.4, and collected in the presence of cold cell lysis buffer (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% NP-40, and Roche Complete Protease Inhibitor Cocktail Cat #11836145001). Nuclei were pelleted by centrifugation at 2,000 rpm for 5 minutes at 4C, and the supertanant discarded. Sheared chromatin was then prepared by sonicating the nuclei in the presence of RIPA buffer (1 x PBS pH 7.4, 1% NP-40, 0.5% NaDOC, Roche Protease Inhibitor Cocktail), followed by centrifugation to remove cell debris. The chromatin was then immunoprecipitated with an anti-HA antibody (Covance MMS-101P). After immunoprecipitation, the formaldehyde crosslinks were reversed by heating overnight at 65C, and the DNA was collected using Qiagen PCR cleanup columns. DNA ends were prepared for Illumina HiSeq sequencing using the NEBNext Uktra DNA Library Prep Kit for Illumina (NEB Cat# E7370L). The adapted DNA was then size selected using SPRI beads for a size of 150-300bp, and PCR-amplified to make the final sequencing library. Samples were sequenced on an Illumina HiSeq 2000. HEK293T cells were obtained from the American Tissue Collection Center (ATCC) through the Duke University Cancer Center Facilities and were maintained in DMEM supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin at 37°C with 5% CO2. Protocol Type normalization data transformation protocol sample treatment protocol nucleic acid library construction protocol growth protocol Experimental Factor Name guide rna transfected gene Experimental Factor Type guide rna transfected gene Comment[SecondaryAccession] GSE57085 Comment[GEOReleaseDate] 2015-04-24 Comment[ArrayExpressSubmissionDate] 2014-04-25 Comment[GEOLastUpdateDate] 2015-04-29 Comment[AEExperimentType] ChIP-seq Comment[SecondaryAccession] SRP041457 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR1262992-SRR1263006 SDRF File E-GEOD-57085.sdrf.txt