Comment[ArrayExpressAccession] E-GEOD-56720 MAGE-TAB Version 1.1 Public Release Date 2014-05-05 Investigation Title Nascent elongating transcript sequencing (NET-seq) for Escherichia coli and Bacillus subtilis reveals a consensus pause sequence enriched at translation start sites. Comment[Submitted Name] Nascent elongating transcript sequencing (NET-seq) for Escherichia coli and Bacillus subtilis reveals a consensus pause sequence enriched at translation start sites. Experiment Description Transcription by RNA polymerase (RNAP) is interrupted by pauses that play diverse regulatory roles. Although individual pauses have been studied in vitro, the determinants of pauses in vivo and their distribution throughout the bacterial genome remain unknown. Using nascent transcript sequencing we identify a 16 nt consensus pause sequence in E. coli that accounts for known regulatory pause sites as well as ~20,000 new in vivo pause sites. In vitro single-molecule and ensemble analyses demonstrate that these pauses result from RNAP/nucleic-acid interactions that inhibit next-nucleotide addition. The consensus sequence also leads to pausing by RNAPs from diverse lineages and is enriched at translation start sites in both E. coli and B. subtilis. Our results thus implicate a conserved mechanism unifying known and newly identified pause events. Examination of nascent transcripts in E. coli and B. subtilis. 6 samples of E. coli NET-seq, 1 sample of E. coli mRNA-seq, and 1 sample of B. subtilis NET-seq. Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Larson Larson Mooney Peters Windgassen Nayak Gross Block Greenleaf Landick Weissman Person First Name Matthew M R J T D C S W R J Person Mid Initials H H A M A M J S Person Email matthew.larson@ucsf.edu Person Affiliation University of California, San Francisco Person Address Cellular & Molecular Pharmacology, University of California, San Francisco, 1700 4th St, San Francisco, CA, USA Person Roles submitter Protocol Name P-GSE56720-8 P-GSE56720-4 P-GSE56720-3 P-GSE56720-5 P-GSE56720-2 P-GSE56720-6 P-GSE56720-7 P-GSE56720-1 Protocol Description Basecalls performed using Casava versions 1.6 or 1.7. Sequenced reads were trimmed for adaptor sequence. Trimmed reads were aligned using Bowtie v0.12.7 against the reference genome using parameters -a -v 3 -m 1 Bowtie alignments against the reference genome were converted to wiggle files. The position of each alignment was mapped to the 3' end of the nascent transcript. After alignment and mapping of the reads, pause detection was performed using custom scripts written in Python 2.7. Positions with counts greater than four standard deviations above the mean were scored as pauses. Genome_build: B. subtilis: NC000964 Supplementary_files_format_and_content: wiggle files with two columns: first column containing chromosome positions and second column containing the number of reads mapped to the position (see publication for details). Supplementary_files_format_and_content: The fas.txt contains two lines for each pause; the first line denotes the gene, strand, pause index, and relative peak height. The second line denotes the sequence 15 nt upstream and downstream of the pause position. Basecalls performed using Casava versions 1.6 or 1.7. Sequenced reads were trimmed for adaptor sequence. Trimmed reads were aligned using Bowtie v0.12.7 against the reference genome using parameters -a -v 3 -m 1 Bowtie alignments against the reference genome were converted to wiggle files. The position of each alignment was mapped to the 3' end of the nascent transcript. Genome_build: E. coli: NC_000913 Supplementary_files_format_and_content: wiggle files with two columns: first column containing chromosome positions and second column containing the number of reads mapped to the position (see publication for details). Basecalls performed using Casava versions 1.6 or 1.7. Sequenced reads were trimmed for adaptor sequence. Trimmed reads were aligned using Bowtie v0.12.7 against the reference genome using parameters -a -v 3 -m 1 Bowtie alignments against the reference genome were converted to wiggle files. The position of each alignment was mapped to the 3' end of the nascent transcript. After alignment and mapping of the reads, pause detection was performed using custom scripts written in Python 2.7. Positions with counts greater than four standard deviations above the mean were scored as pauses. Genome_build: E. coli: NC_000913 Supplementary_files_format_and_content: wiggle files with two columns: first column containing chromosome positions and second column containing the number of reads mapped to the position (see publication for details). Supplementary_files_format_and_content: The fas.txt contains two lines for each pause; the first line denotes the gene, strand, pause index, and relative peak height. The second line denotes the sequence 15 nt upstream and downstream of the pause position. For analysis of lacZ transcription, ITPG was added to a final concentration of 1 mM. For bicyclomycin (BCM) treated samples, cells were grown from an initial OD of 0.01 in the absence of BCM. BCM was added to a final concentration of 20 ug/mL when the cells reached OD 0.05. Cell culture was rapidly filtered in 250 mL increments at 37 M-0C over 0.22 M-NM-