Project PXD012247

PRIDE Assigned Tags:
Technical Dataset



Capillary zone electrophoresis-tandem mass spectrometry with activated ion electron transfer dissociation for large-scale top-down proteomics


Capillary zone electrophoresis (CZE)-tandem mass spectrometry (MS/MS) has been recognized as an efficient approach for top-down proteomics recently for its high-capacity separation and highly sensitive detection of proteoforms. However, the commonly used collision-based dissociation methods often cannot provide extensive fragmentation of proteoforms for thorough characterization. Activated ion electron transfer dissociation (AI-ETD), that combines infrared photoactivation concurrent with ETD, has shown better performance for proteoform fragmentation than higher energy-collisional dissociation (HCD) and standard ETD. Here, we present the first application of CZE-AI-ETD on an Orbitrap Fusion Lumos mass spectrometer for large-scale top-down proteomics of Escherichia coli (E. coli) cells. CZE-AI-ETD outperformed CZE-ETD regarding proteoform and protein identifications (IDs). CZE-AI-ETD reached comparable proteoform and protein IDs with CZE-HCD. CZE-AI-ETD tended to generate better expectation values (E-values) of proteoforms than CZE-HCD and CZE-ETD, indicating higher quality of MS/MS spectra from AI-ETD respecting the number of sequence-informative fragment ions generated. CZE-AI-ETD showed great reproducibility regarding the proteoform and protein IDs with relative standard deviations less than 4% and 2% (n=3). Coupling size exclusion chromatography (SEC) to CZE-AI-ETD identified 3028 proteoforms and 387 proteins from E. coli cells with 1% spectrum-level and 5% proteoform-level false discovery rates. The data represents the largest top-down proteomics dataset using the AI-ETD method so far. Single-shot CZE-AI-ETD of one SEC fraction identified 957 proteoforms and 253 proteins. N-terminal truncations, signal peptide cleavage, N-terminal methionine removal and various post-translational modifications including protein N-terminal acetylation, methylation, S-thiolation, disulfide bonds, and lysine succinylation were detected.

Sample Processing Protocol

Escherichia coli (E. coli) K-12 MG1655 was cultured in Lysogeny broth (LB) medium (37 ºC) while shaking (225 rpm) until the OD600 reached 0.7. E. coli cells were harvested through centrifugation (4,000 rpm, 10 min.) and washed three times with phosphate-buffered saline (PBS). The E. coli cells were lysed in a lysis buffer containing 8 M urea, phosphatase inhibitor and protease inhibitor cocktail with the assistance of sonication on ice for 15 min with a Branson Sonifier 250 from VWR Scientific (Batavia, IL) after homogenization with a Homogenizer 150 from Fisher Scientific (Pittsburgh, PA). Following centrifugation (18,000 x g, 20 min), the supernatant containing extracted proteins was collected. A BCA assay was performed to determine the protein concentration using a small aliquot of the extracted proteins while leftover proteins were stored at -80 ºC for later use. The E. coli sample (~780 µg of proteins) was desalted using a C4 trap column (Bio-C4, 3 µm, 300Å, 4.0 mm i.d., 10 mm long) from Sepax Technologies, Inc. (Newark, DE) on a 1260 Infinity II HPLC system from Agilent (Santa Clara, CA). The eluate containing the E. coli proteins was collected and lyophilized. The E. coli protein sample was redissolved in 50 mM NH4HCO3 (pH 8.0), and an aliquot (~117 µg) with ~2 mg/mL protein concentration was used for the single-shot CZE-MS/MS experiments. The leftover E. coli proteins (~663 µg) was fractionated with size exclusion chromatography (SEC), followed by CZE-MS/MS analyses.

Data Processing Protocol

For the raw files from the fractionated E. coli sample using SEC, we employed the TopFD (TOP-down mass spectrometry feature detection) and TopPIC (TOP-down mass spectrometry based proteoform identification and characterization) pipeline for database search.[34] The 14 raw files corresponding to the 14 SEC fractions were analyzed. First, the 14 raw files were converted into 14 mzML files with the Msconvert tool.[35] Then, the spectral deconvolution was performed with TopFD to generate msalign files for database search using TopPIC (version 1.2.2). The E. coli (strain K12) UniProt database (UP000000625, 4313 entries, version June 28, 2018) was used for database search. The database search parameters were as follows. The maximum number of unexpected modifications was 2. The precursor and fragment mass error tolerances were 15 ppm. The maximum mass shift of unknown modifications was 500 Da. The FDRs were estimated using the target-decoy approach.[32,33] To reduce the redundancy of proteoform IDs, we reviewed the proteoforms that were identified by multiple MS/MS spectra as one same proteoform ID if these MS/MS spectra corresponded to the same proteoform feature reported by the TopFD or those proteoforms were from one same protein and had smaller than 1.2-Da mass differences. Two rounds of analyses were performed. In the first round, we used TopPIC to search each raw file against the E. coli proteome database separately. We then combined all the PrSMs identified from the 14 data files, and filtered the PrSM IDs with a 1% spectrum-level FDR.


Liangliang Sun, Michigan State University
liangliang sun, Department of Chemistry, Michigan State University, 578 S Shaw Lane, East Lansing, Michigan 48824, United States ( lab head )

Submission Date


Publication Date



Not available


Q Exactive


Not available


acetylated residue


Not available

Experiment Type

Top-down proteomics


    McCool EN, Lodge JM, Basharat AR, Liu X, Coon JJ, Sun L. Capillary Zone Electrophoresis-Tandem Mass Spectrometry with Activated Ion Electron Transfer Dissociation for Large-scale Top-down Proteomics. J Am Soc Mass Spectrom. 2019 PubMed: 31073891