Inference and quantification of peptidoforms in large sample cohorts by SWATH-MS: PTM-SWATH-MS Gold Standard data set
The PTM-SWATH MS Gold Standard data set consists of a previously published (Soste et al., 2014, PMID:25194849) set of 579 unpurified, synthetic, heavy-isotope labeled phosphopeptides (Thermo Scientific Biopolymers). These phosphopeptides represent biologically relevant sequences from S. cerevisiae proteins, which have been found altered in their phosphorylation status under various conditions. The complete peptide set contains a mixture of singly and doubly phosphorylated sequences with in average more than 3 modifiable residues per peptide (serines, threonines or tyrosins, often in close proximity to each other). All peptides were mixed with equal volumes (concentrations are unknown due to the unpurified status of the peptides) and the resulting peptide mix was either analyzed directly in DDA mode for assay library generation or spiked into a human cell line background proteome in a 13-step dilution series and analyzed in SWATH mode for the generation of the SWATH Gold Standard data set.
Sample Processing Protocol
DDA: The synthetic phosphopeptide mix (without added background) was measured on an AB Sciex 5600+ TripleTOF mass spectrometer operated in DDA mode in technical triplicates. The mass spectrometer was interfaced with an Eksigent NanoLC Ultra 2D Plus HPLC system as previously described14,62,63. Peptides were directly injected onto a 20-cm PicoFrit emitter (New Objective, self-packed to 20 cm with Magic C18 AQ 3-μm 200-Å material), and then separated using a 120- min gradient from 2–35% (buffer A 0.1% (v/v) formic acid, 2% (v/v) acetonitrile, buffer B 0.1% (v/v) formic acid, 90% (v/v) acetonitrile) at a flow rate of 300 nL/min. MS1 spectra were collected in the range 360–1,460 m/z for 500 ms. The 20 most intense precursors with charge state 2–5 which exceeded 250 counts per second were selected for fragmentation, and MS2 spectra were collected in the range 50–2,000 m/z for 150 ms. The precursor ions were dynamically excluded from reselection for 20 s. DIA: The 13-step dilution series of the synthetic heavy phosphopeptide mix (spiked into a constant human background) was measured in SWATH-MS mode on the same LC-MS/MS systems used for DDA measurements in technical triplicates14,62,63. In SWATH-MS mode the SCIEX 5600 plus TripleTOF instrument was specifically tuned to optimize the quadrupole settings for the selection of 64 variable wide precursor ion selection windows. The 64-variable window schema was optimized based on a normal human cell lysate sample, covering the precursor mass range of 400–1,200 m/z. The effective isolation windows can be considered as being 399.5~408.2, 407.2~415.8, 414.8~422.7, 421.7~429.7, 428.7~437.3, 436.3~444.8, 443.8~451.7, 450.7~458.7, 457.7~466.7, 465.7~473.4, 472.4~478.3, 477.3~485.4, 484.4~491.2, 490.2~497.7, 496.7~504.3, 503.3~511.2, 510.2~518.2, 517.2~525.3, 524.3~533.3, 532.3~540.3, 539.3~546.8, 545.8~554.5, 553.5~561.8, 560.8~568.3, 567.3~575.7, 574.7~582.3, 581.3~588.8, 587.8~595.8, 594.8~601.8, 600.8~608.9, 607.9~616.9, 615.9~624.8, 623.8~632.2, 631.2~640.8, 639.8~647.9, 646.9~654.8, 653.8~661.5, 660.5~670.3, 669.3~678.8, 677.8~687.8, 686.8~696.9, 695.9~706.9, 705.9~715.9, 714.9~726.2, 725.2~737.4, 736.4~746.6, 745.6~757.5, 756.5~767.9, 766.9~779.5, 778.5~792.9, 791.9~807, 806~820, 819~834.2, 833.2~849.4, 848.4~866, 865~884.4, 883.4~899.9, 898.9~919, 918~942.1, 941.1~971.6, 970.6~1006, 1005~1053, 1052~1110.6, 1109.6~1200.5 (including 1 m/z window overlapping). SWATH MS2 spectra were collected from 50 to 2,000 m/z. The collision energy (CE) was optimized for each window according to the calculation for a charge 2+ ion centered upon the window with a spread of 15 eV. An accumulation time (dwell time) of 50 ms was used for all fragment-ion scans in high-sensitivity mode and for each SWATH-MS cycle a survey scan in high- resolution mode was also acquired for 250 ms, resulting in a duty cycle of ~3.45 s. Per MS injection 2 μg of protein amount was loaded onto the HPLC column.
Data Processing Protocol
Assay library generation using DDA data: All original (Soste et al., 2014, PMID:25194849) and raw instrument data acquired in DDA mode were centroided and converted to mzXML using qtofpeakpicker (ProteoWizard 3.0.10200) as described previously (Schubert et al., PMID:25675208). The phosphopeptide sequences were appended with a set of contaminant proteins, iRT peptide sequences and pseudo-reverse decoys. The files were searched using Comet (2015.02) using the default parameters for high mass accuracy instruments: peptide mass tolerance: 20 ppm (monoisotopic), isotope error enabled, fully tryptic digestion with max 2 missed cleavages, variable M (Oxidation), variable K(Label:13C(6)15N(2)), variable R(Label:13C(6)15N(4)), variable STY (Phospho), max variable mods: 5. PeptideProphet (TPP69 4.8.0) with parameters - dDECOY_ -OAPdlIwt was run independently per file and iProphet was used to combine all results. SpectraST (TPP 4.8.0) was used to generate a spectral library of all peptide identifications at iProphet FDR 1% with the following parameters: -cP0.8437 -c_IRR -c_IRTirtkit.txt -cICID-QTOF -c_RDYDECOY -cAC – cM. OpenMS was used for all following steps: ConvertTSVToTraML was used to convert the SpectraST MRM file to a TraML. OpenSwathAssayGenerator was applied on the TraML with following parameters: -swath_windows_file swath64.txt - allowed_fragment_charges 1,2,3,4 -enable_ms1_uis_scoring - max_num_alternative_localizations 20 -enable_identification_specific_losses - enable_identification_ms2_precursors. OpenSwathDecoyGenerator was applied to append decoys to the assays using the following parameters: -method shuffle - append -mz_threshold 0.1 -remove_unannotated. All OpenMS tools were executed using the modified chemistry parameters for phosphorylation (OpenMS.phospho.params) (see Supplementary Data). Assay library generation using DIA data: All raw instrument data acquired in DIA mode were centroided and converted using the AB SCIEX MS Data Converter (Beta Version 1.3) and msconvert (ProteoWizard 3.0.7162) using the parameters as suggested. The signal extraction module of DIA-Umpire (1.2, 2014.10) was applied to the 13-step dilution series SWATH-MS data set using recommended parameters. The ORF protein translation FASTA database for yeast was obtained from the Saccharomyces Genome Database (2015-02-24) and appended with the non-redundant reviewed human protein FASTA obtained from the UniProtKB/Swiss-Prot (2015-02-23) and the iRT peptide sequences and pseudo-reverse decoys. The other steps were conducted as described above. OpenSWATH / pyprophet: OpenSwathWorkflow was run with the following parameters -min_upper_edge_dist 1 - mz_extraction_window 0.05 -rt_extraction_window 600 - extra_rt_extraction_window 100 -min_rsq 0.95 -min_coverage 0.6 - use_ms1_traces -enable_uis_scoring -Scoring:uis_threshold_peak_area 0 - Scoring:uis_threshold_sn -1 -Scoring: stop_report_after_feature 5 -tr_irt hroest_DIA_iRT.TraML. The following subset of scores was used on MS2-level: xx_swath_prelim_score library_corr yseries_score xcorr_coelution_weighted massdev_score norm_rt_score library_rmsd bseries_score intensity_score xcorr_coelution log_sn_score isotope_overlap_score massdev_score_weighted xcorr_shape_weighted isotope_correlation_score xcorr_shape. All MS1 and UIS scores were used for pyprophet. pyprophet was run on a concatenated file of all 13 runs with the following parameters: --final_statistics.emp_p --qvality.enable --qvality.generalized -- ms1_scoring.enable --uis_scoring.enable --semi_supervised_learner.num_iter=20 --xeval.num_iter=20 --ignore.invalid_score_columns.
George Rosenberger, Columbia University
Ruedi Aebersold, ETH Zurich Prof. Dr. Ruedi Aebersold Institute of Molecular Systems Biology Head of Department of Biology HPT E 78 Auguste-Piccard-Hof 1 CH-8093 Zurich Switzerland ( lab head )
Rosenberger G, Liu Y, Röst HL, Ludwig C, Buil A, Bensimon A, Soste M, Spector TD, Dermitzakis ET, Collins BC, Malmström L, Aebersold R. Inference and quantification of peptidoforms in large sample cohorts by SWATH-MS. Nat Biotechnol. 2017 Jun 12 PubMed: 28604659