DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics
This dataset consists of 44 raw MS files, comprising 27 DIA (SWATH) and 15 DDA runs on a TripleTOF 5600 and of two raw mass spectrometry files acquired on a Q Exactive. The composition of the dataset is described in the manuscript by Tsou et al., titled: "DIA-Umpire: comprehensive computational framework for data independent acquisition proteomics", Nature Methods, in press Raw files are deposited here in ProteomeXchange and are associated with the DIA-Umpire processed data. All DIA-Umpire processed results for each sample together with DDA results are deposited in separated folders. Also see the "DataSampleID.xlsx" associated with this Readme file. Internal reference from the Gingras lab ProHits implementation: Project 94, Export version VS2 (Tsou_DIA-Umpire)
Sample Processing Protocol
Sample preparation Proteomics Dynamic Range Standard (UPS2) sample was acquired from Sigma-Aldrich (St. Louis, MO), the MassPREP E. coli Digest Standard was acquired from Waters (Milford, MA) and the MS compatible human protein extract digest was from Promega (Madison, WI). The UPS2 samples were reduced with 5 mM TCEP (tris(2-carboxyethyl)phosphine), alkylated with 50 mM iodoacetamide, and digested overnight with 1 µg trypsin (Promega, Madison, WI) in 100 mM Tris pH 8 at 37°C. UPS2, E. coli, and human peptides were acidified with formic acid and loaded at various concentrations, alone or in combination, onto an in-house made 75 µm x 12 cm analytical column emitter packed with 3 µm ReproSil-Pur C18-AQ (Dr. Maisch HPLC GmbH, Germany). A NanoLC-Ultra 1D plus (Eksigent, Dublin CA) nano-pump was used to deliver a 90 minute gradient from 2% to 35% acetonitrile with 0.1% formic acid, followed by a 30 minute wash with 80% acetonitrile prior to re-equilibration to 2% acetonitrile with 0.1% formic acid. The AP-MS sample preparation was previously described (Lambert et al., Nature Methods, 2013), and the DIA files were previously published as part of this manuscript (they are included here simply for the ease of library regeneration. The DIA/SWATH runs for the AP samples were performed with the same parameters as those described in Lambert et al., but by replacing the 50 ms MS1 scan described in this manuscript by a 250 ms MS1 scan. Mass spectrometry analysis Each sample was analyzed in duplicate (E. coli, human) or in triplicate (UPS2) on a TripleTOFTM 5600 instrument (AB SCIEX, Concord, Ontario, Canada) once using DDA and once using DIA (SWATH) with an extended ion accumulation time of 250 ms for MS1 scans. UPS2 samples were also analyzed using SWATH with the previously-reported MS1 survey scan ion accumulation time of 50 ms. The DDA run consisted of one 250 ms MS1 TOF survey scan covering 400–1300 Da followed by ten data dependent 100 ms MS/MS scans (1 Da isolation window, scan range 100–2000 Da) with precursors excluded for 15 s after being selected for fragmentation once (dynamic exclusion option). The SWATH run consisted of one 250 ms or 50 ms MS1 TOF survey scan followed by 34 sequential MS2 windows of 25 Da covering a mass range of 400–1250 Da at 95 ms per each SWATH scan. The DIA run (Thermo Q Exactive Plus) consisted of one MS survey scan (17500 resolution, target 3e6, max fill time 50 ms) every 10 scans, and 24 sequential MS2 windows of 26 amu (17500 resolution, target 5e5, max fill time 80 ms) covering a mass range from 400–1000 Da. The DDA run (Thermo QE Plus) consisted of one MS survey scan (70000 resolution, target 1e6, max fill time 30 ms) followed by fifteen MS/MS scans (2 Da isolation, 17500 resolution, target 1e5, max fill time 125 ms), with former precursors excluded for 20 s after being selected once.
Data Processing Protocol
Data conversion and computational analysis The .wiff raw files from AB SCIEX 5600 TripleTOF were converted into mzML format by the AB MS Data Converter (AB SCIEX version 1.3 beta) using “centroid” option, and the resulting mzML files were further converted into mzXML format by the msconvert.exe from ProteoWizard (version 3.0.4462) package using default parameters. The .raw files from Thermo Q Exactive Plus were directly converted into mzXML files by msconvert.exe from ProteoWizard. DIA mzXML files were processed by DIA-Umpire with a series of computational algorithms. It begins with a two dimensional (m/z – retention time) feature detection algorithm that detects all possible precursor and fragment ion signals in MS1 and DIA MS2 data, and unfragmented precursor ions in DIA MS2 data. For each detected precursor feature, sets of fragment peaks are grouped and stored as precursor-fragments groups. All pseudo MS/MS spectra and DDA spectra were searched by X! Tandem, Comet, and MSGF+ and followed by PeptideProphet, iProphet, and ProteinProphet analyses. The resulting peptide and protein identification lists were filtered using computed peptide and protein probabilities to achieve 1% false discovery rate (FDR) estimated using the target-decoy approach. Identified peptides and proteins were then quantified using MS1 precursor feature intensity and from the fragment ion intensities. Results at protein, peptide ion, and fragment levels are stored in corresponding summary tables.
Tsou CC, Avtonomov D, Larsen B, Tucholska M, Choi H, Gingras AC, Nesvizhskii AI. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat Methods. 2015 Jan 19 PubMed: 25599550