Increasingly, biochemical co-fractionation-based approaches are used to study interactomes and protein complexes at high throughput. The devised methods facilitate the qualitative assignment and prediction of hundreds of putative cellular assemblies in one experiment and without dependency on genetic engineering to introduce affinity tags. The present dataset consists of a native proteome extracted by mild lysis from the HEK293 cell line and fractionated into 80 fractions along high resolution size exclusion chromatography, and each fraction analyzed via bottom-up SWATH mass spectrometry. In multi-tiered targeted analysis, from this fragment ion level chromatographic data, quantitative complex assembly information of the proteome is reconstructed in three steps, (i) peptide-centric detection of peptide analytes within each SEC fraction based on fragment ion co-elution groups; (ii) protein-centric detection of protein elution in SEC based on fragment peptide co-elution groups in SEC; (iii) complex-centric detection and quantification of protein complexes and -variants based on component subunit protein co-elution groups in SEC. The data delineate a global picture of quantitative complex formation within a human proteome, including deconvolution of novel subversions and assembly intermediates of critical cellular complexes with essential functions.
Sample Processing Protocol
HEK293 cells were obtained from ATCC and cultured in DMEM (supplemented with 10 % FCS, 50 μg/mL penicillin, 50 μg/mL streptomycin) to 80 % confluency before flow harvest. Cells were collected by centrifugation at 500×g and shock-frozen in liquid nitrogen. Cell pellets were thawed by resuspension in lysis buffer (50 mM HEPES pH 7.5, 150 mM NaCl, 50 mM NaF, 200 μM Na3VO4, 1 mM PMSF, 1× protease inhibitor mix (Sigma Aldrich P8340) and 0.5 % IGEPAL-CA630 (Sigma Aldrich I8896)) and 10’ incubation on ice. Lysates were cleared by 15 minutes of ultracentrifugation (100,000×g, 4 °C, Beckman Coulter Optima TLX, Ti-100) and lysis buffer exchanged to SEC buffer (50 mM HEPES pH 7.5, 150 mM NaCl) over 30 kDa molecular weight cut-off membrane (Amicon Ultra-15, Millipore, MA, USA), at a ratio of 1:50, split in three dilution and re-concentration steps of 1:2, 1:5 and 1:5. Proteins were concentrated to 25-30 mg/ml (as judged by OD280). Potential precipitate was spun out by 5 min of centrifugation at 16.9 krcf at 4 °C immediately before protein level fractionation by SEC. SEC fractionation was performed on an Agilent 1100 milliliter flow HPLC system (Agilent, CA, USA) utilizing a Yarra-SEC-4000 column (pore size 500 Å, dimensions 300×7.8 mm, Phenomenex, CA, USA) in 50 mM HEPES pH 7.5, 150 mM NaCl with temperature controlled at 4 °C and at a flow rate of 500 ul/min. 1 ug of concentrated lysate was fractionated per run, collecting 96 fractions (0.19 min/fraction; retention time range 10 min – 28 min; elution volume 5 – 14 mL) and considering 81 thereof for further analysis. Two consecutive runs were pooled to yield the final set of fractions then subjected to bottom-up proteomic analysis. An aliquot of the unfractionated mild extraction proteome sample (~ 25 µg by OD280, 1/40th of SEC input volume) was diluted to fraction volume (186 µL) with SEC buffer and included in peptide sample preparation for LC-MS analysis. Proteomic samples were supplemented with 1 % (m/v) sodium deoxy-cholate and heated to 95 °C for 10’. Denatured proteins were reduced by 5mM tris(carboxyethyl)phosphine (Sigma-Aldrich, 20’ at RT); from a stock solution in 50mM ammonium bi-carbonate buffer titrated to pH 8.8 to avoid precipitation of sodium deoxy-cholate; followed by alkylation by 10 mM iodo-acetamide (Sigma-Aldrich, 20’ at RT, in the dark). Proteins were digested by 0.2 µg sequencing grade porcine trypsin (Promega) by incubation at 37 °C over night. Digests were stopped and deoxy-cholate precipitated by addition of trifluoroacetic acid to 1 % (v/v). Precipitated deoxycholate was spun out by centrifugation (10’, 3220×g) before desalting and clean-up of the supernatant, leaving ~ 30 µL to avoid transfer of precipitated deoxy-cholate, using a 96-well C-18 spin column plate (10-100 µg capacity, The Nest Group) according to manufacturers instructions. Peptide eluates were evaporated to dryness in a heated speedvac and re-suspended by 5’ sonification in 18 µL of 2 % acetonitrile and 0.1 % formic acid in water. To each fraction, the iRT retention time kit was added according to manufacturers instructions (Biognosys AG, Switzerland), albeit at a 1:20 instead of 1:10 ratio owed to large injection volumes in subsequent LC-MS analysis. Peptide samples generated from the proteomic sub-fractions were analysed via LC-MS/MS in both DDA and DIA acquisition mode, side-by-side per chromatographic fraction, sliding from early to late-eluting fractions and with intermittent re-calibration runs of E. coli β galactosidase standards. The LC setup and gradient was identical for the alternating measurements in DDA and DIA mode, employing an Eksigent NanoLC Ultra 2D Plus HPLC system (Eksigent AS2-1) to deliver at 300 nL/min flow a 120-min gradient from 2–35% (buffer A 0.1% (v/v) formic acid, 2% (v/v) acetonitrile, buffer B 0.1% (v/v) formic acid, 90% (v/v) acetonitrile) after direct injection onto PicoFrit emitter (75 µM inner diameter, New Objective) packed with a 20 cm column bed of Magic C18 AQ stationary phase (3-µm bead size, 200-Å pore size, Michrom/Bischoff Chromatography). Data-dependent acquisition operated with the following parameters. MS1 survey spectra were acquired for the range of 360–1,460 m/z with 500 ms fill time cap. The top 20 most intense precursors of charge state 2–5 were selected for CID fragmentation and MS2 spectra were collected for the range of 50–2,000 m/z, with 100 ms fill time cap and dynamic exclusion of precursor ions from reselection for 15 s, essentially as described121. Data-independent acquisition was performed as described, using an updated SWATH scheme with 64 variable windows, calibrated to align ion current per SWATH window in a typical human cell lysate, essentially as described57. SWATH cycles (64 x 50 ms accumulation time) were interspersed by MS1 survey scans for the range of 360–1,460 m/z with a 250 ms fill time cap, resulting in an overall period cycle time of 3498 ms. The MS2 mass range was set to 200 – 2000 m/z.
Data Processing Protocol
DDA data analysis DDA-MS data were processed using the MaxQuant software package (version 126.96.36.199) with the human canonical SwissProt reference database (build Aug-2014) and the following parameters: Enzyme specificity: Trypsin/P; missed cleavages: 2; Fixed modifications: Carbamidomethyl (C); Variable modifications: Oxidation (M) and Acetyl (Protein N-term); first search: enabled (histone subset DB); second peptide search: enabled; match between runs: enabled; match type: from and to (experimental design: SEC fractions were defined as sequential fractions as collected; each fraction was considered a separate experiment; resulting in matching from/to +-1 chromatographic fractions) LFQ was disabled. The peptide.txt output was considered for further analysis, using precursor ‘Intensity’ quantitative values as ‘MS1 quantification’ and spectral count values from the ‘Experiment’ column for ‘spectral counting quantification’. DIA/SWATH-MS data analysis The DIA/SWATH-MS data were analyzed by targeted query of all (>200,000) precursors for 10,322 human proteins contained in the combined human assay library (CAL, Rosenberger et al. 2014 and http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000954). The analysis employed, sequentially, the tools OpenSWATH, pyProphet, and TRIC using the iPortal framework. Notably, a fixed scoring function was trained on DIA data from the unfractionated sample and then employed to consistently score peak groups in the DIA data obtained along the chromatographic fractions, employing the following parameters for the analyses: First, the global scorer was trained on the unfractionated sample using the following parameters. ISOTOPIC_GROUPING = false ALIGNER_DSCORE_CUTOFF = 1 WORKFLOW = openswath_2014-12-01-154112__2015-12-02-191222 imsbtools/20150319 applicake@2064420 msproteomicstools@c10a2b8 openms@4bca6fc ALIGNER_REALIGN_METHOD = splineR_external WINDOW_UNIT = Thomson TRAML = /cluster/apps/imsbtools/20150319/dss_client/store/1/3CAF2B31-FA17-4025-A50C-B7C23F19D4F0/ad/30/37/PDB-PAN_HUMAN-4-64/original/20140815170149_PHL4_64/phl004_canonical_s64_osw_decoys.TraML ALIGNER_MAX_RT_DIFF = auto_3medianstdev MIN_UPPER_EDGE_DIST = 1 ALIGNER_TARGETFDR = 0.01 IRTTRAML = /cluster/apps/imsbtools/stable/files/hroest_DIA_iRT.TraML MPR_MAINVAR = xx_swath_prelim_score COMMENT = SWATHextraction_PHL4_SECINPUT_STDsettings MIN_COVERAGE = 0.6 MIN_RSQ = 0.95 PARENT-DATA-SET-CODES = 20140927004415575-979553, 20141001060028291-980545, 20141001095128199-980567, PDB-PAN_HUMAN-4-64 RT_EXTRACTION_WINDOW = 600 MPR_VARS = library_corr yseries_score xcorr_coelution_weighted massdev_score norm_rt_score library_rmsd bseries_score intensity_score xcorr_coelution log_sn_score isotope_overlap_score massdev_score_weighted xcorr_shape_weighted isotope_correlation_score xcorr_shape ALIGNER_FRACSELECTED = 0 EXTRACTION_WINDOW = 0.05 MPR_NUM_XVAL = 10 REQUANT_METHOD = allTrafo ALIGNER_METHOD = global_best_overall DO_CHROMML_REQUANT = false Second, the scorer was applied in OpenSWATH analysis of the SEC fractions with the following parameters: ISOTOPIC_GROUPING = false ALIGNER_DSCORE_CUTOFF = 1 WORKFLOW = openswath_2016-03-22-131417__2016-05-19-184838 imsbtools/20150319 applicake@c67bdd3 msproteomicstools@c10a2b8 openms@4bca6fc ALIGNER_REALIGN_METHOD = splineR_external WINDOW_UNIT = Thomson TRAML = /IMSB/sonas/biol_imsb_aebersold_scratch-2/datasets/PDB-PAN_HUMAN-4-64/phl004_canonical_s64_osw_decoys.TraML ALIGNER_MAX_RT_DIFF = auto_3medianstdev MIN_UPPER_EDGE_DIST = 1 ALIGNER_TARGETFDR = 0.05 MPR_LDA_PATH = /IMSB/ra/heuselm/SWATH_Models/SEC-SWATH_Input_heuselm_L141001_001/heuselm_L141001_001_scorer.bin IRTTRAML = /cluster/apps/imsbtools/stable/files/hroest_DIA_iRT.TraML MPR_MAINVAR = xx_swath_prelim_score COMMENT = HEKSEC_4_PHL4_modelSECInput MIN_COVERAGE = 0.6 MIN_RSQ = 0.95 RT_EXTRACTION_WINDOW = 600 MPR_VARS = library_corr yseries_score xcorr_coelution_weighted massdev_score norm_rt_score library_rmsd bseries_score intensity_score xcorr_coelution log_sn_score isotope_overlap_score massdev_score_weighted xcorr_shape_weighted isotope_correlation_score xcorr_shape ALIGNER_FRACSELECTED = 0 EXTRACTION_WINDOW = 0.05 MPR_NUM_XVAL = 10 ALIGNER_METHOD = global_best_overall MPR_WT_PATH = /IMSB/ra/heuselm/SWATH_Models/SEC-SWATH_Input_heuselm_L141001_001/heuselm_L141001_001_weights.txt DO_CHROMML_REQUANT = false The run-wise pyProphet results were aligned using TRIC, with a loose target FDR of 5%, followed by downstream filtering and strict FDR control with additional, chromatography-based discriminant scores. TRIC processing selected peak groups for alignment using an m_score (q-value) cutoff of 0.3939430 %. For the aligned values a cutoff of 5.0 % was employed.
Corresponding dataset(s) in other omics resources