Univariate and classification analysis reveals potential diagnostic biomarkers for early-stage ovarian cancer type 1 and type 2
Biomarkers for early detection of ovarian tumors are urgently needed. Tumors of the ovary grow within cysts and most are benign.Surgical sampling is the only way to ensure accurate diagnosis, but often leads to morbidity and loss of female hormones. The present study explored the deep proteome in well-defined sets of ovarian tumors, FIGO stage I, Type 1 (low-grade serous, mucinous, endometrioid; n=9), Type 2 (high-grade serous; n=9), and benign serous (n=9) using TMT-LC-MS/MS. We evaluated new bioinformatics tools in the discovery phase. This innovative selection process involved different normalizations, a combination of univariate statistics, and logistic model tree and naïve Bayes tree classifiers. We identified 142 proteins by this combined approach. One biomarker panel and nine individual proteins were validated in cyst fluid and serum: transaldolase-1, fructose-bisphosphate aldolase A (ALDOA), transketolase, ceruloplasmin, mesothelin, clusterin, tenascin- XB, laminin subunit gamma-1, and mucin-16. Six of the proteins were found significant (p<0.05) in cyst fluid while ALDOA was the only protein significant in serum. The biomarker panel achieved ROC AUC 0.96 and 0.57 respectively. We conclude that classification algorithms complement traditional statistical methods by selecting combinations that may be missed by standard univariate tests.
Sample Processing Protocol
All samples were thawed and filtered (0.22 μm filter, 10 min, 12 000 rpm), and the flow-through was diluted 1:50 with MQ water prior to total protein concentration measurement (Pierce 660 Protein 155 Assay, Thermo Scientific). Samples (500 μg total protein) were depleted using ProteoPrep Albumin and IgG Depletion Sample Prep Kit (Protea) or ProteoPrep Immunoaffinity Albumin & IgG Depletion Kit (Sigma-Aldrich) according to the manufacturer’s protocol, and total protein concentration was determined. Equal protein amounts from nine samples (3 benign, 3 type 1 OC, 3 type 2 OC) were pooled into a representative reference sample that was used on each 10-plex TMT (Fig. 1A). Aliquots containing 30 μg of each sample and the reference sample were digested with trypsin using the filter-aided sample preparation method. Briefly, protein samples were reduced with 100 mM dithiothreitol at 60°C for 30 min, transferred to 30 kDa MWCO Pall Nanosep centrifugal filters (Sigma-Aldrich), washed with 8 M urea repeatedly, and alkylated with 10 mM methyl methane thiosulfonate. Digestion was performed in 200 mM TEAB, 1% sodium deoxycholate (SDC) buffer at 37°C by addition of Pierce MS grade trypsin (Thermo Fisher Scientific) twice in a ratio of 1:25 relative to protein amount, and incubated overnight. Peptides were collected by centrifugation. Digested peptides were labeled using 170 TMT 10-plex isobaric mass tagging reagents (Thermo Scientific). SDC was removed by acidification with 10% TFA. The peptides were further purified and fractionated into 17 fractions by strong cation exchange chromatography (ÄKTA-system, Amersham-Pharmacia) on a PolySULFOETHYL A column (100 × 2.1mm, 5μm 300Å, PolyLC Inc.) from 20% to 40% B over 40 minutes. Solvent A was 25 mM ammonium formate, pH 2.8, and solvent B was 500 mM ammonium formate, pH 2.8. The fractions were desalted using PepClean C18 spin columns (Thermo Fisher Scientific) prior to LC-MS/MS. Fractions were analyzed on a Qexactive (QE) MS coupled to an Easy-nLCII (Thermo Fisher Scientific) and on an Orbitrap Fusion Tribrid MS interfaced to an Easy nanoLC1000 (Thermo Fisher Scientific). Peptides were separated on an in-house constructed analytical column (300 × 0.075 mm ID, 1.8 μm particles Reprosil-Pur C18-AQ particles from Dr. Maisch) using the gradient 7% to –80% B over 90 min at a flow of 200 nL/min. Solvent A was 0.2% formic acid in water and solvent B was 0.2% formic acid in acetonitrile. The first MS scans were performed at 140 000 and 60 000 resolutions with a mass range of m/z 400–1800 for the Qexactive and Fusion, respectively. MS/MS analysis was performed in a data-dependent mode, with the top 10 (QE) or top speed 3s (Fusion) of the most abundant doubly or multiply charged precursor ions in each MS scan selected for fragmentation (MS2) by stepped high energy collision dissociation (HCD) of NCE-values of 25, 35, and 45 (QE) and a combination of collision-induced dissociation (CID) at 30% and HCD at 60% (Fusion). For MS2 scans isolation windows of 2.0 and 1.6 Da were used and the resolution of detection of fragment ions was 35 000 (QE) and 60 000 (Fusion), respectively. The second MS analyses were performed with the Fusion and the precursor ion mass spectra were acquired at 120 000 resolutions, m/z range 350–1500 Da, and fragmentation analysis was performed in a data-dependent multinotch mode with a top speed cycle of 3 s for the most intense doubly or multiply charged precursor ions. MS2 spectra for identification were generated by CID at 30% in the ion trap followed by multinotch (simultaneous) isolation of the top 10 MS2 fragment ions selected for MS3 by HCD at 55% and detection in the Orbitrap at 60 000 resolutions, m/z range 100–500 Da. All fractions were analyzed twice. Dynamic exclusion was set to 30 seconds in both studies, enabling most of the co-eluting precursors to be selected for MS/MS.
Data Processing Protocol
The data files for the different sets were merged for identification and relative quantification using Proteome Discoverer version 1.4 (Thermo Fisher Scientific). The search was against the Human Swissprot Database (Swiss Institute of Bioinformatics, Switzerland) using the Mascot 220.127.116.11 (Matrix Science) search engine. Tryptic peptides with one missed cleavage were accepted. Methionine oxidation was set to variable modification and cysteine alkylation, and TMT labels on peptide N-terminals and lysines were selected as fixed modifications. We used precursor mass tolerances of 5 ppm or 10 ppm with fragment mass tolerances of 800 mmu or 500 mmu, respectively. The reference sample was used as the denominator and to calculate the ratios. The detected peptide threshold in the software was set to a 1% false discovery rate by searching against a reversed database, and identified proteins were grouped by sharing the same sequences to minimize redundancy. Only peptides unique to a given protein were considered for identification of the proteins, excluding those common to other isoforms or proteins of the same family. Protein abundance ratios from the three 10-plex TMT sets were merged into a single table referred to as the raw TMT dataset, containing all the samples.
Sahlgrenska Academy Proteomics Core, University of Gothenburg
Karin Sundfeldt, Professor, Department of Obstetrics and Gynecology at Institute of Clinical Sciences, Kvinnokliniken SU Östra 416 85 Gothenburg, Sweden ( lab head )
Marcišauskas S, Ulfenborg B, Kristjansdottir B, Waldemarson S, Sundfeldt K. Univariate and classification analysis reveals potential diagnostic biomarkers for early stage ovarian cancer Type 1 and Type 2. J Proteomics. 2019 PubMed: 30710757