Proteogenomics of malignant melanoma cell lines: searching for genetically encoded point mutations in LC-MS/MS data
In this study we performed proteogenomic analysis for 9 cell lines of malignant melanoma. The main objectives of the study were identifying the variants originating from point mutations and analyzing the effect of exome data filtering on the outcome of variant identification.
Sample Processing Protocol
8 primary cell lines of human malignant melanoma were under study. Two following annotations were used: 1, 4, 5, 6, 7-3, 8, 604, 605, 606 for the exome sequencing samples; P, SI, ME, 82, 335, KIS, KOR, G for protein samples, respectively. Single nucleotide polymorphisms (SNPs) and indels were annotated for each sample from exome sequencing data using GATK software from Broad Institute and used to generate customized protein sequence databases. For each cell line, 6 databases with different confidence thresholds were generated: raw database - without filtering; lev0 - default thresholds were applied according to the recommendations from Broad Institute; lev1-lev4 - more strict filtering, corresponding to higher thresholds on SNP/indel quality depth and other metrics reported by GATK. Proteomic data were acquired via HPLC-MS/MS analysis for the whole-cell trypsin digests of the 6 biological replicates per cell line. High resolution Q Exactive Orbitrap mass spectrometer and 4-hour LC gradient were employed for the analysis.
Data Processing Protocol
Data were processed using X!Tandem and MSGF+ search engines separately followed by separate post-search filtering of variant peptide identifications to 1% FDR based on target-decoy approach using MP score [Ivanov M. V., et al. (2014) Journal of Proteome Research, 13(4), 1911–20] and in-house scripts based on pyteomics library [Goloborodko A.A., et al. (2013) JASMS, 24(2):301-4]. The searches were performed against particular cell line sequence databases and the databases of other cell lines ("all vs. all") with different Variant peptides identified using databases with different confidence thresholds We compared the results of variant peptide identification using customized databases with different confidence thresholds. The cell line under study was 82 (6), and the variant identifications were counted in the union of the 6 biological replicates. The assumption was that the bigger databases corresponding to lower confidence thresholds (or no threshold for the case of raw database) contain more false matches that cannot be confirmed at the proteome level. However, it turned out that the number of identified variant peptides grows almost linearly with the increase in the database size. The effect of the database size on the number of false identification was studied by searching one LC-MS/MS replicate of the same cell line against its own databases and against the united databases of all other cell lines. The variant peptides of other databases present also in the database of cell line 6 (level 0 or raw) were excluded. The exclusion of variants present in database 6lev0 from the "wrong database search" results leads to a substantial number of variant identifications, while the exclusion of the variants present in "raw" database of cell line 6 resulted in few (0 to 3) false variant identifications even for large variant database sizes. This means that the list of variant sequences in level 0 database is not complete, and the use of unfiltered databases is preferable. Open search implying wide precursor mass window and small fragment mass tolerance is used to define the modification pattern in LC-MS/MS datasets. In this work such search was performed using X!Tandem with precursor mass accuracy of 500 Da in order to find the most frequent mass shifts mimicking single amino acid substitutions. The identified variant peptides containing such substitutions are likely to be false matches. "Open search percent" was calculated as a ratio (in %) between the number of PSMs with a certain mass shift and the total number of PSMs with mass shifts corresponding to any amino acid substitutions (Except zero mass shift).