Project PXD008425

PRIDE Assigned Tags:
Technical Dataset



A protein standard that emulates homology for the characterization of protein inference algorithms


A natural way to benchmark the performance of an analytical experimental setup is to use samples of known content, and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides. Hence, there is a need for mechanisms that infer proteins from a list of detected peptides. Today's commercially available samples of known content do not expose these complications in protein inference, as their contained proteins deliberately are selected for producing tryptic peptides that are unique to a single protein. For a realistic benchmark of protein inference procedures, there is, therefore, a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, based on E. coli expressed protein fragments.

Sample Processing Protocol

o generate the datasets, the PrEST sequences of the Human Proteome Atlas-project was scanned for 191 overlapping pairs of PrEST sequences. From these pairs, two pools, A and B, were created. Each pool contained only one of the PrESTs of each pair. A third pool was created by mixing the pool A and B, resulting in a pool A+B. An amount of 1.8 pmol of each PrEST was added to either the pool A or pool B, and an amount of 0.9 pmol was added to the pool A+B. A total protein amount of 10 $\mu$g from each of the pools were reduced with dithiothreitol and alkylated with iodoacetamide prior to trypsin digestion overnight and each pool was mixed into a background of a tryptic digest of 100 ng \textit{Escherichia coli} [BL21(DE3) strain], resulting in three mixtures, Mixture A, Mixture B, and Mixture A+B. A per sample amount of 1.1 $\mu$g of each of three Mixtures were analyzed in triplicate by LC-MS/MS in random order. The digests were loaded onto an Acclaim PepMap 100 trap column (75 $\mu$m $\times$ 2 cm, C18, 3 $\mu$m, 100 \AA), washed for 5 minutes at 0.25 $\mu$L/min with mobile phase A [95\% H$_2$O, 5\% DMSO, 0.1\% formic acid (FA)] and thereafter separated using a PepMap 803 C18 column (50 cm $\times$ 75 $\mu$m, 2 $\mu$m, 100 \AA) directly connected to a Thermo Scientific Q-Exactive HF mass spectrometer. The gradient went from 3\% mobile phase B [90\% acetonitrile (ACN), 5\% H$_2$O, 5\% DMSO, 0.1\% FA] to 8\% B in 3 min, followed by an increase up to 30\% B in 78 minutes, thereafter an increase to 43\% B in 10 min followed by a steep increase to 99\% B in 7 min at a flow rate of 0.25 $\mu$L/min. Data were acquired in data-dependent (DDA) mode, with each MS survey scan followed by five MS/MS HCD scans (AGC target 3e6, max fill time 150 ms, mass window of 1.2 \textit{m}/\textit{z} units, the normalized collision energy setting stepped from 30 to 24 to 18 regardless of charge state), with 30 s dynamic exclusion. Both MS and MS/MS were acquired in profile mode in the Orbitrap, with a resolution of 60,000 for MS, and 30,000 for MS/MS.

Data Processing Protocol

Left for the user of the set. The set should be used for benchmarking purpose.


Lukas Käll, KTH
Lukas Käll, Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology ( lab head )

Submission Date


Publication Date



Not available


Q Exactive


Not available


Not available

Experiment Type

Shotgun proteomics


    The M, Edfors F, Perez-Riverol Y, Payne SH, Hoopmann MR, Palmblad M, Forsström B, Käll L. A protein standard that emulates homology for the characterization of protein inference algorithms. J Proteome Res. 2018 PubMed: 29631402