Project PXD000561



A draft map of the human proteome


The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here, we report a draft map of the human proteome based on high resolution Fourier transform mass spectrometry-based proteomics technology. In-depth proteomic profiling of 30 histologically normal human samples including 17 adult tissues, 7 fetal tissues and 6 purified primary hematopoietic cells resulted in identification of proteins encoded by greater than 17,000 genes accounting for ~84% of the total annotated protein-coding genes in humans. This large human proteome catalog (available as an interactive web-based resource at will complement available human genome and transcriptome data to accelerate biomedical research in health and disease. The authors request that those considering use of this dataset for commercial purposes contact

Sample Processing Protocol

17 adult tissues, 7 fetal tissues, and 6 hematopoietic cell types were lysed in lysis buffer containing 4% SDS, 100mM DTT and 100 mM Tris pH7.5, homogenized, sonicated, heated for 10-15 min at 750C cooled and centrifuged at 2,000 rpm for 10 minutes. The protein concentration of the cleared lysate was estimated using BCA assay and equal amounts from three donors were pooled for further fractionation. Proteins from SDS lysates were separated on SDS-PAGE and in-gel digestion was carried out using trypsin. The peptides were extracted, vacuum dried and stored at -80ºC until further analysis. 400 µg proteins were subjected to in-solution trypsin digestion following reduction and alkylation with DTT and IAA respectively. The peptide digest were then desalted using Sep-Pak C18 columns (Waters Corporation, Milford, MA), lyophilized and fractionated using high pH reverse phase chromatography using XBridge C18, 5 µm 250 x 4.6 mm column (Waters, Milford, MA). 96 fractions were collected which were then concatenated to 24 fractions, vacuum dried and stored at -80 until further LC-MS analysis.

Data Processing Protocol

Mass spectrometry data obtained from all LC-MS analysis were searched against Human RefSeq50 database (containing 33,833 entries along with common contaminants) using Sequest and Mascot (version 2.2) search algorithms through Proteome Discoverer 1.3 (Thermo Scientific, Bremen, Germany). Enzyme specificity was set as trypsin with maximum one missed cleavage allowed. The minimum peptide length was specified to be 6 amino acids. Carbamidomethylation of cysteine was specified as fixed modification and oxidation of methionine, acetylation of protein N-termini and cyclization of N-terminal glutamine were included as variable modifications. The mass error of parent ions was set to 10 ppm and 0.05 Da for fragment ions. The data was also searched against a decoy database and MS/MS identifications of < 1% false discovery rate (FDR) score threshold was considered for further analysis. To enable identification of novel peptides and correction of existing gene annotations in the human genome, six alternative databases namely- 1) six frame translated genome database 2) three frame translated RefSeq mRNA sequences from NCBI 3) three frame translated pseudogene database with sequences derived from sequences from NCBI and Gerstein’s pseudogenes 4) three frame translated non coding RNAs from NONCODE and 5) N-terminal UTR database of RefSeq mRNA sequences from NCBI were used. Unmatched MS/MS spectra peaklist files were extracted from the protein database search result and searched against these databases using X!Tandem search engine A decoy database was created for each database by reversing the sequences from a target database. Following parameters were common to all searches - 1) Precursor mass error set at 10 ppm, 2) Fragment mass error set at 0.05 Da, 3) carbamidomethylation of cysteine was defined as fixed modification, 4) Oxidation of methionine was defined as variable modification. 5) Only tryptic peptides with up to 2 missed cleavages were considered. The sequences of common contaminants including trypsin used as protease were appended to the database engine installed locally.


Akhilesh Pandey, Johns Hopkins University
Akhilesh Pandey, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205 USA ( lab head )

Submission Date


Publication Date



    Pinto SM, Manda SS, Kim MS, Taylor K, Selvan LD, Balakrishnan L, Subbannayya T, Yan F, Prasad TS, Gowda H, Lee C, Hancock WS, Pandey A. Functional annotation of proteome encoded by human chromosome 22. J Proteome Res. 2014 Mar 26 PubMed: 24669763

    Kim MS, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, Madugundu AK, Kelkar DS, Isserlin R, Jain S, Thomas JK, Muthusamy B, Leal-Rojas P, Kumar P, Sahasrabuddhe NA, Balakrishnan L, Advani J, George B, Renuse S, Selvan LD, Patil AH, Nanjappa V, Radhakrishnan A, Prasad S, Subbannayya T, Raju R, Kumar M, Sreenivasamurthy SK, Marimuthu A, Sathe GJ, Chavan S, Datta KK, Subbannayya Y, Sahu A, Yelamanchi SD, Jayaram S, Rajagopalan P, Sharma J, Murthy KR, Syed N, Goel R, Khan AA, Ahmad S, Dey G, Mudgal K, Chatterjee A, Huang TC, Zhong J, Wu X, Shaw PG, Freed D, Zahari MS, Mukherjee KK, Shankar S, Mahadevan A, Lam H, Mitchell CJ, Shankar SK, Satishchandra P, Schroeder JT, Sirdeshmukh R, Maitra A, Leach SD, Drake CG, Halushka MK, Prasad TS, Hruban RH, Kerr CL, Bader GD, Iacobuzio-Donahue CA, Gowda H, Pandey A. A draft map of the human proteome. Nature. 2014 May 29;509(7502):575-81 PubMed: 24870542