Early Pleistocene enamel proteome sequences from Dmanisi resolve Stephanorhinus phylogeny
Ancient DNA (aDNA) sequencing has enabled reconstruction of speciation, migration, and admixture events for extinct taxa. Outside the permafrost, however, irreversible aDNA post-mortem degradation has so far limited aDNA recovery to the past ~0.5 million years (Ma). Contrarily, multiple analyses suggested the presence of protein residues in Cretaceous fossil remains. Similarly, tandem mass spectrometry (MS) allowed sequencing ~1.5 million year (Ma) old collagen type I (COL1), though with limited phylogenetic use. In the absence of molecular evidence, the speciation of several Early and Middle Pleistocene extinct species remain contentious. In this study, we address the phylogenetic relationships of the Eurasian Pleistocene Rhinocerotidae using a ~1.77 Ma old dental enamel proteome of a Stephanorhinus specimen from the Dmanisi archaeological site in Georgia (South Caucasus). Molecular phylogenetic analyses place the Dmanisi Stephanorhinus as a sister group to the woolly (Coelodonta antiquitatis) and Merck’s rhinoceros (S. kirchbergensis) clade. We show that Coelodonta evolved from an early Stephanorhinus lineage and that the latter includes at least two distinct evolutionary lines. As such, the genus Stephanorhinus is currently paraphyletic and requires systematic revision. We demonstrate that Early Pleistocene dental enamel proteome sequencing overcomes the limits of ancient collagen- and aDNA-based phylogenetic inference. It also provides additional information about the sex and taxonomic assignment of the specimens analysed. Dental enamel, the hardest tissue in vertebrates, is highly abundant in the fossil record. Our findings reveal that palaeoproteomic investigation of this material can push biomolecular investigation further back into the Early Pleistocene.
Sample Processing Protocol
All the sample preparation procedures for palaeoproteomic analysis were conducted in laboratories dedicated to the analysis of ancient DNA and ancient proteins in clean rooms fitted with filtered ventilation and positive pressure, in line with recent recommendations for ancient protein analysis. A negative “extraction blank” control sample, not containing any starting material, was prepared, processed and analysed together with each batch of ancient samples. The external surface of bone and dentine samples was gently removed, and the remaining material was subsequently powdered. Fragments of enamel, occasionally mixed with small amounts of dentine, were removed from teeth with a cutting disc and subsequently crushed to a rough powder. Ancient protein residues were extracted, from mineralised samples of 23 large fauna soecimens from Dmanisi using three different extraction protocols, hereinafter referred to as “A”, “B” and “C”: EXTRACTION PROTOCOL A - FASP. Tryptic peptides were generated using a filter-aided sample preparation (FASP) approach (doi:10.1038/nmeth.1322), as previously performed on ancient samples (doi:10.1111/zoj.12084.) EXTRACTION PROTOCOL B - GuHCl SOLUTION AND DIGESTION. Bone or dentine powder was demineralised in 1 mL 0.5 M EDTA pH 8.0. After removal of the supernatant, all demineralised pellets were re-suspended in a 300 µL solution containing 2 M guanidine hydrochloride (GuHCl, Thermo Scientific), 100 mM Tris pH 8.0, 20 mM 2-Chloroacetamide (CAA), 10 mM Tris (2-carboxyethyl)phosphine (TCEP) in ultrapure H2O (doi:10.1038/nmeth.2834, 10.1002/anie.201713020). Mass spectrometry-grade rLysC (Promega P/N V1671) enzyme, 0.2 µg, was added and the samples were then incubated for 3-4 hours at 37˚C under agitation. Samples and negative controls were subsequently diluted to 0.6 M GuHCl, and 0.8 µg of mass spectrometry-grade Trypsin (Promega P/N V5111) was added. The entire amount of extracted proteins was digested. Next, samples and negative controls were incubated overnight under mechanical agitation at 37˚C. On the following day, samples were acidified, and the tryptic peptides were immobilised on Stage-Tips (doi:10.1021/pr200721u). EXTRACTION PROTOCOL C - DIGESTION-FREE ACID DEMINERALISATION. Approximately 250 mg of dental enamel powder was demineralised in 1.2 M HCl at room temperature, after which the solubilised protein residues were directly cleaned, concentrated and immobilised on Stage-Tips, as described above. The sample prepared on Stage-Tip “1217” was processed with 10% TFA instead of 1.2 M HCl. All the other parameters and procedures were identical to those used for all the other samples extracted with protocol C. Samples were analysed by nanoflow liquid chromatography coupled to tandem mass spectrometry (nanoLC-MS/MS) on an EASY-nLC™ 1000 or 1200 system connected to a Q-Exactive, a Q-Exactive Plus, or to a Q-Exactive HF (Thermo Scientific, Bremen, Germany) mass spectrometer using a 50 cm column (75 μm inner diameter) packed with 1.9 μm C18 beads (Dr. Maisch, Germany).
Data Processing Protocol
Raw data files generated during MS/MS spectral acquisition were searched on a workstation using MaxQuant, version 220.127.116.11, and the commercial tool PEAKS, version 7.5. A two-stage peptide-spectrum matching approach was adopted. For each MaxQuant and PEAKS search, enzymatic digestion was set to “unspecific” and the following variable modifications were included: oxidation (M), deamidation (NQ), N-term Pyro-Glu (Q), N-term Pyro-Glu (E), hydroxylation (P), phosphorylation (S). The error tolerance was set to 5 ppm for the precursor and to 20 ppm, or 0.05 Da, for the fragment ions in MaxQuant and PEAKS respectively. For searches of data generated from sample fractions partially or exclusively digested with trypsin, another MaxQuant and PEAKS search was conducted using the “enzyme” parameter set to “Trypsin/P”. Carbamidomethylation (C) was set: (i) as a fixed modification, for searches of data generated from sets of sample fractions exclusively digested with trypsin, or (ii) as a variable modification, for searches of data generated from sets of sample fractions partially digested with trypsin. For searches of data generated from sample fractions that were not digested, carbamidomethylation (C) was not included as a modification, neither fixed, nor variable. Raw files were initially searched against a target/reverse database of collagen and enamel proteins retrieved from the UniProt and NCBI Reference Sequence Database (RefSeq) archives, taxonomically restricted to mammalian species. A database of partial “COL1A1” and “COL1A2” sequences from cervid species was also included. The results from the preliminary analysis were used for a first, provisional reconstruction of protein sequences. For specimens whose dataset showed a narrower, though not fully resolved, initial taxonomic placement, a second MaxQuant search (MQ2) was performed using a new protein database taxonomically restricted to the “order” taxonomic rank as determined after MQ1. For the MQ2 matching of the MS/MS spectra from the Stephanorhinus specimen (16635), partial sequences of serum albumin and enamel proteins from Sumatran (Dicerorhinus sumatrensis), Javan (Rhinoceros sondaicus), Indian (Rhinoceros unicornis), woolly (Coelodonta antiquitatis), and Black rhinoceros (Diceros bicornis), were also added to the protein database. All the protein sequences from these species were reconstructed from each species’ draft genome sequences (Dalen and Gilbert, unpublished data, Supplementary Information). The datasets re-analysed with MQ2 “advanced” search, were also processed with the PEAKS software using the entire workflow (PEAKS de novo to PEAKS SPIDER) in order to detect hitherto unreported single amino acid polymorphisms (SAPs). Any amino acid substitution detected by the “SPIDER” homology search algorithm was validated by repeating the MaxQuant search (MQ3). In MQ3 the protein database used for MQ2 was modified to include the amino acid substitutions detected by the “SPIDER” algorithm.