Re-annotated Nicotiana benthamiana gene models for improved proteomics and reverse genetics
Nicotiana benthamiana is an important model organism and representative of the Solanaceae (Nightshade) family. N. benthamiana has a complex ancient allopolyploid genome with 19 chromosomes, and an estimated genome size of 3.1Gb. Several draft assemblies of the N. benthamiana genome have been generated, however, many of the gene-models in these draft assemblies appear incorrect. Here we present a nearly non-redundant database of improved N. benthamiana gene-models based on gene annotations from well-annotated genomes in the Nicotiana genus. We show that the new predicted proteome is more complete than the previous proteomes and more sensitive and accurate in proteomics applications, while maintaining a reasonable low gene number (~43,000). As a proof-of-concept we use this proteome to compare the leaf extracellular (apoplastic) proteome to a total extract of leaves. Several gene families are more abundant in the apoplast. For one of these apoplastic protein families, the subtilases, we present a phylogenetic analysis illustrating the utility of this database. Besides proteome annotation, this database will aid the research community with improved target gene selection for genome editing and off-target prediction for gene silencing.
Sample Processing Protocol
Protein digestion and sample clean-up - AF and TE sample corresponding to 15μg of protein was taken for each sample (based on Bradford assay). Dithiothreitol (DTT) was added to a concentration of 40mM, and the volume adjusted to 250μl with MS-grade water (Sigma). Proteins were precipitated by the addition of 4 volumes of ice-cold acetone, followed by a 1hr incubation at -20°C and subsequent centrifugation (18 000× g, 4°C, 20min). The pellet was dried at room temperature (RT) for 5min and resuspended in 25µL 8M urea, followed by a second chloroform/methanol precipitation. The pellet was dried at RT for 5 min and resuspended in 25µL 8M urea. Protein reduction and alkylation was achieved by sequential incubation with DTT (final 5mM, 30 min, RT) and iodoacetamide (IAM; final 20mM, 30min, RT, dark). Non-reacted IAM was quenched by raising the DTT concentration to 25mM. Protein digestion was started by addition of 1000ng LysC (Wako Chemiacls GmbH) and incubation for 3hr at 37°C while gently shaking (800rpm). The samples were then diluted with Ammonium bicarbonate (ABC, final concentration 80mM) to a final urea concentration of 1M. 1000ng Sequencing grade Trypsin (Promega) was added and the samples were incubated overnight at 37°C while gently shaking (800rpm). Protein digestion was stopped by addition of formic acid (FA, final 5% v/v). Tryptic digests were desalted on home-made C18 StageTips (Rappsilber et al., 2007) by passing the solution over 2 disc StageTips in 150µL aliquots by centrifugation (600-1200× g). Bound peptides were washed with 0.1% FA and subsequently eluted with 80% Acetonitrile (ACN). Using a vacuum concentrator (Eppendorf) samples were dried, and the peptides were resuspended in 20 µL 0.1% FA solution. LC-MS/MS - The samples were analysed as in (Grosse-Holz et al., 2018). Briefly, samples were run on an Orbitrap Elite instrument (Thermo) (Michalski et al., 2011) coupled to an EASY-nLC 1000 liquid chromatography (LC) system (Thermo) operated in the one-column mode. Peptides were directly loaded on a fused silica capillary (75µm × 30 cm) with an integrated PicoFrit emitter (New Objective) analytical column packed in-house with Reprosil-Pur 120 C18-AQ 1.9 µm resin (Dr. Maisch), taking care to not exceed the set pressure limit of 980 bar (usually around 0.5-0.8µl/min). The analytical column was encased by a column oven (Sonation; 45°C during data acquisition) and attached to a nanospray flex ion source (Thermo). Peptides were separated on the analytical column by running a 140-min gradient of solvent A (0.1% FA in water; Ultra-Performance Liquid Chromatography (UPLC) grade) and solvent B (0.1% FA in ACN; UPLC grade) at a flow rate of 300nl/min (gradient: start with 7% B; gradient 7% to 35% B for 120 min; gradient 35% to 100% B for 10 min and 100% B for 10 min). The mass spectrometer was operated using Xcalibur software (version 2.2 SP1.48) in positive ion mode. Precursor ion scanning was performed in the Orbitrap analyzer (FTMS; Fourier Transform Mass Spectrometry) in the scan range of m/z 300-1800 and at a resolution of 60000 with the internal lock mass option turned on (lock mass was 445.120025 m/z, polysiloxane) (Olsen et al., 2005). Product ion spectra were recorded in a data-dependent manner in the ion trap (ITMS) in a variable scan range and at a rapid scan rate. The ionization potential was set to 1.8kV. Peptides were analysed by a repeating cycle of a full precursor ion scan (1.0 × 106 ions or 50ms) followed by 15 product ion scans (1.0 × 104 ions or 50ms). Peptides exceeding a threshold of 500 counts were selected for tandem mass (MS2) spectrum generation. Collision induced dissociation (CID) energy was set to 35% for the generation of MS2 spectra. Dynamic ion exclusion was set to 60 seconds with a maximum list of excluded ions consisting of 500 members and a repeat count of one. Ion injection time prediction, preview mode for the Fourier transform mass spectrometer (FTMS, the orbitrap), monoisotopic precursor selection and charge state screening were enabled. Only charge states higher than 1 were considered for fragmentation.
Data Processing Protocol
Peptide and Protein Identification - Peptide spectra were searched in MaxQuant (version 220.127.116.11) using the Andromeda search engine (Cox et al., 2011) with default settings and label-free quantification and match-between-runs activated (Cox and Mann, 2008; Cox et al., 2014) against the databases specified in the text including a known contaminants database. Included modifications were carbamidomethylation (static) and oxidation (M) and acetylation (Protein N-term)(dynamic). Precursor mass tolerance was set to ±20 ppm (first search) and ±4.5 ppm (main search), while the MS/MS match tolerance was set to ±0.5 Da. The peptide spectrum match FDR and the protein FDR were set to 0.01 (based on a target-decoy approach) and the minimum peptide length was set to 7 amino acids. Protein quantification was performed in MaxQuant (Tyanova et al., 2016), based on unique and razor peptides including all modifications. Proteomics processing in R - Identified protein groups were filtered for reverse and contaminants proteins and those only identified by matching, and only those protein groups identified in 3 out of 4 biological replicates either AF or TE were selected. The LFQ values were log2 transformed, and missing values were imputed using a minimal distribution as implemented in imputeLCMD (v2.0) (Lazar, 2015). A moderated t-test was used as implemented in Limma (v3.34.3) (Ritchie et al., 2015; Phipson et al., 2016) and adjusted using Benjamini–Hochberg (BH) adjustment to identify protein groups significantly differing between AF and TE. Bonafide apoplastic protein groups were those significantly (p≤0.01) log2 fold change ≥1.5 in AF samples and those only detected in AF. Protein groups significantly (p≤0.01) log2 fold change ≤-1.5 depleted in AF samples and those only detected in TE were considered intracellular. The remainder was considered both apoplastic and intra-cellular. Majority proteins were annotated with SignalP, PFAM, MEROPS (v12) (Rawlings et al., 2018), GO, and UniProt keywords identifiers. A BH-adjusted Hypergeometric test was used to identify those terms that were either depleted or enriched (p≤0.05) in the bonafide AF protein groups as compared to bonafide AF depleted proteins or protein groups present both in the AF and TE.
Kourelis J, Kaschani F, Grosse-Holz FM, Homma F, Kaiser M, van der Hoorn RAL. A homology-guided, genome-based proteome for improved proteomics in the alloploid Nicotiana benthamiana. BMC Genomics. 2019 20(1):722 PubMed: 31585525