Comment[ArrayExpressAccession] E-GEOD-41749 MAGE-TAB Version 1.1 Public Release Date 2013-04-17 Investigation Title Species- and condition-specific adaptation of the transcriptional landscapes in Candida albicans and Candida dubliniensis Comment[Submitted Name] Species- and condition-specific adaptation of the transcriptional landscapes in Candida albicans and Candida dubliniensis Experiment Description Although Candida albicans and Candida dubliniensis are most closely related, both species significantly behave differently with respect to morphogenesis and virulence. In order to gain further insight into the divergent routes for morphogenetic adaptation in both species, we investigated qualitative along with quantitative differences in the transcriptomes of both organisms by cDNA deep sequencing. Following genome-associated assembly of sequence reads we were able to generate experimentally verified databases containing 6016 and 5972 genes for C. albicans and C. dubliniensis, respectively. About 95% of the transcriptionally active regions (TARs) contain open reading frames while the remaining TARs most likely represent non-coding RNAs. Comparison of our annotations with publically available gene models for C. albicans and C. dubliniensis confirmed approximately 95% of already predicted genes, but also revealed so far unknown novel TARs in both species. Qualitative cross-species analysis of these databases revealed in addition to 5802 orthologs also 399 and 49 species-specific protein coding genes for C. albicans and C. dubliniensis, respectively. Furthermore, quantitative transcriptional profiling using RNA-Seq revealed significant differences in the expression of orthologs across both species. We defined a core subset of 84 hyphal-specific genes required for both species, as well as a set of 42 genes that seem to be specifically induced during hyphal morphogenesis in C. albicans. Species specific adaptation in C. albicans and C. dubliniensis is governed by individual genetic repertoires but also by altered regulation of conserved orthologs on the transcriptional level. We investigated qualitative along with quantitative differences in the transcriptomes of both organisms by cDNA deep sequencing. In a first step, we reevaluated the in silico predicted gene models by collecting experimental data using FLX - technology for sequencing strand-specific and normalized cDNA libraries derived from blastospores and hyphae. In the second step, quantitative RNA-Seq (GAIIX) was applied to C. albicans hyphal cells and C. dubliniensis blastospore and hyphal cells to complement reevaluation of the gene models with FLX data as well as to measure differential gene expression across the species with two biological replicates. Term Source Name ArrayExpress EFO Term Source File http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/efo/efo.owl Person Last Name Sohn Grumaz Lorenz Stevens Lindemann Schoeck Retey Rupp Person First Name Kai Christian Stefan Philip Elena Ulrike Julia Steffen Person Email chg@igb.fhg.de Person Affiliation Fraunhofer IGB Person Address MBT, Fraunhofer IGB, Nobelstr. 12, Stuttgart, Germany Person Roles submitter Protocol Name P-GSE41749-3 P-GSE41749-10 P-GSE41749-6 P-GSE41749-2 P-GSE41749-9 P-GSE41749-8 P-GSE41749-5 P-GSE41749-7 P-GSE41749-4 P-GSE41749-1 Protocol Description Basecalls for both sequencing technologies performed using standard softwares supplied by either Roche for FLX sequencing or by Illumina for GAIIx sequencing. For FLX reads the barcodes at the 5M-bM-^@M-^Y-end (C. albicans: 5M-bM-^@M-^Y-CGAGAC-3M-bM-^@M-^Y and C. dubliniensis: 5M-bM-^@M-^Y- CGTCGT-3M-bM-^@M-^Y) and the remaining sequences from the B adapter at the 3M-bM-^@M-^Y-end (5M-bM-^@M-^Y-CTGAGACTGCCAAGGCACACAGGGGATAGG-3M-bM-^@M-^Y) were trimmed. The processed reads were then blasted with BLASTN (ncbi-blast-2.2.23). Only unique and non-spliced reads with at least 90% identities over the whole length of the reads has passed the quality filter. For intron spanning reads we also performed BLAT alignments (BLAT Suite 0.34). GAIIx reads were mapped with their full length of 75 sequenced bases with SOAP2 (version 2.20) using a seed length of 40 bp allowing 5 mismatches with the exception for sample C.dubliniensis_GAIIx_YPS_blastospores-2.fastq where the first base of each sequence was trimmed resulting in 74 bp reads. As SOAP2 is not compatible with intron-spanning reads we performed another mapping with the remaining unmapped reads with TOPHAT (version 1.3.1). Genome_build: C. albicans: Assembly21 at CGD (latest update 2010-06-15, http://www.candidagenome.org/download/sequence/Assembly21); C. dubliniensis: Assembly CD36 at NCBI (latest update 2010-04-09, ftp://ftp.ncbi.nih.gov/genomes/Fungi/Candida_dubliniensis_CD36) Supplementary_files_format_and_content: BLASTN/BLAT output files were generated as table formats, then piped (shell script) into *.gff3 and finally into *.bedGraph. Standard SOAP output files were converted similarly. Standard TOPHAT output files were converted with BEDtools (bam-to-bed) into *.bed file format. Scores represent coverage of raw mapped reads. Basecalls were performed using standard software supplied by Illumina for HiSeq2000 sequencing. Reads generated with HiSeq2000 were mapped with TOPHAT (version 1.4.1) using default settings. Genome_build: C. albicans: Assembly21 at CGD (latest update 2010-06-15, http://www.candidagenome.org/download/sequence/Assembly21); C. dubliniensis: Assembly CD36 at NCBI (latest update 2010-04-09, ftp://ftp.ncbi.nih.gov/genomes/Fungi/Candida_dubliniensis_CD36) Supplementary_files_format_and_content: TOPHAT output files were converted with BEDtools (bam-to-bed) into *.bed file format. Non-junction reads were generated from *.bed files into *.bedGraph files by a customized python-script. Scores represent coverage of raw mapped reads The cDNA libraries for GAIIx-sequencing were performed according IlluminaM-bM-^@M-^Ys mRNA-Seq Sample Prep Kit protocol. Each of the six samples was loaded on one lane. Thus, the sequencing run was performed on a fully loaded flow cell with single-end 76bp reads and resulted in approximately 30 mio. reads per sample. After 4h of incubation under the corresponding conditions, C. dubliniensis and C. albicans cells were harvested by centrifugation and immediately frozen with liquid nitrogen. Disruption was carried out using a Mixer Mill MM 200 (RETSCH) with a shaking frequency of 30/s. The resulting powder was resuspended in lysis buffer RLT (QIAGEN, Hilden, Germany) supplemented with 0.01% v/v of M-CM-^_-mercaptoethanol. The extraction of total RNA was performed according to QIAGENM-bM-^@M-^Ys Mechanical Disruption Protocol for the isolation of total RNA from yeast using the RNeasy Midi Kit. After precipitation of the RNA by addition of 0.1 volume of 3M NaAc pH 5.3 and 2.5 volume of 100% EtOH, the concentration and integrity of total RNA was analyzed using the Agilent 2100 Bioanalyzer using the RNA Nano kit. For the normalized cDNA libraries for FLXTitanium-sequencing, equal amounts of approximately 25 M-5g of total RNA per condition were pooled together per species. To get rid of genomic contaminants another purification step was performed using QIAGENM-bM-^@M-^Ys RNeasy Mini Plus Kit. From the pooled total RNA poly(A)+-RNA was prepared according to standard protocols (Ref.). First-strand cDNA synthesis was carried out with a N6 randomized primer. Then 454 adapters A (5M-bM-^@M-^Y-CCATCTCATCCCTGCGTGTCTCCGACTCAG-3M-bM-^@M-^Y) and B (5M-bM-^@M-^Y-CTGAGACTGCCAAGGCACACAGGGGATAGG-3M-bM-^@M-^Y) were ligated to the 5' and 3' ends of the cDNAs, respectively, to obtain strand-specificity of the transcripts. Additionally, the C. albicans and the C. dubliniensis samples were barcoded at the 5M-bM-^@M-^Y-end of the fragments with 5M-bM-^@M-^Y-CGAGAC-3M-bM-^@M-^Y and 5M-bM-^@M-^Y- CGTCGT-3M-bM-^@M-^Y, respectively. The cDNAs were amplified with 16 cycles of PCR. Normalization was carried out by one cycle of denaturation and reassociation of the cDNA. Reassociated ds-cDNA was separated from the remaining ss-cDNA (normalized cDNA) by passing the mixture over a hydroxylapatite column. After hydroxylapatite chromatography, the ss-cDNA was amplified by 9 PCR cycles. For Titanium sequencing the cDNA in the size range of 500 M-bM-^@M-^S 700 bp was eluted from preparative agarose gels. Aliquots of the size fractionated cDNAs were analyzed by capillary electrophoresis with the Shimadzu MultiNA microchip system. The two normalized and barcoded cDNA libraries were pooled at equal amounts and were sequenced on a full flow cell with the Titanium chemistry resulting in about 500.000 reads per species. After 4h of incubation under the corresponding conditions, C. dubliniensis and C. albicans cells were harvested by centrifugation and immediately frozen with liquid nitrogen. Disruption was carried out using a Mixer Mill MM 200 (RETSCH) with a shaking frequency of 30/s. The resulting powder was resuspended in lysis buffer RLT (QIAGEN, Hilden, Germany) supplemented with 0.01% v/v of M-CM-^_-mercaptoethanol. The extraction of total RNA was performed according to QIAGENM-bM-^@M-^Ys Mechanical Disruption Protocol for the isolation of total RNA from yeast using the RNeasy Midi Kit. After precipitation of the RNA by addition of 0.1 volume of 3M NaAc pH 5.3 and 2.5 volume of 100% EtOH, the concentration and integrity of total RNA was analyzed using the Agilent 2100 Bioanalyzer using the RNA Nano kit. After 4h of incubation under the corresponding conditions, C. dubliniensis and C. albicans cells were harvested by centrifugation and immediately frozen with liquid nitrogen. Disruption was carried out using a Mixer Mill MM 200 (RETSCH) with a shaking frequency of 30/s. The resulting powder was resuspended in lysis buffer RLT (QIAGEN, Hilden, Germany) supplemented with 0.01% v/v of M-CM-^_-mercaptoethanol. The extraction of total RNA was performed according to QIAGENM-bM-^@M-^Ys Mechanical Disruption Protocol for the isolation of total RNA from yeast using the RNeasy Midi Kit. After precipitation of the RNA by addition of 0.1 volume of 3M NaAc pH 5.3 and 2.5 volume of 100% EtOH, the concentration and integrity of total RNA was analyzed using the Agilent 2100 Bioanalyzer using the RNA Nano kit. The cDNA libraries for HiSeq2000-sequencing were performed according IlluminaM-bM-^@M-^Ys TruSeq RNA Sample Prep Kit v2 protocol. The sequencing run was performed on single-read flow cell with 50bp reads and resulted in approximately 10-15 mio. reads per sample. C. albicans_YPD-1, C. albicans_YPD-2, C. albicans_YPD-3 samples were grown in YPD at 30M-0C for 4 hours. C. albicans_YPS-3 and C. dubliniensis_YPS-3 samples were grown in YPD supplemented with 10% fetal calf serum at 37M-0C for 4 hours. C. dubliniensis_WS-3 sample was grown in water supplemented with 10% fetal calf serum at 37M-0C for 4 hours. Cells were grown in YPD supplemented with 10% fetal calf serum at 37M-0C for 4 hours. Cells were grown in water supplemented with 10% fetal calf serum at 37M-0C for 4 hours. Blastospore cells were grown in YPD supplemented with 10% fetal calf serum at 37M-0C for 4 hours while hyphal cells were induced in water supplemented with 10% fetal calf serum at 37M-0C for 4 hours. Blastospore cells were grown in YPD at 30M-0C for 4 hours while hyphal cells were induced in YPD supplemented with 10% fetal calf serum at 37M-0C for 4 hours. Protocol Type normalization data transformation protocol normalization data transformation protocol nucleic acid library construction protocol nucleic acid library construction protocol nucleic acid library construction protocol growth protocol growth protocol growth protocol growth protocol growth protocol Experimental Factor Name STRAIN OR LINE GROWTH CONDITION ORGANISM MORPHOLOGY Experimental Factor Type strain or line growth condition organism morphology Publication Title Species and condition specific adaptation of the transcriptional landscapes in Candida albicans and Candida dubliniensis. Publication Author List Grumaz C, Lorenz S, Stevens P, Lindemann E, SchM-oM-?M-=ck U, Retey J, Rupp S, Sohn K PubMed ID 23547856 Publication DOI 10.1186/1471-2164-14-212 Comment[SecondaryAccession] GSE41749 Comment[GEOReleaseDate] 2013-04-17 Comment[ArrayExpressSubmissionDate] 2012-10-22 Comment[GEOLastUpdateDate] 2013-04-18 Comment[AEExperimentType] RNA-seq of coding RNA Comment[SecondaryAccession] SRP016578 Comment[SequenceDataURI] http://www.ebi.ac.uk/ena/data/view/SRR604746-SRR604753,SRR771361-SRR771366 SDRF File E-GEOD-41749.sdrf.txt