ANNOUNCEMENT
Saprochaete suaveolens is a fermentative yeast from the
Magnusiomyces/
Saprochaete clade (phylum
Ascomycota, subphylum
Saccharomycotina). It has been isolated from nutrient-rich sources, including industrial wastes, brewery water, process water from wheat-starch production plants, effluent milk, maize mash, soybean flakes, figs, and dragon fruits, and some strains were isolated from patients with pulmonary infections (
1–3). It produces large amounts of volatile organic compounds with an intensive fruity odor (
3–5).
The
S. suaveolens strain NRRL Y-17571 was originally isolated from water in a brewery (
1). Its genome was assembled by the combination of long reads (MinION, Oxford Nanopore Technologies) and short reads (HiSeq 2000, Illumina). DNA was isolated from a culture grown overnight in yeast extract-peptone-dextrose (YPD) medium (1% [wt/vol] yeast extract, 2% [wt/vol] peptone, 1% [wt/vol] glucose) at 28°C using a standard protocol and purified using the DNeasy mini spin column (Qiagen) for HiSeq 2000 analysis or Genomic-tip 100/G (Qiagen) for MinION analysis (
6). Total cellular RNA from the midexponential phase culture grown in yeast extract-peptone-galactose (YPGal) medium (1% [wt/vol] yeast extract, 2% [wt/vol] peptone, 2% [wt/vol] galactose) at 28°C was extracted with hot acidic phenol (
7) and purified with the RNeasy minikit (Qiagen).
We obtained 204,824 long reads (mean, 9,011 nucleotides [nt]; longest read, 211,620 nt) totaling 1.8 Gbp (∼74× coverage) with a MinION Mk-1B device on a R9.4.1 flow cell with a SQK-LSK109 kit and base called with ONT Albacore (v. 2.3.1). A paired-end (2 × 101 nt) TruSeq PCR-free DNA library was sequenced on a HiSeq 2000 platform in Macrogen, Korea, which yielded 64,378,402 reads (6.4 Gbp, ∼262× coverage). RNA-Seq was performed with NovaSeq 6000 system in Macrogen, Korea, which yielded 42,932,052 reads from a TruSeq mRNA V2 nonstranded paired-end (2 × 101 nt) library.
Table 1 presents candidate genome assemblies. The final assembly is based on miniasm, which had the smallest number of contigs and did not show apparent assembly artifacts. To further improve this assembly, we removed contigs containing fragments of mitochondrial DNA (mtDNA) and rRNA genes, individually polished rRNA gene repeats, and replaced regions upstream and downstream of rRNA gene repeats with 505 bp from DBG2OLC and 309 bp from Canu assemblies, respectively. The nuclear genome has a GC content of 39.5% and likely consists of at least 7 chromosomes, because both ends of 4 contigs and one end of 6 contigs are terminated by telomeric repeats with a predominant motif CA
3G
5-7. About 2% of the genome (508 kbp) is covered by simple and low-complexity repeats identified with RepeatMasker v. 4.0.7 (
8).
RNA-Seq reads processed with Trimmomatic v. 0.36 (
9) were assembled into transcripts with Trinity v. 2.8.3 (
10). We trained Augustus v. 3.2.3 (
11) on the
Magnusiomyces capitatus data set (
12) and, using RNA-Seq transcripts aligned to the reference with blat v. 34 × 1 (
13), we predicted 8,119 protein-coding genes.
The genome sequence of S. suaveolens will provide a basis for understanding metabolic pathways involved in the production of volatile organic compounds, suitable as flavors and aromas in the food industry, and genetic traits associated with the ability to colonize humans.
ACKNOWLEDGMENTS
We thank Cletus P. Kurtzman and James Swezey (Agricultural Research Service, Peoria, IL, USA) for providing us with the yeast strain.
Nanopore sequencing, genome assembly, and genome annotation were performed during the hackathon at the #NGSchool2018: Nanopore sequencing & personalised medicine (September 16 to 23, 2018) bioinformatics school organized in Lublin, Poland (
https://ngschool.eu/2018) supported by International Visegrad Fund project 21810033.
The computations were done with the help of cloud services and resources from national e-infrastructure providers through the Training Infrastructure of the EGI Federation. The project was supported by grants from the Slovak Research and Development Agency (no. APVV-14-0253 to J.N.) and VEGA (no. 1/0684/16 to B.B. and no. 1/0458/18 to T.V.). This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 665778 (to L.P.P.). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.