Open access
Genomics and Proteomics
Announcement
23 September 2021

Draft Genome Sequence of a Diploid and Hybrid Candida Strain, Candida sanyaensis UCD423, Isolated from Compost in Ireland

ABSTRACT

Candida sanyaensis is a CUG-Ser1 clade yeast that is associated with soil. Assembly of short-read and long-read data shows that C. sanyaensis has a diploid and hybrid genome, with approximately 97% identity between the haplotypes. The haploid genome size is approximately 15.4 Mb.

ANNOUNCEMENT

The yeast Candida sanyaensis is a member of the subphylum Saccharomycotina and the phylum Ascomycota and is closely related to Candida sojae and Candida tropicalis (1). The species was originally isolated from soil samples from Hainan Island in south China and Taiwan. C. sanyaensis UCD423 was isolated from a wormery in Dublin by two passages of compost material in 9 ml liquid yeast extract-peptone-dextrose (YPD) medium containing chloramphenicol (30 μg/ml) and ampicillin (100 μg/ml) and culture on YPD plates at room temperature, similar to a method reported previously (2). The species was identified from the internal transcribed spacer (ITS) sequence, which is 99% identical to that of C. sanyaensis (1).
For Illumina sequencing, total genomic DNA was extracted from a YPD culture and purified by extraction with phenol-chloroform-isoamyl alcohol. Libraries were generated from 1 μg genomic DNA and sequenced by BGI Tech Solutions Co. (Hong Kong), as described by Morio et al. (3). A total of 150 bases were sequenced from each end with an Illumina HiSeq 4000 instrument, yielding 6.5 million spots. For long-read sequencing, genomic DNA was extracted using the Qiagen Genomic-tip 100G kit. The sequencing library was generated from 400 ng of DNA using a rapid barcoding kit (SQK-RBK004) from Oxford Nanopore Technologies (ONT), following the manufacturer’s instructions. This library was mixed with three libraries from unrelated projects, purified with AMPure XP magnetic beads (Beckman Coulter), and eluted in 12 μl Tris-EDTA (TE) buffer, of which 10 μl was used for sequencing on a FLO-MIN106 flow cell primed with kit EXP-FLP002 on a MinION 1B sequencer using MinKNOW software v4.1.22 (ONT). Base calling (using the fast model [dna_r9.4.1_450bps_fast.cfg]) and demultiplexing were performed using Guppy v4.2.2 (ONT). The total number of MinION reads was 561,441, with a read N50 of 11,647 bp.
Low-quality reads (Q scores of <15) were removed from the Illumina data using Skewer v0.2.2 (4), and reads were assembled using SPAdes v3.14.0 (5) with default parameters. The MinION data were filtered using NanoFilt v2.7.1 (6), removing reads with quality scores of <10 and lengths of <10 kb. Reads were assembled using Canu v2.0 (7) with default parameters for a haploid assembly or with the options corOutCoverage = 200 and batOptions = -dg 3 -db 3 -dr 1 -ca 500 -cp 50 for a diploid assembly, assuming a genome size of 18 Mb. Assembly statistics were visualized using QUAST (8) (Table 1). Very different assemblies were obtained depending on the method and parameters used (Table 1).
TABLE 1
TABLE 1 Assembly statistics for C. sanyaensis (for scaffolds of >500 bp)
ParameterData for assembly with:
SPAdesCanu (haploid)Canu (diploid)
No. of contigsa9,833144207
Total length (bp)b17,851,34424,222,50031,277,210
Size of largest contig (bp)177,2562,922,7472,944,088
N50 (bp)b2,9831,214,081281,063
L501,342612
GC content (%)31.131.631.5
a
The SPAdes assembly of the short-read data alone is highly fragmented. The haploid Canu assembly of the long-read data has the smallest number of contigs.
b
For the long-read data alone, the haploid assembly has a smaller genome size and a greater N50 value than the diploid assembly.
Using the diploid parameters with Canu resulted in an assembly that largely kept two haplotypes separate (Fig. 1). The first haplotype (haplotype A) is represented by the largest 10 scaffolds, ranging from 2.94 Mb to 369 kb (Fig. 1). The second haplotype (haplotype B) is fragmented into much smaller contigs, each of which matches a contig from the first haplotype (Fig. 1). Therefore, C. sanyaensis has a diploid genome and the haplotypes differ by ∼3.3%, suggesting that the genome results from hybridization between related but not identical parents (9). The final diploid assembly of the MinION data was error corrected by incorporating the Illumina data, using nine rounds with Pilon v1.23 (10).
FIG 1
FIG 1 C. sanyaensis UCD423 has a diploid genome. Similarity between the two haplotypes of C. sanyaensis UCD423 was visualized with Circos (11) and Circoletto (12) as described by O’Brien et al. (13), using the diploid assembly after polishing with Pilon. The 10 scaffolds from haplotype A are shown on the inner ring on the left. Scaffolds of  ≥100 kb from haplotype B are shown on the outer ring on the right. Smaller scaffolds are omitted for clarity. Sequences with similarity were identified by BLASTN, and alignments are plotted as links between the two haplotypes. The matches are colored with respect to the scaffold in the first haplotype. Scaffold 8 contains the ribosomal DNA (rDNA) and has few matches with scaffolds from the second haplotype outside the rDNA region, even when scaffolds of <100 kb are included. Assembly of the two haplotypes might have collapsed around the rDNA locus, possibly because of loss of heterozygosity. Matches to scaffold 56 suggest that scaffold 5 and scaffold 8 should be joined. Similarly, matches to scaffold 13 suggest that scaffold 4 and scaffold 10 should be joined. The total length of a single haplotype is approximately 15.4 Mb. The average sequence identity between the haplotypes is 96.72%, as calculated from scaffolds of >100 kb using the average nucleotide identity (ANI) calculator described by Rodriguez and Konstantinidis (14) with default parameters.

Data availability.

This whole-genome shotgun project has been deposited in DDBJ/ENA/GenBank under accession number CAJVQF00000000. The raw reads from Illumina sequencing are available under SRA accession number ERR6313261 and those from MinION sequencing under SRA accession number ERR6310792. These data are also available under project PRJEB46370. The ITS sequence is at accession number MZ507576.

ACKNOWLEDGMENTS

This work was supported by undergraduate teaching resources from University College Dublin. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Thanks go to Rebeca Clavero for isolating the yeast.

REFERENCES

1.
Hui FL, Niu QH, Ke T, Li YX, Lee CF. 2013. Candida sanyaensis sp. nov., an ascomycetous yeast species isolated from soil. Antonie Van Leeuwenhoek 103:47–52.
2.
Sylvester K, Wang QM, James B, Mendez R, Hulfachor AB, Hittinger CT. 2015. Temperature and host preferences drive the diversification of Saccharomyces and other yeasts: a survey and the discovery of eight new yeast species. FEMS Yeast Res 15:fov002.
3.
Morio F, O’Brien CE, Butler G. 2020. Draft genome sequence of the yeast Kazachstania telluris CBS 16338 isolated from forest soil in Ireland. Mycopathologia 185:587–590.
4.
Jiang H, Lei R, Ding SW, Zhu S. 2014. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 15:182.
5.
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477.
6.
De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. 2018. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34:2666–2669.
7.
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736.
8.
Gurevich A, Saveliev V, Vyahhi N, Tesler G. 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29:1072–1075.
9.
Gabaldon T. 2020. Hybridization and the origin of new yeast lineages. FEMS Yeast Res 20:foaa040.
10.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9:e112963.
11.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. 2009. Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645.
12.
Darzentas N. 2010. Circoletto: visualizing sequence similarity with Circos. Bioinformatics 26:2620–2621.
13.
O’Brien CE, Zhai B, Ola M, Ó Cinnéide E, O’Connor Í, Rolling T, Miranda E, Babady NE, Hohl TM, Butler G. 2021. Identification of a novel Candida metapsilosis isolate suggests ongoing hybridization. bioRxiv 2021.07.15.452539.
14.
Rodriguez-R LM, Konstantinidis KT. 2016. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ 4:e1900v1.

Information & Contributors

Information

Published In

cover image Microbiology Resource Announcements
Microbiology Resource Announcements
Volume 10Number 3823 September 2021
eLocator: 10.1128/mra.00761-21
Editor: Jason E. Stajich, University of California, Riverside

History

Received: 28 July 2021
Accepted: 25 August 2021
Published online: 23 September 2021

Contributors

Authors

Adam Ryan
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Eoin Ó Cinnéide
School of Medicine, Conway Institute, University College Dublin, Dublin, Ireland
Sean A. Bergin
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Ghozlan Alhajeri
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Hawraa Almotawaa
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Isabelle Daly
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Sophia Heneghan
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Kellie Horan
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Roslyn Kavanagh
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Christopher Keane
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Aaron Martin
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Ada McDonagh
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Julia O’Leary
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Matthieu Osborne
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Emma Watson
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland
Kevin P. Byrne
School of Medicine, Conway Institute, University College Dublin, Dublin, Ireland
Kenneth H. Wolfe
School of Medicine, Conway Institute, University College Dublin, Dublin, Ireland
School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Dublin, Ireland

Editor

Jason E. Stajich
Editor
University of California, Riverside

Metrics & Citations

Metrics

Note:

  • For recently published articles, the TOTAL download count will appear as zero until a new month starts.
  • There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication.
  • Citation counts come from the Crossref Cited by service.

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.

View Options

Figures and Media

Figures

Media

Tables

Share

Share

Share the article link

Share with email

Email a colleague

Share on social media

American Society for Microbiology ("ASM") is committed to maintaining your confidence and trust with respect to the information we collect from you on websites owned and operated by ASM ("ASM Web Sites") and other sources. This Privacy Policy sets forth the information we collect about you, how we use this information and the choices you have about how we use such information.
FIND OUT MORE about the privacy policy