INTRODUCTION
On 11 March 2020, the World Health Organization declared the ongoing pandemic caused by SARS-CoV-2, which causes COVID-19, a global emergency (
1). Similar to other viral infections, patients may be more susceptible to microbial secondary infections, which can complicate disease management strategies and result in adverse patient outcomes (
2,
3). For example, approximately one quarter of patients infected with the H1N1 influenza virus during the 2009 pandemic were also infected with bacteria or fungi (
4,
5). Among COVID-19 patients, one study found that ∼17% of individuals also have bacterial infections (
6) and another that ∼40% of patients with severe COVID-19 pneumonia were also infected with filamentous fungi from the genus
Aspergillus (
7). A third study reported that ∼26% of patients with acute respiratory distress syndrome-associated COVID-19 were also infected with
Aspergillus fumigatus and had high rates of mortality (
8). Other studies from around the world have also reported high incidences of
Aspergillus infections among patients with COVID-19 (
9–12). Taken together, these findings have prompted some to suggest routine clinical testing for secondary infections of
Aspergillus fungi among COVID-19 patients (
13,
14). Despite the prevalence of microbial infections and their association with adverse patient outcomes, these secondary infections are only beginning to be understood.
Invasive pulmonary aspergillosis is caused by tissue infiltration of
Aspergillus species after inhalation of their asexual spores (
Fig. 1); more than 250,000 aspergillosis infections are estimated to occur annually and have high mortality rates (
15). The major etiological agent of aspergillosis is
A. fumigatus (
16), although a few other
Aspergillus species are also known to cause aspergillosis (
17–20). Numerous factors are known to be associated with
A. fumigatus pathogenicity, including its ability to grow at the human body temperature (37°C) and withstand oxidative stress (
21–32). Disease management of
A. fumigatus is further complicated by resistance to antifungal drugs among strains (
33–36) and
A. fumigatus strains have been shown to exhibit strain heterogeneity with respect to virulence and pathogenicity-associated traits (
18,
32,
37–40). However, it remains unclear whether the genomic and pathogenicity-related phenotypic characteristics of CAPA isolates are similar to or distinct from those of previously studied clinical strains of
A. fumigatus.
To address this question and gain insight into the pathobiology of
A. fumigatus CAPA isolates, we examined the genomic and phenotypic characteristics of four CAPA isolates obtained from four critically ill patients of two different centers in Cologne, Germany (
8) (
Table 1). All patients were submitted to intensive care units due to moderate to severe respiratory distress syndrome (ARDS). Genome-scale phylogenetic (or phylogenomic) analyses revealed CAPA isolates formed a monophyletic group closely related to reference strains Af293 and A1163. Examination of the mutational spectra of 206 genes known to modulate virulence in
A. fumigatus (which are here referred to as genetic determinants of virulence) revealed several putative loss-of-function (LOF) mutations. Notably, CAPA isolate D had the most putative LOF mutations among genes whose null mutants are known to increase virulence. The profiles of pathogenicity-related traits and of secondary metabolites of the CAPA isolates were similar to those of reference
A. fumigatus strains Af293 and CEA17 or CEA10, which are parental strains of A1163 (
41). One notable exception was that CAPA isolate D was significantly more virulent than other strains in an invertebrate model of disease, but on par with two other clinical strains of
A. fumigatus. These results suggest that the genomes of
A. fumigatus CAPA isolates contain nearly complete and intact repertoires of genetic determinants of virulence and have phenotypic profiles that are broadly expected for
A. fumigatus clinical isolates. However, we did find evidence for genetic and phenotypic strain heterogeneity. These results suggest the CAPA isolates show similar phenotypic profiles as
A. fumigatus clinical strains Af293 and A1163 and expand our understanding of CAPA.
MATERIALS AND METHODS
Patient information and ethics approval.
Patients were included into the FungiScope global registry for emerging invasive fungal infections (
www.ClinicalTrials.gov, NCT 01731353). The clinical trial is approved by the Ethics Committee of the University of Cologne, Cologne, Germany (study ID: 05-102) (
73). Since 2019, patients with invasive aspergillosis were also included.
DNA quality control, library preparation, and sequencing.
Sample DNA concentration was measured by Qubit fluorometer and DNA integrity and purity by agarose gel electrophoresis. For each sample, 1 to 1.5 μg of genomic DNA was randomly fragmented by Covaris and fragments with average sizes of 200 to 400 bp were selected by Agencourt AMPure XP-Medium kit. The selected fragments were end-repaired, 3′ adenylated, ligated to adapters, and amplified by PCR. Double-stranded PCR products were recovered by the AxyPrep Mag PCR cleanup kit, and then heat denatured and circularized by using the splint oligonucleotide sequence. The single-strand circlular DNA (ssCir DNA) products were formatted as the final library and went through further QC procedures. The libraries were sequenced on the MGISEQ2000 platform.
Genome assembly and annotations.
Short-read sequencing data of each sample were assembled using MaSuRCA, v3.4.1 (
74). Each
de novo genome assembly was annotated using the MAKER genome annotation pipeline, v2.31.11 (
75), which integrates three
ab initio gene predictors: AUGUSTUS, v3.3.3 (
76), GeneMark-ES, v4.59 (
77), and SNAP, v2013-11-29 (
78). Fungal protein sequences in the Swiss-Prot database (release 2020_02) were used as homology evidence for the genome annotation. The MAKER annotation process occurs in an iterative manner as described previously (
79). In brief, for each genome, repeats were first soft-masked using RepeatMasker v4.1.0 (
http://www.repeatmasker.org) with the library Repbase library release-20181026 and the “-species” parameter set to “
Aspergillus fumigatus.” GeneMark-ES was then trained on the masked genome sequence using the self-training option (“–ES”) and the branch model algorithm (“–fungus”), which is optimal for fungal genome annotation. On the other hand, an initial MAKER analysis was carried out where gene annotations were generated directly from homology evidence, and the resulting gene models were used to train both AUGUSTUS and SNAP. Once trained, the
ab initio predictors were used together with homology evidence to conduct a first round of full MAKER analysis. Resulting gene models supported by homology evidence were used to retrain AUGUSTUS and SNAP. A second round of MAKER analysis was conducted using the newly trained AUGUSTUS and SNAP parameters, and once again the resulting gene models with homology supports were used to retrain AUGUSTUS and SNAP. Finally, a third round of MAKER analysis was performed using the new AUGUSTUS and SNAP parameters to generate the final set of annotations for the genome. The completeness of
de novo genome assemblies and
ab initio gene predictions was assessed using BUSCO, v4.1.2 (
80) using 4,191 preselected “nearly” universally single-copy orthologous genes from the
Eurotiales database (eurotials_odb10.2019-11-20) in OrthoDB, v10.1 (
81).
Polymorphism identification.
To characterize and examine the putative impact of polymorphisms in the genomes of the CAPA isolates, we identified single nucleotide polymorphisms (SNPs), insertion-deletion polymorphisms (indels), and copy number (CN) polymorphisms. To do so, reads were first quality-trimmed and mapped to the genome of
A. fumigatus Af293 (RefSeq assembly accession: GCF_000002655.1) following a previously established protocol (
82). Specifically, reads were first quality-trimmed with Trimmomatic v0.36 (
83) using the following parameters: leading, 10; trailing, 10; slidingwindow, 4:20; minlen, 50. The resulting quality-trimmed reads were mapped to the
A. fumigatus Af293 genome using the Burrows-Wheeler Aligner (BWA) v0.7.17 (
84) with the mem parameter. Thereafter, mapped reads were converted to a sorted bam and mpileup format for polymorphism identification using SAMtools v.1.3.1 (
85).
To identify SNPs and indels, mpileup files were used as input into VarScan v2.3.9 (
86), with the mpileup2snp and mpileup2indel functions, respectively. To ensure only confident SNPs and indels were identified, a Fisher’s exact test
P value threshold of 0.05 and minimum variant allele frequency of 0.75 were used. The resulting Variant Call Format files were used as input to snpEff v.4.3t (
87), which predicted their functional impacts on gene function as high, moderate, or low. To identify CN variants, the sorted bam files were used as input into Control-FREEC v9.1 (
88,
89). The coefficientOfVariation parameter was set to 0.062 and window size was automatically determined by Control-FREEC. To ensure high confidence in CN variant identification, a
P value threshold of 0.05 was used for both the Wilcoxon rank sum and Kolmogorov Smirnov tests.
To identify evidence of putative pseudogenization between reference strains A1163 and Af293, we used a previously established approach (
19,
90). More specifically, we compared lengths of gene pairs as a proxy for pseudogenization. A gene was considered a putative pseudogene in one of the strains if the gene was 70% the length of its reciprocal best blast hit in the other strain.
Maximum likelihood molecular phylogenetics.
To taxonomically identify the species of
Aspergillus sequenced, we conducted molecular phylogenetic analysis of two different loci and two different data sets. In the first analysis, the nucleotide sequence of the alpha subunit of translation elongation factor EF-1,
tef1 (NCBI accession:
XM_745295.2), from the genome of
A. fumigatus Af293 was used to extract other fungal
tef1 sequences from NCBI’s fungal nucleotide reference sequence database (downloaded July 2020) using the blastn function from NCBI’s BLAST+, v2.3.0 (
91).
Tef1 sequences were extracted from the CAPA isolates by identifying their best BLAST hit. Sequences from the top 100 best BLAST hits in the fungal nucleotide reference sequence database and the four
tef1 sequences from the CAPA isolates were aligned using MAFFT v7.402 (
92) using previously described parameters (
93) with slight modifications. Specifically, the following parameters were used: –op 1.0 –maxiterate 1000 –retree 1 –genafpair. The resulting alignment was trimmed using ClipKIT v0.1 (
19) with default “gappy” mode. The trimmed alignment was then used to infer the evolutionary history of
tef1 sequences using IQ-TREE2 (
94). The best-fitting substitution model—TIM3 with empirical base frequencies, allowing for a proportion of invariable sites, and a discrete Gamma model (
95,
96) with four rate categories (TIM3+F+I+G4)—was determined using the Bayesian information criterion. In the second analysis, the same process was used to conduct molecular phylogenetic analysis using calmodulin nucleotide sequences from
Aspergillus section
Fumigati species and
Aspergillus clavatus, an outgroup taxon, using sequences from NCBI that were made available elsewhere (
97). For calmodulin sequences, the best-fitting substitution model was TNe (
98) with a discrete Gamma model with four rate categories (TNe+G4). Bipartition support was assessed using 5,000 ultrafast bootstrap support approximations (
99).
To determine which strains of
A. fumigatus the CAPA isolates were most similar to, we conducted phylogenomic analyses using the 50
Aspergillus proteomes. To do so, we first identified orthologous groups of genes across all 50
Aspergillus strains using OrthoFinder 2.3.8 (
100). OrthoFinder takes as input the proteome sequence files from multiple genomes and conducts all-versus-all sequence similarity searches using DIAMOND v0.9.24.125 (
101). Our input included 50 total proteomes, where 47 were
A. fumigatus, two were
A. fischeri, and one was
A. oerlinghausenensis (
40,
42,
45). OrthoFinder then clusters sequences into orthologous groups of genes using the graph-based Markov Clustering Algorithm (
102). To maximize the number of single-copy orthologous groups of genes found across all input genomes, clustering granularity was explored by running 41 iterations of OrthoFinder that differed in their inflation parameter. Specifically, iterations of OrthoFinder inflation parameters were set to 1.0 to 5.0 with a step of 0.1. The lowest number of single-copy orthologous groups of genes was 3,399 when using an inflation parameter of 1.0; the highest number was 4,525 when using inflation parameter values of 3.8 and 4.1. We used the groups inferred using an inflation parameter of 3.8.
Next, we built the phylogenomic data matrix and reconstructed evolutionary relationships among the 50
Aspergillus genomes. To do so, the protein sequences from 4,525 single-copy orthologous groups of genes were aligned using MAFFT v7.402 (
92), with the following parameters: –bl 62 –op 1.0 –maxiterate 1000 –retree 1 –genafpair. Next, nucleotide sequences were threaded onto the protein alignments using function thread_dna in PhyKIT v0.0.1 (
103). The resulting codon-based alignments were then trimmed using ClipKIT v0.1 (
104), using the gappy mode. The resulting aligned and trimmed alignments were then concatenated into a single matrix with 7,133,367 sites using the PhyKIT function create_concat. To reconstruct the evolutionary history of the 50
Aspergillus genomes, a single best-fitting model of sequence substitution and rate heterogeneity was estimated across the entire matrix using IQ-TREE2 v.2.0.6 (
94). The best-fitting model was determined to be a general time reversible model with empirical base frequencies and invariable sites with a discrete Gamma model with four rate categories (GTR+F+I+G4) (
96,
105–107) using the Bayesian information criterion. During tree search, the number of candidate trees maintained during maximum likelihood tree search was increased from five to ten. Five independent searches were conducted and the tree with the best log-likelihood score was chosen as the “best” phylogeny. Bipartition support was evaluated using 5,000 ultrafast bootstrap approximations (
99).
Biosynthetic gene cluster prediction.
To predict BGCs in the genomes of
A. fumigatus strains Af293 and the CAPA isolates, gene boundaries inferred by MAKER were used as input into antiSMASH v4.1.0 (
108). Using a previously published list of genes known to encode BGCs in the genome of
A. fumigatus Af293 (
42), BLAST-based searches using an expectation value threshold of 1 × 10
−10 were used to identify BGCs implicated in modulating host biology using NCBI’s BLAST+ v2.3.0 (
91). Among predicted BGCs that did not match the previously published list, we further examined their evolutionary history if at least 50% of genes showed similarity to species outside the genus
Aspergillus, which is information provided in the antiSMASH output. Using these criteria, no evidence suggestive of horizontally acquired BGCs from distant relatives was detected.
Characterization of biosynthesized secondary metabolites.
(i) General experimental procedures. HRESIMS experiments utilized a Thermo LTQ Orbitrap XL mass spectrometer equipped with an electrospray ionization source. A Waters Acquity UPLC (Waters Corp.) was utilized using a BEH C18 column (1.7 μm; 50 mm × 2.1 mm) set to a temperature of 40°C and a flow rate of 0.3 ml/min. The mobile phase consisted of a linear gradient of CH3CN-H2O (both acidified with 0.1% formic acid), starting at 15% CH3CN and increasing linearly to 100% CH3CN over 8 min, with a 1.5 min hold before returning to the starting condition.
(ii) Growth and extraction of fungal cultures. To identify the chemical differences between the various A. fumigatus strains and isolates (Af293, CEA10, CAPA isolates A, B, C, and D), they were grown in a clinically relevant growth condition (37°C) and extracted for chemometric analysis. Czapek Dox agar (Sigma-Aldrich) petri plates were inoculated with the asexual spores of each strain in biological triplicates. Subsequently, the plates were incubated at 37°C in the dark for 3 days. The cultures were extracted by chopping and transferring the agar to 20-ml scintillation vials, adding 10 ml of acetone, thoroughly shaking, and then letting the samples sit for 4 h. Last, the cultures were filtered and evaporated to dryness under nitrogen gas.
(iii) Metabolomics analyses. Principal-component analysis (PCA) was performed on the ultraperformance liquid chromatography-mass spectrometry (UPLC-MS) data. Untargeted UPLC-MS data sets for each sample were individually aligned, filtered, and analyzed using MZmine 2.20 software (
https://sourceforge.net/projects/mzmine/) (
109). Peak detection was achieved using the following parameters: noise level (absolute value), 1.5 × 10
5; minimum peak duration, 0.05 min;
m/z variation tolerance, 0.05; and
m/z intensity variation, 20%. Peak list filtering and retention time alignment algorithms were used to refine peak detection. The join algorithm integrated all sample profiles into a data matrix using the following parameters:
m/z and retention time balance set at 10.0 each,
m/z tolerance set at 0.001, and RT tolerance set at 0.5 min. The resulting data matrix was exported to Excel (Microsoft) for analysis as a set of
m/z-retention time pairs with individual peak areas detected in quadruplicate analyses. Samples that did not possess detectable quantities of a given marker ion were assigned a peak area of zero to maintain the same number of variables for all sample sets. Ions that did not elute between 2 and 8 min and/or had an
m/z ratio less than 100 or greater than 1,200 Da were removed from analysis. Relative standard deviation was used to quantify variance between the technical replicate injections, which may differ slightly based on instrument variance. A cutoff of 1.0 was used at any given
m/z-retention time pair across the technical replicate injections of one biological replicate and, if the variance was greater than the cutoff, it was assigned a peak area of zero (
110). Final chemometric analysis, including data filtering and PCA, was conducted using Python. The PCA plots were generated using data from the averaged biological replicates from the petri dish cultures. Each biological replicate was plotted using averaged peak areas obtained across four replicate injections (technical replicates). The principal components (PC) were generated and processed via Scikit Learn decomposition and Pandas v0.25.3, Python libraries. The PCA data were plotted using Altair v4.1.0, Python graphing libraries. These data were converted into a dataframe via Pandas, and the PCs were created from the dataframe using Scikit Learn decomposition. The PCA scores and loadings plots were then plotted using the PCs dataframe that was generated from Scikit Learn.
Infection of Galleria mellonella.
Survival curves (
n ≥ 20/strain) were generated for
Galleria mellonella infected with CAPA isolates A, B, C, and D. Phosphate-buffered saline (PBS) without asexual spores (conidia) was administered as a negative control. A log-rank test was used to examine strain heterogeneity, followed by pairwise comparisons with the Benjamini-Hochberg multitest correction (
111). All the selected larvae of
Galleria mellonella were in the final (sixth) instar larval stage of development, weighing 275 to 330 mg. Fresh conidia from each strain were harvested from minimal medium (MM) plates in PBS solution and filtered through a Miracloth (Calbiochem). For each strain, the spores were counted using a hemocytometer and the stock suspension was done at 2 × 10
8 conidia/ml. The viability of the administered inoculum was determined by plating a serial dilution of the conidia on MM medium at 37°C. A total of 5 μl (1 × 10
6 conidia/larva) from each stock suspension was inoculated per larva. The control group was composed of larvae inoculated with 5 μl of PBS to observe the killing due to physical trauma. The inoculum was performed using Hamilton syringe (7000.5KH) via the last left proleg. After infection, the larvae were maintained in petri dishes at 37°C in the dark and were scored daily. Larvae were considered dead by presenting the absence of movement in response to touch.
Growth assays.
To examine growth conditions of the CAPA isolates and reference strains Af293 and CEA17, plates were inoculated with 10
4 spores per strain and allowed to grow for 5 days on solid MM or MM supplemented with various concentrations of osmotic (sorbitol, NaCl), cell wall (Congo red, calcofluor white, and caspofungin), and oxidative stress agents (menadione and t-butyl) at 37°C. MM had 1% (wt/vol) glucose, original high-nitrate salts, trace elements, and a pH of 6.5; trace elements, vitamins, and nitrate salt compositions follow standards described elsewhere (
112). To correct for strain heterogeneity in growth rates, radial growth in centimeters in the presence of stressors was divided by radial growth in centimeters in the absence of the stressor.
To determine the MICs of antifungal drugs in the CAPA isolates and reference strains Af293 and CEA17, strains were grown in 96-well plates at a concentration of 10
4 spores/well in 200 μl of RPMI 1640 supplemented with increasing concentrations of amphotericin B, voriconazole, itraconazole, and posaconazole, according to the protocol elaborated by the Clinical and Laboratory Standards Institute (
71). The MIC was defined as the lowest concentration of drugs that visually inhibited 100% fungal growth. Three independent experiments were carried out for each antifungal drug.
Data availability.
Newly sequenced genomes assemblies, annotations, and raw short reads have been deposited to NCBI’s GenBank database under BioProject accession
PRJNA673120.
Supplementary data, including tables, figures, and files, additional copies of genome assemblies, annotations, and gene coordinates, raw data, including the genome assembly and annotations of all analyzed
Aspergillus genomes, the aligned and trimmed phylogenetic and phylogenomic data matrices, polymorphisms identified in the present project, and predicted BGCs have been uploaded to figshare (
10.6084/m9.figshare.13118549).
ACKNOWLEDGMENTS
We thank Joan Bennett, Amelia Barber, and Jorge Amich for reviewing this work and for their constructive feedback. We thank the Rokas and Goldman laboratories for support of this work and their helpful insight.
J.L.S. and A.R. are supported by the Howard Hughes Medical Institute through the James H. Gilliam Fellowships for Advanced Study Program. A.R. received additional support from a Discovery Grant from Vanderbilt University, the Burroughs Wellcome Fund, the National Science Foundation (DEB-1442113), and the National Institutes of Health/National Institute of Allergy and Infectious Diseases (R56 AI146096). G.H.G. and A.D. thank Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grant numbers 2016/07870-9 and 2020/06151-4, and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), both from Brazil. R.A.C.S. was supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grant numbers 2017/21983-3 and 2019/07526-4. X.Z. is supported by the Key-Area Research and Development Program of Guangdong Province (2018B020206001). F.F. has a Clinician Scientist Position supported by the Deans Office, Faculty of Medicine, University of Cologne. C.V. is supported by FAPESP grant number 2018/00715-3. S.L.K. was supported by the National Center for Complementary and Integrative Health, a component of the National Institutes of Health, under award number F31 AT010558.
O.A.C. is supported by the German Federal Ministry of Research and Education, is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy (CECAD, EXC 2030, 390661388), and has received research grants from, is an advisor to, or received lecture honoraria from Actelion, Allecra Therapeutics, Al-Jazeera Pharmaceuticals, Amplyx, Astellas, Basilea, Biosys, Cidara, Da Volterra, Entasis, F2G, Gilead, Grupo Biotoscana, IQVIA, Janssen, Matinas, Medicines Company, MedPace, Melinta Therapeutics, Menarini, Merck/MSD, Mylan, Nabriva, Noxxon, Octapharma, Paratek, Pfizer, PSI, Roche Diagnostics, Scynexis, and Shionogi. P. K. has received nonfinancial scientific grants from Miltenyi Biotec GmbH, Bergisch Gladbach, Germany, and the Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany, and received lecture honoraria from or is advisor to Akademie für Infektionsmedizin e.V., Astellas Pharma, European Confederation of Medical Mycology, Gilead Sciences, GPR Academy Ruesselsheim, MSD Sharp & Dohme GmbH, Noxxon N.V., and University Hospital, LMU Munich outside the submitted work. A. R. is a Scientific Consultant for LifeMine Therapeutics, Inc. N.H.O. is on the Scientific Advisory Board of Mycosynthetix, Inc.