INTRODUCTION
The human infant nasopharynx is an important site of colonization for disease-associated commensal bacteria, including
Haemophilus influenzae,
Moraxella catarrhalis,
Staphylococcus aureus, and
Streptococcus pneumoniae. Colonization is highly dynamic and develops with age (
1–3). The infant bacterial microbiota is influenced by the mode of delivery (
4), breastfeeding (
5), socio-economic status (
6), ethnicity (
7), innate immunity receptor gene polymorphisms (
8), and geolocation (
9). Bacterial colonization may also follow seasonal patterns and be impacted by viral co-infection (
10–12). Differences in microbiota profiles may have impact on the susceptibility to respiratory tract infections (
3).
Nasopharyngeal colonizers are an important reservoir of antimicrobial resistance (AMR). Antibiotic administration for acute otitis media reduces colonization by antimicrobial-susceptible organisms (
13,
14). AMR may also increase in nasopharyngeal organisms because of exposure to antibiotics used for prophylaxis or treatment of infections in a distant site (
15). For example, prophylaxis to prevent
Pneumocystis jirovecii infection in HIV-positive infants results in colonization by co-trimoxazole-resistant pneumococci (
16).
Studies of the nasopharyngeal microbiota have historically been culture based with a focus on a limited range of organisms with pathogenic potential (
17–19). Expanding species detection based on multiple colony picks and conventional bacteriological techniques is possible but labor intensive (
20). Culture-based microbiota studies may be improved by the application of matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) for bacterial identification. MALDI-TOF MS systems are now commonplace in diagnostic microbiology laboratories and are capable of high-throughput, cost-effective, accurate, and rapid identification of a broad range of bacterial and fungal species (
21). A small study of broncho-alveolar lavage fluid comparing culture followed by MALDI-TOF MS against 16S sequencing direct from the specimen revealed reasonable concordance, with mostly fastidious anaerobes being missed by culture (e.g.,
Prevotella sp.) and some readily culturable organisms being missed by sequencing (
22).
In recent years, molecular approaches, i.e., amplicon sequencing of the 16S rRNA gene or full metagenomic sequencing, have become more commonplace. However, the low biomass of the nasopharynx and associated samples renders this approach challenging (
23,
24). Successful sequence-based studies from Australia (
1), the Netherlands (
11), and Thailand (
25) have all revealed a fairly small number of dominant taxa.
Targeted metagenomics, defined here as where specimens are cultured and DNA from culture plates is extracted and sequenced, is a potentially important and cost-effective way to improve resolution of sequencing for key species. It has been applied recently to determine pneumococcal colonization diversity in a cohort of mother-infant pairs on the Thailand-Myanmar border (
26), and changes in the pneumococcal resistome over time in South African infants (
27).
In this proof-of-concept study, a combined culture-based workflow using MALDI-TOF MS and targeted metagenomic sequencing was assessed using a collection of nasopharyngeal swabs (NPS) from a cohort of Cambodian children presenting to a hospital out-patient department with minor illnesses. Colonization by major lineages of
H. influenzae,
M. catarrhalis,
S. aureus, and
S. pneumoniae was explored in greater detail using the recently described, and well benchmarked, mSWEEP pipeline (
28).
MATERIALS AND METHODS
Study population
Children aged 5 months to 4 years were recruited from the Angkor Hospital for Children (AHC) out-patient department. Children were eligible for study enrolment if they presented to the hospital with a minor illness not requiring hospital admission and had not knowingly received a systemic antibiotic in the preceding 4 weeks. The sample size was not calculated formally for this proof-of-concept study, conducted in a population with a known high prevalence of pneumococcal colonization (
29,
30). Recruitment was purposive, aiming to capture 25 children who were prescribed amoxicillin during their out-patient visit and 75 children who were not prescribed an antibiotic.
Angkor Hospital for Children is a non-governmental healthcare organization located in Siem Reap, northern Cambodia (
31). The hospital provides free primary- to tertiary-level care to children <16 years old without geographic restrictions. AHC has 82 beds, with 117,232 out-patient visits and 3,189 admissions recorded in 2018. Cambodia is a lower middle-income South East Asian country, with a tropical climate. In 2018, the under-five mortality was 27.7/1,000 live births (
32). The
H. influenzae type b (Hib) vaccine was introduced in 2010 and the 13-valent pneumococcal conjugate vaccine (PCV13) followed in 2015 (
33). Vaccine coverage was 92% (Hib) and 84% (PCV13) among one-year olds in 2018 (
34).
Study procedures
At the enrolment visit, demographic, immunization, and current illness data were recorded and a flocked nylon NPS was collected (Medical Wire & Equipment, Corsham, UK). Children were followed up at six time points, at 2-week intervals, until 12 weeks post-enrolment. At each follow-up visit, details of recent illness and medications were recorded, vital signs were taken, and a further NPS was collected.
NPS were placed immediately into 1 mL sterile skim milk-tryptone-glucose-glycerol broth (STGG) and kept in a cool box before vortex mixing and separation into two 0.5-mL aliquots which were stored at −80°C within 8 hours of collection (
35). Onsite laboratory processing is summarized in Fig. S1.
Culture-based detection of colonization
NPS-STGG specimens (swab aliquot #1, containing the swab tip) were thawed, and 10 µL was cultured onto chocolate agar (CA) and 5% sheep blood agar + 5 mg/L gentamicin (BA-CN) plates. Growth was assessed after overnight incubation at 37°C in 5% CO2. Plates with poor growth were discarded and culture repeated using 100 µL NPS-STGG. Bacteriological media were prepared in-house using locally sourced antibiotic-free citrated sheep blood and commercial reagents (Oxoid, Basingstoke, UK), with the use of appropriate quality controls.
All discrete colony morphotypes from the CA plate were identified by MALDI-TOF MS (VITEK MS, Knowledge Base V3.2.0; bioMerieux, Marcy L’ Etoile, France). The in vitro diagnostic (IVD) mode was used primarily, and colonies were re-tested using the research use only (RUO) mode if an acceptable result was not obtained in the IVD mode. An acceptable result was defined as return of a single organism name with an associated confidence level. If >1 organism name from the same genus was returned, then the result was entered to the genus level (e.g., “Streptococcus sp.”). Where multiple genera were offered for a given colony pick, attempts were made to purify and retest. In the event of repeated failure to identify by MALDI-TOF MS, then just the Gram result for the colony was recorded (e.g., “Gram positive cocci”).
Specific target species were characterized in greater detail. Beta-lactamase activity was determined for
H. influenzae isolates using Cefinase disks (BBL, Becton Dickinson, Franklin Lakes, NJ, USA). Methicillin resistance was determined for
S. aureus isolates by cefoxitin disk diffusion testing (Oxoid), following 2018 Clinical and Laboratory Standards Institute guidelines (
36).
S. pneumoniae were followed up from the selective BA-CN culture plate. Identification of the dominant alpha-hemolytic colony morphotype was confirmed by MALDI-TOF MS and optochin disk susceptibility (Oxoid). The serotype was determined by latex agglutination, with confirmation by the Quellung reaction where required (
30). The penicillin minimum inhibitory concentration (MIC) was determined using the Etest method (bioMerieux) with non-susceptible defined as an MIC of ≥0.12 µg/mL (
36).
Detection of viral infections
Enrolment visit swabs were tested by PCR to detect the presence of influenza A, influenza B, and respiratory syncytial virus (RSV). Briefly, RNA was extracted from 200 µL thawed NPS-STGG (swab aliquot #2) using the Qiagen Viral RNA Mini Kit and a QIAcube instrument (Qiagen, Hilden, Germany). Multiplex real-time PCR was done using the Fast-Track Diagnostics FLU/HRSV RUO Kit (Siemens Healthcare, Erlangen, Germany) on a Bio-Rad CFX96 thermocycler (Bio-Rad, Hercules, CA, USA). All extraction and PCR work followed the manufacturer’s instructions.
Targeted metagenomic sequencing-based detection of colonization
At the same time as the primary culture work, a further 100 µL thawed NPS-STGG (swab aliquot #1) was cultured on chocolate agar at 37°C in 5% CO
2. Following overnight incubation, all colonies from the plate were scraped into 1 mL sterile phosphate buffered saline and centrifuged at full speed for 5 minutes to yield a cell pellet. Following storage at −80°C, DNA was extracted from the cell pellets using the Promega Wizard Genomic Purification DNA Kit (Promega, Madison, WI, USA), following the manufacturer’s instructions. DNA yield and quality were assessed using a BioPhotometer D30 (Eppendorf, Hamburg, Germany), before shipping to the Wellcome Sanger Institute for sequencing on the Illumina HiSeq4000 platform [150 bp paired-end reads, median 15.8 million reads per sample with inter-quartile range (IQR) 14.6–17.2 million]. Read accession numbers are summarized in Table S1. The sequencing-based analysis was performed blinded without the knowledge of the culture or MALDI-TOF MS-based results. The mSWEEP pipeline (version 1.3.2) was performed in accordance with the instructions at GitHub (
https://github.com/PROBIC/mSWEEP). In short, first, a reference database of 5,510 taxa was constructed (Table S2) and indexed with Themisto (version 0.1.0;
k = 31); then, the reads were pseudoaligned also with Themisto. Lastly, mSWEEP was used to obtain the abundances running it with the alignment and Themisto index. The reference included 91
Haemophilus influenzae, 55
Moraxella catarrhalis, 2
Moraxella canis, 13
Neisseria meningitidis, 3,041
Streptococcus pneumoniae, 2,239
Staphylococcus aureus, 447 other
Streptococcaceae genomes, and single genomes from
Cutibacterium granulosum,
Corynebacterium pseudodiphtheriticum,
Corynebacterium accolens,
Haemophilus parainfluenzae,
Neisseria bacilliformis,
Neisseria cinerea,
Neisseria dentiae,
Neisseria elongata subsp.
glycolytica,
Neisseria flavescens,
Neisseria lactamica,
Neisseria mucosa,
Neisseria perflava,
Neisseria polysaccharea,
Neisseria sicca,
Neisseria subflava,
Neisseria weaveri, “
Candidatus Ornithobacterium hominis” (
37),
Staphylococcus argenteus,
Staphylococcus epidermidis,
Staphylococcus haemolyticus,
Staphylococcus saprophyticus, and
Staphylococcus schweitzeri.
Data management and analysis
Clinical and culture-based laboratory data were recorded on paper forms and single entered into an Access 2016 database (Microsoft, Richmond, WA, USA). Automated checks for missing and out-of-range values were implemented in R (R Foundation for Statistical Computing, Vienna, Austria).
Analyses were done using R v4.2.0 (
38), with add-on packages tidyverse v1.3.1 for data visualization (
39), pheatmap v1.0.12 for hierarchical clustering (
40), and vegan v.2.6-2 for calculation of diversity indices, non-metric multidimensional scaling plots, and PERMANOVA (
41). Between-group comparisons of proportion data were done using the chi-squared or Fisher’s exact test, as appropriate. Between-group comparisons of continuous data were done using Student’s
t-test or Wilcoxon rank sum test, as appropriate.
DISCUSSION
This proof-of-concept study has demonstrated the practical utility of a combined culture, MALDI-TOF MS plus targeted metagenomic sequencing approach to the analysis of nasopharyngeal swabs. Data presented highlight the diversity and longitudinal dynamics of the aerobic nasopharyngeal bacterial microbiota in young Cambodian children. Over a 3-month period, most children were colonized by the major respiratory tract pathogens S. pneumoniae, H. influenzae, and M. catarrhalis. The mSWEEP pipeline revealed considerable within-species diversity, which was striking given the short follow-up time.
The aerobic nasopharyngeal bacterial microbiota was dominated by a small number of genera and species. Despite methodologic differences, this finding is similar to previous 16S or full metagenomic sequencing studies. Teo and colleagues identified six dominant genera (
Moraxella [31.2%],
Streptococcus [15.5%],
Corynebacterium [13.5%],
Staphylococcus [10.3%],
Haemophilus [9.7%], and
Alloiococcus/
Dolosigranulum [8.8%]) in a study of 234 Australian infants over the first year of life (
1). A study of 96 Dutch children found great inter-individual variability but with just 30 operational taxonomic units (OTU) representing almost 98% of all sequencing reads (
11). There have been very few studies done in lower income settings to compare with. However, a longitudinal 16S-based study of 21 refugee infants on the Thailand-Myanmar border found that colonization was dominated by five taxa (
Moraxella,
Streptococcus,
Haemophilus,
Corynebacterium, and “
Candidatus Ornithobacterium hominis” [
37]), with 15 OTUs accounting for 98.6% of the microbiota. In this cohort, there was development of the microbiota over time but relatively less inter-individual variation.
Lineage-level data are of critical importance for tracking bacterial outbreaks (
42), vaccine impact (
43), and AMR (
44,
45). Traditional colony pick whole genome sequencing (WGS) remains appropriate for analyzing isolates from clinical infection episodes. However, there are limitations using this approach for situations where multiple strains may co-exist such as in the nasopharynx. Multiple colony picks are unlikely to capture full diversity (
46), and metagenomic sequencing has become the preferred approach. However, the costs and the low biomass of the nasopharynx makes this challenging. The present study, in agreement with recent findings from the Thailand-Myanmar border which focused entirely on pneumococcal co-colonization (
26), confirms the potential for targeted metagenomic sequencing from an initial bacterial culture plate, an approach which resolves both the cost and biomass issues.
There were several limitations to the study. The sample size and follow-up duration were relatively small, limiting the possibilities for definitive analysis of the associations between clinical and environmental factors and the microbiota. Anaerobic culture was not attempted, resulting in an absence of such organisms from the data set, which limited the possibility for detection of inter-species interactions. The selection of a single chocolate agar plate as the enrichment step for mSWEEP work may have also resulted in sub-optimal detection of some aerobic species, notably
S. pneumoniae, which was more frequently detected by culture and MALDI-TOF where an additional selective blood agar plate culture was included to ensure pneumococcal colonies were identifiable for serotyping. The enrichment culture step resulted in an absence of absolute abundances for the targeted metagenomic sequencing data, limiting the granularity of analyses of inter-species interactions and temporal colonization dynamics. The MALDI-TOF MS identification results were not confirmed directly by conventional or molecular microbiology, except for
S. pneumoniae, which prevented comment on the overall accuracy of MALDI-TOF MS for identification of upper respiratory organisms. This is of relevance given the plethora of closely related streptococcal species which have been a challenge to identify definitively, including by MALDI-TOF MS (
47,
48). Addition of colony pick WGS data would have been valuable. Finally, the mSWEEP database was incomplete, resulting in comparisons between MALDI-TOF MS and mSWEEP being limited to 21 species. Additionally, the small number of reference genomes available for
H. influenzae and
M. catarrhalis made it impossible to accurately resolve strain-level carriage dynamics, which was also true to a lesser degree for
S. aureus. Moving forward, demix_check, a recently described add-on tool to the mSWEEP and mGems pipelines will help with improving removal of spurious multiple colonization detection occurring because of the lack of suitable references. Ongoing efforts to sequence large collections of
H. influenzae and
M. catarrhalis carriage isolates will also improve future database coverage and thus strain-level identification. Despite these limitations, the study has demonstrated the value of this analytic approach for the study of the dominant and disease-associated members of the nasopharyngeal microbiota. The culture + MALDI-TOF component provided an assessment of the breadth of colonization while the culture + mSWEEP work assessed intra-species diversity. Future studies should select the workflow component appropriate for the scientific question to be addressed.
Conclusions
Culture of nasopharyngeal swabs followed by MALDI-TOF MS and targeted metagenomic sequencing was an effective method to determine major components of the bacterial microbiota and within-species diversity. Used at scale, this approach will be useful for determination of impacts on the bacterial microbiota of environmental factors and clinical interventions, such as antibiotics and vaccines.
ACKNOWLEDGMENTS
The authors are grateful for the critical review of the manuscript by David Aanensen and Mogens Kilian.
This research was funded in whole, or in part, by the Wellcome Trust (grant numbers 206194 and 220211). J.C. was funded by the ERC (grant number 742154) and by Norwegian Research Council FRIPRO (grant number 299941).
For the purpose of open access, the author has applied a CC BY public copyright license to any Author-Accepted Manuscript version arising from this submission.
The authors have no competing interests to declare.