INTRODUCTION
Fourier-transform infrared (FT-IR) spectroscopy is an analytical technique, where the interaction of the infrared light with the bacterial cell provides a biochemical fingerprint of its composition in main macromolecules. The high resolution, together with the short time-to-result, and lower cost compared to whole-genome sequencing (WGS) and the simplicity of the procedure make it an attractive tool for bacterial discrimination and typing (
1).
Different studies have demonstrated resolution for strain typing in different clinically relevant species, corroborated by WGS analysis (
2 – 7). Moreover, we have provided a comprehensive analysis of the molecular features at the basis of spectral discrimination in different bacterial species (
Klebsiella pneumoniae,
Acinetobacter baumannii,
Salmonella enterica, and
Escherichia coli), contributing to the establishment of reliable genotypic–phenotypic correlations that support strain typing (
2,
8 – 10). Subsequent studies have demonstrated its usefulness for outbreak management and infection control (
11 – 15).
The application of FT-IR spectroscopy in routine clinical microbiology laboratories for outbreak investigation has been facilitated by the IR Biotyper integrated system launched by Bruker Daltonics (Germany) in 2017. This equipment is based on transmission FT-IR, an acquisition mode where the infrared beam crosses the sample to reach the detector (
16). It requires the preparation of a standardized bacterial suspension that needs to be dried uniformly before spectrum acquisition (
17,
18). The available software package allows comparative spectral analysis by using a clustering method to infer clonal relatedness; thus, the cutoff definition requires expertise since it is variable according to the data set or the species analyzed (
3,
18,
19). Besides, several authors reported difficulties in standardization and reproducibility, with the variation being associated with culture conditions (culture media and incubation time) (
17,
19). Available studies have demonstrated reliability in specific and variable experimental conditions, but evaluation of the reproducibility between equipments is still lacking. FT-IR instruments with the attenuated total reflection (ATR) acquisition mode are also widely used, where the infrared light interacts with the sample through an evanescence wave that targets the detector (
2,
8 – 10,
20,
21). In this case, a bacterial colony is directly applied to the ATR crystal, and the spectra are immediately acquired, avoiding additional reagents and time in the preparation of a bacterial suspension. For these reasons, it has been associated with a lower cost, easiness of the procedure, and a higher reproducibility (
1).
K. pneumoniae is a critical pathogen identified by the World Health Organization and the European Centre for Disease Prevention and Control, for which the increasing rates of resistance to last-line beta-lactams and other antibiotics are mainly due to nosocomial spread (
22). For these reasons, a reliable and quick method for strain typing is especially critical for early and effective infection control. In previous studies, we used FT-IR ATR to demonstrate the accuracy of the technique for
K. pneumoniae (Kp) capsular (K)-typing by directly testing one bacterial colony (no suspension is required) and using an in-house spectral database and a machine-learning classification model (
2,
11,
12,
23). The workflow used requires expertise in FT-IR, multivariate data analysis algorithms, and high-level programming knowledge to work on MATLAB (MathWorks, USA). Over the last years, we have shown that this workflow can reliably support outbreak control (
11,
12) and epidemiological surveillance in humans (
23) and animals (
24,
25).
However, the translation of FT-IR to routine microbiology laboratory workflows for quick bacterial typing depends on (i) high resolution and accuracy for identification of bacterial lineages defined by reference typing methods; (ii) solid demonstration of spectral reproducibility between instruments throughout time; (iii) ability to provide meaningful and immediate information for infection control and surveillance; and (iv) an automated workflow accessible for non-expert users. In this study, we developed a quick, automated, and reproducible FT-IR ATR workflow for the identification of multidrug resistant (MDR) K. pneumoniae clinically relevant lineages, which provides meaningful information to support outbreak management and epidemiological surveillance in a simple and user-friendly manner on the same day that it is detected in the laboratory.
RESULTS
Improvement of database coverage and robustness
The updated RF classification model includes >2,000 spectra from 293
K.
pneumoniae isolates belonging to 33 different KL-types. This represents a ~70% increase compared to our previous database (from 19 to 33 KL-types) due to the inclusion of 14 new KL-types (KL9, KL21, KL13, KL15, KL22.37, KL23-like, KL25, KL30, KL38, KL45, KL81/KL120, KL102, and KL125/KL114) in the current model (
Table 1) (
2). Besides, we improved the robustness of most previously existing classes by increasing the number of isolates per class on average by 130% (13%–300%), especially those that were poorly represented before (KL2, KL27, KL63, or KL107). The updated RF classification model, including the 33 KL-types, allowed 90% correct predictions in an internal cross-validation step (
Fig. S1).
In most classes (
n = 21/33; 64%), each KL-type was linked to a unique ST (e.g., KL19-ST15, KL107-ST302, KL64-ST147, KL105-ST11, and KL62-ST348) and, occasionally, unique cgMLST types representing lineages circulating in wide geographic areas (
Table S1). In some cases (KL23, KL24, KL27, and KL38), isolates were associated with diverse ST, O-, or cgMLST-types (
Table 2). We thus hypothesized that FT-IR could discriminate O-antigen variation. To evaluate this possibility, we created partial-least squares discriminant analysis (PLS-DA) models to improve discrimination within KL23/KL38, KL24, and KL27. These models yielded 96%–100% correct predictions in an internal cross-validation test (
Fig. 2; Fig. S2). Furthermore, among isolates predicted
in silico as KL23 by KAPTIVE (
https://kaptive.holtlab.net), FT-IR distinguished a variant profile (designated KL23-
like) suggesting capsular biochemical variation (
Table 2). Thus, by increasing the discriminatory power within KL-types, we have the potential to distinguish by FT-IR up to 36
K.
pneumoniae lineages that are frequently associated with multidrug resistance patterns, high transmissibility, colonization, and/or persistence (
Table S1).
Validation of the RF classification model
The positive controls from the validation set (n = 204 isolates) represented 22 out of the 33 KL-types. Most (90%; n = 183/204) of these isolates were identified correctly, and a large proportion of these (95%; n = 175/184) yielded a probability score (P1) >25% and a P1-P2 difference >10% (Table S2). Henceforth, we set these parameters to distinguish TP from FP results. In accordance, the Simpon’s index of diversity for FT-IR was 0.894 (CI = 0.872–0.916), the adjusted Rand was 0.911, and the adjusted Wallace was 0.947 (0.926–0.968). False-negative results represented 12% of the sample, and a fraction of these (32%; n = 8/25) corresponded to isolates that were correctly identified but did not meet the set criteria. The remaining belonged to variable (n = 9) KL-types, such as KL23-like (31%; n = 4/13) or KL30 (27%; n = 3/11), being more frequently misidentified. As explained above, differentiation between KL23-like isolates has already been improved in a specific model (96% correct predictions, Fig. S2C). It is of note that 56% of false-negative results were correctly identified when re-evaluated after re-isolation in Columbia agar plates with 5% sheep blood.
The negative controls from the validation set consisted of 76 isolates belonging to 33 different KL-types/wzi alleles absent from the RF model (n = 1–15 isolates each; average 4). Using the set criteria, 92% (n = 70/76) of these were correctly excluded. False positives (n = 6) were recorded for five different KL-types. All of them yielded high scores and represent KL-types not yet characterized biochemically (e.g., two isolates with the cps genotyped as KL139 were classified as KL27) (Table S2).
Considering the whole validation set, we obtained an accuracy rate of 89%, a sensitivity of 88%, and a specificity of 92% with the established workflow. Colony application, spectral acquisition, and automated KL prediction through Clover MS Data Analysis software yielded a time-to-result was of 5 min/isolate.
Repeatability and reproducibility of the workflow
We obtained 98% of correct predictions from biological replicates over time (Equipment 1) and 100% from spectra obtained in a different FT-IR instrument (Equipment 2) (Table S3).
Using Columbia Agar plates with 5% sheep blood and the same prediction scores (> or =25% for P1 and P1–P2 > 10%), the accuracy (86%) and sensitivity (87%) were similar to those obtained with Mueller-Hinton, but the specificity was lower (83%) (Table S4). In accordance, Simpon’s index of diversity for FT-IR was 0.903 (CI = 0.872–0.934), the adjusted Rand was 0.831, and the adjusted Wallace was 0.881 (0.901–0.947). Sporadic false-negative (n = 10) and false-positive (n = 4) results were observed for strains belonging to 10 different KL-types. Of note, a high proportion (60%) of false negatives resulted correctly predicted when tested in the Mueller-Hinton media.
DISCUSSION
In this study, we developed a quick, automated, and reproducible FT-IR ATR workflow for typing up to 36 clinically relevant
K. pneumoniae lineages frequently associated with multidrug resistance. Though KL2 is included in the model, hypervirulent genetic backgrounds and other typical KL-types (e.g., KL1) were not represented since they are infrequent in nosocomial infections in Europe (
28). The method is based on the recognition of biochemical patterns associated with the KL-type and, in some cases, of variable KL- and O-type combinations of specific lineages within the same (ST15-KL24 or ST15-KL112) or different STs (ST15-KL24-O1 from ST45-KL24-O2 or ST11-KL27-O2 from ST392-KL27-O4). These data correlate well with those of lineages defined by whole-genome sequencing, though not always at the core genome MLST level (this study) (
4,
29). Since intrahospital transmission is dominated by a few highly transmissible clonal lineages carrying the same capsular locus (
2,
22,
30 – 33), assigning isolates to the same KL-type is highly suggestive of genetic relatedness and enough to support effective and real-time infection control (
11,
12).
In fact, the high accuracy and sensitivity (~88%) obtained assure that few closely related isolates eventually involved in an outbreak will be missed, while most (if not all) unrelated isolates are discarded. In the context of an outbreak or cluster investigation, early elimination of isolates that are different from each other, as soon as they are detected by an antibiogram or other phenotypic methods, is a very useful tool for infection control teams. For these reasons, we propose the use of FT-IR as a screening tool for clustering and identification of closely related isolates upfront WGS (i) to support early infection control measures based on typing information obtained at the same time as bacterial identification and (ii) to reduce the number of isolates to be sequenced by WGS for a deeper epidemiological analysis. This would decrease the workload, time, and cost associated with typing (
Fig. 1), not only for
K. pneumoniae but also for other species of public health interest such as
S. enterica,
A. baumannii, or
E. coli for which proof-of-concept studies are available (
8 – 10,
18). Furthermore, the method can also be useful for public health surveillance in humans (
23), animals (
24,
25,
34), or water environments (
35).
Pattern recognition techniques are increasingly being explored in other microbiology diagnostic areas such as MALDI-TOF MS-based species differentiation or antibiotic resistance prediction (
36 – 39). Similar strategies have also been used for FT-IR-based serotyping in
Streptococcus pneumoniae (
40),
S. enterica (
41), or
Staphylococcus aureus (
6) and differentiation of
Enterococcus sp. (
21). and yeasts (
20). These applications are based on machine-learning classification models (using O or K antigens as classes) that are trained using a well-known spectral data set that is validated by challenging with new input data. We used RF considering the low risk of overfitting with the training set and the easiness to determine feature importance (
42). We are aware that speed of analysis might be compromised in larger data sets, but improved RF algorithms might represent a solution (
43). Hence, machine-learning will be crucial for future developments of the method, which include (i) the expansion of current databases for other lineages, including hypervirulent
K. pneumoniae, as well as for other clinically relevant bacteria, (ii) validation in larger samples and in real-time contexts, and (iii) exploring the adequacy and limits of spectral databases that represent a historical record of an institution or a given geographic region. Therefore, a classifier, such as the one created here, must be periodically retrained and adapted to accommodate the
K. pneumoniae lineages prevalent in the local area, the specific needs and strategies of a given setting or institution, and remain responsive to changes in the bacterial population over time. Hence, larger-scale validation studies are currently underway to optimize spectral databases and/or models, which will be openly shared with the scientific community to foster continued improvement and innovation. Once a machine-learning algorithm is trained and accessible through a user-friendly platform, users can employ the established workflow to obtain typing information without the need for expertise in spectral data analysis, similar to the experience with MALDI-TOF MS.
Notably, we showed that the spectral information required for typing is stable across time, instruments, and culture media. Robustness of ATR FT-IR has been previously demonstrated for yeast identification (
44) and is associated with direct colony analysis and the use of a classification model, which prevents the inconsistencies associated with sample preparation and cluster cutoff definition described for IR Biotyper (Bruker Daltonics, Germany) (
17 – 19). False-negative results belonged to scattered isolates from different KL-types, most of which were correctly predicted in a different culture medium. Thus, to maximize both speed and sensitivity, we recommend testing directly in Columbia Agar with 5% sheep blood and re-test poorly predicted isolates in Mueller-Hinton, after overnight culturing. Moreover, when misidentifications occurred with highly related KL-types, subsequent models improved discrimination and accuracy (e.g., KL23-like), a strategy that has been used previously (
45). On the other hand, most false-positive results were obtained for non-characterized KL-types, suggesting a high relatedness to known capsules. Hence, FT-IR spectral information can also be used to confirm or disregard
in silico predictions based on capsule genotype (
cps) (
46) or eventually to depict evolutionary events involving the capsule that can occur
in vivo (
47 – 49).
The workflow developed is comparable to that of MALDI-TOF MS, using directly the bacterial colony and obtaining the result in <5 min, including from the Columbia Agar culture isolated directly from the clinical sample. Not only the simplicity of the protocol and automated data analysis make this technique suitable for non-expert users, but also the extraordinary short time-to-response represents a great advantage when compared with that of in-house implemented whole-genome sequencing (usually 48–72 h) (
Fig. 1). Furthermore, the possibility to obtain typing information the same day the bacteria are identified constitutes a hallmark of infection control. The Clover MS Data analysis software is simple and flexible and does not require knowledge on spectral data analysis, allowing non-expert users to type through a user-friendly workflow (
36,
37). Developments from this study (e.g., spectral processing workflow, algorithm development, and data visualization) were already incorporated into the software, which is available to potential users by subscription. Different entry-level FT-IR ATR instruments from different manufacturers (e.g., PerkinElmer, Thermo-Fisher, and Shimadzu) can be used. The cost of these instruments is lower than that of other specialized equipments (IR Biotyper, MALDI-TOF MS, Illumina, and MinION), and the costs of the reagents are negligible, turning the method especially attractive for low-resource settings (
44).
In conclusion, we demonstrated that FT-IR ATR spectroscopy is an accurate, quick, and reproducible tool providing meaningful and accurate information at a very early stage (at the same time as bacterial identification) to support infection control and public health surveillance. Furthermore, the high robustness of the established workflow together with the availability of spectral databases and/or ML models through flexible and user-friendly platforms (Clover MS Data analysis or others) will facilitate adoption of the method and provide opportunities to enhance and consolidate real-time applications at a global level.