INTRODUCTION
The arrival of SARS-CoV-2 over a short interval from late 2019 to early 2020 overwhelmed medical care providers worldwide. The rapid development of SARS-CoV-2 assays was a key element of the pandemic response, first for limiting transmission and soon after for identifying patients that could benefit from convalescent plasma and remdesivir therapy. Optimal allocation of these initially scarce agents required physicians to decide which patients were most likely to benefit. Estimating patient risk was further pressured by the need to administer antiviral agents early, often before respiratory failure was evident (
1–6), as is typical of respiratory virus infections (
7). As we endeavor to minimize morbidity and mortality in the next pandemic, rapidly ascertaining disease risk is an important area of emphasis.
Early in the COVID-19 pandemic, disease progression risk understandably relied upon readily accessible baseline clinical characteristics such as age, sex, or diabetes diagnosis (
8). A recent retrospective chart review of 969 inpatients during the early pandemic indicates that further prognostic information was accessible from additional clinical data beyond baseline clinical characteristics (
9). New laboratory biomarkers of risk also have potential to improve prognostic accuracy, but with a new disease, it is difficult to ascertain which of the thousands of possible analytes are most useful. One potential source of risk-associated laboratory biomarkers is the metabolome, the population of small molecules whose presence and abundance in patients reflect a broad array of physiological processes and exposures (
10). Of note, clinical and research laboratories currently have the potential to expeditiously identify and deploy metabolome-derived prognostic signatures using the same liquid chromatography-mass spectrometry (LC-MS) profiling approaches currently applied to identify inborn errors of metabolism or drug and toxin exposures. These resources could be useful early in an epidemic when resource limitations imposed by geography and infectious agent precautions (specimen shipping, phlebotomist exposure, etc.) necessitate on-site discovery and validation.
In this study, we used LC-MS to identify metabolomic correlates of severe COVID-19 progression in patients presenting for evaluation at a U.S. academic medical center during local SARS-CoV-2 arrival, prior to vaccine availability. We obtained patient urine specimens upon emergency department admission and assessed their temporal association with severity-defining endpoints of respiratory failure or death. A metabolomic signature associated with severe COVID-19 progression in subjects enrolled early in the study was validated via blinded analysis in later enrollees. The severity-associated signature is composed of three metabotypes, two of which preceded onset of severe disease and one that preferentially appeared after intubation. This prototype assay exhibits a stronger prognostic potential for severe outcomes than a panel of clinical risk factors.
DISCUSSION
In this study, we describe an accessible, patient-based approach to rapid prognostic LC-MS/MS assay development for a new pandemic infection. If completed in time for use with COVID-19 patients in 2020, this approach might have improved triage decisions and optimized allocation of limited convalescent plasma and remdesivir. Entities charged with pandemic preparedness should consider expanding diagnostic efforts beyond pathogen identification alone to include prognostic assays compatible with existing laboratory equipment.
Relative to previously published metabolomic studies in COVID-19 patients, this study is distinguished by its clinically valid cohorts of patients presenting for medical evaluation, use of instrumentation accessible to clinical laboratories, feature selection yielding a simplified metabolite signature, blinded prospective validation, and careful attention to the relationship between metabolites and endpoint timing (
20–23). In the context of a new pandemic, accessibility of specimens prior to severe symptoms is challenging. With limited resources, we elected to implement this study in a hospital setting where tests were available, specimens could be safely collected, and outcomes could be reliably discerned. Because we reasoned that stable predictive markers would persist in patients after intubation, we elected to include subjects with post-intubation specimens in our cohorts. Inclusion of these specimens ensured timely recruitment of subjects and permitted temporal analyses (
Fig. 5) to identify lagging, non-predictive, indicators such as metabotype 3. When the 20 post-intubation specimens (32% of subjects) were excluded from the validation cohort, the model exhibited a non-significant trend (receiver-operator characteristic area under the curve [ROC AUC] 68.2%, 95% CI: 49.4–84.6) toward prediction of severe outcomes. Given that metabotypes 1 and 2 are not temporally associated with intubation (
Fig. 5), we regard this result as a probable Type II error. We advise future biomarker studies to carefully monitor the timing of specimen collection relative to key events in infection pathophysiology. Future studies may also consider longer term follow-up to identify markers for post-infectious sequelae. Any of the three severity-associated metabotypes identified here could conceivably modify the risk of post-COVID conditions, which are affected by the course of early infection (
24–27).
Several aspects of our approach are critical to its utility in a future pandemic. Primary among these is prompt establishment of an approved protocol for collecting specimens and longitudinal clinical characteristics. In the present study, this began in early March 2020, anticipating a surge of cases and a limited supply of antiviral therapies. The delay of the present study resulted in part from the recent development of the data analysis methods used here (
12,
13,
28) and the need to curate the patient data, though other methods could have potentially been considered (
11,
29). The present report was also delayed by efforts to determine the identities of metabolites used in the prototype assay, which we felt would be useful for interpreting the results. We calculate that our present approach would require 100–200 patients in a similar future pandemic, assuming that prognostic metabolites exist and exhibit log-normal distributions (Fig. S24). More advanced mass spectrometers with reduced data variability could reduce study size requirements. Specimen accrual and disease-specific patient data collection are likely to be rate-limiting steps.
Each of the three severe COVID-19 metabotypes suggests unique insights into physiologic processes or exposures that affect COVID-19 severity. The levetiracetam metabolite defining metabotype 1 was unexpected and raises important questions about its association with COVID-19 severity. Of the two subjects newly treated with levetiracetam prior to sample collection, only one had generalized seizure attributed to COVID-19. The majority (22/24, 92%) of subjects with detectable levetiracetam metabolite had been receiving it for seizure prophylaxis. Levetiracetam may be a marker for patient conditions that predispose to COVID-19 progression, a direct biological modifier of COVID-19 progression risk, or a combination thereof. Indeed, early levetiracetam efficacy trials conducted before the COVID-19 pandemic identified a positive association between levetiracetam administration and other respiratory infections (
30). The levetiracetam metabolite does not exclusively drive the present model, which remains accurate even after holding out patients in whom it was detected (Fig. S21). Further study, such as a retrospective patient data analysis, would be necessary to determine the degree to which seizure disorders or related conditions, levetiracetam, or a class effect among anticonvulsants contribute to COVID-19 severity risk.
The nature of metabotype 2 remains unclear. While previously detected in human feces (
31), 1-phenyl-2-hexanone is not clearly an intestinal or microbiome-derived product. Notably, metabotype 2 exhibits a distinctive association with age, a frequently applied proxy for COVID-19 risk. Its trend toward superiority over age as a severity predictor (Fig. S25) raises the possibility that it represents an age-associated physiological process that modifies COVID-19 risk. Given the widely noted role of age as a risk factor in the COVID-19 pandemic, the nature of this metabotype merits additional study.
Because metabotype 3 was preferentially detected in patients with established respiratory failure, we did not regard it as a strong prognostic candidate. This affects the metabotype’s potential to inform COVID-19 antiviral therapy, which is effective in most patients only when administered early in disease, as with other acute respiratory viruses (
4,
6,
32,
33). Hydroxyenterodiol, the tentatively-identified feature defining metabotype 3, is derived from enterodiol, which is produced by the intestinal microbiota. It is plausible that intestinal inflammation in some forms of severe COVID-19 leads to increased enterodiol permeability and increased hydroxylation by host P450 enzymes. If metabotype 3 represents an inflammatory endotype in COVID-19 patients, it may instead help inform selection of immune modulating therapies (
34,
35).
Limitations of this single-site study include its generalizability to other sites and its prospective applicability, both likely concerns for any evolving pandemic illness. At the time of this writing, population-wide vaccine and convalescent immunity to SARS-CoV-2 and antiviral supply is more robust than that in 2020, limiting the current clinical value of the prototype assay. This could change rapidly if a new SARS-CoV-2 variant emerges that evades current immunity or antiviral agents. Continued virus evolution and improved treatment strategies may also lessen the prospective validity of an assay based on results from a single site early in the pandemic. An intrinsic limitation of metabolomic data analysis is that choices made during machine learning may influence the set of biomarkers discovered, especially when multiple near-optimal sets exist. Preliminary tests (Fig. S22 and S23) suggest that the biomarker signature presented here is relatively robust to these choices. More sophisticated but less widely available instrumentation may have identified different or more complex signatures with superior sensitivity or specificity but more limited applicability. In a future pandemic, some of these limitations may be remedied by coordination between sites and access to improved instrumentation.
ACKNOWLEDGMENTS
This study utilized samples from the Washington University School of Medicine’s COVID-19 biorepository, which is supported by the following: the Barnes-Jewish Hospital Foundation, the Siteman Cancer Center grant P30 CA091842 from the National Cancer Institute of the National Institutes of Health, and the Washington University Institute of Clinical and Translational Sciences grant UL1TR002345 from the National Center for Advancing Translational Sciences of the National Institutes of Health. This research was also supported by The Longer Life Foundation (J.P.H.), National Institutes of Health RO1DK111930 (J.P.H.), an HHMI Gilliam Award (GT11504, A.L.H. and P.J.M.), and a James S. McDonnell Foundation Complex Systems Scholar Award (#220020315, P.J.M.). The content is solely the responsibility of the authors and does not necessarily represent the view of the NIH or any other agency funding this research.
We are grateful to Adriana Rauseo Acevedo and Rachel Presti for their efforts in data management and patient recruitment.
J.P.H., L.R.M., and J.I.R. developed the original concept and designed the overall study approach. J.A.O. and C.W.G. conducted enrollment, specimen collection, and clinical data collection. L.R.M. and J.I.R. conducted specimen processing and mass spectrometric analyses, prepared the data, and interpreted spectra. J.I.R., A.L.H., L.R.M., P.J.M., and J.P.H. analyzed the data and wrote the manuscript.