The enormous progress in biotechnological and computational techniques over the last few decades has revolutionized our understanding of microbial communities. In particular, studies based on amplicon and metagenomic sequencing have further clarified the fact that microbiomes are not static entities, but dynamic ecosystems whose constituent members interact in myriad ways with each other and with their environments (
1–6). These sequence-based approaches have catalyzed early efforts to map microbial interrelationships and represent an invaluable first step in identifying the organisms participating in these interactions. However, these approaches have largely been limited to inferring microbial associations (e.g., via co-occurrences) (
7,
8). While these associations may partially reflect causal interactions between microbes, the full landscape of interdependencies is likely much richer and more nuanced in ways that we are just starting to grasp (
9–13). Indeed, interactions may be defined and measured in many different ways (
14,
15), for example, by evaluating direct contact between cells (
16,
17), physical proximity (
18,
19), the cost of producing exchanged metabolites (
20–22), and the type of chemical mediators involved (
23,
24). These factors are crucial in determining the emergence and consequences of an interaction beyond its ecological classification (mutualism, competition, etc.), and can provide a more complete view of microbial ecosystem properties, which is helpful for building mathematical models of community dynamics.
Advances in metabolomics, transcriptomics, and high-throughput culturing platforms are beginning to produce a growing body of data on the mechanisms and environmental dependencies exhibited by microbial interactions (
25–28). While this wealth of information has the potential to enhance our knowledge of specific microbial interrelationships, it poses the new challenge of finding an appropriate framework to describe interactions in a way that efficiently encompasses their diversity and complexity (
Fig. 1A) (
29–31).
Addressing this challenge would allow us to dynamically and continuously combine diverse sources of data (
Fig. 1B and
C) to yield insights that could not be obtainable from individual data sets. For example, one could use an open interaction database to easily fetch experimentally-grounded parameters (e.g., metabolite uptake/secretion rates) for simulating dynamics of microbial food webs and to predict the general conditions that determine the stability of a community (
Fig. 1D) (
32). As another example, network scientists and ecologists could use the same database to assess how widespread particular interactions are (
33), or to identify network structures that are common across biomes or taxonomic groups, refining our understanding of how microbial ecosystems assemble (
Fig. 1E) (
34,
35). These inferences could then be used to clarify the connection between causal interactions and observed co-occurrence patterns—distinct data types whose combination can advance the understanding of how different microbial relationships affect community function, dynamics, and resilience. Finally, simultaneous mining of multiple data sets would enable searches for examples of specific interactions (
Fig. 1E), to (i) identify interactions that occur robustly irrespective of biome and experimental details, (ii) facilitate the bottom-up design of synthetic consortia by complementing existing approaches (
36–38), or (iii) help create experimentally-verified data sets for benchmarking microbial inference methods (
39).
More generally, the prevalence and effects of specific interaction attributes (e.g., dependence on specific resources (
40), strain-level physiological differences (
41), and definitions of ecological outcomes (
8,
42)) across organisms and ecosystems could be quantified and compared, contributing to an enhanced understanding of the general ecological principles that govern living systems.
Despite this promising prospect, several factors complicate the integration of microbial interaction data. Among these, of particular importance is the fact that the majority of interaction data are not accessible outside the original study in which they were reported and often appear in the form of arbitrarily formatted tables. For this reason, efforts have been made to create interaction databases with standardized formatting (
Table 1) (
43–47). While these represent useful resources for finding specific interacting participants from diverse microbiomes, such databases are often limited to one or a few types of data. Nonetheless, these efforts follow in the rich history of endeavors within biology aimed at standardizing and sharing data and computational models of biological systems (
47–51). For example, one of the most prominent early cases for the need for standardization was the explosion of high-throughput data generated from DNA microarrays at the end of the 1990s. It is particularly telling that a commentary article accompanying the paper that proposed MIAME (Minimum Information About a Microarray Experiment) (
50) was titled “Microarray Standards at Last” (
52), capturing the acknowledged need for appropriate reporting standards at the time. As suggested by the title, MIAME was not created overnight, but rather entailed a careful process that integrated viewpoints from multiple stakeholders to create a useful and accepted reporting framework that enhanced the reproducibility of results and the drawing of broader insights from integrated sources of data. To begin a similar journey, we propose that a greater focus on reporting interaction attributes and mechanisms using standardized formats will open up important opportunities for the microbiome field, and outline specific steps that can be taken to reach this potential.
A FAIR REPRESENTATION OF MICROBIAL INTERACTIONS
We specifically envision the adoption of data sharing and stewardship practices that would enable microbial interaction data to fulfill the principles of Findability, Accessibility, Interoperability, and Reusability (FAIR). These principles, first formally presented in 2016 to address growing challenges in scientific data management, serve to guide efforts that aim to improve access to reliable and reproducible scholarly data (
53) and have already been adopted as an important component of microbiome data management (
1,
6,
54–58). We therefore focus our present discussion on two concrete efforts that can be initiated in order to make data on intermicrobial interactions more FAIR, namely (i) the creation and/or adoption of open web infrastructures for cataloging and making data from disparate sources available to the community (hence Findable and Accessible), and (ii) the adoption of a minimal set of metadata requirements that are human- and machine-readable (i.e., Interoperable and Reusable).
1: An open web catalog for findable and accessible microbial interaction data.
It is difficult to imagine how to make progress on Findability and Accessibility without a centralized resource that is capable of capturing the wide breadth of interactions currently available only in individual publications or split into type-specific repositories (
Fig. 1B and
Table 1), and designed to be able to grow to accommodate newly generated data. Such a resource could be generated via multiple strategies, including the following:
A.
Integrating existing repositories of microbial interaction data into an established infrastructure. Several existing infrastructures could in principle serve this purpose. To illustrate the potential advantages and challenges of this strategy, we may consider as an example Global Biotic Interactions (GloBI) (
47), a metadatabase for sharing and analyzing species interaction data. It provides a searchable platform to identify specific interactions based on the organisms involved and the relationships they experience (e.g., X preys on, hosts, is symbiont of Y), and can therefore serve as a structure for integrating a wider breadth of microbial interaction data and their attributes. Nonetheless, as GloBI considers the species level as the most phylogenetically precise, it may not easily capture strain- or mutant-specific interactions common in microbial ecology research (
59). Integration into GloBI would also require amending existing metadata items not applicable to microbes, as well as the ontologies of interactions (apart from “interacts with,” “parasitizes,” “ecologically co-occurs with”) to match features of microbial interactions.
B.
Using existing database-building tools and available metadata. For example, one could consider using the recently-published tool mako (
35) to create a database by importing network files or deposited sequences to be analyzed. If these input files contain metadata on nodes, edges, or samples from which sequences were obtained, the metadata in question can be readily propagated into the database. However, mako is currently limited to undirected interactions and to creating a local interaction database, which complicates continuous online access and editing.
C.
Establishing a new database specific to microbial interactions. Such a database could be designed from the ground up to more flexibly store multiple types of microbial interactions (e.g., co-occurrences, causal interactions, and higher-order interactions), as well as their attributes. This approach could apply several modalities for importing and organizing data in a standardized or automated way, facilitating the incorporation of data from individual studies as they are published. Recent manually-compiled catalogs of interactions and their attributes (
14,
15) as well as of tools to convert them into searchable online resources (
cpauvert.github.io/mi-atlas) may serve as small-scale examples for planning such a larger-scale resource.
2: A minimal set of metadata requirements for interoperability and reusability.
While a centralized database would lay a foundation for FAIR microbial interaction data, its impact would remain limited if its contents cannot be easily updated by scientists and accessed by humans and machines. We therefore also advocate for the inclusion of metadata with reports of interactions as a way to promote interoperability and reusability. While convergence to specific guidelines will require significant community discussions and buy-in from stakeholders, we propose that the following four categories of metadata could serve as a starting point: microbial entities, interaction inference methods, interaction context, and attributes. These are described in detail in
Table 2 and are outlined as follows:
A.
Microbial entities. The species (and strain, if relevant) names of each of the microbes participating in an interaction should be provided, e.g., in a comma-separated list, along with their taxonomic accession numbers and eventually their sequence identifier (
Table 2A). These lists would also accommodate interactions that cannot be easily described via a pairwise representation. Interaction attributes and effects specific to each participant could be matched with each identifier.
B.
Interaction inference methods. Despite being challenging to standardize, documentation of the methods that were used to identify an interaction represent highly relevant metadata. As a first step, the evidence for the interaction in question can be broadly categorized using the Evidence and Conclusion Ontology (
60), which would indicate whether experimental or computational methods (or both) were used. We also propose a more specific metadata item for the type of computational or experimental method used (e.g., simulation, microscopy, cultivation, and sampling) (
Table 2B). Lastly, the relevant publication, code, detailed protocols, and other literature-based evidence should be accessible via persistent identifiers (e.g., DOIs).
C.
Interaction context. The environmental context of the interaction—such as the biome (e.g., host-associated, synthetic)—could be documented using the Environment Ontology (
61) or propagated from the samples used to infer the interaction. Cultivation conditions could also be integrated following the standards established by databases of bacterial isolates (
57) and extended to co-cultures. Relevant metadata are proposed in
Table 2C with an emphasis on linking values to existing resources such as the Gene Ontology for cellular components, the Chemical Entities of Biological Interest Ontology for compounds, and the Genome Standards Consortium (
51) for the oxygen status of the environment.
D.
Interaction attributes. Defining an interaction’s type (e.g., cooperation, antagonism, association, pairwise or higher-order, etc.) is also not trivial, but could be guided by incorporating existing ontologies such as the active list maintained by the OBO Foundry (
62). Several other frameworks exist to describe interaction types such as GloBI, Population and Community Ontology (
63), and Interaction Network Ontology (
64). It nonetheless remains to be seen whether these ontologies are appropriate for describing all known attributes of microbial interactions, or if a larger set defined by the community is needed. In the meantime, we propose the inclusion of the ecological effect experienced by each participant or by each set of participants (positive/negative/neutral) (
42), as well as of information on whether the association described is a co-occurrence by providing the associated metric strength. Lastly, we propose the inclusion of any known interaction dependencies (e.g., on spatial structure or physical contact) and any additional user-defined keywords that provide further relevant information not captured in the previous items (
Table 2D).
As an example of how these metadata can be compiled for different data types (
Fig. 1B), we have used them to describe three interactions gathered from the literature (
Table 3).
OUTLOOK
The practices, standards, and use cases we have outlined here are by no means exhaustive, but are rather meant to catalyze further discussion on ways to improve the access to and usability of data on microbial interactions and their attributes. We believe the time is opportune for such discussions to take place, not only due to the rapidly growing body of data on microbial interactions and their mechanisms, but also because of a growing momentum within the microbiome community to improve the reliability and reproducibility of research outputs. These are exemplified by government-funded initiatives such as the National Microbiome Data Collaborative (NMDC, USA (
56,
65) and the National Research Data Infrastructure (NFDI4Microbiota, Germany; (
https://nfdi4microbiota.de), which advocate for the adoption of reporting standards for microbiome data. As with existing accepted data reporting standards, any proposed global framework for describing microbial interactions must be shaped by its various stakeholders, including computational and empirical researchers, industry representatives, funding agencies, educational users, and publishers. Such involvement would enable any formalism to be flexible and broadly embraced, as opposed to a rigid standard with little endorsement or room for growth.
Bearing these considerations in mind, we suggest the following roadmap toward FAIR microbial interaction data. First, we call for increased discussions within the scientific community to select and prioritize the interaction features that are most useful to report. These can be carried out via dedicated workshops that, in addition to biologists, could include philosophers of biology interested in microorganisms, as well as physicists and mathematicians who can help define the qualitative and quantitative nature of intermicrobial interactions and their important attributes. This first community-driven effort could lead to the creation of a reporting standard, extending our suggestions in
Table 2 to a more mature “Minimal Information for Intermicrobial Interactions” definition similar to those for publishing microarray data (
50) and genome sequences (
51,
66,
67), or for assessing the quality of genome-scale metabolic models (
68). Second, these metadata suggestions could be further implemented as usable formats such as SBML (
49,
69) or BIOM (
70), which enable the standardized export and sharing of genome-scale models and count data, respectively. As such, data scientists and bioinformaticians could take part in hackathons to develop such a toolbox with standardized file formats, converter scripts, and validators to streamline the adoption of microbial interaction metadata. Third, we envision teams of investigators and students gathering for “annota-thons” to collaboratively extract knowledge from the microbial interaction literature and use the aforementioned toolboxes to compile the relevant metadata, ensuring that an open web catalog of microbial interactions truly relies on known published material. Last, the rise of an open community, willing to quickly share protocols and methods of scientific projects enabled by FAIR microbial interaction data resources, would provide further incentives for adoption of standard formats, creating a positive feedback loop that could accelerate benefits for the whole community and pave the way for major integrative and collaborative advances in microbiome research.
ACKNOWLEDGMENTS
We thank the editors and the reviewers for their positive comments and for their helpful recommendations. A.R.P. is funded by a James S. McDonnell Postdoctoral Fellowship. C.P. is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—460129525. D.K. was partially funded by the Kilachand Multicellular Design Program graduate fellowship. D.S. acknowledges funding from the U.S. Department of Energy, Office of Science, Office of Biological & Environmental Research through the Microbial Community Analysis and Functional Evaluation in Soils Science Focus Area Program (m-CAFEs) under contract number DE-AC02-05CH11231 to Lawrence Berkeley National Laboratory, from the Human Frontier Science Program (HFSP Research Grant RGP0060/2021), from the NSF Center for Chemical Currencies of a Microbial Planet (C-CoMP, publication #011), from the NIH National Cancer Institute (grant R21CA260382), and from the NIH National Institute on Aging, award number UH2AG064704. Figure elements created with Biorender.com.