Open access
Open Peer Review
Genetics and Molecular Biology
Methods and Protocols
5 January 2023

PPNet: Identifying Functional Association Networks by Phylogenetic Profiling of Prokaryotic Genomes

ABSTRACT

Identification of microbial functional association networks allows interpretation of biological phenomena and a greater understanding of the molecular basis of pathogenicity and also underpins the formulation of control measures. Here, we describe PPNet, a tool that uses genome information and analysis of phylogenetic profiles with binary similarity and distance measures to derive large-scale bacterial gene association networks of a single species. As an exemplar, we have derived a functional association network in the pig pathogen Streptococcus suis using 81 binary similarity and dissimilarity measures which demonstrates excellent performance based on the area under the receiver operating characteristic (AUROC), the area under the precision-recall (AUPR), and a derived overall scoring method. Selected network associations were validated experimentally by using bacterial two-hybrid experiments. We conclude that PPNet, a publicly available (https://github.com/liyangjie/PPNet), can be used to construct microbial association networks from easily acquired genome-scale data.
IMPORTANCE This study developed PPNet, the first tool that can be used to infer large-scale bacterial functional association networks of a single species. PPNet includes a method for assigning the uniqueness of a bacterial strain using the average nucleotide identity and the average nucleotide coverage. PPNet collected 81 binary similarity and distance measures for phylogenetic profiling and then evaluated and divided them into four groups. PPNet can effectively capture gene networks that are functionally related to phenotype from publicly prokaryotic genomes, as well as provide valuable results for downstream analysis and experiment testing.

INTRODUCTION

The identification of functional association networks, i.e., correlative genes encoding protein complexes or involving common biological processes, allows novel virulence gene associations and mechanisms of pathogenicity to be elucidated (1, 2). In addition, networks of functional association can be used to predict the function(s) of uncharacterized proteins (3). Although genomewide surveys of functional links remain experimentally challenging in many organisms, e.g., protein complex purification, double mutant phenotyping, and correlated gene expression, etc. (4), advances in modern experimental technologies using high-throughput biology, such as next-generation sequencing and microarrays, have made it possible to capture the complex interplay between molecules.
Gene coexpression networks (GCNs), namely, transcript-transcript association networks, are typically generated by high-throughput methods for differential coexpression analysis of gene expression data generated, for example, by microarray or transcriptome sequencing, and are usually represented as an undirected graph (5, 6). GCNs in bacteria are typically constructed from transcriptome data, whereby gene sets or modules that exhibit a similar expression behavior across various environmental conditions, such as the invasion of host cells and tissues, heat shock, anaerobic stress, or iron restriction (5, 79). However, there are some limitations in terms of expression-based network inference in bacteria. For example, GCNs are typically established under specific experimental conditions, and not all transcriptional regulatory networks will be functional. In addition, because of the high cost of library construction and sequencing, publicly available transcriptomic data for some bacterial species is limited or not available, especially for bacterial field isolates (10).
While transcriptome data for some bacterial species is nonexistent or of limited availability, there are >360,000 sequenced bacterial genomes currently accessible to date (https://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/), and these provide a convenient and cost-effective resource for constructing association networks based on the phylogenetic profiling method (3). Comparing the phylogenetic distributions is an effective way to predict the functional associations between nonhomologous genes, an approach first introduced by Pellegrini et al. (3). Typically, this method uses phylogenetic profiling between different species (1113) and is rarely used with the same species. An important reason is that the evolutionary distance between the same species is too close and prevents identification of functional associations being identified via core genes phylogenetic profile analysis. Similarly, if there are too many isolates of the same strain, this hampers the construction of functional association networks. Another reason is that many species lack sufficient genomic data for comparison. However, recent advances, such as the recognition of considerable intravariation in phenotypes within a single species, e.g., physiological-biochemical characteristics, pathogenicity, and antibiotic resistance, and the current and rapidly increasing availability of whole-genome sequence data, create an environment facilitating the identification of functional association networks through phylogenetic profiling.
Here, we present PPNet (https://github.com/liyangjie/PPNet), the first tool for deriving large-scale bacterial association networks of one species based on phylogenetic profiling. To demonstrate the utility of our approach, we used it to identify a virulence-related gene network in the zoonotic bacterial pathogen Streptococcus suis. PPNet demonstrated excellence performance based on the evaluation measures used—the AUROC (area under the receiver operating characteristic), the AUPR (area under the precision-recall), and the overall score—and specific networks were validated by bacterial two-hybrid analysis, demonstrating its utility. The results suggest that PPNet offers a general approach to constructing microbial association networks by drawing upon easily acquired genome-scale data.

RESULTS

PPNet overview.

PPNet is implemented as a Python script and can be easily installed on Linux, MacOS, and Windows Subsystem for Linux (WSL) platforms. An overview of the PPNet workflow is shown in Fig. 1; further details are provided in Materials and Methods below. Briefly, PPNet requires both genome sequence and knowledge of the phenotype (e.g., pathogenic or nonpathogenic) of all strains as the input data. The first step is to perform quality control for all genomes to reduce data redundancy and biased genomes. Next, genome annotation of the high-quality genome data set is performed, and predicted protein sequences for each genome are extracted for gene clustering. Then, for all orthologs, a preliminary phylogenetic profile is generated across all isolates. In addition, PPNet divides strains into two groups based on phenotypic information provided by the user and compares the distribution of each ortholog from different phenotypic groups; only the phylogenetic profile of orthologs with significantly different distributions is selected for network inference (Fig. 2a). Finally, PPNet calculates the association coefficients among the genes based on the similarity of their phylogenetic profiles. By default, PPNet will list the association coefficient between each pair of genes that is less than or equal to the first percentile, and the output list can be further visualized by Cytoscape (14).
FIG 1
FIG 1 Schematic representation of the PPNet workflow. Taking genome data and grouping information of strains as the input, each genome data goes through a set of filtration steps, including the removal of poor-quality genomes based on N50 and the removal of duplicate genomes based on the ANI and ANC, with the thresholds for each step set by the user. Next, the obtained high-quality genomes are automatically annotated, and a preliminary phylogenetic profile is constructed. The phylogenetic profile is represented by a binary matrix, where each row represents an ortholog, each column represents a strain, and the “1” or “0” in each row refers to the presence or absence of the ortholog in each strain, respectively. The preliminary phylogenetic profile is then filtered by using the Fisher exact test; only the phylogenetic profile of orthologs with significant differences in distribution across strain groups is retained. Finally, the association coefficients among the orthologs are calculated based on the similarity of their phylogenetic profiles. These results are saved as the output and can be imported to Cytoscape for visualization.
FIG 2
FIG 2 Differences between virulent and nonvirulent serotype. (a) Venn diagram showing the premise for using the virulent serotype collection (red) and nonvirulent serotype collection (blue) of S. suis genomes. (b) DAPC was used to evaluate the separation between the isolates of the virulent serotype collection (red) and the nonvirulent serotype collection (blue) of S. suis, using the presence/absence data for genes in the accessory genome. (c) Phylogenetic tree of 551 S. suis strains based on the binary presence or absence of accessory genes. The outer ring provides information about the virulent (red) and nonvirulent serotypes (blue), respectively. (d) Heatmap visualizing the distribution of VRDGs in the genomes of S. suis. The presence (dark blue areas) or absence (light blue areas) of 1,060 VRDGS is shown in the heatmap. Each row indicates a VRDG and was clustered by hierarchical clustering based on the VRDG distribution. Each column indicates a strain, which was divided into virulent group (red) and nonvirulent group (blue). A colored strip from red to yellow shown on left side of the heatmap correspond to –log10 (Padj) values from low to high. The Padj values are the adjusted P values computed by Fisher exact test under the null hypothesis that the presence or absence of this gene is unrelated to virulence and adjusted by false discovery rate.

S. suis virulence-related gene association network.

To demonstrate its usefulness, PPNet was used to infer the virulence-related gene association network of S. suis from publicly available data. A total of 1,288 published S. suis genome sequences, including 43 complete and 1245 draft genomes, were obtained from the National Center for Biotechnology Information (NCBI) FTP server (see Table S1 in the supplemental material). Based on the average nucleotide identity (ANI) and serotype, 34 genomes were identified as not being derived from S. suis and were removed from further analyses. Thus, 1,254 S. suis genomes were used as input data for PPNet.
After quality control and removal of redundant genome data by PPNet, 551 nonredundant and high-quality genomes were finally used for subsequent analysis. The preliminary phylogenetic profile created by PPNet contained 15,722 orthologs, with 1,141 and 14,581 genes assigned to the core (present in 99% of isolates) and accessory (variably present) genomes, respectively.
Capsular serotype and virulence of S. suis are known to be related (15). Accordingly, 323 S. suis strains of serotypes 1 to 5, serotypes 7 to 9, serotype 1/2, serotype 14, serotype 16, serotype 24, and serotype Chz were categorized here as the virulent group, while 228 strains of other serotypes, including serotype 6, serotypes 10 to 13, serotype 15, serotypes 17 to 19, serotype 21, serotype 23, serotype 25, serotypes 27 to 31, novel capsular polysaccharide loci (NCL), and nontypeable strains, were classified as the nonvirulent group (1630). In addition, to determine the genetic diversity present in the accessory genomes associated with virulent and nonvirulent serotypes, we performed discriminant analysis of principle components (DAPC), which showed a clearcut separation between virulent and nonvirulent serotypes in terms of accessory genomes (Fig. 2b), suggesting that molecular serotyping was feasible for classification of virulent strains. Further, a phylogenetic tree (Fig. 2c) was constructed according to the binary presence or absence of accessory genes (31). The red and blue columns in the figure represent virulent and nonvirulent serotype strains, respectively. A group separation was observed between virulent and nonvirulent serotypes with a few exceptions, suggesting that the virulent phenotype classification through molecular serotyping was associated with accessory genome.
To obtain more valuable phylogenetic profiles, PPNet identifies virulence-related differential genes (VRDGs) by comparing the distribution of genes from virulent and nonvirulent genomes (Fig. 2a). Each gene receives its own null hypothesis of no association to virulence, and a Fisher test is performed (see Materials and Methods). VRDGs are defined as gene families that are overrepresented in virulent genomes. A total of 1,060 VRDGs were identified, and phylogenetic profiles of VRDGs were used to infer an association network (see Table S2). Figure 2d shows that VRDGs were predominantly present in virulent genomes compared to nonvirulent genomes. Finally, PPNet generated a total of 81 virulence-related gene association networks based on 81 binary similarity and dissimilarity measures (32, 33) (see Table S3).

Performance of network association inference methods.

To evaluate these 81 networks of S. suis, the gene interaction networks of S. suis 05ZYH33 in the STRING (v11.0) was set as the gold standards for performance evaluation (34). We assessed the performance of the methods used for S. suis based on the AUROC, the AUPR (35), and the overall score, all of which have been used to summarize the performance of networks (36) (Fig. 3a; see also Table S3). The overall score and the performance of each network for all applied 81 binary similarity and dissimilarity measures are shown in Fig. 3a. Classification into the same cluster was made when the same AUROC, AUPR, and overall score were obtained from the different equations. Equation 60 (see Table S3) [SOCHIAI-II = ad/((a + b)(a + c)(b + c)(c + d))0.5], was found to give the highest value of overall score(see Fig. 3a).
FIG 3
FIG 3 Performance of network inference methods. (a) Assessment of network inference methods listed in Table S$3 in the supplemental material. Performance for the association networks of S. suis constructed by different binary similarity coefficients are indicated by the area receiver operating characteristic (AUROC) (blue), the area under the precision-recall (AUPR) (yellow), and the overall score (orange). The cluster included all the binary similarity coefficients sharing the same value of AUROC, AUPR, and the overall score. (b) A scatterplot depicts the minimum distance versus AUROC by 81 binary similarity and distance measures. According to the distribution, the final 43 dots obtained from 81 equations were divided into four groups (G1, G2, G3, and G4) through hierarchical clustering (see Fig. S1), represented by four different colored boxes. OCHIAI-II similarity (equation 60 [see panel a]) present in the first group had a relatively short minimum distance and the second-highest AUROC value. (c and d) AUROC (c) and PR (d) curves, as determined by OCHIAI-II similarity (equation 60 [see panel a]).
In order to assess the capability of each equation, a scatterplot of minimum distances of the ROC (receiver operating characteristic) curve to the theoretical optimum point and AUROCs corresponding to the 81 equations for constructing the gene interaction networks of S. suis was generated (Fig. 3b). Based on the scatterplot, the final 43 dots obtained from 81 equations were divided into four groups (G1, G2, G3, and G4) according to the result of hierarchical clustering (see Fig. S1). The well-performing equations (see Fig. 3a) with least minimum distances and the highest AUROC scores were obtained in G1, which consisted of equations 60, 51, 52, 53, 54, etc. (see Fig. 3b). The ROC and PR (precision-recall) curves generated using OCHIAI-II similarity are displayed in Fig. 3c and d.

Functional enrichment of S. suis virulence-related gene association networks.

To gain insights into S. suis virulence-related genes interactions, the association network was built at a cutoff of 3,215 edges with 753 genes by OCHIAI-II similarity, responding to an estimated precision of 50% based on the gold standard of all predicted and experimentally validated interactions from the STRING database (34, 36) (see Fig. S2). We found that the S. suis virulence gene-related network has a modular structure; to determine whether there was a functional association between genes within these modules, we analyzed the identified modules in S. suis SC19 strain for enrichment of Gene Ontology terms. Of 52 network modules, 21 were highly enriched in molecular function (Fig. 4). For example, 17 S. suis virulence-related genes were highly enriched in multiple molecular functions, which included protein-N (PI)/phosphohistidine-sugar phosphotransferase activity (B9H01_05885; B9H01_05880 and B9H01_05890), kinase activity (B9H01_05910; B9H01_05890 and B9H01_05900), lyase activity (B9H01_05850; B9H01_05870), and d-glucosamine phosphotransferase system (PTS) permease activity (B9H01_05885; B9H01_05880). The function of the other genes of this module included gluconate 5-dehydrogenase, M13 family metallopeptidase, muramidase-released protein, preprotein translocase subunit YajC, RpiB/LacA/LacB family sugar-phosphate isomerase, bifunctional 4-hydroxy-2-oxoglutarate aldolase/2-dehydro-3-deoxy-phosphogluconate aldolase, LacI family DNA-binding transcriptional regulator, DUF5590 domain-containing protein, and a putative protein of unknown function. The data indicate that PTS systems are closely involved in virulence with other enriched modules, e.g., kinase, lyase, and d-glucosamine PTS permease activities. The inferred associated networks also provided a list of functional predictions for S. suis uncharacterized genes for analyzing complex regulatory networks for further study.
FIG 4
FIG 4 VRDGs association network of S. suis SC19. The VRDGs association network of SC19 connects 1,508 interactions with 329 genes for S. suis, which was extracted from the whole network of Fig. S1. Gene Ontology term enrichment was performed for the network modules, and gray genes are those with no enrichment.

Experimental support for selected identified network associations.

To validate the network predicted from PPNet, we experimentally tested all of the two- and three-gene interaction modules identified in the virulence-related gene association networks constructed by OCHIAI-II similarity, as predicted for S. suis SC19, using bacterial two-hybrid analysis. We selected 17 pairs of predicted two-gene interactions and six groups of predicted three-gene interactions and tested each of them individually by bacterial two-hybrid analyses (see Fig. S3). Thirty-five pairs of interactions among 52 genes were verified (Fig. 5; see also Fig. S3).
FIG 5
FIG 5 Evaluation VRDGs association network by bacterial two-hybrid. Identification of the interactions of S. suis SC19 between the genes of two-gene interaction modules and the three-gene interaction modules fused to pKT25 and pUT18 by bacterial two-hybrid analyses was assessed in MacConkey-maltose indicator plates assay and by β-galactosidase activity assays in E. coli BTH101. BTH101 with pKT25 and pUT18 was the negative control. The inset displays the results of the MacConkey-maltose indicator plates assay. The “Mx” in the figure represents module x in Fig. S2 in the supplemental material. Positive colonies are red and negative colonies colorless. The bottom broken line indicates the cutoff value (ODc = 238.1481) for determining a positive of β-galactosidase activity, defined as three times the negative-control value. Error bars indicate the standard deviations (n = 3 biological replicates).
Predicted interactions were considered confirmed if they showed red clones on MacConkey-maltose indicator plates (37). A well-defined difference between positive and negative results was displayed for all of the predicted interactions. Of 35 pairs, 21 (60%) showed positive results (Fig. 5). A total of two novel targets among 10 estimated interactions, not predicted or identified in the STRING database, displayed positive results identified by bacterial two-hybrid analyses. Using β-galactosidase assays, we also quantified the extent of protein interactions (37). Consistent with the results of bacterial two-hybrid analyses, all positive interactions showed high β-galactosidase activity indicative of interaction (Fig. 5).
Overall, the results indicate that PPNet can be used to predict novel virulence gene association networks, is complementary to those predicted in the STRING database, and provides an important theoretical starting point for studying the pathogenic mechanisms or other biological pathways of pathogenic bacteria.

DISCUSSION

Here, we describe PPNet for the prediction of functional associations between nonhomologous genes, which can effectively capture gene networks that are functionally related to phenotype (e.g., pathogenic, antibiotic resistance, thermophilic, etc.), including operons, protein complexes, transcription factors and their target genes, etc. Several features distinguish PPNet from previous approaches (1113). First, it can utilize genome data from a single species as the input. Fewer genes were identified with lower levels of similarity among genes in multiple microbial organisms by phylogenetic profiling (38) compared to in one bacterial species. More abundant homologous genes with high similarity among multiple strains of one species found by phylogenetic analysis should allow the construction of meaningful association networks. Second, the functional associations network identified by PPNet are closely linked to the phenotype, allowing a better understanding of the mechanisms that underly phenotypic differences. Third, a total of 81 binary similarity and distance measures were packaged in PPNet for users to choose from, since the choice of an appropriate similarity or distance measures is necessary for dealing with multivariate data represented by binary feature vectors (39). In order to demonstrate the utility of our approach, we used PPNet to determine whether virulence-related gene association networks could be identified from the publicly available genomes of the zoonotic pathogen S. suis.
Many bacteria have fast mutation rates, which are common in nature or hosts. Here, we chose S. suis as an example because of the availability of many genomes and the pathogenicity of different isolates can be variable. Moreover, as a nonmodel organism, more than half of the genes of its pangenome are uncharacterized and its pathogenic mechanisms are not fully understood. In a previous study, S. suis was divided into three groups (nonclinical, systemic, and respiratory) based on clinical data to investigate its genetic basis of disease (40). However, clinical information is missing for many genomic data and, on the other hand, S. suis as an opportunistic pathogen cannot be identified as nonpathogenic or pathogenic even if it is isolated from clinical health or disease cases. Previous studies show that different serotypes of S. suis have different pathogenic potential; strains isolated from diseased pigs mainly belong to certain serotypes (1630). Hence, we determined here the serotypes of strains molecularly and then assigned them as virulent or nonvirulent based on the established relationship between serotype and virulence potential. DAPC showed a quite clear genetic difference in the accessory genomes between virulent and nonvirulent serotypes (Fig. 2b), suggesting our approach was valid.
Another program for determining genes associated with phenotypes is Kover, a k-mer-based software using machine learning algorithms that allows users to find some k-mers (sequences of k length) that are associated with phenotype (41). Kover recognizes k-mer presence/absence rather than gene presence/absence and is convenient for testing other types of representations for genomic variants, such as single nucleotide polymorphisms (SNPs) and unitig level. However, Kover is less user friendly since users need to further annotate through sequence alignment to identify the cognate gene. Our approach and Scoary (42) both use the Fisher exact test to compare the significance of the presence/absence of genes associated with the different phenotypes, although these approaches do not detect SNPs in the orthologs of accessory genes because they are classified according to sequence similarity. In addition, Scoary considers the effect of population structure on gene distribution. However, Scoary is considered stringent, resulting in too few predictions (43), and is not conducive to identifying subsequent network inference studies based on phylogenetic profiles.
A total of 35 of 1,060 VRDGs identified in this study coincided with those of the 71 confirmed and putative S. suis virulence factors summarized previously (44). Of these, 24 VRDGs were involved in the virulence-related gene association networks on the basis of the genome of the SC19 strain of S. suis (Fig. 4). For example, treR have been shown in vitro to be related to virulence characteristics (4547). It was also found that 32 genes were involved in interactions with treR in our association network (see Fig. S4). Three gene modules, including the ABC-type multidrug transporter gene ccmA (48) and the ABC-type amino acid transporter gene hisM (45) modules, were identified as positive by bacterial two-hybrid experiments in this study (see Fig. S4). In addition, the proposed virulence gene scrR was reported to be a repressor protein that is part of the sucrose operon (45). This gene was also found in our association networks, with three VRDGs being connected with it, suggesting that the VRDGs may be part of or allied to the sucrose operon (see Fig. S4). We also found another four genes associated with the srtF pilus gene (4951) (see Fig. S4), suggesting that these genes may be related to flagellum synthesis.
PPNet also sought to determine whether any strains were replicates. The genomic data of S. suis strains downloaded from NCBI FTP server potentially contains replicate strains because of the isolation methods used. For example, in one study, six colonies were selected per swab of the same pig in China, and three were selected in United Kingdom. Isolates with alpha-hemolytic activity and positive biochemically results were stored as S. suis and then sent for whole-genome sequencing (52), leading to a possibility that multiple colonies isolated from the same pig are the same strain. Also, swabs from multiple pigs could be from the same pig farm. Although thousands of S. suis strains have had their whole genomes sequenced, there is region and serotype bias. For example, a total of 379 isolates from Vietnam were isolated and sequenced from 2015 and 2016 (see Table S1), suggesting the presence of multiple replicated genomes, which has the potential to confound statistical or probability analysis (53). Therefore, we reasoned that it was prudent to carry out a preliminary screening for potential replicated genomes in the NCBI database.
The rationale for our approach is that under evolutionary pressure, functionally related genes encode proteins that form a complex involved in carrying out reactions in the same biochemical pathway tend to cooccur or to coabsence in the genomes of different strains within a species. The outcome of network inference therefore varies from binary data and can be highly complementary to expression-based network inference. Binary similarity distance measures play an important role in the processing of binary data (32). Tremendous efforts have been made to find the most meaningful binary similarity and distance measures, which have been proposed in various fields, including biology, ethnology, taxonomy, image retrieval, geology, and chemistry (32, 54). Here, 81 binary similarity and distance measures, including 76 similarity and dissimilarity measures used over the last century showing a meaningful performance in their respective fields (54), and five new binary similarity coefficients (33), were used. In order to evaluate the different similarity and measures, STRING, a recognized gold standard, was applied to collect, score, and integrate all publicly available sources of protein-protein interaction information (34). Dependent on the ROC curve analysis (36, 55), PR curves (36, 56), overall score (35, 57), and clustering measures (32, 58), we found that OCHIAI-II similarity (equation 60) was the best for determining potential association networks of S. suis (see Fig. 3a).
In conclusion, our study had developed a powerful tool PPNet with 81 binary similarity and dissimilarity measures for network inference and evaluated PPNet from several aspects by constructing a functional association network for S. suis, which exhibits excellent performance based on evaluation measures, including AUROC, AUPR, and overall score, and selected examples were validated by bacterial two-hybrid experimental analysis.
One potential disadvantage of our binary similarity and distance measure approach is that the gene association networks identified do not include core genes. However, it should be noted that the accessory genome, believed to be important in phenotypic variation and genome evolution (59), is logically much bigger than the core genome, as in this study, especially as more genomes are added (Fig. 2b) (60), suggesting that this potential disadvantage is mitigated. Further work remains to be done to choosing a more appropriate gene phylogenetic profile to build the functional association network and to determine the threshold for judging whether two genes are related.

MATERIALS AND METHODS

Genome selection.

To obtain a more reliable genomic data, PPNet first calculates the N50 values for all genomes; those with an N50 of <10,000 are considered as comprising poorly sequenced or assembled data and are excluded from subsequent analysis. In addition, PPNet distinguishes genomes from the same strain according to the average nucleotide identity (ANI) and average nucleotide coverage (ANC) for each pair of genomes, MUMmer is used to align the input sequences, and the ANI and ANC are calculated by PYANI (61, 62). PPNet needs to set a threshold for ANI and ANC, respectively; genomes with ANI > threshold (ANI) and |1 – ANC| < threshold (ANC) values will be considered redundant genomes. Then, among the replicated genomes, we chose genomes of the isolates with a maximum N50 as representative strains for further study. By default, PPNet will first test the numbers of nonredundant genomes identified by ANI at different thresholds and then select the inflection point as the threshold for ANI. The ANC threshold is then determined in the same way.

Construction and filtering of the phylogenetic profile.

PPNet uses Prokka (63) to automate the annotation all genomes and then extracts the GFF3 format annotation files from the output files as the input files for Roary (31). To construct the phylogenetic profile, we adopted the default setting of Roary, which splits paralogs from homologous groups into groups of true orthologs by using conserved gene neighborhood information.
In order to obtain a more valuable phylogenetic profile for network prediction, PPNet requires strain grouping information and creates a 2 × 2 contingency table, the levels being presence and absence for the trait and gene, respectively, with counts of the numbers of isolates in each cell. For each gene, we assume the null hypothesis, i.e., it is independent of virulence and uses the Fisher exact test to compute P values. Finally, P values were corrected by using the false discovery rate. Genes with an adjusted P value (Padj) of <0.05 are considered phenotype-related differential genes, and the phylogenetic profile of these differential genes was used for network inference.

Network inference.

To investigate the relationship between phenotype-related differential genes, PPNet constructs association networks by conversion of the distribution of these genes by binary classification. Specifically, four variables—a, b, c, and d—are defined as follows: a is the number of genes present in group 1 and group 2; b and c are the numbers of genes present in group 1 but not present in group 2 and vice versa, respectively; and d is the number of genes where the gene is absent in both group 1 and group 2. Subsequently, 81 binary similarity and distance measures are used to construct phenotype-related differential gene networks. A detailed description of all binary similarity and distance coefficients is given in Table S3.

Genome sequences of S. suis.

Draft or complete genome sequences of 1,288 S. suis were downloaded from the NCBI FTP server (April 2019; ftp://ftp.ncbi.nlm.nih.gov/genomes/). To facilitate subsequent analysis, the 1,288 S. suis strains were renamed as SS001 to SS1288. (Details of the 1,288 S. suis strains can be found in Table S1.) To determine whether all of the genomic data collected belonged to S. suis, we used the Python module PYANI (62) to calculate the ANI among 1,288 S. suis genomes (64). The ANI was obtained by using MUMmer (NUCmer) to align the input sequences (61). If the ANI between the strain and any serotype reference strain of S. suis (see Table S4) was >95%, the strain was designated to be a member of S. suis (64).

Molecular serotyping of S. suis.

We used genome sequences of all S. suis as the query in BLASTn searches against a nucleotide BLAST database of all serotype-specific genes in the CPS synthesis locus (17, 6568) for molecular serotyping. The serotype-specific gene analysis identified 33 classic serotypes, except for two pairs of serotypes: (i) serotypes 1 and 14 and (ii) serotypes 2 and 1/2, which have no antigenic differences genes between them (17) (see Table S5).

Grouping of S. suis into virulent and nonvirulent groups.

To find the virulence-related genes of S. suis, we divided the S. suis strains into virulent and nonvirulent groups according to epidemiological surveys based on their serotypes (1630). Specifically, strains of serotypes 1 to 5, serotypes 7 to 9, serotype 1/2, serotype 14, serotype 16, and serotypes 24 and Chz were considered highly virulent, while the remaining strains of other serotypes, including serotypes 10 to 13, serotype 15, serotypes 17 to 19, serotype 21, serotype 23, serotype 25, and serotypes 27 to 31, were classified as nonvirulent (1630). In addition, NCL and nontypeable strains were also classified as nonvirulent since they are mainly isolated from healthy carrier pigs (16).

DAPC.

Discriminant analysis of principle components (DAPC) was implemented in the R package adegenet (69, 70) to determine whether genetically related individuals were closely grouped. In this study, we used the presence or absence of accessory genes in 551 S suis genotyped isolates to determine the differences between the high and low virulence groups as classified by virulent/nonvirulent molecular serotypes, as described below. After identifying the optimal number of principal components (PCs) by cross-validation, we retained 100 PCs based on the preliminary data, which accounted for approximately 82.73% of the total genetic variability, and all discriminant functions were retained (40).
The phylogenetic tree was created using the S. suis accessory genome by FastTree to group isolates together based on the presence or absence of genes in their accessory genomes (71). This phylogenetic tree was visualized with iTOL and rerooted using the midpoint rooting method (72).

Network performance.

In order to evaluate the performance of the binary similarity and distance measures, we used STRING (https://string-db.org) as the gold standard. A total of 81 similarity coefficients were assessed by using the area under the receiver operator curve (AUROC), the area under the precision versus recall curve (AUPR), and an overall score that summarizes the performance across the 81 networks (35, 36). The overall score is defined as the mean specific P value (log transformed) of the network, which was used in the previous DREAM challenge (35, 57).
overall score=log10PROClog10PPR2
The negative-control group was generated randomly by setting the expected probability of interaction as 0.5 using the Python random package (36). The minimum distance of the ROC curve to the theoretical optimum point and AUROC were used to evaluate the performance of the groups of equations (32), and hierarchical clustering was performed by using the hclust function in R stat package.
To visualize the VRDG association networks of S. suis based on the binary similarity coefficient with the best overall score, we constructed high-confidence networks at an estimated precision of 50%. The network modules were annotated with Gene Ontology term by eggNOG-mapper (73, 74) and enriched by ClusterProfiler (75).

Experimental materials and design.

A total 14 pairs of predicted two-gene interactions and eight groups of predicted three-gene interactions and a pair of positive controls were tested (see Fig. S3). Among these interactions, six pairs of novel interactions between two genes modules and ten pairs of novel interactions among three gene modules, which were not predicted in the STRING database, were selected for experimental validation.
Strains and plasmids used in this study are listed in Table S6 in the supplemental material. S. suis strains were grown in tryptic soy broth supplemented with 10% bovine serum at 37°C under vigorous agitation. Escherichia coli BTH101strain was grown aerobically in lysogeny broth at 37°C. Bacterial two-hybrid analyses, including a MacConkey-maltose indicator plate assay and β-galactosidase assays, were performed as described in the manual of the bacterial adenylate cyclase two-hybrid system kit (Euromedex).
Each interaction pair was scored on MacConkey-maltose indicator plate assays on a minimum of three individual occasions, and β-galactosidase assays were performed at least three times.

Data availability.

Reannotated genome data in GFF format of 551 nonredundant S. suis strains for construction of the pangenome have been be deposited at Cyverse (https://de.cyverse.org/dl/d/8354EE88-08A6-46D5-9241-FC5BB91E349C/gff_file.zip).

ACKNOWLEDGMENTS

We thank Qi Huang (College of Animal Medicine, Huazhong Agricultural University, Wuhan, China) for providing strains and plasmids used in the bacterial two-hybrid analyses.
This study was supported by grants from the National Key Research and Development Program of China (2021YFD1800400), the Natural Science Foundation of Hubei Province (2021CFA016), the Hubei Province Natural Science Foundation for Distinguished Young Scholars (2020CFA060), the Applied Basic Research Project of Wuhan (grant 2020020601012254), and the UK Biotechnology and Biological Sciences Research Council (BB/S019901/1).

Supplemental Material

File (spectrum.03871-22-s0001.pdf)
File (spectrum.03871-22-s0002.xlsx)
ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

1.
Guala D, Ogris C, Müller N, Sonnhammer ELL. 2020. Genome-wide functional association networks: background, data, and state-of-the-art resources. Brief Bioinform 21:1224–1237.
2.
De Smet R, Marchal K. 2010. Advantages and limitations of current network inference methods. Nat Rev Microbiol 8:717–729.
3.
Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. 1999. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA 96:4285–4288.
4.
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, et al. 2013. A large-scale evaluation of computational protein function prediction. Nat Methods 10:221–227.
5.
Weirauch MT. 2011. Gene coexpression networks for the analysis of DNA microarray data, p 215–250. In Applied statistics for network biology. Wiley-VCH Verlag GmbH, Weinheim, Germany.
6.
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. 2018. Gene coexpression analysis for functional classification and gene-disease predictions. Brief Bioinform 19:575–592.
7.
Ibraim IC, Parise MTD, Parise D, Sfeir MZT, De Paula Castro TL, Wattam AR, Ghosh P, Barh D, Souza EM, Góes-Neto A, Gomide ACP, Azevedo V. 2019. Transcriptome profile of Corynebacterium pseudotuberculosis in response to iron limitation. BMC Genomics 20:1–24.
8.
Vergara-Irigaray M, Fookes MC, Thomson NR, Tang CM. 2014. RNA-seq analysis of the influence of anaerobiosis and FNR on Shigella flexneri. BMC Genomics 15:438–422.
9.
Rastrojo A, Corvo L, Lombraña R, Solana JC, Aguado B, Requena JM. 2019. Analysis by RNA-seq of transcriptomic changes elicited by heat shock in Leishmania major. Sci Rep 9:1–18.
10.
Wangsanuwat C, Heom KA, Liu E, O’Malley MA, Dey SS. 2020. Efficient and cost-effective bacterial mRNA sequencing from low input samples through ribosomal RNA depletion. BMC Genomics 21:717.
11.
Cheng Y, Perocchi F. 2015. ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling. Nucleic Acids Res 43:W160–W168.
12.
Franceschini A, Lin J, von Mering C, Jensen LJ. 2016. SVD-phy: improved prediction of protein functional associations through singular value decomposition of phylogenetic profiles. Bioinformatics 32:1085–1087.
13.
Date SV, Marcotte EM. 2003. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat Biotechnol 21:1055–1062.
14.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498–2504.
15.
Gottschalk M, Segura M. 2000. The pathogenesis of the meningitis caused by Streptococcus suis: the unresolved questions. Vet Microbiol 76:259–272.
16.
Segura M, Fittipaldi N, Calzas C, Gottschalk M. 2017. Critical Streptococcus suis virulence factors: are they all really critical? Trends Microbiol 25:585–599.
17.
Liu Z, Zheng H, Gottschalk M, Bai X, Lan R, Ji S, Liu H, Xu J. 2013. Development of multiplex PCR assays for the identification of the 33 serotypes of Streptococcus suis. PLoS One 8:e72070.
18.
Kataoka Y, Sugimoto C, Nakazawa M, Morozumi T, Kashiwazaki M. 1993. The epidemiological studies of Streptococcus suis infections in Japan from 1987 to 1991. J Vet Med Sci 55:623–626.
19.
Wei Z, Li R, Zhang A, He H, Hua Y, Xia J, Cai X, Chen H, Jin M. 2009. Characterization of Streptococcus suis isolates from the diseased pigs in China between 2003 and 2007. Vet Microbiol 137:196–201.
20.
Kim D, Han K, Oh Y, Kim CH, Kang I, Lee J, Gottschalk M, Chae C. 2010. Distribution of capsular serotypes and virulence markers of Streptococcus suis isolated from pigs with polyserositis in Korea. Can J Vet Res 74:314–316.
21.
Aarestrup FM, Jorsal SE, Jensen NE. 1998. Serological characterization and antimicrobial susceptibility of Streptococcus suis isolates from diagnostic samples in Denmark during 1995 and 1996. Vet Microbiol 60:59–66.
22.
Gottschalk M, Lacouture S, Bonifait L, Roy D, Fittipaldi N, Grenier D. 2013. Characterization of Streptococcus suis isolates recovered between 2008 and 2011 from diseased pigs in Québec, Canada. Vet Microbiol 162:819–825.
23.
Wisselink HJ, Smith HE, Stockhofe-Zurwieden N, Peperkamp K, Vecht U. 2000. Distribution of capsular types and production of muramidase-released protein (MRP) and extracellular factor (EF) of Streptococcus suis strains isolated from diseased pigs in seven European countries. Vet Microbiol 74:237–248.
24.
Messier S, Lacouture S, Gottschalk M. 2008. Ditribution of Streptococcus suis capsular types from 2001 to 2007. Can Vet J 49:461–462.
25.
Gottschalk M, Segura M, Xu J. 2007. Streptococcus suis infections in humans: the Chinese experience and the situation in North America. Anim Health Res Rev 8:29–45.
26.
Kerdsin A, Dejsirilert S, Sawanpanyalert P, Boonnark A, Noithachang W, Sriyakum D, Simkum S, Chokngam S, Gottschalk M, Akeda Y, Oishi K. 2011. Sepsis and spontaneous bacterial peritonitis in Thailand. Lancet 378:960.
27.
Gottschalk M, Xu J, Calzas C, Segura M. 2010. Streptococcus suis: a new emerging or an old neglected zoonotic pathogen? Future Microbiol 5:371–391.
28.
Kerdsin A, Oishi K, Sripakdee S, Boonkerd N, Polwichai P, Nakamura S, Uchida R, Sawanpanyalert P, Dejsirilert S. 2009. Clonal dissemination of human isolates of Streptococcus suis serotype 14 in Thailand. J Med Microbiol 58:1508–1513.
29.
Nghia HDT, Ngo TH, Le DL, Campbell J, To SD, Chau NVV, Mai NTH, Tran TH, Spratt B, Farrar J, Schultsz C. 2008. Human case of Streptococcus suis serotype 16 infection. Emerg Infect Dis 14:155–157.
30.
Vilaichone R-K, Vilaichone W, Nunthapisud P, Wilde H. 2002. Streptococcus suis infection in Thailand. J Med Assoc Thai 85(Suppl 1):S109–S117.
31.
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, Fookes M, Falush D, Keane JA, Parkhill J. 2015. Roary: rapid large-scale prokaryote pangenome analysis. Bioinformatics 31:3691–3693.
32.
Wijaya SH, Afendi FM, Batubara I, Darusman LK, Altaf-Ul-Amin M, Kanaya S. 2016. Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines. BMC Bioinformatics 17:1–19.
33.
Consonni V, Todeschini R. 2012. New similarity coefficients for binary data. Match 68:581–592.
34.
Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P, Jensen LJ, Von Mering C. 2019. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47:D607–D613.
35.
Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, Clarke ND, Altan-Bonnet G, Stolovitzky G. 2010. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One 5:e9202.
36.
Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Aderhold A, Stolovitzky G, Bonneau R, Chen Y, Cordero F, Crane M, Dondelinger F, Drton M, Esposito R, Foygel R, De La Fuente A, Gertheiss J, Geurts P, Greenfield A, et al. 2012. Wisdom of crowds for robust gene network inference. Nat Methods 9:796–804.
37.
Cao Z, Casabona MG, Kneuper H, Chalmers JD, Palmer T. 2017. The type VII secretion system of Staphylococcus aureus secretes a nuclease toxin that targets competitor bacteria. Nat Microbiol 2:1–11.
38.
Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D. 2004. Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 5:R35.
39.
Kosman E, Leonard KJ. 2005. Similarity coefficients for molecular markers in studies of genetic relationships between individuals for haploid, diploid, and polyploid species. Mol Ecol 14:415–424.
40.
Weinert LA, Chaudhuri RR, Wang J, Peters SE, Corander J, Jombart T, Baig A, Howell KJ, Vehkala M, Välimäki N, Harris D, Chieu TTB, Van Vinh Chau N, Campbell J, Schultsz C, Parkhill J, Bentley SD, Langford PR, Rycroft AN, Wren BW, Farrar J, Baker S, Hoa NT, Holden MTG, Tucker AW, Maskell DJ, BRaDP1T Consortium. 2015. Genomic signatures of human and animal disease in the zoonotic pathogen Streptococcus suis. Nat Commun 6:6740.
41.
Drouin A, Letarte G, Raymond F, Marchand M, Corbeil J, Laviolette F. 2019. Interpretable genotype-to-phenotype classifiers with performance guarantees. Sci Rep 9:1–13.
42.
Brynildsrud O, Bohlin J, Scheffer L, Eldholm V. 2016. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17:1–9.
43.
Levy A, Salas Gonzalez I, Mittelviefhaus M, Clingenpeel S, Herrera Paredes S, Miao J, Wang K, Devescovi G, Stillman K, Monteiro F, Rangel Alvarez B, Lundberg DS, Lu TY, Lebeis S, Jin Z, McDonald M, Klein AP, Feltcher ME, Rio TG, Grant SR, Doty SL, Ley RE, Zhao B, Venturi V, Pelletier DA, Vorholt JA, Tringe SG, Woyke T, Dangl JL. 2017. Genomic features of bacterial adaptation to plants. Nat Genet 50:138–150.
44.
Fittipaldi N, Segura M, Grenier D, Gottschalk M. 2012. Virulence factors involved in the pathogenesis of the infection caused by the swine pathogen and zoonotic agent Streptococcus suis. Future Microbiol 7:259–279.
45.
Wilson TL, Jeffers J, Rapp-Gabrielson VJ, Martin S, Klein LK, Lowery DE, Fuller TE. 2007. A novel signature-tagged mutagenesis system for Streptococcus suis serotype 2. Vet Microbiol 122:135–145.
46.
Wu T, Chang H, Tan C, Bei W, Chen H. 2009. The orphan response regulator RevSC21 controls the attachment of Streptococcus suis serotype-2 to human laryngeal epithelial cells and the expression of virulence genes. FEMS Microbiol Lett 292:170–181.
47.
de Greeff A, Buys H, van Alphen L, Smith HE. 2002. Response regulator important in pathogenesis of Streptococcus suis serotype 2. Microb Pathog 33:185–192.
48.
Vanier G, Fittipaldi N, Slater JD, Domínguez-Punaro MDLC, Rycroft AN, Segura M, Maskell DJ, Gottschalk M. 2009. New putative virulence factors of Streptococcus suis involved in invasion of porcine brain microvascular endothelial cells. Microb Pathog 46:13–20.
49.
Fittipaldi N, Gottschalk M, Vanier G, Daigle F, Harel J. 2007. Use of selective capture of transcribed sequences to identify genes preferentially expressed by Streptococcus suis upon interaction with porcine brain microvascular endothelial cells. Appl Environ Microbiol 73:4359–4364.
50.
Takamatsu D, Nishino H, Ishiji T, Ishii J, Osaki M, Fittipaldi N, Gottschalk M, Tharavichitkul P, Takai S, Sekizaki T. 2009. Genetic organization and preferential distribution of putative pilus gene clusters in Streptococcus suis. Vet Microbiol 138:132–139.
51.
Fittipaldi N, Takamatsu D, Domínguez-Punaro MDLC, Lecours MP, Montpetit D, Osaki M, Sekizaki T, Gottschalk M. 2010. Mutations in the gene encoding the ancillary pilin subunit of the Streptococcus suis srtF cluster result in pili formed by the major subunit only. PLoS One 5:e8426.
52.
Zou G, Zhou J, Xiao R, Zhang L, Cheng Y, Jin H, Li L, Zhang L, Wu B, Qian P, Li S, Ren L, Wang J, Oshota O, Hernandez-Garcia J, Wileman TM, Bentley S, Weinert L, Maskell DJ, Tucker AW, Zhou R. 2018. Effects of environmental and management-associated factors on prevalence and diversity of Streptococcus suis in clinically healthy pig herds in China and the United Kingdom. Appl Environ Microbiol 84:e02590-17.
53.
Chen Q, Zobel J, Verspoor K. 2017. Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study. Database 2017:baw163.
54.
Choi S, Choi S, Cha S. 2010. A survey of binary similarity and distance measures. J Syst Cybern Informatics 2010:43–48.
55.
Metz CE. 1978. Basic principles of ROC analysis. Semin Nucl Med 8:283–298.
56.
Manning CD, Schütze H, Weikurn G. 1999. Foundations of statistical natural language processing. MIT Press, Cambridge, MA.
57.
Marbach D, Prill RJ, Schaffter T, Mattiussi C, Floreano D, Stolovitzky G. 2010. Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA 107:6286–6291.
58.
Mejia IR, Batyrshin I. 2018. Towards a classification of binary similarity measures, p 325–335. In Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Springer Verlag, New York, NY.
59.
Yao W, Li G, Zhao H, Wang G, Lian X, Xie W. 2015. Exploring the rice dispensable genome using a metagenome-like assembly strategy. Genome Biol 16:187.
60.
Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen TM, Mora M, Scarselli M, Margarit y Ros I, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O’Connor KJB, Smith S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM. 2005. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome. Proc Natl Acad Sci USA 102:13950–13955.
61.
Kurtz S, Shumway M, Antonescu C, Salzberg SL, Phillippy A, Smoot M, Delcher AL, Delcher AL. 2004. Versatile and open software for comparing large genomes. Genome Biol 5:R12.
62.
Pritchard L, Glover RH, Humphris S, Elphinstone JG, Toth IK. 2016. Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens. Anal Methods.
63.
Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069.
64.
Richter M, Rosselló-Móra R. 2009. Shifting the genomic gold standard for the prokaryotic species definition. Proc Natl Acad Sci USA 106:19126–19131.
65.
Pan Z, Ma J, Dong W, Song W, Wang K, Lu C, Yao H. 2015. Novel variant serotype of Streptococcus suis isolated from piglets with meningitis. Appl Environ Microbiol 81:976–985.
66.
Qiu X, Bai X, Lan R, Zheng H, Xu J. 2016. Novel capsular polysaccharide loci and new diagnostic tools for high-throughput capsular gene typing in Streptococcus suis. Appl Environ Microbiol 82:7102–7112.
67.
Zheng H, Ji S, Liu Z, Lan R, Huang Y, Bai X, Gottschalk M, Xu J. 2015. Eight novel capsular polysaccharide synthesis gene loci identified in nontypeable Streptococcus suis isolates. Appl Environ Microbiol 81:4111–4119.
68.
Zheng H, Qiu X, Roy D, Segura M, Du P, Xu J, Gottschalk M. 2017. Genotyping and investigating capsular polysaccharide synthesis gene loci of non-serotypeable Streptococcus suis isolated from diseased pigs in Canada. Vet Res 48:10–10.
69.
Jombart T. 2008. adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403–1405.
70.
Jombart T, Ahmed I. 2011. adegenet 1.3–1: new tools for the analysis of genome-wide SNP data. Bioinformatics 27:3070–3071.
71.
Price MN, Dehal PS, Arkin AP. 2010. FastTree 2: approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490.
72.
Letunic I, Bork P. 2019. Interactive Tree of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 47:W256–W259.
73.
Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C, Bork P. 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 34:2115–2122.
74.
Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ, Von Mering C, Bork P. 2019. EggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314.
75.
Yu G, Wang LG, Han Y, He QY. 2012. ClusterProfiler: an R package for comparing biological themes among gene clusters. Omi A. OMICS 16:284–287.

Information & Contributors

Information

Published In

cover image Microbiology Spectrum
Microbiology Spectrum
Volume 11Number 114 February 2023
eLocator: e03871-22
Editor: Sébastien P. Faucher, McGill University
PubMed: 36602356

History

Received: 21 September 2022
Accepted: 1 December 2022
Published online: 5 January 2023

Peer Review History

Download review history as PDF.

Keywords

  1. functional association network inference
  2. phylogenetic profiling
  3. prokaryotic genome
  4. Streptococcus suis
  5. dereplication

Contributors

Authors

Yangjie Li
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
College of Informatics, Huazhong Agricultural University, Wuhan, China
Bin Ma
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
Kexin Hua
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
Huimin Gong
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
Rongrong He
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
Dingren Bi
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
Section of Paediatric Infectious Disease, Imperial College London, St Mary’s Campus, London, United Kingdom
State Key Laboratory of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China
College of Animal Medicine, Huazhong Agricultural University, Wuhan, China
Hubei Provincial Key Laboratory of Preventive Veterinary Medicine, Huazhong Agricultural University, Wuhan, China

Editor

Sébastien P. Faucher
Editor
McGill University

Reviewer

ad hoc peer reviewer
University of Maryland, Baltimore

Notes

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note:

  • For recently published articles, the TOTAL download count will appear as zero until a new month starts.
  • There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication.
  • Citation counts come from the Crossref Cited by service.

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.

View Options

Figures and Media

Figures

Media

Tables

Share

Share

Share the article link

Share with email

Email a colleague

Share on social media

American Society for Microbiology ("ASM") is committed to maintaining your confidence and trust with respect to the information we collect from you on websites owned and operated by ASM ("ASM Web Sites") and other sources. This Privacy Policy sets forth the information we collect about you, how we use this information and the choices you have about how we use such information.
FIND OUT MORE about the privacy policy