Open access
Environmental Microbiology
Research Article
19 October 2022

Comparative Genomic Insights into the Evolution of Halobacteria-Associated “Candidatus Nanohaloarchaeota”


Members of the phylum “Candidatus Nanohaloarchaeota,” a representative lineage within the DPANN superphylum, are characterized by their nanosized cells and symbiotic lifestyle with Halobacteria. However, the development of the symbiosis remains unclear. Here, we propose two novel families, “Candidatus Nanoanaerosalinaceae” and “Candidatus Nanohalalkaliarchaeaceae” in “Ca. Nanohaloarchaeota,” represented by five dereplicated metagenome-assembled genomes obtained from hypersaline sediments or related enrichment cultures of soda-saline lakes. Phylogenetic analyses reveal that the two novel families are placed at the root of the family “Candidatus Nanosalinaceae,” including the cultivated taxa. The two novel families prefer hypersaline sediments, and the acid shift of predicted proteomes indicates a “salt-in” strategy for hypersaline adaptation. They contain a lower proportion of putative horizontal gene transfers from Halobacteria than “Ca. Nanosalinaceae,” suggesting a weaker association with Halobacteria. Functional prediction and historical events reconstruction disclose that they exhibit divergent potentials in carbohydrate and organic acid metabolism and environmental responses. Globally, comparative genomic analyses based on the new families enrich the taxonomic and functional diversity of “Ca. Nanohaloarchaeota” and provide insights into the evolutionary process of “Ca. Nanohaloarchaeota” and their symbiotic relationship with Halobacteria.
IMPORTANCE The DPANN superphylum is a group of archaea widely distributed in various habitats. They generally have small cells and have a symbiotic lifestyle with other archaea. The archaeal symbiotic interaction is vital to understanding microbial communities. However, the formation and evolution of the symbiosis between the DPANN lineages and other diverse archaea remain unclear. Based on phylogeny, habitat distribution, hypersaline adaptation, host prediction, functional potentials, and historical events of “Ca. Nanohaloarchaeota,” a representative phylum within the DPANN superphylum, we report two novel families representing intermediate stages, and we infer the evolutionary process of “Ca. Nanohaloarchaeota” and their Halobacteria-associated symbiosis. Altogether, this research helps in understanding the evolution of symbiosis in “Ca. Nanohaloarchaeota” and provides a model for the evolution of other DPANN lineages.


The DPANN superphylum (an acronym of five candidate phylum names, “Candidatus Diapherotrites,” “Candidatus Parvarchaeota,” “Candidatus Aenigmarchaeota,” “Candidatus Nanoarchaeota,” and “Candidatus Nanohaloarchaeota”) is a group of archaea with nanosized cells and small genomes (13). Despite different classifications in the Genome Taxonomy Database (GTDB) and NCBI taxonomy database (4), more and more lineages are classified in the DPANN superphylum, including “Candidatus Micrarchaeota” (5), “Candidatus Woesearchaeota” (6), “Candidatus Pacearchaeota” (6), “Candidatus Huberarchaeota” (7), “Candidatus Mamarchaeota” (8), and “Candidatus Undinarchaeota” (9). The DPANN group formed at the early stage of archaeal evolution (913). A general symbiotic lifestyle is proposed from the reduced metabolic potentials of most members (3, 5, 6, 14). The symbiosis was demonstrated based on the cocultures of DPANN lineages and their hosts (2, 1519). Many lineages, like “Ca. Nanoarchaeota,” “Ca. Huberarchaeota,” “Ca. Nanohaloarchaeota,” and “Ca. Aenigmarchaeota,” were predicted to exchange genes with their respective hosts via horizontal gene transfer (HGT) (9, 1922). However, it remains unknown how the DPANN lineages form symbioses with diverse taxa.
Ca. Nanohaloarchaeota” is one of the first five phyla in the DPANN group (1). This phylum is widely distributed in (hyper)saline habitats (3, 2327) by harnessing the energetically favorable “salt-in” strategy (23, 25) like their host Halobacteria (28). “Ca. Nanohaloarchaeota” cells were revealed to have a tight symbiotic relationship with the class Halobacteria, as demonstrated by their cocultures (17, 18). In some “Ca. Nanohaloarchaeota” genomes, long “SPEARE” proteins containing serine protease, adhesion, and restriction endonuclease domains were supposed to function in attachment and invasion of hosts (17). Remarkably, the nanohaloarchaeon “Candidatus Nanohalobium constans” LC1Nh exhibits mutualistic symbiosis with the host under conditions with glycogen or starch as a carbon source (18). “Ca. Nanohaloarchaeota” shares 21% of sisterhood relationships with Halobacteria (9), and similar HGT events have also been reported (22). However, because of the protein adaptation to high salinity in the cytoplasm, some of the close relationships may be the result of compositional biases from convergent evolution (9, 23, 25). Based on the cocultivation and genomic prediction, “Ca. Nanohaloarchaeota” were considered aerotolerant anaerobes with a lifestyle of sugar fermentation, while the hosts generally perform aerobic respiration (3, 18, 23, 24). In addition, all these reports on “Ca. Nanohaloarchaeota” were focused on one family, i.e., “Candidatus Nanosalinaceae” (see Results and Discussion).
In this research, we report five dereplicated metagenome-assembled genomes (MAGs) of two novel families named “Candidatus Nanoanaerosalinaceae” and “Candidatus Nanohalalkaliarchaeaceae” in “Ca. Nanohaloarchaeota.” They were obtained from hypersaline sediments or enrichment cultures from soda-saline lakes. Furthermore, we performed comparative genomic analyses on habitat distribution, amino acid composition, and functional gene prediction. These results provide insights into the evolution of “Ca. Nanohaloarchaeota” symbiosis with Halobacteria.


Acquisition of the genomes of two novel families within the phylum “Ca. Nanohaloarchaeota.”

We performed metagenomic analyses on enrichment cultures of five deep sediments of a soda-saline lake in Inner Mongolia, China (described in Materials and Methods), and reanalyzed our previously reported metagenomes of brine and sediment samples from the same natural environments (24, 29). In this research, we obtained 10 MAGs affiliated with “Ca. Nanohaloarchaeota” (see Table S1 in the supplemental material). In the description below, we mainly follow the GTDB taxonomy unless otherwise specified. The taxonomic annotation based on the GTDB database (release 202) reveals that all 10 genomes as well as 16 “Ca. Nanohaloarchaeota” genomes deposited in public databases belong to the order “Candidatus Nanosalinales” (Table S1), of which 19 (five are obtained in this study) are affiliated with the family “Ca. Nanosalinaceae,” and the other 7 are not classified (including the two novel families described below).
In the phylogenetic tree based on the 122 single-copy conserved proteins in the GTDB, the phylum “Ca. Nanohaloarchaeota” members are closely related to EX4484-52, “Ca. Aenigmatarchaeota,” PWEA01, and QMZS01 (see Fig. S1 in the supplemental material), and the five phyla (“Ca. Nanohaloarchaeota,” “Ca. Aenigmatarchaeota,” and three unidentified phyla [NATU]) are located in the DPANN group (see Fig. S6 at The phylum “Ca. Nanohaloarchaeota” is classified into four family-level lineages with bootstrap supports of more than 75%, including “Ca. Nanosalinaceae,” “Ca. Nanoanaerosalinaceae,” “Ca. Nanohalalkaliarchaeaceae,” and AB_1215_Bin_137 (Fig. 1a). The five representative MAGs of novel families are shown in Table 1. In fact, AB_1215_Bin_137 is a MAG obtained from Guaymas Basin (Gulf of California) sediment samples (30). It is classified as a member of the order “Ca. Nanosalinales” by GTDB-Tk (Table S1), but it has a relatively long distance from the other three families (Fig. 1a), and its 16S rRNA gene shares similarities of less than 82.0% (order threshold) (31) with them (Fig. S2a). Therefore, it may represent a novel order. The families “Ca. Nanoanaerosalinaceae” and “Ca. Nanohalalkaliarchaeaceae” are placed at the root of “Ca. Nanosalinaceae” (Fig. 1a). The average amino acid identity (AAI) values of the predicted proteome between the genomes of the families “Ca. Nanoanaerosalinaceae” and “Ca. Nanosalinaceae” are generally less than 45.0%, the family boundary according to the previous literature (32); those among genomes of the same family are generally greater than 45.0% (Fig. 1b). The 16S rRNA gene identity analysis backs the classification of “Ca. Nanoanaerosalinaceae” (Fig. S2a) according to the taxonomic thresholds of sequence identities for order and family (82.0% and 86.5%, respectively) (31). Similarly, the MAG NHA21 of the family “Ca. Nanohalalkaliarchaeaceae” has AAI with “Ca. Nanosalinaceae” below 45% but approximately 45% AAI with “Ca. Nanoanaerosalinaceae” (Fig. 1b). Although no 16S rRNA gene is annotated in NHA21 for further evidence, considering the significantly different donor profile of HGT and average isoelectric point (ApI) from “Ca. Nanoanaerosalinaceae” (described in “Symbiotic host prediction of the two novel families” and “Acidic proteomes of the two novel families for hypersaline adaptation”), we recommend the novel family “Ca. Nanohalalkaliarchaeaceae” represented by NHA21. Phylogenetic analyses based on ribosome proteins and the 16S rRNA gene are in good line with the taxonomy of “Ca. Nanohaloarchaeota” (Fig. S7 to S10 []). In conclusion, we propose two new families which form a branch in the phylogenetic trees and are located at the root of the “Ca. Nanosalinaceae.” Of all 26 “Ca. Nanohaloarchaeota” genomes (10 obtained in this study), six shared more than 95% AAI and average nucleotide identity (ANI) for species boundaries (32), and they were removed in further research for lower completeness or higher contamination (Fig. 1; Fig. S2b). We observed that all 26 “Ca. Nanohaloarchaeota” genomes had low completeness values of 70.40 to 87.31% (Table S1), but a representative member with a higher completeness for each species could be reasonably selected.
FIG 1 Phylogeny of the phylum “Ca. Nanohaloarchaeota.” (a) Phylogenomic tree based on the 122 single-copy ubiquitous proteins in GTDB. It was obtained by pruning the tree in Fig. S1. Briefly, the best-fit model of LG+F+G4 was chosen, and a consensus tree based on ultrafast bootstrap approximation of 1,000 times is presented. (b) Average AAI matrix for the genomes of “Ca. Nanohaloarchaeota.” The data are rounded by omitting decimal fractions smaller than 0.5 and counting all others (including 0.5) as 1. The background (from yellow to orange) is colored according to the threshold AAIs of species, genus, and family (95.0, 65.0, and 45.0%, respectively). The genomes sharing an AAI of more than 95.0% with other genomes of higher completeness or low contamination are marked by red, and they were abandoned in subsequent research.
TABLE 1 Genomic features of representative MAGs affiliated with “Ca. Nanoanaerosalinaceae” and “Ca. Nanohalalkaliarchaeaceae”
MAGWGS accession no.Size (bp)No. of contigsN50 (bp)GC (mol%)Completeness (%)Contamination (%)Taxonomy at family levelReference
NHA21JALDAF0000000001,130,2667819,63256.0978.940.00NanohalalkaliarchaeaceaeThis study
NHA20JALDAE000000000840,2426021,27041.2079.910.00NanoanaerosalinaceaeThis study
NHA23JALDAH000000000849,0248217,98238.0578.500.93NanoanaerosalinaceaeThis study
NHA24JALDAI000000000762,9571069,73438.1276.532.80NanoanaerosalinaceaeThis study
Markedly, the taxonomy is different from the previous report (18), in which the taxa of the family “Ca. Nanosalinaceae” were classified into three classes. The possible reason may be that the other lineages placed at the root of “Ca. Nanohaloarchaeota” were not included in that phylogenomic analysis. In addition, we found that some “Ca. Nanohaloarchaeota” genomes are incorrectly classified in the NCBI taxonomy database. Ten assemblies were in the “Ca. Nanohaloarchaeota” (under taxid 1462430) of the DPANN group, but 29 were in the class “Ca. Nanohaloarchaea” (under taxid 1051663) of Euryarchaeota. It is clear that they share a very close relationship with each other. The misinterpretation was considered the result of inadequate outgroup representation (1). Moreover, the well-researched “Ca. Nanohaloarchaeota” taxa belong to the family “Ca. Nanosalinaceae” (Fig. 1), including the two cocultures with Halobacteria (17, 18). In addition, the class Halobacteria (as host of “Ca. Nanohaloarchaeota”) is affiliated with the phylum Halobacteriota (Fig. S6 to S8 []), whose subordinates belong to the phylum Euryarchaeota of the classical taxonomy (4).

The two novel families prefer the habitats of hypersaline sediment.

The five representative genomes affiliated with the novel families “Ca. Nanoanaerosalinaceae” and “Ca. Nanohalalkaliarchaeaceae” were obtained from hypersaline and alkaline sediment samples or the related enrichment cultures (Table S6 []). They seem to have a habitat preference different from that of “Ca. Nanosalinaceae,” which were generally reported to occur in hypersaline brines (3, 23, 24). Therefore, the relative abundance of the 20 dereplicated “Ca. Nanohaloarchaeota” genomes was estimated in the metagenomes of soda-saline lake samples (24, 29) or the enrichment cultures (this study) with different salinities.
On the whole, almost all the MAGs not obtained from our samples could not be detected (Table S2). Therefore, we counted only the MAGs that were assembled from our soda-saline lake samples. The result shows that “Ca. Nanosalinaceae” are detected predominantly in the hypersaline brine (more than 20% salinity) and marginally in the hypersaline surface sediment (Fig. 2). The detection of “Ca. Nanosalinaceae” in the surface sediment might be the result of brine interfusion. Therefore, this result is mainly in agreement with the previous reports (3, 23, 24). MAG NHA25 was an exception detected in the enrichment culture of hypersaline sediment with acetate added, and a probable cause might be that it did not harbor the SOD2 gene (discussed below).
FIG 2 Relative abundance of the three families of the phylum “Ca. Nanohaloarchaeota” in soda-saline lake samples and the related enrichment cultures. The relative abundance was estimated based on the percentage of the reads that mapped onto the MAG obtained from the soda-saline lake samples, because most MAGs assembled from other samples were not detected (Table S2). NHA21, “Ca. Nanohalalkaliarchaeaceae”; NHA24, NHA20, NHA23, and NHA-2, “Ca. Nanoanaerosalinaceae”; NHA25, NHA26, NHA29, and NHA-1, “Ca. Nanosalinaceae.” The metagenomes of brine, surface sediment, and deep sediment were published in previous reports (24, 29). In their sample names, HC and DK represent Hutong Qagan Lake and Habor Lake, respectively, and the numbers represent the salinities of brines; in the brine and surface sediment, W and S represent water (or brine) and sediment, respectively; in the deep sediment, the numbers following “Sed” show the salinities of the pore water. The enrichment culture is described in Materials and Methods.
Apparently, the “Ca. Nanoanaerosalinaceae” are found in hypersaline surface sediment but not in brine and deep sediment samples, while “Ca. Nanohalalkaliarchaeaceae” are detected in the enrichment cultures of deep sediment (Fig. 2; Table S2). In fact, the metagenome exhibits a snapshot of the true microbial community. The mixture of deep sediment samples of four sites from each pond could reduce the spatial heterogeneity (29), but some “Ca. Nanohaloarchaeota” taxa were reported to experience a diel cycle in relative abundance (33). Consequently, considering the fact that the enrichment cultures are under anaerobic condition and from deep sediment, “Ca. Nanohalalkaliarchaeaceae” could be inferred to inhabit hypersaline deep sediment, although they were not detected in the natural environments (Fig. 2; Table S2). Overall, the three families seemingly prefer different hypersaline habitats; i.e., “Ca. Nanosalinaceae,” “Ca. Nanoanaerosalinaceae,” and “Ca. Nanohalalkaliarchaeaceae” colonized brine, surface sediment, and deep sediment, respectively. Notably, our samples are from saline and alkaline environments. However, “Ca. Nanoanaerosalinaceae” and “Ca. Nanohalalkaliarchaeaceae” might also exist in neutral hypersaline sediment, in view of the presence of the family “Ca. Nanosalinaceae” in both alkaline and neutral brine (2325, 27). The investigation of neutral hypersaline sediment is necessary to answer this question.

Symbiotic host prediction of the two novel families.

Considering the gene exchange between “Ca. Nanosalinaceae” and Halobacteria as a result of their close association with each other in natural habitats (9), we predicted putative HGTs using the HGTector tool, which is founded on sequence homology search hit distribution statistics (34). First, we created a GTDB archaeal taxonomy-based database (described in Materials and Methods), because the default database follows NCBI taxonomy, in which 29 genomes were incorrectly classified in “Ca. Nanohaloarchaea” (taxid 1051663) of Euryarchaeota. Afterward, the HGT events were predicted in the 14 representative genomes of the family “Ca. Nanosalinaceae” by following the tutorials. The result shows that HGT events could be found in them (Table S9 []), and these genomes harbor high proportions (46.25 to 79.25%) of genes horizontally acquired from Halobacteria (Fig. 3). Although the tools are different, a conclusion similar to that in the previous report could be drawn (9).
FIG 3 Horizontal gene transfer inference in the genomes of the phylum “Ca. Nanohaloarchaeota.” The taxonomic percentage of horizontal gene transfer donors for each strain was inferred using the HGTector pipeline with a GTDB taxonomy-based database based on the archaeal genomes in GTDB release 202. The genomes that are affiliated with the families “Ca. Nanosalinaceae,” “Ca. Nanoanaerosalinaceae,” and “Ca. Nanohalalkaliarchaeaceae” (NHAA) are marked.
Furthermore, this database was used for HGT prediction of other lineages. The result reveals that NHA21 of “Ca. Nanohalalkaliarchaeaceae” harbors 47.86% of horizontally acquired genes from Halobacteria. Although “Ca. Nanosalinaceae” acquire significantly higher percentages of horizontally transferred genes from Halobacteria than NHA21, as estimated using the t test (P < 0.05) (details shown in Text S1 in the supplemental material), we still believe that NHA21 may keep an associated connection with Halobacteria, because the proportion of HGTs, i.e., almost half, was considered high. Another clue is that most of the other half of horizontally acquired genes are from unassigned Archaea (Fig. 3), indicating that the symbiosis plays a role almost equal to that of the natural environment as the source of gene gains in NHA21. The class Halobacteria were generally reported to be predominant in the brine of saline lake or saltern (23, 25), while it also exhibited high abundance in some hypersaline sediments (35, 36), including soda-saline sediment in our previous research (24, 29). Some Halobacteria have the capability of strict or facultative anaerobic respiration based on elemental sulfur, dimethyl sulfoxide (DMSO), or nitrate (3740). The diverse anaerobic respirations provide a metabolic basis for the presence of Halobacteria in the hypersaline sediment. Although it is difficult to predict the detailed taxon of Halobacteria for NHA21 as hosts, our data revealed at least one indication that NHA21 might form close interactions with Halobacteria.
In “Ca. Nanoanaerosalinaceae,” only 12.94 to 18.33% of the HGT events are found from Halobacteria (Fig. 3), significantly lower than “Ca. Nanosalinaceae” and “Ca. Nanohalalkaliarchaeaceae” (Text S1). Compared to none of the 228 HGT events in AB_1215_Bin_137 detected with Halobacteria as donors, “Ca. Nanoanaerosalinaceae” members harbor moderate proportions of horizontally acquired genes from Halobacteria (Fig. 3; Table S9 []). However, the main donors of horizontally acquired genes in “Ca. Nanoanaerosalinaceae” genomes were predicted as unassigned archaea over Halobacteria (Fig. 3). This suggests that the environment predominantly shapes the genomic features. Even so, considering the general symbiosis in DPANN (2, 3, 5, 6, 1419), limited metabolic potentials in “Ca. Nanoanaerosalinaceae” (see Fig. 5; Table S4, described below), and a slightly higher proportion of HGTs from Halobacteria than other Halobacteriota, PWEA01, and “Ca. Nanoarchaeota” (Fig. 3), we infer that Halobacteria is the most probable host for “Ca. Nanoanaerosalinaceae,” but the connection might be not so close as that of “Ca. Nanosalinaceae” and “Ca. Nanohalalkaliarchaeaceae.”

Acidic proteomes of the two novel families for hypersaline adaptation.

It is widely believed that “Ca. Nanosalinaceae” members adopt the “salt-in” strategy to resist the high osmotic pressure of hypersaline environments (23, 25), while AB_1215_Bin_137 inhabiting deep-sea sediment (30) may not be obligated to adapt to the extreme osmotic pressure. During the evolution of the salt-in strategy, the isoelectric point profiles of the predicted proteome became acid shifted (23, 25), because the negatively charged amino acids can maintain the stability and activity of proteins under hypersaline conditions (41). Consequently, we compared the isoelectric point profiles and amino acid compositions of the predicted proteomes of the two novel families and reference lineages. Our results support the idea that the isoelectric point profiles of the predicted proteomes of the family “Ca. Nanosalinaceae” were acid shifted (Fig. S3), and their average isoelectric points range from 4.87 to 5.59 (Table S3). They are close to the three Halobacteria references (4.71 to 4.83). Notably, AB_1215_Bin_137, which does not adopt the salt-in strategy, displays a nonacidic proteome and an average isoelectric point of 8.10 (Fig. 4a; Table S3). In “Ca. Nanohalalkaliarchaeaceae,” NHA21 displays an isoelectric point profile similar to that of LC1Nh, one representative strain of “Ca. Nanosalinaceae” (Fig. 4a). Correspondingly, its average isoelectric point is 5.04, lower than those of most members of “Ca. Nanosalinaceae” (Table S3). Results of the t test indicate that the difference is significant (Text S1). Furthermore, the amino acid composition was compared. NHA21 has a high proportion of glutamate and aspartate in its proteome as Halobacteria and “Ca. Nanosalinaceae” (Fig. 4b; Table S3). These data support the idea that NHA21 employs the salt-in strategy.
FIG 4 Comparison of isoelectric point profiles and amino acid compositions. (a) Isoelectric point profiles of the predicted proteomes of “Ca. Nanoanaerosalinaceae,” “Ca. Nanohalalkaliarchaeaceae,” and reference species. The y axis shows the frequencies of proteins in the proteomes at each isoelectric point. The isoelectric point of each protein was predicted based on the amino acid sequence. The isoelectric point profiles with a bin width of 0.1 are shown. Haloferax mediterranei ATCC 33500 (GCA_000306765.2) and Escherichia coli O157:H7 strain Sakai (GCA_000008865.2) represent acid-shifted salt-in halophiles and nonhalophiles, respectively. Numbers in parentheses are average isoelectric points (more details are presented in Table S3). (b) Percentage of acidic amino acid glutamate, aspartate, and both. The composition was calculated from the predicted proteome based on the genome sequence. D, aspartate; E, glutamate; D+E, the sum of glutamate and aspartate.
In “Ca. Nanoanaerosalinaceae,” the isoelectric point profiles are also acid shifted, but the degree is weak (Fig. 4a). Their average isoelectric points range from 5.69 to 6.16 (Table S3). t tests reveal that they are significantly higher than those of “Ca. Nanosalinaceae” but not significantly different from those of salt-in Salinibacter ruber and salt-out Spiribacter salinus (Text S1). However, they also contain a high proportion (more than 16%) of acidic amino acids (Fig. 4b; Table S3). It seems that “Ca. Nanoanaerosalinaceae” may maintain moderately acidic proteomes and moderate concentrations of intracellular inorganic salt.
In Halobacteria such as Haloarcula hispanica, Haloferax mediterranei, and Natronococcus occultus (representatives of the three orders), the mole percentages of glutamate and aspartate are higher than that in Escherichia coli, and levels of the two amino acids are almost equal. In “Ca. Nanosalinaceae” and “Ca. Nanoanaerosalinaceae” members, glutamate is unexpectedly much more abundant than aspartate, although both amino acid residues are also richer than in E. coli (Fig. 4b; Table S3). The result suggests that “Ca. Nanosalinaceae” and “Ca. Nanoanaerosalinaceae” achieve the salt-in strategy with more glutamate accumulation in the proteomes, and they are different from their Halobacteria hosts. We infer that most “Ca. Nanohaloarchaeota” lineages prefer glutamate, considering that AB_1215_Bin_137 contains about 12% acidic amino acids (close to nonhalophiles), and glutamate is likewise much more abundant than aspartate (Table S3); the characteristic of more glutamate is conserved in most “Ca. Nanohaloarchaeota.” More than half of the representative genomes have a glutamate dehydrogenase gene (gdhA) for glutamate biosynthesis (Fig. 5a; Table S4; described below); it might be a metabolic basis for the more glutamate. An exception is NHA21 of “Ca. Nanohalalkaliarchaeaceae,” which has almost equivalent levels of glutamate and aspartate in its proteome (Fig. 4b; Table S3). This suggests that “Ca. Nanohalalkaliarchaeaceae” may evolve along a different path from “Ca. Nanoanaerosalinaceae,” although they share a close phylogenetic relationship and the same habitat of hypersaline sediments. The two novel families may also employ the salt-in strategy to balance the osmotic pressure in hypersaline habitats.
FIG 5 Functional potentials of the phylum “Ca. Nanohaloarchaeota” lineages. (a) Dot plot showing the presence or absence of genes involved in metabolism and environmental response in the members of the two novel families and the percentage in the family “Ca. Nanosalinaceae.” Solid and hollow dots indicated presence and absence in the genomes, respectively. Transparent blues show the percentage of genes in 14 “Ca. Nanosalinaceae” genomes. (b) Reconstruction of functional potentials in three families of “Ca. Nanohaloarchaeota.” The process was estimated based on the genes involved in genetic information processing, metabolism, and environmental stress response. Solid dots with different colors indicate the presence of the process or gene(s) in the three families, i.e., “Ca. Nanohalalkaliarchaeaceae” (NHAA) represented by NHA21, “Ca. Nanoanaerosalinaceae,” and “Ca. Nanosalinaceae,” while hollow dots indicate the absence of the process or gene(s). Glc, glucose; Glc-1P, glucose 1-phosphate; Glc-6P, glucose 6-phosphate; F-6P, fructose 6-phosphate; F-1,6P2, fructose 1,6-bisphosphate; DHAP, dihydroxyacetone phosphate; G-3P, glyceraldehyde 3-phosphate; PEP, phosphoenolpyruvate; Glu, glutamate; OAA, oxaloacetate; Cit, citrate; Ict, isocitrate; 2-OG, 2-oxoglutarate; Scn-CoA, succinyl-CoA; Scn, succinate; Fmr, fumarate; Che, chemotaxis; GlcN-6P, glucosamine 6-phosphate; GlcN-1P, glucosamine 1-phosphate; GlcNAc-1P, N-acetylglucosamine 1-phosphate; UDP-GlcNAc, UDP-N-acetylglucosamine; PHB, poly-hydroxybutyrate. Enzyme abbreviations are listed in Table S4.

The novel families contain divergent potentials of metabolism and environmental response.

Cluster analysis based on the 3,007 orthogroups of “Ca. Nanohaloarchaeota” and the closely related phylum EX4484-52 reveals the functional difference among the three families (Fig. S4), so we compared their functional potentials. Generally, all genomes of “Ca. Nanohalalkaliarchaeaceae” and “Ca. Nanoanaerosalinaceae” have genes involved in the DNA replication apparatus, RNA polymerase complex, multiple ribosome proteins, and aminoacyl-tRNA biosynthesis, like those of “Ca. Nanosalinaceae” (Table S4). Meanwhile, they all lacked the complete genes for the electron transfer chain and the de novo biosynthesis of amino acids (except glutamate), purine, pyrimidine, and terpenoid for cell membranes (Fig. 5; Table S4). It is obvious that most “Ca. Nanohaloarchaeota” members have only draft genomes. Theoretically, the absence of a gene in a genome might be a false-negative result from the incompleteness of the genome. Nevertheless, the data above also support the idea that “Ca. Nanohaloarchaeota” members may have a symbiotic lifestyle. We found that the different genes were mainly involved in metabolism and environmental response, and then we estimated historical events (originations, duplications, transfers, and losses) by using the amalgamated likelihood estimation (ALE) approach (42, 43) and selecting the orthogroups that achieved a threshold of 0.3 in the raw reconciliation frequencies to avoid misses of true events (44).
We first reconstructed glycolysis or gluconeogenesis and related pathways, which are reported to be involved in the symbiosis of the nanohaloarchaeon “Candidatus Nanohalobium constans” LC1Nh (18). However, many members of the families “Ca. Nanohalalkaliarchaeaceae” (represented by NHA21) and “Ca. Nanoanaerosalinaceae” lack some genes, including phosphoglycerate mutase genes (gpmI or gpmB), a glucose-6-phosphate isomerase gene (gpi), and some coding genes involved in alpha-glycan (such as glycogen) utilization (Fig. 5; Table S4). NHA21 does not have the ADP-dependent phosphofructokinase gene (pfkC) and pyruvate water dikinase gene (pps) but harbors a fructose-1,6-bisphosphatase (KEGG orthology identifier K01622) gene for gluconeogenesis and oxaloacetate decarboxylase (Na+ extruding; oad) for phosphoenolpyruvate (PEP) biosynthesis from pyruvate. Generally, the two novel families are not so proficient in carbohydrate metabolism as “Ca. Nanosalinaceae.” Therefore, we suppose that the carbohydrate metabolism may drive the development of the close association between “Ca. Nanosalinaceae” and Halobacteria. The observations support the idea that the genes gpi, fbp, gap2, eno, and pps, involved in glycolysis or gluconeogenesis, were estimated originations from node 53 to node 51 (Fig. 6a and b). These events occurred at the node of the last common ancestor (LCA) of most “Ca. Nanosalinaceae” members with the exception of J07AB56. Actually, the complete genome of J07AB56 is necessary to give a definite answer.
FIG 6 History event approximation in the phylum “Ca. Nanohaloarchaeota.” (a) Ancestral reconstruction tree of “Ca. Nanohaloarchaeota.” The consensus tree of ultrafast bootstrap approximation based on 124 single-copy ubiquitous orthogroups in representative genomes is exhibited. The historical events are approximated based on the species tree and 1,629 gene UFBOOT trees. The radii of the black circles at the nodes represent their (inferred) genome sizes. Some internal nodes of interest are marked by Arabic numerals. The bar charts above the horizontal branches represent the numbers of duplications, transfers, originations, and losses (bar heights in the legend correspond to 300 events each). Some branches were extended with dashed lines to fit the width of the bar charts. The branches leading to the phylum EX4484-52 were collapsed. (b) Dot plot showing the functional annotation of the historical event at the interest nodes. The orthogroups that achieved a threshold of 0.3 in the raw reconciliation frequencies are reported. The putative functions of orthogroups were estimated by medoid sequences, which have the highest sum of similarity scores with all other sequences based on the BLOSUM62 substitution matrix.
Obviously, NHA21 distinguishingly harbors many genes involved in organic acid conversion (including IDH3, sucC, sdhAB, mdh, and pckA) and polyhydroxybutyrate (PHB) biosynthesis (Fig. 5). Considering the incompleteness of the citrate cycle in NHA21, the organic acid metabolism-related genes may participate in energy production and reducing power balance. Similarly, we also found organic acid metabolism in the other two families. We found that in some “Ca. Nanosalinaceae” members, malate dehydrogenase (ME2), acetate coenzyme A (acetate-CoA) ligase (ADP forming) (acdA), and pyruvate dehydrogenase (pdh) genes are located together and even form a gene cluster (Fig. S5a). These genes coupled the metabolism of malate, pyruvate, and acetate. Generally, pyruvate is considered a key nutrient in hypersaline environments, and it can be excreted by some members of the Halobacteria fed with glycerol (45). Meanwhile, glycerol is an important osmotic stabilizer produced by Dunaliella, a primary producer in hypersaline ecosystems (46). Therefore, pyruvate might be consumed with acetate and malate as products. In this process, ATP was generated, and reducing power was balanced (Fig. S5b). In “Ca. Nanoanaerosalinaceae,” the ME2 gene is lacking, and other genes are not linked (Table S4). However, they commonly have lactate dehydrogenase gene (ldhA), whose protein product could take on the role of ME2 in NAD+ regeneration (Fig. S5c). Overall, there are metabolic variances of organic acids among the three families (Fig. 5). We found that organic acid metabolism-related genes show complex evolutionary events (Fig. 6), and this might lead to the metabolic diversity of organic acids.
In addition to metabolism, some environmental responses are different. The subunits SecYDF of the Sec-dependent protein export system are present in most members of all three families, while the TatC subunit of the twin-arginine translocation system is present in “Ca. Nanohalalkaliarchaeaceae” (represented by NHA21) as well as some members of the family “Ca. Nanosalinaceae” but not in “Ca. Nanoanaerosalinaceae” members (Fig. 5a; Table S4). The Tat system exports the folded protein, and this process can avoid protein denaturation during the unfolding and refolding under hypersaline conditions (47). Consistently, TatC is found to exist in “Ca. Nanohalalkaliarchaeaceae” and some “Ca. Nanosalinaceae” with a more acidic proteome (implying high intracellular salinity [see above]). tatC was an estimated origination from node 55 to node 54, and then it might be lost from node 52 to node 50 (Fig. 6a and b). The evolutionary analysis backs the functional prediction.
Furthermore, we observed that chemotaxis-related genes (cheAWCDBY) and zinc transporter genes (znuABC) are present in “Ca. Nanohalalkaliarchaeaceae” and “Ca. Nanoanaerosalinaceae,” while they are not in any of the 14 representative genomes of “Ca. Nanosalinaceae” (Fig. 5a). It was reported that zinc was important for bacterial chemotaxis (48), and it may play a similar role in the novel archaeal lineages. Both “Ca. Nanohalalkaliarchaeaceae” and “Ca. Nanoanaerosalinaceae” survive in hypersaline sediment (Fig. 2), and the sediment is not so inhabitable for Halobacteria as brine (29, 36). The chemotaxis is possibly significant in seeding out the Halobacteria hosts. Conversely, an Fe-Mn family superoxide dismutase gene (SOD2) is annotated in the family “Ca. Nanosalinaceae” but not in “Ca. Nanoanaerosalinaceae” (Fig. 5; Table S4). Accordingly, superoxide dismutase may play a part in response to the reactive oxygen species superoxide in aerobic environments. The evolutionary result predicted that the chemotaxis-related genes cheWDY were originations from node 54 to node 52, while the SOD2 gene was an origination from node 51 to node 49 (Fig. 6a and b).

Deduction of the symbiotic evolution of “Ca. Nanohaloarchaeota” with Halobacteria.

The two novel families are located at the root of the “Ca. Nanosalinaceae.” Meanwhile, they exhibit differences in habitat distribution, close connection with Halobacteria (inferred from HGTs from Halobacteria), proteome acidification, and functional potentials (described above). Most DPANN lineages inhabit marine or low-salt environments (911, 14, 49). Therefore, it is reasonable that the adaptational evolution into extreme hypersaline habitats occurs in “Ca. Nanosalinales.” Following this idea, we consider that the acidification of predicted proteomes of members in “Ca. Nanohaloarchaeota” gradually took place during this process. “Ca. Nanosalinaceae” represents a developing direction of adaptation to extreme hypersaline environments. In fact, the two novel families are also the offspring inhabiting modern environments, and they maintain certain acidifications of proteomes from their LCAs. Therefore, we emphasize that, in other words, the two novel families are not the intermediate status itself but may inherit similar characteristics.
All three families were found in hypersaline habitats, i.e., brine, surface sediment, and deep sediment (Fig. 2). They are selected by different ecological niches, but they could be used to analyze phylogenetic and evolutionary relationships. One of the reasons is that “Ca. Nanosalinaceae” members resist the oxygen in the brine, but they cannot perform aerobic respiration as well as the two novel families (Fig. 5; Table S4). In addition, as a member of “Ca. Nanosalinaceae,” NHA25 seems to prefer anaerobic environments (Table S2), and it is the only one that does not have the SOD2 gene (Table S4). Possibly, the absence of this gene restricts NHA25 in anaerobic environments. In other words, habitat preference for brine and sediment may depend on a few genes. In addition, the ecological niches of the haloarchaeal hosts should also be considered; i.e., they thrive in both brine and sediment. Some Halobacteria are able to perform both aerobic respiration and anaerobic respiration (39, 40), and they could exist as hosts for these families. In addition, it is also possible that the three “Ca. Nanohaloarchaeota” families might live with different haloarchaeal lineages, and their common ancestor might have first established a physical connection with one Halobacteria ancestor, or they may share a molecular apparatus for interaction. Further evolutionary analysis on Halobacteria or cultivation-based research may give a definite answer to the question. In any case, however, Halobacteria lineages could be regarded as the common hosts for the evolution of the three “Ca. Nanohaloarchaeota” families.
Based on this research, we propose that the development of a close association between “Ca. Nanosalinaceae” and their Halobacteria hosts might be derived from the weaker connection like “Ca. Nanoanaerosalinaceae.” Similar to hypersaline adaptation, “Ca. Nanoanaerosalinaceae” may maintain the weak interaction with Halobacteria, while “Ca. Nanosalinaceae” make an enhancement. In addition, “Ca. Nanohalalkaliarchaeaceae,” represented by NHA21, also show an close association like “Ca. Nanosalinaceae.” NHA21 was most likely to establish an association with the order “Candidatus Halarchaeoplasmatales” (the new name for “Candidatus Haloplasmatales” [29]), because the order was reported to be the most abundant archaeal lineage in some hypersaline deep sediments (29, 36, 50). Nevertheless, they also form a close relationship with Halobacteria (Fig. 3). This might be the result of evolution from the weak connection represented by “Ca. Nanoanaerosalinaceae.” Apparently, “Ca. Nanohalalkaliarchaeaceae” has an evolutionary path different from that of “Ca. Nanosalinaceae” (Fig. 6). The convergent evolution to close connection indicates the advantage of “Ca. Nanohaloarchaeota” symbiosis with Halobacteria. In conclusion, the comparative genomic analysis of the novel lineages provides evidence of symbiotic evolution, although the metabolic prediction and evolutionary approximation could be further improved by relying on the complete genomes in the future. Moreover, the coculture of the novel families with Halobacteria would possibly offer supplementary insights.


We report two novel families, “Ca. Nanoanaerosalinaceae” and “Ca. Nanohalalkaliarchaeaceae.” The two families prefer the habitat hypersaline sediment and are placed at the root of the phylogenetic trees of “Ca. Nanosalinaceae.” In addition, “Ca. Nanoanaerosalinaceae” contain a lower proportion of genes horizontally acquired from Halobacteria, while both novel families exhibit distinct proteomic characteristics for hypersaline adaptation and different functional potentials, including carbohydrate metabolism, organic acid metabolism, chemotaxis, reactive oxygen species response (SOD2), and hypersaline adaptation (Tat protein translocation system). The two novel families broaden the archaeal diversity of “Ca. Nanohaloarchaeota” and provide insights into symbiotic evolution. This research will also provide a model for similar studies in other DPANN lineages.

Description of the family “Ca. Nanohalalkaliarchaeaceae” and taxa classified in the family.

Ca. Nanohalalkaliarchaeaceae” fam. nov. (’ae, N.L. fem. n. Nanohalalkaliarchaeum, Candidatus generic name; -aceae, designating a family; N.L. fem. pl. n. Nanohalalkaliarchaeaceae, the Nanohalalkaliarchaeum family). The type genus of the family is “Candidatus Nanohalalkaliarchaeum.”
Ca. Nanohalalkaliarchaeum” gen. nov. (’um. Gr. masc. n. nanos, a dwarf; Gr. masc. n. hals (gen. halos), salt of the sea; N.L. neut. n. alkali, from Arabic al, “the,” and Arabic n. qaliy, ashes of saltwort; N.L. neut. n. archaeum, archaeon; from Gr. adj. archaîos -a -on, ancient; N.L. neut. n. Nanohalalkaliarchaeum, a dwarf haloalkaliphilic archaeon). The type species of the genus is “Candidatus Nanohalalkaliarchaeum halalkaliphilum.”
Ca. Nanohalalkaliarchaeum halalkaliphilum” sp. nov. (’phi.lum. Gr. masc. n. hals (gen. halos), salt; N.L. neut. n. alkali, alkali; N.L. adj. philus -a -um, friend, loving; from Gr. adj. philos -ê -on, loving; N.L. neut. adj. halalkaliphilum, salt and alkali loving). The type material is the metagenome-assembled genome NHA21, whose GenBank accession number (whole genome sequencing [WGS]) is JALDAF000000000.

Description of the family “Ca. Nanoanaerosalinaceae” and taxa classified in the family.

Description of “Ca. Nanoanaerosalinaceae” fam. nov. (’ae, N.L. fem. n. Nanoanaerosalina, Candidatus generic name; -aceae, designating a family; N.L. fem. pl. n. Nanoanaerosalinaceae, the Nanoanaerosalina family). The type genus of the family is the genus “Candidatus Nanoanaerosalina.”
Description of “Ca. Nanoanaerosalina” gen. nov. (’na. Gr. masc. n. nanos, a dwarf; Gr. pref. an-, not; Gr. masc. n. aêr (gen. aeros), air; N.L. masc. adj. salinus, saline; N.L. fem. n. Nanoanaerosalina, a dwarf saline organism not living in air). The type species of the genus is “Candidatus Nanoanaerosalina halalkaliphila.”
Description of “Ca. Nanoanaerosalina halalkaliphila” sp. nov. (’ Gr. masc. n. hals, (gen. halos), salt; N.L. neut. n. alkali, alkali; N.L. adj. philus -a -um, friend, loving; from Gr. adj. philos -ê -on, loving; N.L. fem. adj. halalkaliphila, salt and alkali loving). The type material is the metagenome-assembled genome NHA20, whose GenBank accession number (WGS) is JALDAE000000000.


Enrichment culture and metagenomic sequencing.

Deep sediment samples from five crystallizer ponds of different salinities (1, 3, 15, 24, and 33%) described in the previous study (29) were used for enrichment culture. In brief, 5 g of four mixed sediment samples (from the identical pond) was added with 5 mL of corresponding brine and 5 mL sterile medium with the same salinity as the brine. The five media were composed of (per liter) 0.2 mg MgCl2·6H2O, 0.05 g KH2PO4, 2 g KCl, and variable concentrations of NaCl, NaHCO3, and CaCl2 for samples from different ponds. Their concentrations (per litre) were 10 g, 2 g, and 1.2 mg, respectively, for the 1% samples; 30 g, 3 g, and 0.4 mg for 3%; 150 g, 13 g, and 6.5 mg for 15%; 240 g, 28 g, and 7.7 mg for 24%; and 370 g, 5 g, and 4.1 mg for 33%. After sterilization, pH values were usually approximately 10.0. After premixing the samples with brine and media, a final concentration of 2.42 g/L Na2MoO4·2H2O, 5 mg/L kanamycin, 20 mg/L ampicillin, 2.4 g/L NaS2·9H2O, and either 0.52 g/L sodium formate dihydrate (indicated with “F” in sample names), 0.68 g/L sodium acetate trihydrate (indicated with “A”), or no substrate was added (details shown in Table S5 []). A total of 15 samples were anaerobically and statically incubated at 30°C without light for about 210 days. Then, microbial cells with insoluble matter were collected using centrifugation at 4°C. Total DNA was extracted from the samples using a PowerSoil DNA isolation kit (MoBio, CA, United States) for metagenome sequencing. However, there was not enough DNA from the sample of 1% salinity with no substrate. Therefore, 14 competent DNA samples were used for subsequent library construction and metagenomic sequencing in the Illumina platform HiSeq X 10 to generate 150-bp paired-end reads (Table S5 []).

Contig assembly and genome binning.

Quality control of raw reads of each metagenome was performed using the read_qc module with default parameters of the metaWRAP v1.2.2 pipeline (51). Clean reads generated for each sample were individually assembled into contigs using the assembly module of metaWRAP with the default assembler MEGAHIT v1.1.3 (52), and short contigs (<1,000 bp) were removed. Three different metagenomic tools for genome binning, CONCOCT v1.0.0 (53), MetaBAT2 v2.12.1 (54), and MaxBin2 v2.2.6 (55), integrated into the binning module of metaWRAP, were used to recover initial MAGs from each metagenome. In addition, to obtain more “Ca. Nanohaloarchaeota” MAGs, 18 metagenomes of brine and surface sediment samples from our previous study (24) were reanalyzed by following the genome-binning approach described below. Then, all MAGs obtained from the same metagenome were individually refined with a minimum completeness of 50% or 30% and a maximum contamination of 10% or 20% using the bin_refinement module in metaWRAP. Subsequently, the best representative genomes were chosen from all the refined MAGs using dRep v3.2.0 with a threshold of 99% ANI (56). Ten more “Ca. Nanohaloarchaeota” MAGs were obtained in this study (Table S6 []).

Genome collection, quality estimation, and function prediction.

The genome sequences of “Ca. Nanohaloarchaeota” and its closely related lineages, including the phyla “Ca. Aenigmarchaeota,” EX4484-52, PWEA01, and QMZS01, were collected (Table S1) by 9 October 2021. First, 232 genomes were downloaded from GenBank of the National Center for Biotechnology Information (NCBI; under the following taxonomy IDs: 743724 for the phylum “Ca. Aenigmarchaeota,” 1462430 for the phylum “Ca. Nanohaloarchaeota” in the DPANN group, 1051663 for the class “Nanohaloarchaea” in the phylum Euryarchaeota, and 2565780 for the unclassified DPANN group. Two genomes of “Candidatus Nanohaloarchaeum antarcticus” (17) were obtained via the identification numbers 2643221421 and 2791354821 in the Integrated Microbial Genomes (IMG) database ( Additionally, 148 archaeal genomes from our previous studies (24, 29) were also used. In addition, 2,285 archaeal representative genomes in GTDB were downloaded (Table S7 []) for phylogeny analysis. The relative abundance of MAGs was estimated individually by mapping clean reads to MAGs with the genome module in CoverM v.0.6.0 ( using the following parameters: –min-read-percent-identity, 0.95; –min-read-aligned-percent, 0.75.
Classification was generally preanalyzed using the classify workflow in GTDB-Tk (V1.7.0) (57) based on release 202 in the Genome Taxonomy Database ( Subsequently, basic statistics (including contig number, genome size, N50, N90, and G+C content) of genomes were generated using the script (last modified 25 July 2019) in the BBTools suite ( The genomes were annotated using Prokka (version 1.13) with the settings Archaea for annotation mode and RNAmmer (v1.2) for rRNA prediction (58, 59). In this process, Prodigal (v2.6.3) was used to find protein-coding features (60). The numbers of genes, coding sequences (CDSs), rRNAs, and tRNAs were retrieved from the Prokka output. Genome completeness and contamination were estimated using the lineage-specific workflow in CheckM (v1.1.3) (61). The isoelectric point of each protein was predicted using a protein isoelectric point calculator (62). Functions of CDSs in each genome were predicted using the Diamond method in eggNOG-mapper (v2.0.0) (63, 64). From the output, Clusters of Orthologous Genes (COG) categories (65) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (KO) identifiers were retrieved. Metabolic pathways were reconstructed using the online tool KEGG Mapper with KO annotation ( Carbohydrate-active enzymes were annotated using dbCAN2 (v2.0.6) based on the CAZy database version CAZyDB.07312019 (66, 67).

Phylogeny and sequence identity analyses.

The phylogenetic trees were based on 122 archaeal ubiquitous single-copy proteins, ribosome proteins, and 16S rRNA genes with 2,285 archaeal genomes except NATU lineages in GTDB as an outgroup. Briefly, the multiple-sequence-alignment file of the 122 archaeal ubiquitous single-copy proteins were also produced using the classify workflow in GTDB-Tk (57). The consensus tree of ultrafast bootstrap approximation was reconstructed using IQ-TREE (multicore version 1.6.12) with an ultrafast bootstrap of 1,000 and standard model selection followed by tree inference (68). The best-fit model was chosen according to the Bayesian information criterion. The archaeal ribosome proteins were inferred from the genomes and then aligned, masked, and trimmed using AMPHORA2 with default options (69). The aligned and trimmed amino acid sequences of Rpl2p, Rpl3p, Rpl4lp, Rpl5p, Rpl6p, Rpl14p, Rpl18p, Rpl22p, Rpl24p, Rps3p, Rps8p, Rps10p, Rps17p, and Rps19p were concatenated using AMAS (70). The phylogenomic tree was reconstructed using IQ-TREE with the same settings, and the best-fit model was chosen. The archaeal 16S rRNA gene sequences were predicted using RNAmmer (v1.2) (59). Those 1,450 to 1,700 bp in length and of high quality (score above 1,100 and no ambiguous sites) were selected (Table S8 []), referring to previous literature (59, 71). The sequences were aligned using MUSCLE v3.8.31 with default options (72). The phylogenetic reconstruction was also done using the same method, and the best-fit model was chosen. The phylogenetic trees were visualized using the Interactive Tree of Life (iTOL, version 6.5) (73). The archaeal trees were generally reconstructed by setting the root at the node between the DPANN group and other lineages. The phylogenetic trees of the NATU clade were reconstructed by following the same approach.
The identity between two 16S rRNA gene sequences was estimated using Nucleotide-Nucleotide BLAST 2.6.0+ (74). AAI between two sets of predicted coding sequences from the genomes (referred to as the predicted proteome here) was calculated using the online AAI calculator ( ANI between two genomes was computed using FastANI (version 1.33) with default options (75).

Horizontal gene transfer event prediction.

The putative HGT events were computed using the HGTector pipeline version 2.0b3 (34). First, the GTDB taxonomy-based database for HGTector analysis was built by considering that different genomes of “Ca. Nanohaloarchaeota” were assigned to Euryarchaeota or the DPANN group. In brief, the taxonomy file was downloaded from GTDB within release 202, and then it was reformatted into NCBI taxdump style using a Python 3 script named provided by HGTector contributors ( Three files named names.dmp, nodes.dmp, and were produced. All proteins of 2,339 archaeal representatives in GTDB were directly downloaded from GenBank or predicted based on the genomes using Prodigal (v2.6.3) (60). A GTDB-based and local prot.accession2taxid file was created based on the produced file and all proteins. The database was built from all the proteins and the GTDB-based taxonomy files using the makedb command in Diamond v0.9.26.127 (64). After that, a batch homology search was performed using a Diamond method for each proteome. HGT events were predicted using the analyze command with the following settings: “self” group, “Ca. Nanohaloarchaeota” (whose taxid in the local GTDB-based taxonomy was 18); “close” group, “Ca. Nanohaloarchaeota” and EX8848-52 (taxid, 20); maximum number of hits, 12; maximum E-value cutoff, 1e−8; minimum percent identity cutoff, 30%; minimum percent query coverage cutoff, 50%; bandwidth for Gaussian KDE, auto. Donor’s taxonomy of HGT-derived genes was deciphered using the lineage command in TaxonKit (v0.9.0) (76).

Comparative genomic analysis for ancestral reconstruction.

The comparative genomic analysis was performed by referring to the published research (44, 77, 78). Briefly, 3,007 orthogroups were found from 32,540 CDSs in the genome set (20 “Ca. Nanohaloarchaeota” and 9 EX4484-52 genomes) using OrthoFinder version 2.4.0 with default settings (79). The gene count matrix of orthogroups in the genomes was used to compare the functional profile. Nonmetric multidimensional scaling (NMDS) analysis was performed using the metaMDS function with default options in R package vegan v2.5-7 (
We retained 1,629 orthogroups with 4 or more genes according to the principles in previous literature (77). The genes of identical orthogroup were aligned using MAFFT v7.407 with the L-INS-i method, which has high accuracy (80). The columns were removed using the heuristic automated1 method of trimAl version 1.2rev59 (81), and then the sequences containing too many gaps were abandoned with the following options: minimum overlap of a position with other positions, 0.3; minimum percentage of the satisfied positions, 50. For the reconstruction of the species tree, 124 single-copy orthogroups were manually selected that were present in no fewer than 18 “Ca. Nanohaloarchaeota” genomes (≥90%) and no fewer than 5 EX4484-52 genomes (>50%). The aligned and trimmed sequences were concatenated using AMAS (70). Correspondingly, the phylogenomic tree was reconstructed using IQ-TREE, and the best-fit model of LG+F+I+G4 was chosen. The root of this species tree was reset using the midpoint method, a built-in function in iTOL (73). To obtain the UFBOOT trees of the 1,629 orthogroups, we used IQ-TREE with the same settings (-m, LG+G; -bb, 1000; -wbtl) as in the previous research (77). The frequencies of duplications, transfers (gene transfers from the lineage inside the species tree), losses, and originations (gene transfers from the lineages outside the species tree, or true gene originations), as well as copy numbers of the 1,629 orthogroups at each node, were inferred using maximum-likelihood estimation (ALEml_undated command) in ALE v0.4 (42). The number of each event and genome size were inferred by parsing the .uml_rec files of the ALE output using the Python scripts set named ALE helper (78). The details of orthogroups and the output files can be accessed via the link in “Data availability.” The ancestral reconstruction tree was visualized using ETE Toolkit version 3.1.2 (82).
The orthogroups that achieved a threshold of 0.3 in the raw reconciliation frequencies were counted. This threshold is relaxed but necessary to avoid missing many true events (44). A medoid sequence was selected from each orthogroup as the representative for functional annotation. The medoid sequences have the highest sum of similarity scores with all other sequences based on the BLOSUM62 substitution matrix using Protein-Protein BLAST 2.6.0+ (74).

Statistical analysis.

Student’s t test or the Wilcoxon test was performed using the functions in the library ggpubr to compare the means between any two families’ ApI, the ratio of HGTs from Halobacteria, and G+C content. Briefly, the normal distribution of data was first tested to decide between the parametric t test and the nonparametric Wilcoxon test. The comparisons between NHA21 (representing “Ca. Nanohalalkaliarchaeaceae”) and the other two families were performed using a one-sample test, while the comparisons between “Ca. Nanoanaerosalinaceae” and “Ca. Nanosalinaceae” were performed using an unpaired two-sample test.

Data availability.

The “Ca. Nanohaloarchaeota” genomes are available from the NCBI under the BioProject identifier PRJNA797678. DNA sequencing data have been deposited in BioProject with the identifiers PRJNA549802 and PRJNA679647. Metagenomic sequencing data of 14 enrichment samples are deposited in BioProject with the identifier PRJNA769545. Raw data (including protein files, tree files, horizontal gene transfer analysis, comparative genomics files, etc.) generated in this study, Fig. S6 to S10, and Table S5 to S9 are available at


This study was funded by the National Natural Science Foundation of China (no. 91751201 and 32000046) and partially supported by the Senior User Project of RV KEXUE (no. KEXUE2019GZ05) provided by the Center for Ocean Mega-Science, Chinese Academy of Sciences.
We thank the reviewers for constructive comments to help us improve the manuscript.
We declare no competing interests.

Supplemental Material

File (msystems.00669-22-s0001.pdf)
File (msystems.00669-22-s0002.tif)
File (msystems.00669-22-s0003.tif)
File (msystems.00669-22-s0004.tif)
File (msystems.00669-22-s0005.tif)
File (msystems.00669-22-s0006.tif)
File (msystems.00669-22-s0007.xlsx)
File (msystems.00669-22-s0008.xlsx)
File (msystems.00669-22-s0009.xlsx)
File (msystems.00669-22-s0010.xlsx)
ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.


Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF, Darling A, Malfatti S, Swan BK, Gies EA, Dodsworth JA, Hedlund BP, Tsiamis G, Sievert SM, Liu WT, Eisen JA, Hallam SJ, Kyrpides NC, Stepanauskas R, Rubin EM, Hugenholtz P, Woyke T. 2013. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499:431–437.
Huber H, Hohn MJ, Rachel R, Fuchs T, Wimmer VC, Stetter KO. 2002. A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature 417:63–67.
Narasingarao P, Podell S, Ugalde JA, Brochier-Armanet C, Emerson JB, Brocks JJ, Heidelberg KB, Banfield JF, Allen EE. 2012. De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities. ISME J 6:81–93.
Rinke C, Chuvochina M, Mussig AJ, Chaumeil PA, Davin AA, Waite DW, Whitman WB, Parks DH, Hugenholtz P. 2021. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat Microbiol 6:946–959.
Baker BJ, Comolli LR, Dick GJ, Hauser LJ, Hyatt D, Dill BD, Land ML, Verberkmoes NC, Hettich RL, Banfield JF. 2010. Enigmatic, ultrasmall, uncultivated Archaea. Proc Natl Acad Sci USA 107:8806–8811.
Castelle CJ, Wrighton KC, Thomas BC, Hug LA, Brown CT, Wilkins MJ, Frischkorn KR, Tringe SG, Singh A, Markillie LM, Taylor RC, Williams KH, Banfield JF. 2015. Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr Biol 25:690–701.
Probst AJ, Ladd B, Jarett JK, Geller-McGrath DE, Sieber CMK, Emerson JB, Anantharaman K, Thomas BC, Malmstrom RR, Stieglmeier M, Klingl A, Woyke T, Ryan MC, Banfield JF. 2018. Differential depth distribution of microbial function and putative symbionts through sediment-hosted aquifers in the deep terrestrial subsurface. Nat Microbiol 3:328–336.
Castelle CJ, Banfield JF. 2018. Major new microbial groups expand diversity and alter our understanding of the tree of life. Cell 172:1181–1197.
Dombrowski N, Williams TA, Sun J, Woodcroft BJ, Lee JH, Minh BQ, Rinke C, Spang A. 2020. Undinarchaeota illuminate DPANN phylogeny and the impact of gene transfer on archaeal evolution. Nat Commun 11:3939.
Spang A, Caceres EF, Ettema TJ. 2017. Genomic exploration of the diversity, ecology, and evolution of the archaeal domain of life. Science 357:eaaf3883.
Baker BJ, De Anda V, Seitz KW, Dombrowski N, Santoro AE, Lloyd KG. 2020. Diversity, ecology and evolution of Archaea. Nat Microbiol 5:887–900.
Dombrowski N, Lee JH, Williams TA, Offre P, Spang A. 2019. Genomic diversity, lifestyles and evolutionary origins of DPANN archaea. FEMS Microbiol Lett 366:fnz008.
Moody ER, Mahendrarajah TA, Dombrowski N, Clark JW, Petitjean C, Offre P, Szöllősi GJ, Spang A, Williams TA. 2022. An estimate of the deepest branches of the tree of life from ancient vertically-evolving genes. Elife 11:66695.
Castelle CJ, Brown CT, Anantharaman K, Probst AJ, Huang RH, Banfield JF. 2018. Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations. Nat Rev Microbiol 16:629–645.
Jahn U, Gallenberger M, Paper W, Junglas B, Eisenreich W, Stetter KO, Rachel R, Huber H. 2008. Nanoarchaeum equitans and Ignicoccus hospitalis: new insights into a unique, intimate association of two archaea. J Bacteriol 190:1743–1750.
St John E, Liu Y, Podar M, Stott MB, Meneghin J, Chen Z, Lagutin K, Mitchell K, Reysenbach AL. 2019. A new symbiotic nanoarchaeote (Candidatus Nanoclepta minutus) and its host (Zestosphaera tikiterensis gen. nov., sp. nov.) from a New Zealand hot spring. Syst Appl Microbiol 42:94–106.
Hamm JN, Erdmann S, Eloe-Fadrosh EA, Angeloni A, Zhong L, Brownlee C, Williams TJ, Barton K, Carswell S, Smith MA, Brazendale S, Hancock AM, Allen MA, Raftery MJ, Cavicchioli R. 2019. Unexpected host dependency of Antarctic Nanohaloarchaeota. Proc Natl Acad Sci USA 116:14661–14670.
La Cono V, Messina E, Rohde M, Arcadi E, Ciordia S, Crisafi F, Denaro R, Ferrer M, Giuliano L, Golyshin PN, Golyshina OV, Hallsworth JE, La Spada G, Mena MC, Merkel AY, Shevchenko MA, Smedile F, Sorokin DY, Toshchakov SV, Yakimov MM. 2020. Symbiosis between nanohaloarchaeon and haloarchaeon is based on utilization of different polysaccharides. Proc Natl Acad Sci USA 117:20223–20234.
Sakai HD, Nur N, Kato S, Yuki M, Shimizu M, Itoh T, Ohkuma M, Suwanto A, Kurosawa N. 2022. Insight into the symbiotic lifestyle of DPANN archaea revealed by cultivation and genome analyses. Proc Natl Acad Sci USA 119:e2115449119.
Podar M, Anderson I, Makarova KS, Elkins JG, Ivanova N, Wall MA, Lykidis A, Mavromatis K, Sun H, Hudson ME, Chen W, Deciu C, Hutchison D, Eads JR, Anderson A, Fernandes F, Szeto E, Lapidus A, Kyrpides NC, Saier MH, Jr, Richardson PM, Rachel R, Huber H, Eisen JA, Koonin EV, Keller M, Stetter KO. 2008. A genomic analysis of the archaeal system Ignicoccus hospitalis-Nanoarchaeum equitans. Genome Biol 9:R158.
Li YX, Rao YZ, Qi YL, Qu YN, Chen YT, Jiao JY, Shu WS, Jiang H, Hedlund BP, Hua ZS, Li WJ. 2021. Deciphering symbiotic interactions of “Candidatus Aenigmarchaeota” with inferred horizontal gene transfers and co-occurrence networks. mSystems 6:e00606-21.
Feng Y, Neri U, Gosselin S, Louyakis AS, Papke RT, Gophna U, Gogarten JP. 2021. The evolutionary origins of extreme halophilic archaeal lineages. Genome Biol Evol 13:evab166.
Vavourakis CD, Ghai R, Rodriguez-Valera F, Sorokin DY, Tringe SG, Hugenholtz P, Muyzer G. 2016. Metagenomic insights into the uncultured diversity and physiology of microbes in four hypersaline soda lake brines. Front Microbiol 7:211.
Zhao D, Zhang S, Xue Q, Chen J, Zhou J, Cheng F, Li M, Zhu Y, Yu H, Hu S, Zheng Y, Liu S, Xiang H. 2020. Abundant taxa and favorable pathways in the microbiome of soda-saline lakes in Inner Mongolia. Front Microbiol 11:1740.
Ghai R, Pasic L, Fernandez AB, Martin-Cuadrado AB, Mizuno CM, McMahon KD, Papke RT, Stepanauskas R, Rodriguez-Brito B, Rohwer F, Sanchez-Porro C, Ventosa A, Rodriguez-Valera F. 2011. New abundant microbial groups in aquatic hypersaline environments. Sci Rep 1:135.
La Cono V, Bortoluzzi G, Messina E, La Spada G, Smedile F, Giuliano L, Borghini M, Stumpp C, Schmitt-Kopplin P, Harir M, O’Neill WK, Hallsworth JE, Yakimov M. 2019. The discovery of Lake Hephaestus, the youngest athalassohaline deep-sea formation on. Sci Rep 9:1679.
Zhaxybayeva O, Stepanauskas R, Mohan NR, Papke RT. 2013. Cell sorting analysis of geographically separated hypersaline environments. Extremophiles 17:265–275.
Gunde-Cimerman N, Plemenitaš A, Oren A. 2018. Strategies of adaptation of microorganisms of the three domains of life to high salt concentrations. FEMS Microbiol Rev 42:353–375.
Zhou H, Zhao D, Zhang S, Xue Q, Zhang M, Yu H, Zhou J, Li M, Kumar S, Xiang H. 2022. Metagenomic insights into the environmental adaptation and metabolism of Candidatus Haloplasmatales, one archaeal order thriving in saline lakes. Environ Microbiol 24:2239–2258.
Castelle CJ, Meheust R, Jaffe AL, Seitz K, Gong XZ, Baker BJ, Banfield JF. 2021. Protein family content uncovers lineage relationships and bacterial pathway maintenance mechanisms in DPANN Archaea. Front Microbiol 12:660052.
Yarza P, Yilmaz P, Pruesse E, Glöckner FO, Ludwig W, Schleifer KH, Whitman WB, Euzéby J, Amann R, Rosselló-Móra R. 2014. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol 12:635–645.
Konstantinidis KT, Rossello-Mora R, Amann R. 2017. Uncultivated microbes in need of their own taxonomy. ISME J 11:2399–2406.
Andrade K, Logemann J, Heidelberg KB, Emerson JB, Comolli LR, Hug LA, Probst AJ, Keillar A, Thomas BC, Miller CS, Allen EE, Moreau JW, Brocks JJ, Banfield JF. 2015. Metagenomic and lipid analyses reveal a diel cycle in a hypersaline microbial ecosystem. ISME J 9:2697–2711.
Zhu Q, Kosoy M, Dittmar K. 2014. HGTector: an automated method facilitating genome-wide discovery of putative horizontal gene transfers. BMC Genomics 15:717.
Emmerich M, Bhansali A, Losekann-Behrens T, Schroder C, Kappler A, Behrens S. 2012. Abundance, distribution, and activity of Fe(II)-oxidizing and Fe(III)-reducing microorganisms in hypersaline sediments of Lake Kasin, Southern Russia. Appl Environ Microbiol 78:4386–4399.
Vavourakis CD, Andrei A-S, Mehrshad M, Ghai R, Sorokin DY, Muyzer G. 2018. A metagenomics roadmap to the uncultured genome diversity in hypersaline soda lake sediments. Microbiome 6:168.
Sorokin DY, Messina E, Smedile F, La Cono V, Hallsworth JE, Yakimov MM. 2021. Carbohydrate-dependent sulfur respiration in halo(alkali)philic archaea. Environ Microbiol 23:3789–3808.
Sorokin DY, Kublanov IV, Gavrilov SN, Rojo D, Roman P, Golyshin PN, Slepak VZ, Smedile F, Ferrer M, Messina E, La Cono V, Yakimov MM. 2016. Elemental sulfur and acetate can support life of a novel strictly anaerobic haloarchaeon. ISME J 10:240–252.
Torregrosa-Crespo J, Martinez-Espinosa RM, Esclapez J, Bautista V, Pire C, Camacho M, Richardson DJ, Bonete MJ. 2016. Anaerobic metabolism in Haloferax genus: denitrification as case of study. Adv Microb Physiol 68:41–85.
Qi Q, Ito Y, Yoshimatsu K, Fujiwara T. 2016. Transcriptional regulation of dimethyl sulfoxide respiration in a haloarchaeon, Haloferax volcanii. Extremophiles 20:27–36.
Lanyi JK. 1974. Salt-dependent properties of proteins from extremely halophilic bacteria. Bacteriol Rev 38:272–290.
Szollosi GJ, Rosikiewicz W, Boussau B, Tannier E, Daubin V. 2013. Efficient exploration of the space of reconciled gene trees. Syst Biol 62:901–912.
Szollosi GJ, Davin AA, Tannier E, Daubin V, Boussau B. 2015. Genome-scale phylogenetic analysis finds extensive gene transfer among fungi. Philos T R Soc B 370:20140335.
Martijn J, Schon ME, Lind AE, Vosseberg J, Williams TA, Spang A, Ettema TJG. 2020. Hikarchaeia demonstrate an intermediate stage in the methanogen-to-halophile transition. Nat Commun 11:5490.
Oren A. 2015. Pyruvate: a key nutrient in hypersaline environments? Microorganisms 3:407–416.
Oren A. 2017. Glycerol metabolism in hypersaline environments. Environ Microbiol 19:851–863.
Cai L, Zhao DH, Hou J, Wu JH, Cai SF, Dassarma P, Xiang H. 2012. Cellular and organellar membrane-associated proteins in haloarchaea: perspectives on the physiological significance and biotechnological applications. Sci China Life Sci 55:404–414.
Sanders L, Andermann TM, Ottemann KM. 2013. A supplemented soft agar chemotaxis assay demonstrates the Helicobacter pylori chemotactic response to zinc and nickel. Microbiology (Reading) 159:46–57.
Liu XB, Li M, Castelle CJ, Probst AI, Zhou ZC, Pan J, Liu Y, Banfield JF, Gu JD. 2018. Insights into the ecology, evolution, and metabolism of the widespread Woesearchaeotal lineages. Microbiome 6:102.
Eder W, Ludwig W, Huber R. 1999. Novel 16S rRNA gene sequences retrieved from highly saline brine sediments of Kebrit Deep, Red Sea. Arch Microbiol 172:213–218.
Uritskiy GV, DiRuggiero J, Taylor J. 2018. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6:158.
Li DH, Liu CM, Luo RB, Sadakane K, Lam TW. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676.
Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. 2014. Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146.
Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. 2019. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7:e7359.
Wu Y-W, Simmons BA, Singer SW. 2016. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32:605–607.
Olm MR, Brown CT, Brooks B, Banfield JF. 2017. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11:2864–2868.
Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. 2019. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36:1925–1927.
Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069.
Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, Ussery DW. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108.
Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055.
Kozlowski LP. 2016. IPC—Isoelectric Point Calculator. Biol Direct 11:55.
Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, Mende DR, Letunic I, Rattei T, Jensen LJ, von Mering C, Bork P. 2019. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314.
Buchfink B, Reuter K, Drost HG. 2021. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18:366–368.
Galperin MY, Wolf YI, Makarova KS, Vera Alvarez R, Landsman D, Koonin EV. 2021. COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 49:D274–D281.
Henrissat B, Davies G. 1997. Structural and sequence-based classification of glycoside hydrolases. Curr Opin Struc Biol 7:637–644.
Zhang H, Yohe T, Huang L, Entwistle S, Wu PZ, Yang ZL, Busk PK, Xu Y, Yin YB. 2018. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 46:W95–W101.
Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274.
Wu M, Scott AJ. 2012. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28:1033–1034.
Borowiec ML. 2016. AMAS: a fast tool for alignment manipulation and computing of summary statistics. PeerJ 4:e1660.
Land ML, Hyatt D, Jun SR, Kora GH, Hauser LJ, Lukjancenko O, Ussery DW. 2014. Quality scores for 32,000 genomes. Stand Genomic Sci 9:20.
Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797.
Letunic I, Bork P. 2021. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49:W293–W296.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421.
Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114.
Shen W, Ren H. 2021. TaxonKit: a practical and efficient NCBI taxonomy toolkit. J Genet Genomics 48:844–850.
Huang WC, Liu Y, Zhang X, Zhang CJ, Zou D, Zheng S, Xu W, Luo Z, Liu F, Li M. 2021. Comparative genomic analysis reveals metabolic flexibility of Woesearchaeota. Nat Commun 12:5281.
Sheridan PO, Raguideau S, Quince C, Holden J, Zhang L, Thames C, Williams TA, Gubry-Rangin C, Thames Consortium. 2020. Gene duplication drives genome expansion in a major lineage of Thaumarchaeota. Nat Commun 11:5494.
Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238.
Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780.
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973.
Huerta-Cepas J, Serra F, Bork P. 2016. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol 33:1635–1638.

Information & Contributors


Published In

cover image mSystems
Volume 7Number 620 December 2022
eLocator: e00669-22
Editor: Thulani P. Makhalanyane, University of Pretoria
PubMed: 36259734


Received: 24 July 2022
Accepted: 23 September 2022
Published online: 19 October 2022


  1. Candidatus Nanohaloarchaeota”
  2. DPANN superphylum
  3. symbiosis
  4. horizontal gene transfer
  5. evolution
  6. comparative genomics



State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
Sumit Kumar
Enzyme and Microbial Biochemistry Lab, Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi, India
Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
Heng Zhou
State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
Qiong Xue
State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
Wurunze Sun
State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
Jian Zhou
State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China


Thulani P. Makhalanyane
University of Pretoria


Dahe Zhao, Shengjie Zhang, and Sumit Kumar contributed equally to this work. Author order was determined in order of the time spent participating in the research.
The authors declare no conflict of interest.

Metrics & Citations


Note: There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication.

Citation counts come from the Crossref Cited by service.


If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.

View Options

Figures and Media






Share the article link

Share with email

Email a colleague

Share on social media

American Society for Microbiology ("ASM") is committed to maintaining your confidence and trust with respect to the information we collect from you on websites owned and operated by ASM ("ASM Web Sites") and other sources. This Privacy Policy sets forth the information we collect about you, how we use this information and the choices you have about how we use such information.
FIND OUT MORE about the privacy policy