Surveillance and prevalence.
To understand the prevalence and distribution of coronavirus in bats, 985 individuals belonging to 35 species, from three bat families, were sampled at 82 sites in 15 provinces of China (Fig.
1; Table
1). All samples were collected from apparently healthy individuals and tested for the presence of coronavirus by RT-PCR detection of a 440-bp RdRp gene fragment. A total of 64 (6.5%) samples tested positive in 19 of the 82 sites, located in 12 provinces (Fig.
1). Ten of the 35 species tested were found to harbor coronavirus; 57 (89%) positive samples were detected from six species of the family
Vespertilionidae, and the rest were from four species of the
Rhinolophidae. The
Vespertilionidae and
Rhinolophidae accounted for 48% and 47% of the samples, respectively (Table
1).
Two colonies of bats, from different sampling sites, had much higher positive rates than average. One Miniopterus schreibersi colony had a 55% (11/20) positive rate, while a Pipistrellus abramus colony had a 35% (11/31) positive rate. All positive samples were from anal swabs, and none from throat swabs, suggesting that the gastrointestinal tract is the principal replication site of coronavirus infection in those bats.
There were also some species of bats that had high sample numbers, but in which all individuals were negative for coronavirus: 84 individuals of the genus Hipposideros (58 from Hipposideros armiger), 101 specimens of Rhinolophus pusillus, and 37 samples from two genera of the Pteropodidae.
To determine the overall diversity of coronaviruses that were isolated from bats, preliminary phylogenetic analysis of the RdRp fragment obtained from RT-PCR detection revealed that all viruses characterized fell within the previously recognized coronavirus groups, including the SARS-CoV group. Of the 65 viruses, only three bat coronaviruses were closely related to SARS-CoV (putative group 4) and 40 clustered with group 1 viruses, while the remaining 22 viruses form a separate group that is most closely related to group 2 viruses (putative group 5); however, there was no statistical support for this relationship (Fig.
2). None of the coronaviruses characterized in this study were phylogenetically related to group 3.
Genetic analysis revealed the presence of species-specific host restriction of coronavirus in bats. For all species, but one, that were sampled and found to harbor coronavirus, those viruses from a single species all clustered together with high bootstrap support (Fig.
2). The one exception was
R. ferrumequinum, which tested positive for group 1, 4, and 5 viruses. Furthermore, in instances where the same bat species was sampled in different provinces, those species were found to harbor coronaviruses that clustered together (Fig.
2). Species specificity was also evident when two bat species from the same cave in Guangxi,
Miniopterus schreibersi and
Myotis ricketti, were positive for group 1 coronaviruses, represented by BtCoV/A911/05 and BtCoV/821/05, respectively, but the viruses from each species did not cluster together in the phylogenetic analysis (Fig.
2).
These findings suggest that genetically divergent coronaviruses are commonly present in, and specific to, different species of bats in China.
Genome organization.
Based on preliminary phylogenetic analysis of the RdRp gene (Fig.
2), four strains, representing the diversity of bat coronaviruses isolated in this study, were selected for full genome sequencing: BtCoV/
Tylonycteris pachypus/Guangdong/133/2005 (BtCoV/133/05), BtCoV
/Rhinolophus ferrumequinum/Hubei/273/2004 (BtCoV/273/04), BtCoV/
R. macrotis/Hubei/279/2004 (BtCoV/279/04), and BtCoV/
Scotophilus kuhlii/Hainan/512/2005 (BtCoV/512/05). An additional five viruses were selected for partial sequencing of the RdRp, HEL, and S genes: BtCoV/
S. kuhlii/Hainan/515/2005 (BtCoV/515/05), BtCoV/
S. kuhlii/Hainan/527/2005 (BtCoV/527/05), BtCoV/
Pipistrellus pipistrellus/Hainan/434/05 (BtCoV/434/05), BtCoV/
P. abramus/Sichuan/355/2005 (BtCoV/355/05), and BtCoV/
Myotis ricketti/Yunnan/701/2005 (BtCoV/701/05). Sequences generated in this study were analyzed with all available coronavirus sequence data in public databases. Comparison of the genome organization of bat coronaviruses with that of representative strains of other coronavirus is presented in Fig.
3 and Table
2.
All four bat coronaviruses had classic coronavirus genome organization in which the replicase gene and structural protein genes are arranged in the order 5′-ORF1a and ORF1b, S, E, M, and N (Fig.
3). The genome size of these bat coronaviruses varied: the longest was 30.3 kb, for BtCoV/133/05, and the shortest was 28.2 kb, for BtCoV/512/05.
Putative ORFs coding for nonstructural proteins or accessory proteins were deduced and analyzed if transcription-regulating sequences (TRSs) were present close to, and upstream of, potential initiating methionine residues. The ORFs of nonstructural proteins vary significantly among different bat coronaviruses. The genome organization of BtCoV/273/04 and that of BtCoV/279/04 were essentially the same and were similar to that of SARS-CoV. The genome organization of BtCoV/512/05 is most similar to that of porcine epidemic diarrhea virus (PEDV), while the genome of BtCoV/133/05 is unlike that of all known coronaviruses (Fig.
3).
In the genome of all coronaviruses, approximately the first two-thirds of the genome is composed of the two large replicase ORFs ORF1a and ORF1b, which encode virus replicase polyproteins pp1a and pp1ab (
14). Proteolytic processing end products and putative functional domains of the replicase polyproteins were identified. The nonstructural proteins nsp1 and nsp2 were the most variable among these bat coronaviruses, while papain-like protease (PL), 3C-like protease (3CL), RdRp, metal binding (MB), and HEL functional domains were conserved in all genomes, except that of BtCoV/133/05 (Fig.
3). Coronaviruses generally employ two papain-like proteases, PL1 and PL2, to process the N-proximal regions of the replicative polyproteins. PL1 and PL2 were identified in BtCoV/512/05; however, only one PL domain was identified in BtCoV/273/04 and BtCoV/279/04. It is noteworthy that in BtCoV/133/05 both nsp1 and nsp2 were highly divergent from other coronaviruses and that the PL domain could not be identified in any of the nonstructural proteins (Fig.
3; Table
2).
ORFs located between the S and E genes and between the M and N genes were predicted and are numbered according to their order in the genome (Fig.
3; Table
2). In viruses BtCoV/273/04, BtCoV/279/04, and BtCoV/512/05, there is a single ORF between the S and E genes (ORF3). In BtCoV/273/04 and BtCoV/279/04 ORF3 is predicted to encode a similar protein of 274 amino acids (aa) with two predicted TM helices in the N-terminal sequence. BLAST and Pfam searches failed to identify any sequences similar to this protein. In BtCoV/512/05 ORF3 encodes a predicted 224-aa protein also with two predicted TM domains in the N-terminal sequence.
The region between the S and E genes in BtCoV/133/05 is the longest among all known coronaviruses, at 2,013 bp (Fig.
3). Furthermore, in BtCoV/133/05 this region contains three predicted ORFs (ORF3a, ORF3b, and ORF3c), with predicted proteins of 91, 285, and 227 aa, respectively. Each of these ORFs has a conserved TRS upstream of the ORFs: UUAACGAACUU (9 nucleotides) AUG for OFR3a and UUAACGAACUU AUG for ORF3b and ORF3c. The ORF3c-encoded protein contains three TM domains, but no matching proteins could be identified.
In BtCoV/273/04 and BtCoV/279/04 the region between the M and N genes is a 1,085- and a 1,095-bp sequence, respectively, that contains three ORFs (ORF6, ORF7, and ORF8) of 63, 122, and 122 aa, respectively (Fig.
3). ORF7 is predicted to have two TM domains, in both the N- and C-terminal sequences, while for ORF8 one TM helix is predicted. BLAST and Pfam searches failed to identify sequences similar to any of the three predicted proteins. This region between the M and N genes is absent in BtCoV/133/05 and BtCoV/512/05 (Fig.
3). The sequence region between the M and N genes of BtCoV/273/04 and BtCoV/279/04 and other SARS-like CoVs showed a gene organization similar to that of IBV (
22,
46). Analysis of this region in a representative IBV (NC_001451) revealed a much shorter region (692 bp) also with two ORFs (ORF6 and ORF7) predicted to encode proteins of 65 and 82 aa, respectively. However, unlike BtCoV/273/04 and BtCoV/279/04, in IBV no conserved TRSs were identified upstream of the three ORFs.
Downstream of the N gene in BtCoV/512/05, there is a 387-bp sequence (ORF10) that is predicted to encode a 129-aa protein with a putative signal peptide at the N-terminal region and three TM domains. This sequence region is absent in all known coronaviruses including BtCoV/133/05, BtCoV/273/04, and BtCoV/279/04 (Fig.
3). No matching protein was identified in GenBank or Pfam.
The hemagglutinin esterase protein, which is present in group 2 coronaviruses (
6) and presumably obtained by horizontal gene transfer from influenza C virus (
48), was not present in any of the bat coronaviruses analyzed in this study. In the 3′ untranslated region a stem-loop II-like (s2m) motif (
15) was recognized in BtCoV/273/04 and BtCoV/279/04 but not in BtCoV/133/05 and BtCoV/512/05 (Fig.
3). This motif is also present in group 3 coronaviruses and SARS-CoV but not in other coronaviruses (
34,
37).
Phylogenetic analysis.
To further define the evolutionary pathway of those novel BtCoVs, each of the major genes was phylogenetically analyzed. In all genes analyzed, represented by the HEL and S gene trees, the bat coronaviruses did not form a single group (Fig.
5). As in the preliminary analysis, five groups, all with 100% bootstrap support, were apparent (Fig.
2 and
5). The same relationships were apparent in all genes analyzed, with the exception of group 1 bat CoVs (BtCoV/512/05, BtCoV/515/05, and BtCoV/527/05) and putative group 5 viruses (represented by BtCoV/133/05).
In the HEL, N, and E gene phylogenies, putative group 5 viruses fall as the sister group to the SARS and SARS-like CoV group (putative group 4), which also contains two bat coronaviruses from this study (BtCoV/273/04 and BtCoV/279/04). However, in the S, M, and RdRp gene analyses, group 5 viruses are most closely related to group 2 coronaviruses.
In all genes analyzed, except the S gene, group 1 bat coronaviruses are most closely related to PEDV (bootstrap support, 99%), and these viruses cluster with HCoV-NL63 and HCoV-229E (Fig.
5A). In the S gene tree, while group 1 bat coronaviruses still clustered together with PEDV, they were now most closely related to those coronaviruses from domestic animals (Fig.
5B). The relationship of group 1 bat coronaviruses to PEDV, transmissible gastroenteritis virus, and feline coronaviruses demonstrates that virus transmission may occur between bats, livestock, and companion animals, presenting a possible pathway for human infection.
None of the viruses sequenced in this study was the direct progenitor of SARS. It is noteworthy that within putative group 4 the SARS-like viruses from bats clustered together, away from SARS viruses from other mammalian hosts (Fig.
5), suggesting that other intermediate hosts or viruses were involved in the emergence of SARS.
Taken together, the above phylogenetic findings demonstrated that bats had a relatively high diversity of coronaviruses and harbor a distinct lineage (putative group 5) that may represent a novel coronavirus group. These relationships are in consensus with the results of the genomic and sequence similarity analyses.