Introduction
The majority of emerging infectious diseases (EIDs) of humans are zoonoses, and the majority of these originate in wildlife (
1–3). These diseases are largely viral (e.g., severe acute respiratory syndrome [SARS] and Nipah virus) and represent a significant global health threat. Analyses of trends in EIDs suggest that the rate of infectious disease emergence is increasing (
3) and that the emergence of new viruses is not yet constrained by the richness (number of viruses) or diversity (genetic variability) of unknown viruses in wildlife, which is thought to be high. Systematically measuring viral richness, abundance, and diversity (here termed “virodiversity”) in wildlife is hindered by the large number of host species (e.g., around 5,500 mammals), their global distribution and often remote habitats (
4), and the expense of collection, sampling, and viral identification or discovery (
5), and it has not yet been achieved for even a single host species. In this study, we repeatedly sampled a mammalian host known to harbor emerging zoonotic pathogens (the Indian Flying Fox,
Pteropus giganteus) and used PCR with degenerate primers targeting nine viral families to discover a large number and diversity of viruses. We then adapted the techniques normally used to estimate biodiversity in vertebrates and plants to estimate the total viral richness within these nine families in
P. giganteus. Our analyses demonstrate proof-of-concept and provide the first statistically supported estimates of the unknown viral richness of a mammalian host and the sampling effort required to achieve it.
DISCUSSION
In this study, we combined virological and ecological techniques to describe the virodiversity of a known zoonotic reservoir, P. giganteus, including the first-ever estimate of viral richness and the sampling effort required to detect any proportion of it. The article also includes a phylogenetic description of 55 viruses identified in this species and an ecological description of the positive and negative occurrences between them. This unprecedented description of the potential zoonotic pool not only establishes a framework for comparison of virodiversity among host populations in different geographic regions and ecological settings but also provides important a priori knowledge should a novel pathogen emerge or have emerged. For example, sequence data from this study are already being used to develop serological assays to test people with a range of symptoms and known high risk of exposure to bats. This will allow us to identify cases of bat-to-human viral spillover and, potentially, their health consequences.
Estimate of total viral diversity (richness) in P. giganteus in Bangladesh.
Previous attempts have been made to predict the total number of unknown viruses in humans by analyzing temporal trends in viral discovery (
13,
14). However, these assessments principally consider the emergence of disease-causing agents rather than an ecological assessment of virodiversity. Here, we measured viral richness using nonparametric species discovery curves, which are commonly used in biodiversity studies (
10,
13,
15–17) and rely on the frequency of rarely occurring species to measure the completeness of discovery (
15) and make statistical estimations of the undiscovered fraction (
10). Our estimate of the viral carrying capacity of
P. giganteus as 58 viruses (for the nine viral families assayed) was robustly supported, with the cumulative number of new viruses slowing toward an asymptotic trajectory and with statistical estimators showing reliably asymptotic behavior. Given that our total discovery effort revealed 55 of the estimated 58 viruses, we suggest that most have now been identified and that the remaining viruses in
P. giganteus in Bangladesh are extremely rare. However, we make the qualification that this estimation of richness is likely a minimum and that additional viruses will almost certainly be found through the expansion of viral family testing and the use of high-throughput sequencing.
While we cannot infer a great deal about the biology or taxonomic relatedness of the undiscovered diversity, our estimate of viral richness does allow useful considerations of the efficacy of viral discovery efforts in this species. In general, surveys for undiscovered diversity present diminishing returns, since the commonly occurring species are quickly identified, while the rare species require an increasingly large sampling effort. Here, the Chao 2 estimator predicted that 85% of the total richness could be achieved if another ~500 samples were screened but that a further ~5,500 would then be required to find the remaining 15%. Considering that only very rare viruses would exist within this final fraction and assuming that this rarity reduces both exposure and the chance of transmission, the public health advantage gained by knowledge of their biological properties or taxonomy may not be sufficiently high to justify the cost of their discovery. Similar estimations were made for each of the viral families individually. For example, 13 HVs were discovered in a total of 1,741 samples, with an estimated 8,503 additional samples needed to identify just 2 more predicted viruses (100% of the estimated). Equally, 11 PMVs were identified in 1,108 samples, with one additional virus projected from a further 773 samples. In both cases, an extremely limited return is expected from a costly discovery effort.
Estimates of unknown viral diversity (richness) in all mammals and the cost of discovery.
Mammals are the reservoir hosts of the majority of emerging zoonoses (
2,
3,
18). If we assume that all 5,486 described mammalian species (
19) harbor an average of 58 viruses in the nine families of interest (as estimated here in
P. giganteus) and that these viruses exhibit 100% host specificity, the total richness of mammalian viruses awaiting discovery exceeds ~320,000. We used the data on expenditures for surveillance and pathogen discovery in this study to calculate the direct cost of discovering all 58 viruses in
P. giganteus (see the supplemental material for details of this cost analysis). We estimate this cost to be $1.2 million, including collection and laboratory testing of 7,079 samples. Assuming expenditures to be equal for all host species, the cost of sampling and viral discovery for all mammalian viruses would be approximately $6.3 billion. Accounting for diminishing returns means that discovering 85% of the estimated diversity would be disproportionately cheaper at approximately $1.4 billion. Our estimates of virodiversity and cost of discovery are preliminary; however, we include them to demonstrate (i) how a systematic estimation of total viral diversity could be used to inform better surveillance through strategic resource allocation and (ii) that, given our cost estimates, the discovery of the majority of potential zoonotic viruses is not an unattainable goal over the next few decades. The generation of sequence data will not, of course, in itself prevent pandemics. However, it does provide data that refine our knowledge of the functional relationship between host and viral diversity, including traits associated with increased risk of spillover and subsequent emergence (e.g., viruses closely related to and sharing receptor binding domains with known lethal agents [
20]) and, also, facilitates the development of rapid diagnostic tests for intervention and control.
Several important limitations must be considered in our extrapolations, including (i) the assumption that a mean of 58 viruses per species is a reasonable estimate and that host populations are panmictic with respect to viral transmission (such that expanded geographic sampling would not influence viral detections), (ii) the assumption that viruses are not shared by more than one host species, (iii) that only those viruses within the nine families are considered in this estimation, (iv) that the results are limited by the sensitivity and specificity of our tests, and (v) that a similar mean cost of sample collection is incurred across all species. Clearly, many of these limitations and assumptions require additional exploration. For example, while including more viral families in our survey would increase the viral richness estimate, accounting for species turnover (viral sharing between species) would reduce it. Also, while the cost of sample collection in Bangladesh is relatively low because of logistical simplicities, in some regions (e.g., tropical montane forests of Africa and Southeast Asia), the cost of transportation is much higher. Better estimates of the total number of viruses in mammals (and the cost of their discovery) will be achieved iteratively as other hosts are more extensively sampled and tested, additional viral families are included, and the limits of viral detection increase.
Novel viruses.
The current study significantly enhances our knowledge of the viruses harbored by
P. giganteus, for which only two viruses had been previously described, Nipah virus and a GB virus-like flavivirus (
21,
22). A total of 50/55 of the viruses discovered in this study were considered novel, while 5/55 have been reported previously (PgBoV-1 and -2, PgCoV-3 and -4, and PgPMV-11). Additional discussion of the 50 novel viruses is provided in the supplemental material. Here, we discuss a number of important limitations that must be considered in the interpretation of these results. First, the use of consensus PCR limits surveillance and discovery to viruses related to those targeted in these assays. Second, variations in virus concentration can also influence the probability of detection. Third, we evaluated the diversity of viruses in a limited set of compartments and tissues, and unbiased sequencing was not used as a secondary method of capturing diversity. The classification of viruses is also significant, as redefining the genetic limits between one virus and another would change the total number and prevalence of viruses discovered and would impact our estimations of viral richness. We have tried to address this by using monophyletic clades as a taxonomic surrogate, which obviates the variable and polythetic criteria set by the ICTV for species demarcations.
Coinfection.
The identification of coexisting microbes is important to a description of virodiversity because of the positive and negative associations that can occur between them (
23–28). Here, we report a large number of intra- and interfamilial cooccurrences in
P. giganteus and show that as many as five different viruses can exist in a single sample. Not only does this reveal information about the carrying capacity and composition of discrete viral niches within an individual bat, it also demonstrates the number of different viruses that could potentially spill over to a new host from a single exposure event.
The most common intrafamilial codetections were observed within the subfamily
Herpesviridae, supporting previous studies demonstrating coinfection of HVs in bats (
29). Statistically supported associations were observed between PgHV-10, -11, and -13, which phylogenetically cluster within a presumptive new genus of the betaherpesvirus subfamily. It is not known whether these detections represent coinfection of the same cell or a group of viral variants with segregated cell tropism. It is also unknown why these viruses should so readily coexist, though ecological mechanisms such as simultaneous transmission (codispersal), the availability of requisite resources, and/or shared benefits associated with host immunomodulation by one or more of these viruses may explain the observed cooccurrence. Recombination is also a possible consequence of coinfection and is a common feature in the ecology and evolution of herpesviruses (
30–36). PgHV-11 was identified as a recombinant lineage derived from the strongly associated PgHV-10 and PgHV-13, and all three viruses were detected in the same sample or compartment (throat) multiple times, suggesting that true coinfection does occur, albeit with unknown frequency.
A negative association was also observed between PgHV-12 and -13, suggesting that mechanisms might also exist to reduce cooccurrence. These two viruses are very closely related, and we speculate that cooccurrence may offer little benefit to the viral population because of increased competition for resources coupled with minimal potential for fitness gains via recombination. Even though previous studies showed a lack of immune recognition in betaherpesviruses (
37), we suggest this might act as an effective mechanism for reducing the coexistence of closely related viruses by preventing sequential infections. Such a mechanism would not completely preclude cooccurrence due to codispersal and would therefore serve to explain why some codetections were still observed between these two viruses.
Viral spillover.
Our discovery efforts revealed five viruses that appear to represent spillover events. These included two human bocaviruses (PgBoV-1 and -2), an avian adenovirus (PgAdV-2), a human/bovine betacoronavirus (PgCoV-3), and an avian gammacoronavirus (PgCoV-4). In each case, these viruses were only observed once and showed strong phylogenetic association to viruses found in humans, birds, or ruminants. The interface by which these viruses were able to move from these disparate hosts into bats is unclear. However, on several occasions, we have observed
P. giganteus in Bangladesh drinking from bodies of water (rivers and ponds) that are used by people, livestock, domestic animals, and wildlife for drinking, bathing, and in some cases, sewage, and we hypothesize that shared water sources may be a source of exposure. Viral spillover (and/or host switching) is an example for which the concept of virodiversity in defined animal host populations might be particularly important. Such processes precede many emergence events (
3,
38); however, there is almost certainly additional asymptomatic movement of viruses between hosts, the frequency and impact of which remain poorly understood.
An additional consideration is that any of the 55 viruses found in
P. giganteus may have already spilled over into the human population. Annual outbreaks of Nipah virus in Bangladesh demonstrate that human exposure to viruses from these bats persists (
39–44), and there are a significant number of undiagnosed morbidities and mortalities in this region that may well have resulted from the spillover of one of these other viruses. Subclinical movement is equally possible, as demonstrated with Tioman virus in Malaysia (
45,
46), and investigating these spillover events may help to refine our understanding of disease emergence in novel hosts.
Conclusions.
Our work illustrates the power of using ecological approaches to characterize virodiversity and estimate viral richness and can be considered part of a strategy to better target surveillance to identify agents that pose zoonotic risks before they emerge in people (
3). The projected $1.4 billion cost of discovering 85% of the estimated diversity is far less than the economic impact of even a single pandemic like SARS, which has been estimated at $16 Billion (
47). If annualized over a 10-year period, the discovery of 85% of mammalian viral diversity would be just $140 million/year, which is both a one-off cost and a fraction of the cost of globally coordinated pandemic control programs such as the “One World, One Health” program, estimated at $1.9 to 3.4 billion per year, recurring (
64). While these programs will not themselves prevent the emergence of new zoonotic viruses, they will further contribute to pandemic preparedness by enhancing our understanding of viral ecology and the mechanisms of disease emergence and by providing sequences and other insights that reduce the morbidity, mortality, and economic impact of emerging infectious diseases by expediting recognition and intervention.
ACKNOWLEDGMENTS
We acknowledge funding from the United States Agency for International Development (USAID) Emerging Pandemic Threats PREDICT project, cooperative agreement number GHN-A-OO-09-00010-00, an NIAID nonbiodefense Emerging Infectious Disease Research Opportunities award (R01-AI079231 [P.D.]), an NIH/NSF Ecology and Evolution of Infectious Diseases award from the Fogarty International Center (R01-TW005869 [P.D.]), K08AI067549 (J.H.E.), and NIH-AI57158 (W.I.L.).
We thank the Bangladesh Forest Department and Ministry of Environment and Forest for permission to catch bats and conduct this study. We thank Pitu Biswas, Gofur Sheikh, and Jim Desmond (EcoHealth Alliance), Jahangir Hossain, Emily Gurley, Salah Uddin Khan, Ausraful Islam, and Najmul Haider (ICDDR,B), as well as Kawthar Muhammad (CII), for their help with sample collection and project management.
The contents of this paper are the responsibility of the authors and do not necessarily reflect the views of USAID or the United States Government.
Author contributions were as follows. Study design was by S.J.A., K.A.M., I.N.-M., N.C.A., T.L.B., W.B.K., T.G., S.P.L., S.S.M., J.A.K.M., P.D., and W.I.L. Laboratory experiments were performed by S.J.A., I.N.-M, M.D.S.-L., and R.O.F. Phylogenetic analyses were performed by S.J.A., T.G., and W.I.L. Ecological analyses were performed by S.J.A., K.A.M., C.M.Z.-T., A.S., P.H., T.L.B., P.D., and W.I.L. Samples were collected by J.H.E., A.I., S.A.K., K.J.O., and P.D. The paper was written by S.J.A., K.A.M., I.N.-M., R.O.-F., N.C.A., A.S., T.L.B., J.H.E., K.J.O., T.G., S.P.L., S.S.M., J.A.K.M., P.D., and W.I.L.