The Siberian “ice viruses” are contaminants.
Before turning to the phylogenetic evidence presented in Fig.
1, some aspects of the results presented by Zhang et al. (
21) merit clarification. In their original paper (
20), the authors misidentified the two published sequences that were most closely related to the sequences supposedly recovered from Siberian lake ice and water. Specifically, these two strains, labeled in Fig. 3 of Zhang et al. (
20) as Av U38242 (Tokyo/3/67) and Av U08904 (A/WS/33), were described by the authors as avian in origin, when in fact they represent human influenza A virus strains. This was corrected in an erratum (
21), which made clear that both these strains are human in origin. Incidentally, the WS/33 strain was derived from a strain of the first human influenza virus ever isolated, by (and from) Wilson Smith, in the United Kingdom, in 1933 (
The erratum, however, perpetuated an error in the GenBank entry of the sequence labeled by Zhang et al. (
21) as Tokyo/3/67 (accession number U38242). Sequence U38242 is not from Tokyo or from 1967 (Fig.
1). Indeed, the H1N1 subtype did not circulate in humans between 1957 and 1977; the actual Tokyo/3/67 is a subtype H2N2 isolate. The sequence published under the accession number U38242 is an H1, not H2, sequence and, as described in the Los Alamos National Laboratory influenza virus database ( ), it was likely derived from a laboratory H1 × N2 reassortant strain whose HA gene came from a WS/33-derived source (Fig.
Hence, the claim (
21) that the “ice viruses” are related to Asian strains that circulated in birds or humans in the 1960s is incorrect. The two closest relatives of the “ice viruses” were not only human viruses but were, specifically, WS/33-derived strains. The erratum did not address how an apparently human virus from the 1930s came to be in two Siberian lakes, an observation that undermines the notion of migratory birds depositing influenza viruses into the lakes.
A rigorous phylogenetic reanalysis of the “ice viruses” provides an explanation (Fig.
1). First, the sequences described by Zhang et al. as Lake Park “ice viruses” and the one supposedly originating from water from Lake Edoma are positioned on the human H1N1 lineage. It is worth noting that distinct and strongly supported human, swine, and avian H1N1 lineages emerged when BMCMC or ML methods, and relatively realistic substitution models, were used (Fig.
1). In contrast, the neighbor-joining and maximum parsimony trees inferred by Zhang et al. (
20) obscured the existence of these distinct lineages, making it more difficult to detect the problematic placement of putatively avian influenza virus strains on the human H1N1 lineage. For example, the Brevig-Mission/1/18 “Spanish flu” sequence was placed on the swine lineage in their tree, presumably due to the lack of an explicit, realistic model of nucleotide substitution.
Furthermore, the “ice viruses” are not merely on the human lineage; they form a monophyletic clade with published sequences from laboratory strains derived from the original Wilson Smith 1933 isolate. The unequivocal support for this clade (posterior probability, 1.0; bootstrap support, 100%) and the intermingling of “ice virus” and WS/33-derived sequences (Fig.
1) indicates that the “ice virus” (and the Lake Edoma virus) sequences are ultimately derived from the ancestor of the WS/33 sequences—the original Wilson Smith isolate itself. No other interpretation for the origin of these sequences is supported by these phylogenetic results.
Although the “ice virus” sequences are characterized by fairly long terminal branches, indicating considerable evolutionary change from the WS/33 ancestor, other WS/33-derived sequences exhibit equally long branches. For example, WSN/1933 (CY010788) and Wilson-Smith/1933 (DQ508905), as shown in Fig.
1, fall among Zhang et al.'s sequences and have also accumulated a considerable genetic distance from the WS/33 ancestor. Such long branches are expected for laboratory-adapted viruses that have experienced many rounds of replication during growth in cell culture or chicken eggs; the topological pattern, nevertheless, clearly indicates that they diverged from the WS/33 ancestor.
Finally, perhaps the clearest indication of the source of the “ice viruses” is the single clone of the positive control reference strain used by Zhang et al. (
20): WS/33, clone p1.9 (Fig.
1). It is phylogenetically indistinguishable from the “ice viruses” in that it too falls in the WS/33 clade but also has a long terminal branch (Fig.
In their Materials and Methods section, Zhang et al. (
20) stated that they employed a nested PCR approach with multiple positive controls included for every set of RT-PCR experiments. Hence, every supposedly positive result from a Siberian ice or lake water sample came from a tube that had been manipulated in the presence of a concentrated WS/33 positive control and potentially vast amounts of first-round PCR product from the control. Evidently, approximately 4% of the time this led to contamination of test samples by the positive control. The alternative explanation—that two separate Siberian lakes actually contained laboratory-adapted, human H1N1 influenza A viruses derived from the same source as Zhang et al.'s positive control and that they were deposited there by migratory birds—is unworthy of serious consideration.
This instance of accidental contamination of ice samples with positive control sequences calls into question other results by the same group in which they purportedly identified ancient viruses preserved in ice (
3). For example, Castello et al. (
3) reported tobacco mosaic virus (TMV) amplification from ice cores up to 140,000 years old and many times that old according to unpublished results (
20). These “ice viruses” were virtually identical to modern TMV sequences, a finding that is at odds with the considerable accumulation of nucleotide substitutions expected after 140,000 years of evolution in such a rapidly evolving virus. Moreover, the experimental design involved purified TMV positive controls and a nested PCR approach. Each “ancient” sequence was thus generated in the presence of an obvious source of PCR contamination (in addition to residual TMV RNA, which can potentially be present on the hands of anyone who has recently held a cigarette). The authors explained the presence of modern-looking RNA virus sequences in 140,000-year-old ice cores as being “due to ancient forms continually returning to the atmosphere and hydrosphere from glacial meltwater or from ablated glacial surfaces.” Contamination is a more parsimonious explanation.
These “ice virus” examples stand in contrast to the phylogenetic patterns observed with authentic “fossil” sequences, such as the 1918 Spanish flu strains (Fig.
1) and the human immunodeficiency virus type 1 sequence recovered from a frozen blood sample from 1959 (
23). In these cases, the authenticity of the putatively ancient viral sequences is strongly supported by the fact that they had accumulated substantially fewer nucleotide substitutions than their modern relatives, as illustrated by their short root-to-tip branch lengths. This is exactly what would be expected if they were sampled many generations earlier than modern strains, whose subsequent evolutionary change is recorded in their longer branches.
Other anomalies.
Similar laboratory contamination artifacts may explain other cases of influenza viruses that appear perplexingly out of place on phylogenetic trees. For example, Bikour et al. (
2) reported a 1930-like swine influenza A (H1N1) virus supposedly present during an outbreak of respiratory disease in swine in Quebec in 1990-1991. The HA sequence from that isolate, swine/St-Hyacinthe/148/90, was indeed more like a 1930 swine virus than what one might have expected in the 1990s (Fig.
1). In fact, it is virtually identical to swine/Iowa/15/30 and is nested among published sequences from this 1930 isolate. Like WS/33, swine/Iowa/15/30 is a common laboratory reference strain, and it was used as a positive control by Bikour et al. (
2). In other words, this appears to be another case of laboratory contamination with a reference strain present in the same laboratory reporting the extraordinary result. The authors suggested that influenza viruses “can be maintained for long periods in swine, perhaps in geographically isolated pockets.” That idea fails to explain, mechanistically, how any strain of influenza virus replicating in swine could exhibit complete evolutionary stasis over 60 years. Accidental contamination of one culture with a reference strain, or simple PCR contamination, on the other hand, easily explains the pattern. As with the more recent “Korean pig flu” scare (
8), this seemingly exceptional result appears to reflect carelessness with a positive control sample.
Similarly, the human-swine H1N1 hybrid virus reportedly circulating among humans and animals in Alma Ata in the 1980s (
7) had an HA gene sequence closely related to a published swine/Iowa/15/30 sequence (Fig.
1). This suggests a laboratory error of one kind or another: perhaps this was another case of simple contamination within the lab reporting the sequence, and no such virus actually circulated. Alternatively, this might represent a genuine escape of an experimental swine/Iowa/15/30 HA-containing virus which then temporarily circulated in humans. (A similar unintentional release of an archival influenza virus isolate may have led to the reemergence of the H1N1 subtype in humans in the 1970s [
15].) Either way, the presence of a 1930-like virus at a much later period bears the unmistakable stamp of human-influenced, not natural, processes.
Anchlan et al. (
1) reported influenza virus isolates from humans in Mongolia, in 1988 and 1991, with close similarities to another common lab strain, PR/8/34, derived from the original Puerto Rico isolate from 1934. Again, the phylogenetic analysis indicates that the Mongolian isolates from the 1980s and 1990s are nested within the clade of sequences derived from the original PR/8/34 isolate (Fig.
1). In this case, there is some evidence in favor of PR/8/34-related viruses actually circulating in Mongolia in the 1980s and 1990s. First, a (perhaps not completely) UV light-inactivated reassortant vaccine (PR/8/34 × USSR/77), prepared in Leningrad in 1978, was apparently used in the Mongolian population around 1978 (
19). The Mongolian isolate from 1988 was found to be a reassortant between PR/8/34 and USSR/77, while the one from 1991 was PR/8/34-like in all genes (
1). Moreover, Anchlan et al. found that 12% of sera from various parts of Mongolia apparently contained antibodies against PR/8/34 (
1). However, the fact that the sequences recovered in 1988 and 1991 were virtually identical to published PR/8/34 sequences and had apparently not accumulated the approximately 10 years' worth of substitutions expected in the decade or so since the experimental vaccine had been administered suggests that laboratory contamination rather than an escaped vaccine strain is the more likely explanation for these results.
Anchlan et al. (
1) stated that, “mutational and evolutionary rates of the Mongolian strains seem to be significantly lower when compared to the rates of human influenza A strains isolated in other parts of the world… . Thus, viruses from remote areas might keep the potential to reappear in the human population after several years to cause a pandemic.” Assuming that these isolates were derivatives of an incompletely inactivated vaccine (rather than simple contamination), their apparently low evolutionary rate (their similarity to 1930s era strains) would instead be a straightforward result of recent human exposure to a laboratory strain isolated in 1934. As with each of the above cases, there is no need to invoke evolutionary stasis or natural abiotic reservoirs. Rather, this case and the others involve laboratory contamination or escape of viruses that had been in cold storage in freezers for several decades and were thus out of phase with viral lineages that had accumulated changes without interruption over those decades.
To our knowledge—aside from the “Korean pig flu” case, which seems to represent another example of laboratory contamination by a WS/33-derived reference strain (
8)—there are no further examples of human or swine H1N1 that appear to involve primitive viruses circulating at the “wrong” time. Hence, every anomaly in the human and swine H1N1 lineages is apparently explained by human error of one kind or another, whether it be a labeling error, contamination of cell culture or RT-PCR, or unintentional escape of a laboratory strain from an earlier era.
Avian influenza virus from 1917?
Regarding the modern-looking avian virus sequence from 1917 (
10), despite the unquestionable success those authors have had recovering ancient human influenza viruses, it is practically impossible to avoid the conclusion that this sequence represents an artifact of contamination by an avian influenza virus isolate from the late 1990s. In this case, instead of an unexpectedly short branch suggestive of a primitive virus circulating in the present, the virus ostensibly representative of 1917 exhibited an unexpectedly long branch length, suggestive of a genetically modern virus circulating in the past. Fanning et al. (
10) concluded that there has been little or no evolutionary change in avian influenza virus genes over nearly a century. This idea is more radical than is sometimes appreciated, because even if there has been extremely strong negative selection on avian influenza virus proteins, an RNA virus with such a high mutation rate and replication rate would still be expected to accumulate many synonymous substitutions over such a long interval. This is not merely a theoretical assertion; it is now clear that avian influenza virus, like the human and swine varieties, evolves to a molecular clock (
Crucially, the “1917” strain is identical in the available HA1 fragment to four strains isolated from wild birds in Ohio in 1999 (Fig.
1), which were not available for analysis when the “1917” strain was published. Moreover, its partial HA2 sequence is 99.7% identical to the Ohio 1999 strains, and its partial NP gene is 100% identical (data not shown). Inspection of the branch lengths of the avian H1N1 lineage in Fig.
1 reveals the unmistakable pattern of a molecular clock, with increasing genetic distance with year of sampling. This means that the long and modern-looking branch leading to the supposedly 1917 virus cannot be explained, as Fanning et al. (
10) propose, by evolutionary stasis in avian flu virus from 1917 until the present. The Ohio 1999 strains are clearly descendants of an ancestor from the mid-1990s, and the whole avian lineage appears to emanate from an ancestor that existed around the 1950s. If the “1917” virus really existed, then the avian H1N1 lineage must have reevolved the 1917-like form over the last half-century or so.
Extensive convergent evolution is highly improbable. Given the strong purifying selection in this lineage, it is virtually impossible: almost all the nucleotide substitutions that have occurred in the avian clade depicted in Fig.
1 are silent at the amino acid level and could have had little or no exposure to the forces of natural selection. The idea that the strains that circulated among birds in Ohio in 1999 had retraced the evolutionary pathway to match the “1917” strain across a multitude of silent substitutions is untenable. Rather, the RT-PCR analysis of the 1917 strain was evidently compromised by an isolate closely related to the Ohio 1999 ones. It follows that the “1917” sequence tells us nothing about the origins or emergence of the 1918 Spanish flu pandemic or the potential for future such pandemics.
Cooper and Poinar (
5) proposed nine criteria for authenticating putative ancient DNA/RNA findings (extensive use of negative controls, cloning, independent replication, etc.). More recently, Gilbert et al. (
11) argued in favor of a “cognitive” approach, suggesting that researchers validate the authenticity of their results on a case-by-case basis rather than using a checklist. With regard to viral work in general and in light of the above cases specifically, researchers wishing to provide convincing authentication might want to give special consideration to the following recommendations. (i) Researchers should physically separate pre- and post-PCR activities (
11); specifically, avoid nested PCR when attempting to authenticate the existence of viral nucleic acids from challenging sources. (ii) Researchers should ensure that putative sequences from ancient/degraded sources could not have arisen from virus cultures becoming contaminated by another strain from the same laboratory (
17). (iii) Researchers should assess the quality and reliability of viral nucleic acids directly and by assaying host DNA/RNA (
12). (iv) Researchers should use positive controls judiciously or not at all. If used, they should be genetically distinguishable from putative ancient/degraded sequences with 100% reliability. (v) Researchers should ensure that exceptional results are reproducible. (vi) Researchers should perform a thorough phylogenetic analysis, making use of state-of-the-art methods that employ explicit, data-supported nucleotide substitution models and a rigorous statistical framework. Report estimates of confidence (e.g., bootstrap support for ML, posterior probabilities for BMCMC) for key nodes on the tree(s) inferred. It is perhaps worth emphasizing that viral data sets sampled over a wide time interval can often provide a uniquely powerful test of authenticity; branches that are shorter or longer than expected for a sequence from a particular date can reveal possible errors (
The sort of laboratory artifacts detected here have the potential to divert attention away from actual influenza virus reservoirs and the biological processes governing the emergence and spread of this pathogen. It is therefore important that every effort be made to avoid them in the future.