Whole-genome sequencing (WGS) of microbial pathogens from clinical samples is a highly sensitive tool used to gain a deeper understanding of the biology, epidemiology, and drug resistance mechanisms of many infections. However, WGS of organisms which exhibit low densities in their hosts is challenging due to high levels of host genomic DNA (gDNA), which leads to very low coverage of the microbial genome. WGS of Plasmodium vivax, the most widely distributed form of malaria, is especially difficult because of low parasite densities and the lack of an ex vivo culture system. Current techniques used to enrich P. vivax DNA from clinical samples require significant resources or are not consistently effective. Here, we demonstrate that selective whole-genome amplification (SWGA) can enrich P. vivax gDNA from unprocessed human blood samples and dried blood spots for high-quality WGS, allowing genetic characterization of isolates that would otherwise have been prohibitively expensive or impossible to sequence. We achieved an average genome coverage of 24×, with up to 95% of the P. vivax core genome covered by ≥5 reads. The single-nucleotide polymorphism (SNP) characteristics and drug resistance mutations seen were consistent with those of other P. vivax sequences from a similar region in Peru, demonstrating that SWGA produces high-quality sequences for downstream analysis. SWGA is a robust tool that will enable efficient, cost-effective WGS of P. vivax isolates from clinical samples that can be applied to other neglected microbial pathogens.
IMPORTANCE Malaria is a disease caused by Plasmodium parasites that caused 214 million symptomatic cases and 438,000 deaths in 2015. Plasmodium vivax is the most widely distributed species, causing the majority of malaria infections outside sub-Saharan Africa. Whole-genome sequencing (WGS) of Plasmodium parasites from clinical samples has revealed important insights into the epidemiology and mechanisms of drug resistance of malaria. However, WGS of P. vivax is challenging due to low parasite levels in humans and the lack of a routine system to culture the parasites. Selective whole-genome amplification (SWGA) preferentially amplifies the genomes of pathogens from mixtures of target and host gDNA. Here, we demonstrate that SWGA is a simple, robust method that can be used to enrich P. vivax genomic DNA (gDNA) from unprocessed human blood samples and dried blood spots for cost-effective, high-quality WGS.
Malaria is a mosquito-borne infection caused by protozoan parasites of the Plasmodium genus. Of the six Plasmodium species known to infect humans (1–3), P. vivax is the most widely distributed, causing approximately half of all clinical cases of malaria outside Africa (4). Whole-genome sequencing (WGS) of Plasmodium parasites from clinical samples has revealed important insights into the biology, epidemiology, and mechanisms of drug resistance of malaria (5–12). For P. vivax, WGS of clinical isolates has the potential to uncover mechanisms underlying some of the unique aspects of this parasite’s biology, such as distinguishing between reinfection and relapse due to activation of dormant liver parasites. However, WGS of P. vivax from clinical samples is challenging, mainly due to low parasite densities in clinical samples compared to those of Plasmodium falciparum and the lack of a robust ex vivo culture system.
Multiple techniques have been developed to enrich Plasmodium genomic DNA (gDNA) from clinical samples, including leukocyte depletion (13–15), hybrid selection with RNA baits (16, 17), short-term ex vivo culture (18), adaptation to growth in splenectomized monkeys (9, 19), and single-cell sequencing (20). The majority of these techniques require significant labor and resources. While leukocyte depletion is the most cost-effective, it requires sample processing within 6 h of sample collection, which is not feasible at many field sites and is not always effective. Two recent studies that performed WGS on over 400 clinical isolates of P. vivax (15, 17) employed hybrid selection and leukocyte depletion to enrich P. vivax gDNA from clinical samples. Pearson et al. used leukocyte depletion on 292 clinical samples and had to eliminate 144 (49%) of their samples from further population genetics analysis due to low quality, which often occurs due to contaminating human DNA (15). Hupalo et al. used hybrid selection to enrich their samples, with 31 out of 170 sequences (21%) removed from further analysis due to low quality (17). Although more frequently successful, the hybrid selection technique requires either expensive synthetic RNA baits or a large amount of pure P. vivax DNA to create the RNA baits, which is difficult to obtain. In addition, hybrid selection can introduce bias, since it is approximately half as efficient at capturing regions with GC contents of >50% (16).
An alternative method is selective whole-genome amplification (SWGA). SWGA has been used to enrich submicroscopic DNA levels of the ape Plasmodium parasites, P. reichenowi and P. gaboni, from whole-blood samples (21, 22) and P. falciparum genomes from dried blood spots (23) for WGS. SWGA preferentially amplifies the genomes of pathogens from complex mixtures of target and host DNA (Fig. 1) (24). SWGA does not require separation of target DNA from background DNA, making it an attractive option for pathogens that cannot be amplified in culture. DNA amplification is carried out by the highly processive, strand-displacing phi29 DNA polymerase and a set of pathogen-specific primers that target short (6 to 12 nucleotide) motifs that are common in the pathogen genome and uncommon in the host genome. The strand displacement function of phi29 results in the amplification of genomic regions where primers bind frequently, leading to the preferential amplification of genomes with frequent primer-binding sites. Here, we show that SWGA efficiently enriches P. vivax gDNA from unprocessed human blood samples and dried blood spots for cost-effective, high-quality whole-genome sequencing.
Primer design and optimization using P. vivax-infected whole-blood samples.
To perform SWGA on P. vivax gDNA from unprocessed human blood samples, we designed primers that specifically amplified this parasite’s DNA using a previously published approach for P. falciparum (22). Briefly, we identified the most frequently occurring motifs of 6 to 12 nucleotides in length in the P. vivax Salvador-1 (Sal-1) reference genome. We selected the top 10,000 primers of each length, yielding a total of 70,000 primers for further analysis. We filtered these primers based on characteristics such as melting temperature (18 to 32°C), ability to homodimerize (no greater than 3 consecutive matches), binding frequencies on the human genome and Sal-1 genome (less frequent than once every 500,000 bp and more frequent than once every 50,000 bp, respectively), and infrequent binding of the human mitochondrial genome (less than 4 binding sites). Next, we removed primers predicted to bind the Sal-1 subtelomeres. These filters resulted in a pool of 222 primers. We separated these primers into 6 sets that were predicted not to form heterodimers and identified the top set (pvset1) of 10 primers using a selection algorithm described previously (22). gDNA from an unprocessed P. vivax-infected whole-blood sample (MRL2) was subjected to SWGA with primer set pvset1 prior to shotgun sequencing (see Fig. S1 in the supplemental material). SWGA significantly increased the percentage of reads that mapped to the P. vivax Sal-1 reference genome, from 0.7% to 73.5% (see Fig. S1A), and improved the genome coverage obtained from ~80 million bp of sequencing from 1.5% to 58% (see Fig. S1B).
We observed that the genome coverage obtained per base pair sequenced was lower than that achieved with SWGA of P. falciparum gDNA using the same primer set design methods (22). Visual inspection of the P. vivax genome coverage from samples subjected to SWGA revealed that coverage gaps were typically in regions with comparatively higher GC content (Fig. 2A). The P. falciparum genome is extremely AT rich, with only 19.4% of bases consisting of G’s or C’s, while the GC content for the P. vivax genome is 42.3% (9). The P. vivax genome also has an isochore structure: internal chromosomal areas have a high GC content and subtelomeres and centromeres have lower GC contents (25). Since phi29 DNA polymerase pauses more frequently during strand displacement and primer extension in regions with a high GC content of DNA (26), we hypothesized that differences in base composition could explain the more uneven amplification of the P. vivax genome compared to that of P. falciparum.
We thus designed primer sets specifically targeting regions of the P. vivax Sal-1 reference genome with high GC content and poor coverage using the swga program (https://github.com/eclarke/swga ; unpublished data), a program that identifies and scores SWGA primer sets (see Fig. S2 in the supplemental material). Primers were designed to bind regions of the P. vivax Sal-1 genome that had even AT/GC composition, were longer than 195,000 bp, and had low sequence coverage when amplified with pvset1. We identified 1,939 primer sets (consisting of up to 15 primers) with minimal human genome binding and maximal P. vivax genome binding and scored them based on evenness of binding, as well as mean distance between primer binding sites in the foreground and background genomes. The primer set with the best score, pvset1920, was chosen for subsequent testing. SWGA of an unprocessed human blood sample with pvset1920 yielded overall superior P. vivax genome coverage compared to that of SWGA with pvset1 (Fig. 3). Visual inspection of these post-SWGA P. vivax sequences revealed that pvset1920 achieved improved coverage particularly in regions with high GC content (Fig. 2B), with troughs in coverage in genomic regions of lower GC content, which include the centromeres and subtelomeres.
Having developed a method that worked well for SWGA of P. vivax gDNA from whole blood, we tested whether the method could also be applied to gDNA extracted from dried blood spot samples. Dried blood spots are a common method of storing patient and parasite DNA that utilizes a smaller volume of blood and does not require immediate cold storage. DNA extracted from dried blood spots can have variable quality depending on the method of collection and storage (27). SWGA has been used to enrich P. falciparum DNA from dried blood spots for WGS (23), with an average of 48.1% ± 3.5% of the genome covered at ≥5× for samples with an average parasite density of 73,601 parasites/µl ± 19,399 (1.5% parasitemia). Since P. vivax clinical samples generally have lower parasite densities, we wondered if it would be feasible to obtain significant genome coverage on P. vivax from dried blood spots with SWGA. We extracted DNA from blood spots obtained from symptomatic patients in Peru and performed SWGA with pvset1920, achieving 73% and 42% genome coverage at 1× on initial testing for samples with a high (sample C, 56,790 parasites/µl; 1.1%) and low (sample L, 2,572 parasites/µl; 0.05%) level of parasitemia, respectively (see Fig. S3 in the supplemental material).
We finally tested whether an enzymatic digest to remove contaminating human DNA could further improve P. vivax genome coverage. Modification-dependent restriction endonucleases (MDREs), such as MspJI and FspEI, which specifically recognize cytosine C-5 methylation or hydroxymethylation (28), have been used to selectively degrade human DNA in P. falciparum clinical samples. Enzyme digest of DNA extracted from clinical samples with >80% human contamination has previously been shown to enrich P. falciparum DNA ~ninefold for more efficient WGS (29). However, when we performed a digest with MspJI and FspEI enzymes on gDNA extracted from whole blood obtained from patients with P. vivax infection, we observed either no change or markedly decreased genome coverage in the 5 enzyme-digested samples (see Fig. S4 in the supplemental material).
SWGA and WGS of P. vivax from patient samples.
To test the utility of SWGA for variant calling and population genetics analysis of P. vivax from unprocessed clinical blood samples, we used primer set pvset1920 to perform SWGA on P. vivax gDNA from 18 whole-blood and 4 dried blood spot samples collected from symptomatic patients with P. vivax infection in Peru. Since the whole-blood samples had not been leukocyte filtered, they had significant contamination with human DNA, with less than 1.5% of reads mapping to the Sal-1 reference genome in unamplified samples (not shown). For all samples, SWGA significantly increased the proportion of reads that mapped to the P. vivax Sal-1 reference genome, resulting in higher genome coverage and higher percentages of callable total- and core-genome regions (covered by ≥5 reads) (Table 1). Comparison of the SWGA-amplified samples to 10 leukocyte-filtered samples from a field study in Peru which were sequenced to a similar depth (1.5 ± 0.2 billion bp sequenced for SWGA samples versus 1.5 ± 0.5 billion bp for leukocyte-filtered samples) showed that SWGA yields a twofold increase in the percentage of sequencing reads that map to the P. vivax genome and an average 5× P. vivax core-genome coverage of 60.1% ± 26.0%, compared to 43.7% ± 41.4% for leukocyte-filtered samples (10). For the 4 dried blood spot samples, we achieved an average 5× core-genome coverage of 54.0% ± 34.6%.
TABLE 1 Sequencing statistics for P. vivax sequences from clinical samples that underwent selective whole-genome amplificationa
Sequencing statistics were determined using the Genome Analysis Toolkit’s (GATK) DepthofCoverage tool. The core genome was defined by coordinates determined in the large-scale P. vivax sequencing study by Pearson et al. (15). The leukocyte-filtered sequencing statistics presented here are from P. vivax clinical samples obtained from a previously published study in Peru (10).
NA, not applicable.
Covered by ≥5 reads.
There was a trend toward improved mean coverage and percentage of the genome callable in samples with higher parasite densities (see Fig. S5 in the supplemental material). This is consistent with previous SWGA results for P. falciparum (22) and results from other P. vivax-enriching methods, such as hybrid selection (16). Samples subjected to SWGA yielded similar ≥5× genome coverage per sequenced base pair compared to that obtained by direct sequencing of a leukocyte-filtered patient sample (Fig. 4A). The percentage of the 14 large chromosomes of P. vivax considered callable for samples that underwent SWGA fell within the range of that obtained by direct sequencing of leukocyte-filtered samples (Fig. 4B). Additionally, post-SWGA sequences yielded similar levels of mean base quality when normalized across 100-bp windows of various %GC contents in the reference genome compared to the results for leukocyte-filtered samples (Fig. 5).
To examine the utility of post-SWGA sequences for variant analysis, we called 45,821 single-nucleotide polymorphisms (SNPs) from the whole-blood and dried blood spot samples that were subjected to SWGA. For the whole-blood samples, an average of 14,463 SNPs was identified per sample, which is consistent with prior studies of P. vivax field isolates (10). Compared to the results for leukocyte-filtered samples, SNP characteristics such as SNP rate, transition-to-transversion (Ti/Tv) ratio, and nonsynonymous-to-synonymous ratio were nearly identical in the samples that underwent SWGA (Table 2).
TABLE 2 Comparison of single-nucleotide polymorphism characteristics of Plasmodium vivax sequences after selective whole-genome amplification (SWGA) and leukocyte filtrationa
Sample preparation method
Nonsynonymous-to-synonymous ratio (SNP effect)
SNPs per base
Samples for SWGA were obtained from Iquitos, Peru, and samples for leukocyte filtration were from a previously published study done in Madre de Dios, Peru (10).
In addition, the proportions of SNPs that were exonic, intronic, intergenic, or at 5′ and 3′ untranslated regions were similar between samples prepared using the two methods of P. vivax enrichment. We also detected SNPs in several known drug resistance genes previously detected in samples from Peru (10) and Colombia (8) in the whole-blood and dried blood spot samples (Table 3; see also Table S1 in the supplemental material), further validating the utility of sequences derived from SWGA for variant calling. This includes several intronic mutations around a putative chloroquine resistance transporter gene (pvcrt), in addition to coding mutations in the dihydrofolate reductase (pvdhfr), multidrug resistance protein 1 (pvmdr1), multidrug resistance protein 2 (pvmrp2), and dihydropteroate synthetase (dhps) genes.
TABLE 3 Nonsynonymous SNPs in known drug resistance genes detected in Plasmodium vivax samples that underwent selective whole-genome amplification (SWGA)
No. of samples bearing indicated allele(no. confidently genotyped)
Allele of the P. vivax Sal-1 reference genome at the indicated position.
Allele found in P. vivax samples from Peru that underwent SWGA.
Amino acid change resulting from base change from the reference to the alternate allele.
We also compared sample clonality estimates of post-SWGA sequences to microsatellite analyses on the same unamplified samples. We estimated the clonality of the 6 post-SWGA sequences with the highest coverage using the Fws statistic, a measure of within-host diversity previously used to characterize multiplicity of infection in Plasmodium falciparum patient samples (Table 4) (5, 30). An Fws score of ≥0.95 indicates low within-host diversity and infection with a single parasite, while an Fws score of ≤0.70 is suggestive of a multiclonal infection. Microsatellite analysis on these same 6 unamplified samples indicated that all were clonal, except for sample 9, where the presence of 2 microsatellite markers at more than one position suggested that it could be a multiclonal sample. However, for all 6 post-SWGA sequences, the Fws score was ≥0.95, suggesting that all were clonal infections. Thus, while SWGA does not introduce errors that lead to a falsely low Fws, it may lead to underestimations of clonality in multiclonal samples.
TABLE 4 Clonality estimates of samples that underwent selective whole-genome amplification compared to microsatellite analysis from the unamplified samples
No. of heterozygous SNPs
No. of microsatellite markers before SWGA in microsatellite locus:
Finally, we constructed a neighbor-joining tree using core-genome SNPs to determine the relatedness of our samples to one another and to other P. vivax isolates from Peru and around the world (Fig. 6). In this tree, our samples, which were from the Iquitos region of Peru, clustered with one another and were most closely related to leukocyte-filtered samples from another region of Peru (10). They also exhibited the expected degree of relatedness to previously published P. vivax sequences derived from other leukocyte-filtered (15), hybrid-selected (17), and monkey-adapted (19) clinical samples from diverse areas of the world, further validating the use of post-SWGA sequences from downstream analyses.
In this study, we validate SWGA as a cost-effective, robust method to enrich P. vivax gDNA from unprocessed whole-blood and dried blood spot clinical samples to improve the efficiency and decrease the cost of subsequent WGS. This is a method that can be applied to clinical samples infected with other malaria species, such as P. malariae, P. ovale curtisi, and P. ovale wallikeri, where parasite densities are low, and where there is no routine ex vivo culture (31), though species-specific primer sets would likely be required. SWGA utilizes readily available reagents, does not require processing at the time of sample collection, and can be performed in a simple, overnight reaction. While several methods have been used successfully to enrich P. vivax gDNA for WGS from infected whole-blood samples, most are resource and labor intensive. Short-term ex vivo culture of P. vivax isolates or adaptation to growth in monkeys produces a large amount of P. vivax DNA, but these methods require significant resources. Single-cell sequencing allows for highly sensitive dissection of multiclonal samples; however, this approach requires cryopreserved samples and specialized laboratory equipment (20). While leukocyte filtration is cost-effective and efficient, it is not always possible to perform at field sites with limited infrastructure, because samples require refrigeration within 6 hours to minimize white blood cell lysis and reduce irreversible contamination from human DNA. Hybrid selection is less labor intensive, but the production of the RNA baits used for capture requires either large amounts of P. vivax Sal-1 DNA or costly commercially synthesized RNA bait.
Using SWGA, we achieved a higher-than-average callable P. vivax genome than was obtained for leukocyte-depleted clinical samples sequenced at a similar depth. SWGA generally yielded the highest genome coverage for clinical samples with the highest parasite densities, consistent with our experience with P. falciparum (22). For the 12 samples with parasite densities of >5,000 parasites/µl (0.1% parasitemia), we were able to call, on average, 71.5% of the core genome, compared to 37% for the 6 samples with parasite densities of <5,000 parasites/µl. Increased sequencing effort is needed to obtain maximal genome coverage for samples with lower parasite densities (see Table S2 and Fig. S6 in the supplemental material). In these cases, the low genome coverage is likely the result of stochastic amplification of a small number of starting P. vivax genomes, which leads to very deep coverage of some genomic regions and little or no coverage of others (22). If maximal genome coverage is desired, sequential SWGA reactions with pvset1920 followed by pvset1 increase coverage slightly (3 to 5% 1× genome coverage) (see Fig. S7). Additionally, performing multiple independent SWGA reactions on a sample and combining the sequencing reads can improve genome coverage (4 to 12% 1× genome coverage) (see Fig. S8). Since multiple rounds of SWGA or pooling the products from multiple reactions increases the workload and expenses for a small improvement in genome coverage, we opted for a single amplification reaction with pvset1920 for our samples. However, these protocol modifications may be useful if high genome coverage is needed from samples with low parasitemia.
One potential limitation of SWGA of P. vivax from clinical samples is the ability to amplify all clones in a multiclonal sample. One of our samples appeared to be multiclonal by microsatellite analysis of the unamplified gDNA, and yet, Fws analysis of the post-SWGA sequence suggested that the sample was comprised of a single clone. It is possible that in this case, a majority clone was amplified preferentially over minority clones. Another possibility is that the microsatellite marker assessment may overestimate clonality. Analyses of additional multiclonal samples will be necessary to address this question. Another important limitation of SWGA is that copy number variant (CNV) detection is not possible on post-SWGA sequences. The uneven distribution of primer-targeted motifs in the target genome results in peaks and troughs in mean genome coverage that can confound CNV detection methods. Finally, SWGA requires long strands of gDNA for efficient amplification of the target genome and is unlikely to work well on degraded or ancient DNA samples.
Whole-genome analysis has the potential to reveal much about the biology and epidemiology of P. vivax infections. For example, comparison of recurrent infections using WGS can help distinguish relapse due to reactivation of hypnozoites from reinfection or drug resistance, an epidemiological distinction of public health importance. SWGA enables high-quality and cost-effective WGS of P. vivax from unprocessed blood samples that would otherwise be impossible or prohibitively expensive to sequence. Advanced technologies like SWGA will greatly facilitate future P. vivax whole-genome sequencing projects, thereby improving our ability to understand and combat the most widespread form of malaria.
MATERIALS AND METHODS
Patient sample collection and preparation.
The P. vivax-infected DNA sample used for initial testing of selective amplification primer sets (MRL2) was provided by the Malaria Reference Laboratory of the London School of Hygiene and Tropical Medicine, London, United Kingdom. The six dried blood spot samples used in this study were collected from patients with symptomatic P. vivax infection in Peru. Parasitemia was quantified with real-time PCR, and gDNA was extracted using a QIAamp DNA blood minikit (Qiagen).
Eighteen whole-blood samples used for additional testing and further sequencing analysis were derived from whole-blood samples collected from patients with symptomatic P. vivax infections from two sites around Iquitos, Peru, during a study conducted by U.S. Naval Medical Research Unit No. 6 (32). Thick blood smears were examined to identify the parasite species and to determine the level of parasitemia. Parasite density was calculated by counting the number of asexual parasites per 200 white blood cells in the thick smear (Assuming an average of 6,000 white blood cells per µl). Each blood smear was examined by two microscopists independently and by a third microscopist in the event of a discrepancy. The final parasite density was calculated as the average of the density readings from the two concordant microscopists. Whole-blood samples were collected in the field using EDTA-containing Vacutainer tubes. Samples were frozen and transported to the central laboratory until further processing. DNA was isolated from thawed whole blood using a QIAamp DNA blood minikit (Qiagen) following the manufacturer’s recommendations and as described elsewhere (33). Samples were subsequently resuspended in Tris-EDTA (TE) buffer, and gDNA was quantified using a Qubit 2.0 fluorometer.
The initial set of pvset1 primers was designed as described previously (22). Primer set pvset1920 was designed using the swga program, which scores primer sets based on their selectivity and evenness of binding (measured using the Gini index) and, thus, automates and improves primer selection. The source code of swga, along with download links and documentation, are available at https://www.github.com/eclarke/swga . pvset1920 was designed to specifically amplify longer regions (>195,000 bp) of the P. vivax reference genome (Sal-1) that were GC rich (48.5 to 50.6%) and yielded low genome coverage following SWGA with pvset1. A total of 1,939 primer sets were identified that exhibited a minimum background binding distance of 25,000 bp and a maximum foreground binding distance of 37,000 bp. These were scored using swga’s composite primer scoring algorithm (Gini score × foreground mean/background mean), and the set with the lowest score (pvset1920) was chosen for testing.
pvset1 consists of the following 10 primers, with asterisks indicating phosphorothioate bonds that are necessary to prevent degradation by phi29: 5′-CGTTG*C*G-3′, 5′-TTTTTTC*G*C-3′, 5′-TCGTG*C*G-3′, 5′-CGTTTTTT*T*T-3′, 5′-TTTTTTTC*G*T-3′, 5′-CCGTT*C*G-3′, 5′-CGTTTC*G*T-3′, 5′-CGTTTC*G*C-3′, 5′-CGTTTT*C*G-3′, and 5′-TCGTTC*G*T-3′. pvset1920 consists of the following 12 primers: 5′-AACGAAGC*G*A-3′, 5′-ACGAAGCG*A*A-3′, 5′-ACGACGA*A*G-3′, 5′-ACGCGCA*A*C-3′, 5′-CAACGCG*G*T-3′, 5′-GACGAAA*C*G-3′, 5′-GCGAAAAA*G*G-3′, 5′-GCGAAGC*G*A-3′, 5′-GCGGAAC*G*A-3′, 5′-GCGTCGA*A*G-3′, 5′-GGTTAGCG*G*C-3′, and 5′-AACGAAT*C*G-3′.
Selective whole-genome amplification.
Thirty to 70 ng of input DNA was added to a 50-µl reaction mixture containing 3.5 µM SWGA primers, 30 U phi29 DNA polymerase enzyme (New England Biolabs), 1× phi29 buffer (New England Biolabs), 4 mM dNTPs (Roche), 1% bovine serum albumin, and water. The reaction was carried out on a thermocycler with cycling conditions consisting of a ramp down from 35°C to 30°C (10 min per degree), 16 h at 30°C, 10 min at 65°C, and hold at 4°C. The samples were diluted 1:1 with DNase-free, RNase-free water and purified with AMPure XP beads (Beckman-Coulter) at a 1:1 ratio according to the manufacturer’s protocol. When a second round of selective amplification was performed, the reaction mixture contained 100 to 200 ng of the AMPure XP-purified product from the first reaction.
One hundred twenty-five to 500 ng of gDNA extracted from P. vivax-infected whole-blood samples was digested with 5 units of FspEI (New England Biolabs) and 5 units of MspJI (New England Biolabs) enzymes in a 30-µl reaction mixture. A mock digest with an identical amount of gDNA and no enzymes was run in parallel. Samples were digested for 2 h at 37°C and then heat inactivated at 80°C for 15 min. For coverage analysis, rarefaction analysis was performed on Illumina MiSeq sequencing reads derived from samples digested with MspJI or FspEI (digest and SWGA) or mock digested with no enzyme prior to SWGA with primer set pvset1920. The coverage obtained by mapping 200,000 MiSeq sequencing reads (~25 million bp of sequencing depth) was compared between digested and mock-digested samples. P values were obtained using the two-tailed t test.
The SWGA products and unamplified DNA used for primer set testing were sequenced on an Illumina MiSeq using a modified Nextera library preparation method. For HiSeq runs, next-generation-sequencing libraries of SWGA products were prepared using the Nextera XT DNA preparation kit (Illumina) according to the manufacturer’s protocol. These samples were pooled and clustered on a HiSeq 2500 (Illumina) in Rapid Run mode with 100-bp paired-end reads. For the coverage analysis of leukocyte-filtered samples, fastq files from a prior study of P. vivax field samples (10) were used.
Raw fastq files were aligned to the Sal-1 reference genome (PlasmoDB version 13, http://plasmodb.org/common/downloads/release-13.0/PvivaxSal1/fasta/data/ ) using the Burroughs-Wheeler Aligner (version 0.7.8) (34) and SAMtools (version 0.1.19) (35, 36) in the Platypus pipeline, as previously described (37). Picard (version 2.0.1) was used to remove unmapped reads, and the Genome Analysis Toolkit (GATK) (38) was used to realign the sequences around the indels. Picard’s CollectGcBiasMetrics tool was used to generate the GC bias plots. GATK’s DepthOfCoverage tool was used to determine the percentages of the total and core genomes covered by ≥5 reads, mean coverage, and coverage over the core genome. The coordinates of the P. vivax core genome, which excludes subtelomeric and hypervariable regions with significantly higher read mapping errors, was obtained from a recent analysis of hundreds of P. vivax sequences from clinical isolates (15).
For rarefaction analyses, sequences were aligned with smalt (Wellcome Trust Sanger Institute) and mapped with custom scripts in R (https://www.r-project.org /). To visualize base composition across chromosomes, plots were created in Geneious (version 9.1) (39) using the Sal-1 P. vivax reference sequences and 25-bp windows. Plots of chromosome coverage were created with IGVTools (version 2.3.40) (40, 41).
Variant calling and analysis.
We followed the GATK’s best practices to call variants (42, 43). The aligned sequences were run through GATK’s HaplotypeCaller in “reference confidence” mode to create genomic variant call format (GVCF) files for each sample. This reference confidence model highlights areas of the genome that are likely to have variation and produces a comprehensive record of genotype likelihoods and annotations for each site. The samples were joint genotyped using the GenotypeGVCFs tool.
Variants were further filtered based on quality scores and sequencing bias statistics based on default parameters from GATK. SNPs were filtered out if they met any of the following criteria: quality depth (QD) of <2.0, mapping quality (MQ) of <50.0, Phred-scaled P value using Fisher’s exact test to detect strand bias (FS) of >60.0, symmetric odds ratio (SOR) of >4.0, Z score from Wilcoxon rank-sum test of alternative versus reference read-mapping qualities (MQRankSum) of <−12.5, and ReadPosRankSum (RPRS) of <−8.0. Variants were annotated using snpeff (version 4.2) (44).
Fws of samples with the highest genome coverage was estimated using moimix (https://doi.org/10.5281/zenodo.58257 ), a package available through R. The package calculates the Fws statistic using the equation Fws = 1 − (Hw/Hs), where Hw is the within-host heterozygosity and Hs is the population-level heterozygosity (5, 30). The core P. vivax genome, as defined by Pearson et al. (15), was used for core-genome analysis. For microsatellite genotyping, five neutral microsatellite loci of significant variability in the Peruvian Amazon were typed in a previous study (32). If there was more than one marker at any given locus, the sample was considered multiclonal, per prior genotyping studies (45–47).
A neighbor-joining tree was constructed using SNPs from the core P. vivax genome from sequences obtained in this study, along with sequences from previously published studies that are available in the NCBI Short Read Archive. The Mdio samples were from a previous study conducted by our laboratory in Peru (10), and the rest of the sequences were obtained from two recent large-scale P. vivax sequencing studies (15, 17). In order to assess the phylogenetic relationships of sequenced isolates, we constructed a multiple sequence alignment from filtered SNPs called in GATK using an in-house Perl script. This alignment was used as the input for maximum-likelihood phylogenetic analysis in the randomized accelerated maximum-likelihood program (RAxML) (48) with 500 pseudoreplicates using the generalized time-reversible model, and the resulting tree was visualized in Dendroscope (49).
Ethics approval and consent to participate.
The sample from Malaria Research Laboratories (MRL) was an anonymized DNA sample previously collected by the MRL and provided under the MRL’s remit to undertake epidemiological surveillance relevant to imported malaria in the United Kingdom. The protocol for the collection of field samples was approved by the Institutional Review Board of the U.S. Naval Medical Research Center (Protocol NMRCD.2005.0005) and the National Institutes of Health of Peru (protocol 009-2004) in compliance with all applicable federal regulations governing the protection of human subjects. All adult subjects provided written informed consent, and all children 8 to 17 years old provided verbal assent to participate in the study.
The P. vivax genome Illumina sequencing reads of the 22 samples used for variant analysis in this study are available in the National Center for Biotechnology Information’s Sequence Read Archive with the study accession number SRP095853 .
E.A.W. was supported by National Institutes of Health (NIH) grant R01 AI 103058. Support for sample preparation and sequencing at the University of California, San Diego, was provided by NIH grant P50 GM 085764. A.N.C. was supported through NIH grant T32 AI 007036. Work done by B.H.H., D.E.L., and S.A.S. was supported by grants from the NIH (R01 AI 091595, T32 AI 007532, and P30 AI 045008). K.F. was supported by UC San Diego Clinical and Translational Research Institute grant UL1TR001442. J.M.V. received support for the collection of dried blood spot samples from grants from the NIH/NIAID (U19 AI 089681 and D43 TW 007120). A.G.L. received support from training grant 2D43 TW007393 awarded by the Fogarty International Center of the NIH. We thank the Malaria Research Laboratory UK and all the patients who provided the samples for this study.
Several authors of the manuscript are military service members or employees of the United States Government. This work was prepared as part of their duties. Title 17 USC § 105 provides that Copyright protection under this title is not available for any work of the United States Government. Title 17 USC § 101 defines a U.S. Government work as a work prepared by a military service member or employee of the United States Government as part of that person’s official duties.
The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of the Navy, Department of Defense, or the U.S. Government.
This article is a direct contribution from a Fellow of the American Academy of Microbiology.
Singh B, Kim Sung L, Matusop A, Radhakrishnan A, Shamsul SS, Cox-Singh J, Thomas A, and Conway DJ. 2004. A large focus of naturally acquired Plasmodium knowlesi infections in human beings. Lancet363:1017–1024.
Calderaro A, Piccolo G, Gorrini C, Rossi S, Montecchini S, Dell’Anna ML, De Conto F, Medici MC, Chezzi C, and Arcangeletti MC. 2013. Accurate identification of the six human plasmodium spp. causing imported malaria, including Plasmodium ovale wallikeri and Plasmodium knowlesi. Malar J12:321.
Sutherland CJ, Tanomsing N, Nolder D, Oguike M, Jennison C, Pukrittayakamee S, Dolecek C, Hien TT, do Rosário VE, Arez AP, Pinto J, Michon P, Escalante AA, Nosten F, Burke M, Lee R, Blaze M, Otto TD, Barnwell JW, Pain A, Williams J, White NJ, Day NP, Snounou G, Lockhart PJ, Chiodini PL, Imwong M, and Polley SD. 2010. Two nonrecombining sympatric forms of the human malaria parasite Plasmodium ovale occur globally. J Infect Dis201:1544–1550.
Borrmann S, Straimer J, Mwai L, Abdi A, Rippert A, Okombo J, Muriithi S, Sasi P, Kortok MM, Lowe B, Campino S, Assefa S, Auburn S, Manske M, Maslen G, Peshu N, Kwiatkowski DP, Marsh K, Nzila A, and Clark TG. 2013. Genome-wide screen identifies new candidate genes associated with artemisinin susceptibility in Plasmodium falciparum in Kenya. Sci Rep3:3318.
Flannery EL, Wang T, Akbari A, Corey VC, Gunawan F, Bright AT, Abraham M, Sanchez JF, Santolalla ML, Baldeviano GC, Edgel KA, Rosales LA, Lescano AG, Bafna V, Vinetz JM, and Winzeler EA. 2015. Next-generation sequencing of Plasmodium vivax patient samples shows evidence of direct evolution in drug-resistance genes. Infect Dis1:367–379.
Chan ER, Menard D, David PH, Ratsimbasoa A, Kim S, Chim P, Do C, Witkowski B, Mercereau-Puijalon O, Zimmerman PA, and Serre D. 2012. Whole genome sequencing of field isolates provides robust characterization of genetic diversity in Plasmodium vivax. PLoS Negl Trop Dis6:e1811.
Hester J, Chan ER, Menard D, Mercereau-Puijalon O, Barnwell J, Zimmerman PA, and Serre D. 2013. De novo assembly of a field isolate genome reveals novel Plasmodium vivax erythrocyte invasion genes. PLoS Negl Trop Dis7:e2569.
Venkatesan M, Amaratunga C, Campino S, Auburn S, Koch O, Lim P, Uk S, Socheat D, Kwiatkowski DP, Fairhurst RM, and Plowe CV. 2012. Using CF11 cellulose columns to inexpensively and effectively remove human DNA from Plasmodium falciparum-infected whole blood samples. Malar J11:41.
Pearson RD, Amato R, Auburn S, Miotto O, Almagro-Garcia J, Amaratunga C, Suon S, Mao S, Noviyanti R, Trimarsanto H, Marfurt J, Anstey NM, William T, Boni MF, Dolecek C, Tran HT, White NJ, Michon P, Siba P, Tavul L, Harrison G, Barry A, Mueller I, Ferreira MU, Karunaweera N, Randrianarivelojosia M, Gao Q, Hubbart C, Hart L, Jeffery B, Drury E, Mead D, Kekre M, Campino S, Manske M, Cornelius VJ, MacInnis B, Rockett KA, Miles A, Rayner JC, Fairhurst RM, Nosten F, Price RN, and Kwiatkowski DP. 2016. Genomic analysis of local variation and recent evolution in Plasmodium vivax. Nat Genet48:959–964.
Hupalo DN, Luo Z, Melnikov A, Sutton PL, Rogov P, Escalante A, Vallejo AF, Herrera S, Arévalo-Herrera M, Fan Q, Wang Y, Cui L, Lucas CM, Durand S, Sanchez JF, Baldeviano GC, Lescano AG, Laman M, Barnadas C, Barry A, Mueller I, Kazura JW, Eapen A, Kanagaraj D, Valecha N, Ferreira MU, Roobsoong W, Nguitragool W, Sattabonkot J, Gamboa D, Kosek M, Vinetz JM, González-Cerón L, Birren BW, Neafsey DE, and Carlton JM. 2016. Population genomics studies identify signatures of global dispersal and drug resistance in Plasmodium vivax. Nat Genet48:953–958.
Larremore DB, Sundararaman SA, Liu W, Proto WR, Clauset A, Loy DE, Speede S, Plenderleith LJ, Sharp PM, Hahn BH, Rayner JC, and Buckee CO. 2015. Ape parasite origins of human malaria virulence genes. Nat Commun6:8368.
Sundararaman SA, Plenderleith LJ, Liu W, Loy DE, Learn GH, Li Y, Shaw KS, Ayouba A, Peeters M, Speede S, Shaw GM, Bushman FD, Brisson D, Rayner JC, Sharp PM, and Hahn BH. 2016. Genomes of cryptic chimpanzee plasmodium species reveal key evolutionary events leading to human malaria. Nat Commun7:11078.
Guggisberg AM, Sundararaman SA, Lanaspa M, Moraleda C, González R, Mayor A, Cisteró P, Hutchinson D, Kremsner PG, Hahn BH, Bassat Q, and Odom AR. 2016. Whole genome sequencing to evaluate the resistance landscape following antimalarial treatment failure with fosmidomycin-clindamycin. J Infect Dis214:1085–1091.
Morin JA, Cao FJ, Lázaro JM, Arias-Gonzalez JR, Valpuesta JM, Carrascosa JL, Salas M, and Ibarra B. 2012. Active DNA unwinding dynamics during processive DNA replication. Proc Natl Acad Sci U S A109:8115–8120.
Schwartz A, Baidjoe A, Rosenthal PJ, Dorsey G, Bousema T, and Greenhouse B. 2015. The effect of storage and extraction methods on amplification of Plasmodium falciparum DNA from dried blood spots. Am J Trop Med Hyg92:922–925.
Cohen-Karni D, Xu D, Apone L, Fomenkov A, Sun Z, Davis PJ, Kinney SR, Yamada-Mabuchi M, Xu SY, Davis T, Pradhan S, Roberts RJ, and Zheng Y. 2011. The MspJI family of modification-dependent restriction endonucleases for epigenetic studies. Proc Natl Acad Sci U S A108:11040–11045.
Oyola SO, Gu Y, Manske M, Otto TD, O’Brien J, Alcock D, Macinnis B, Berriman M, Newbold CI, Kwiatkowski DP, Swerdlow HP, and Quail MA. 2013. Efficient depletion of host DNA contamination in malaria clinical sequencing. J Clin Microbiol51:745–751.
Ansari HR, Templeton TJ, Subudhi AK, Ramaprasad A, Tang J, Lu F, Naeem R, Hashish Y, Oguike MC, Benavente ED, Clark TG, Sutherland CJ, Barnwell JW, Culleton R, Cao J, and Pain A. 2016. Genome-scale comparison of expanded gene families in Plasmodium ovale wallikeri and Plasmodium ovale curtisi with Plasmodium malariae and with other plasmodium species. Int J Parasitol46:685–696.
Durand S, Cabezas C, Lescano AG, Galvez M, Gutierrez S, Arrospide N, Alvarez C, Santolalla ML, Bacon DJ, and Graf PC. 2014. Efficacy of three different regimens of primaquine for the prevention of relapses of Plasmodium vivax malaria in the Amazon Basin of Peru. Am J Trop Med Hyg91:18–26.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map format and SAMtools. Bioinformatics25:2078–2079.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, and DePristo MA. 2010. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res20:1297–1303.
Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Meintjes P, and Drummond A. 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics28:1647–1649.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, and Daly MJ. 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet43:491–498.
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, and DePristo MA. 2013. From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinformatics43:11.10.1–11.10.33.
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, and Ruden DM. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly6:80–92.
Imwong M, Sudimack D, Pukrittayakamee S, Osorio L, Carlton JM, Day NP, White NJ, and Anderson TJ. 2006. Microsatellite variation, repeat array length, and population history of Plasmodium vivax. Mol Biol Evol23:1016–1018.
de Souza AM, de Araújo FC, Fontes CJ, Carvalho LH, de Brito CF, and de Sousa TN. 2015. Multiple-clone infections of Plasmodium vivax: definition of a panel of markers for molecular epidemiology. Malar J14:330.
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.