INTRODUCTION
The high level of genetic diversity is one of the main contributors to immune system and drug treatment failure during HIV-1 infection. This diversity is generated primarily by the error-prone reverse transcriptase during DNA synthesis, a process that results in approximately one mutation every three replication cycles (
1–4). Moreover, each HIV-1 virion contains two copies of the RNA genome, allowing the reverse transcriptase to switch between the two copackaged RNA genomes. This process of recombination also influences HIV-1's sequence diversity by generating a progeny that is a genetic mix of the two parental strains (
5). Recombination occurs much more frequently than mutation and is a powerful force that influences the evolution of the HIV-1 genome (for a review, see reference
4). Investigations into locations of inter/intrasubtype recombination indicate that sequence identity is sufficient to explain most breakpoint locations (
6–9). This is unsurprising, as sequence similarity between genomic partners is a strict requirement for efficient recombination (
7,
10–12). Given that the vast majority of HIV-1 infections are not the result of coinfections with multiple divergent viral strains but are initiated from a single virion, a model system that measures recombination between genetically similar genomes rather than inter/intrasubtypes will better approximate the quasispecies
in vivo (
13–15). However, little is known about recombination likely to be found within the viral quasispecies of an infected individual, because it is difficult to detect recombination between genetically similar genomes. Understanding recombination is a critical piece in the puzzle of HIV-1's evolutionary history and may help with the development of future treatments or with vaccine design.
Measuring recombination involves analyzing the progeny of heterozygous virions (virions containing two genetically different genomes) to determine where recombination breakpoints exist and at what frequency they are generated. Studies to date have measured recombination rates in a number of elegant ways. The use of retroviral reporter systems, where correctly positioned recombination will recreate a functional foreign gene insert conferring antibiotic resistance or fluorescence (
16–18), allows for the rapid screening of recombinants but does not allow the measurement of recombination on the natural HIV-1 sequence. A more direct method of detecting recombination is through the sequencing of reverse transcription products derived from an authentic HIV-1 replication cycle. Importantly, recombination can be observed only when it leads to the generation of chimeric molecules. That is, template switching between identical genomes, or an even number of template switches between two genetic loci, will lead to no genetic changes and will go unobserved. Thus, to detect recombination on the native HIV-1 genome, genetically different strains must be utilized. Previous studies have leveraged sequence differences between highly divergent but naturally occurring subtypes to measure intra- or intersubtype recombination (
19–22). However, as the overall sequence similarity between RNA templates is a major driving force governing recombination (
6,
7,
10,
12), and the majority of infected individuals harbor viral populations that are known to be genetically similar (
14,
23), measurements of recombination between genetically divergent strains will reflect only the special case of inter/intrasubtype recombination but will not reflect recombination among the genetically similar HIV-1 genomes found in most viral quasispecies.
To address these issues, we developed a minimally codon-modified HIV-1 genome and showed that this could be used to directly measure recombination under conditions where sequence similarity between RNA templates remains high (
24). Using Sanger sequencing of single-round reverse transcription products in the absence of selection, we showed that recombination does not occur randomly. This is in agreement with studies showing that recombination rates depend on a complex set of factors, such as the availability of nucleotide (nt) substrates (
25–27), the RNA template itself (
7,
12,
28), overall sequence similarity (
6,
7,
10,
12), and local sequence context of recombining sequences (
28–30). Using both
in vitro assays and single-cycle HIV-1 vectors, recombination hot spots have been identified in the untranslated regions (UTRs) (
30–32), in
gag (
29,
33), and in
env (
28,
34). However, only limited information on recombination is available within other regions of the HIV-1 genome (
33). We and others have attempted to use direct sequencing to locate recombination hot spots within the HIV-1 genome (
24,
33,
35), but the large amount of sequencing data required made it impossible to draw firm conclusions with strong statistical support.
In this study, we made use of next-generation sequencing to perform a comprehensive analysis of HIV-1 recombination using the marker method, with two marker configurations in gag and pol that allow recombination to be measured over 13 and 26 regions, respectively. This configuration is uniquely high resolution, with regions (separated by adjacent marker points) ranging from 21 to 159 nucleotides in length. Additionally, the system has broad coverage within gag and pol. We develop a statistical approach for comparing recombination rates and find that the recombination is not constant along the genome but varies with nucleotide position. This variation is statistically significant, with some regions showing a 6-fold difference in recombination rate. We identify 7 hot spots and 3 cold spots in gag and 5 hot spots and 7 cold spots in pol. Hot spots appear in gag at the beginning of the matrix, the matrix-capsid junction, and the capsid-p2 junction and in pol at the protease-p51 junction. We found no hot spots around regions that have been implicated with protease inhibitor and reverse transcriptase inhibitor drug resistance mutations. We also analyze recombination rates using a virus with a completely different set of engineered marker points and find that differences in recombination rate are not simply due to our silent marker manipulation of the viral sequence. Our results show that the viral gene region is a strong independent predictor of recombination rate.
DISCUSSION
The high replication rate of HIV-1 and high rates of mutation and recombination lead to remarkable adaptability of the virus in the face of intense evolutionary pressure. Recombination is thought to make natural selection more efficient by breaking linkages between mutations (
48–50). That is, recombination helps to maintain genetic diversity by breaking linkages between advantageous and deleterious mutations while also facilitating the removal of deleterious mutations by bringing them together in the same genome. Importantly, recombination can also pair advantageous mutations, which can facilitate the acquisition of multidrug resistance leading to treatment failure (
48–54). Recombination may also be an important mechanism by which the virus eventually escapes immune control (
55–58). However, recombination also has the potential to inhibit adaptation and evolution depending on epistasis and genetic drift (
51). Consequently, an improved understanding of recombination is important for understanding the evolutionary history of HIV-1 and may help to guide the design of robust antiretroviral therapies.
There have been many studies showing that even in the absence of selection, recombination does not occur randomly on the HIV-1 genome, highlighting the presence of additional factors governing the recombination process (
11,
19,
28–35,
59). However, many of these studies do not measure recombination rate in their natural genome context, or they measure recombination between highly divergent genomes that may not be most representative of the situation
in vivo, where we expect recombination between closely related members of the viral quasispecies. Here, we present a system that allows the study of recombination between highly similar genomes that mimic the HIV-1 quasispecies within an HIV-1-infected patient. We delineate the process of retroviral recombination through infection of primary T lymphocytes with a minimally codon-modified full-length virus. An advantage of this method is that we can target specific areas of the genome while controlling the length of interval and hence the accuracy of our study. We have previously used a similar system to analyze recombination rates in a small region of
gag (
37). In this case, we were unable to draw conclusions about the location of recombination hot spots, primarily because this requires analysis of large numbers of sequences (
19,
35,
37). In this study, we applied next-generation sequencing to systematically measure high-resolution recombination rates in
gag and
pol. These two genome regions were chosen because of their importance in the generation of drug-resistant virus and immune escape mutations (
60).
We have optimized this system and shown that it is not biased by confounding factors related to experimentally induced recombination and for the occurrence of multiple template switches over intervals of various lengths (
24,
37). Using two independent sets of marker modification, we show that putative recombination hot spots are not due to modifications introduced by our marker system. Indeed, there is a high correlation of recombination hot spots between our two systems. Notably, regardless of viral phenotype and blood donor, we demonstrate greater-than-6-fold recombination rate changes across
gag and
pol. These changes are consistent regardless of viral phenotype (
r = 0.68,
P < 0.001) and blood donor (
r = 0.44 to 0.71,
P = <0.001 to 0.04). We identify 12 genome regions with significantly higher rates of recombination and 10 genome regions with significantly lower rates of recombination.
It is instructive to compare our recombination hot spots between closely related genomes with those identified in natural HIV-1 sequences. Surprisingly, the
gag hot/cold spots identified in our study match closely with those identified by analyzing patient sequences (
6,
9,
61). This is surprising, because regions of sequence similarity are presumed to drive intersubtype recombination, and one would not expect to see the impact of local recombination hot spots after so many confounding factors, such as selection for functional proteins, drug resistance, or selection from the immune system (
9,
62). One of the most comprehensive studies, by Simon-Loriere and colleagues, analyzed sequences retrieved from the Los Alamos National Laboratory HIV sequence database (
http://www.hiv.lanl.gov) and provides evidence of recombination (
9). Their study identified two hot spots and one cold spot in the capsid of
gag, corresponding to our regions G
H5 to G
H8, G
H12 and G
H13, and G
H9 and G
H10. These regions also corresponded to hot and cold spot clusters in our analysis. The hot region spanning G
H5 to G
H8 does include a subregion with a strong and significant cold spot (G
H6; −0.73 × 10
−3;
P < 0.0001) that is not present in the Simon-Loriere study. However, this subregion may have been missed in their data set, as the segment G
H6 is only 21 bp in length, and they averaged their recombination breakpoints using a sliding window of 200 nt. It is interesting to note that these two hot regions span the matrix-capsid and capsid-P2 junctions of Gag. Indeed, it has been proposed that the distribution of RNA structures along the HIV-1 genome has evolved to facilitate gene swapping in a way that maximizes genetic diversity while minimizing the chance that the resulting progeny is impaired (
9,
61). Our study does not directly address this issue, as our marker points were designed to minimize structural changes to the genome. However, our data showing the position of hot and cold spots in the genome will help to inform future mechanistic studies into the factors that influence recombination.
Within
pol, some of our hot spots do not match those found by analyzing patient sequence databases. In our data set, we observe a hot spot near the beginning of p51 (P
H6; 0.74 × 10
−5;
P < 0.05) that is followed by a region of intermediate recombination ending with a strong recombination cold spot at P
H12 (−0.66 × 10
−5;
P < 0.0001). In the Simon-Loriere study, they identify a broad hot spot beginning at region P
H6 and peaking at P
H11. Thus, where their study finds one of their strongest hot spots, we find a region of intermediate recombination ending with one of our coldest spots at P
h12. As this region contains important resistance mutations, such as the thymidine analogue mutations (
60), the detection of hot spots for recombination in the
in vivo data could be evidence for selection. Similarly, we identify a cold spot (P
H31; −0.75 × 10
−5;
P < 0.001) that falls close to the p51-RNase H junction, which was labeled as a hot spot for recombination in the Simon-Loriere study. On the other hand, we identify hot spot P
H19 (0.54 × 10
−5; <0.05), which falls within an unstructured peptide loop of the reverse transcriptase enzyme (
63). Interestingly, this hot region, P
H19 to P
H21, corresponds exactly to some of the most highly structured RNA in the HIV-1 genome, as measured by selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry (
63). Indeed, RNA structures are proposed to favor recombination by causing reverse transcriptase to pause on the template (
12,
27,
64–66), and mechanistic studies demonstrate that the presence of RNA structure is often a feature of recombination hot spots (
34,
67). It has been previously reported that HIV-1 gene junctions are both enriched in RNA structure and thus more recombinogenic than other regions of the HIV-1 genome (
61,
63). We anecdotally note that our recombination hot spots do seem to be enriched at gene junctions, with the exception of the RNase H junction. This suggests that local fluctuations in recombination rates could drive the evolution of the RNA genome on a global scale. Further investigation of these genomic locations is warranted, as the molecular mechanisms that cause recombination hot and cold spots may shed further light on the higher-level organization of the HIV-1 genome.
As recombination is thought to facilitate viral evolution by intermixing immune escape and drug resistance mutations within HIV-1
gag and
pol, knowledge of how recombination rates vary within these particular (
68) regions of the viral genome is of importance for designing antiviral strategies. From a therapeutic viewpoint, the shuffling of resistance mutations within
gag and
pol could impact the generation of multidrug-resistant viruses (
48–50). In general, the further apart genomic regions are, the less likely they will be linked together, and the easier it will be to shuffle mutations between these regions. For genomic regions that are close together, it should be easier to generate an RT double mutation where the resistance mutations are separated by a recombination hot spot. Our data suggest that the major reverse transcriptase drug resistance mutations lie in a relatively stable region of the genome, theoretically reducing the risk that they will be brought together by recombination. It is important to note, however, that an important prerequisite for recombination is the copackaging of genetically distinct genomes into viral particles via efficient coinfection of cells. Early studies suggested that these conditions were likely to be fulfilled
in vivo, with between 75 and 80% of infected spleen cells harboring at least two or more proviruses, with most of these cells harboring genetically distinct proviruses (
69). More recent studies on both CD4
+ T cells and infected spleen cells contradict this view and show that the majority of cells are only singly infected (
68,
70). Nevertheless, there is ample evidence that at least some recombination does occur
in vivo and that it is functionally relevant to immune escape and the generation of multidrug-resistant HIV-1 (
48–52,
54–58,
68). Furthermore, it is possible that the location of recombination hot spots may be more important under scenarios of low coinfection than under scenarios where the conditions for recombination are rampant. It will be important to test this assertion by including the possibility of recombination hot spots in models of HIV-1 dynamics.
All together, our data provide unique insights into HIV-1 recombination occurring between highly similar genomes likely to be found in the majority of infected individuals. Our results demonstrate that recombination does not occur randomly, and we identify recombination hot spots and cold spots in gag and pol. Importantly, our recombination hot/cold spots match closely with those found by analysis of patient sequence databases, indicating that, for gag and pol, the recombinogenic properties of the RNA genome itself, rather than sequence similarity, is likely to be the main driver of recombinant genomes circulating in the human population. Further studies into this area may ultimately prove crucial in developing robust antiviral strategies against HIV-1.