INTRODUCTION
The ability of HIV to persist as an integrated provirus within a small fraction of infected cells, even during suppressive antiretroviral therapy (ART), is the main barrier to cure (
1,
2). It is also the reason why ART must be taken for life. Seeding of HIV sequences into reservoir cells begins immediately following infection (
3,
4) and continues until viral suppression is achieved on ART, thereby establishing a genetically diverse viral reservoir (
5–9). Understanding the within-host evolutionary dynamics of the proviruses that persist during ART, as well as the origins of HIV sequences that emerge from the reservoir if ART is interrupted, will aid the development of curative strategies.
In recent years, our understanding of reservoir dynamics has been enriched by studies that have interpreted on-ART proviral genetic diversity in blood in the context of HIV’s within-host evolutionary history (
5–7,
10–12). These studies have revealed that a large percentage of proviruses that persist in the blood during ART (most of which are genetically defective (
13–15)), as well as the vast majority of replication-competent reservoir sequences that persist during this time, date to the year or two preceding ART initiation (
5,
6,
9–11). We now understand that this is because reservoir turnover during untreated infection is relatively rapid (the half-life of persisting proviruses during this period is estimated to be a year or less (
5,
16)). This means that, if ART is not initiated until advanced chronic infection, many of the earliest within-host lineages will have already been eliminated by this time. Nevertheless, proviruses dating to earlier periods of infection are routinely recovered during ART, albeit less frequently (
5–7,
9,
11).
During the initial years of ART, the proviral pool decreases in size (initial on-ART half-lives of intact and defective proviruses are ~4 and >10 years, respectively, with decay slowing further thereafter (
17–19)). At the same time, clonal expansion of infected cells also occurs (
20–22). Given these opposing processes, and assuming that no new viral variants are seeded into the reservoir during ART (
5,
23), it is reasonable to hypothesize that the persisting proviral pool will decline in genetic diversity over time, as distinct proviruses are gradually eliminated. Relatively few studies, however, have investigated on-ART proviral genetic stability (
20,
24–29). Moreover, only two have done so in the context of HIV’s within-host evolutionary history (
5,
6), which can shed light on the lineage origins and ages of persisting proviruses. Their results however were not entirely concordant. While one study suggested that younger HIV lineages may be preferentially eliminated during the initial years of ART (though this did not reach statistical significance (
6)), the other supported relative proviral genetic stability even in the longer term (though the primary goal of the latter analysis was to investigate whether residual HIV replication occurs during ART, not to evaluate proviral genetic stability over time (
5)). Even fewer studies have compared the within-host evolutionary origins and ages of proviruses persisting on ART with those of HIV sequences emerging from the reservoir (i.e., as rebound viremia) (
30), which have been shown to include within-host recombinants of unknown origin (
31). Such analyses can help illuminate how the rebound-competent reservoir in blood may be distinctive from the overall, largely defective, proviral pool.
To address these knowledge gaps, we reconstructed within-host HIV evolutionary histories in seven participants enrolled in the Women’s Interagency HIV Study (WIHS) who seroconverted during follow-up. Our goal was to investigate the genetic stability of distinct proviruses sampled up to four times, up to 12 years following ART initiation. In two participants, we also investigated the diversity and age distribution of reservoir-origin HIV sequences that emerged in plasma post-ART.
DISCUSSION
We reconstructed within-host HIV evolutionary histories from pre-ART plasma and on-ART proviral sequences sampled over a median of 14 (range 9–23) years in seven participants. These analyses can reveal the lineage origins—and ages—of proviruses persisting on ART, as well as insights into the temporal stability of the on-ART proviral pool in terms of its genetic diversity, composition, and age distribution. Consistent with previous reports, clonal (defined as
env-gp-120-identical) sequences persisted long-term—in fact, one clone was recovered at all time points over an 8-year period in participant 5. Clones also “waxed and waned” over time (e.g., participants 5 and 6) and dominated in some cases (participants 3 and 6’s proviral pools were >50% clonal) (
20,
21,
42–48). To avoid our genetic diversity assessments being influenced by clonal expansion, which increased over time in 4 of 6 participants, clones were collapsed down to a single representative per time point.
Despite increasing clonality, the distinct proviral sequences comprising the proviral pool were highly stable in terms of genetic diversity, composition, and age distribution. Though we found no broad evidence that proviral diversity was being lost over time, we did detect a modest yet statistically significant shift in proviral composition in participant 4, for whom the proviruses sampled 12 years post-ART were on average older and exhibited different population structure than those sampled in the earlier years of ART. We do not believe that this is a sampling artifact, as proviruses were sampled twice in year 12 of ART, with consistent results. Rather, the more plausible explanation is that younger proviruses (i.e., those seeded just prior to ART initiation) were preferentially eliminated during the initial years of ART. This gradually shifted the balance towards older, more long-lived proviruses, making them more likely to be detected using the limiting-dilution approaches used here. This observation is also consistent with a study of four individuals with HIV subtype C with longitudinal on-ART sampling (
6), as well as a recent study in a non-human primate model of HIV (
49), both of which suggested that younger proviruses were preferentially eliminated during these initial years of ART. Of note, participant 4 had the longest follow-up of any individual in the study, which may have allowed the opportunity to observe this phenomenon.
Our study also indicates that the replication-competent HIV reservoir in blood (measured as HIV sequences that emerged in plasma post-ART in participants 1 and 5) represents a genetically restricted subset of the overall proviral pool, which is predominantly defective. Consistent with prior studies (
5–7,
9–12), participants’ on-ART proviral pools ranged from modestly (e.g., participant 4) to substantially (e.g., participant 3) skewed towards viral variants that dated to the years immediately preceding ART, which is consistent with continual reservoir seeding—yet relatively rapid turnover—during untreated infection. By contrast, the plasma HIV sequences that emerged post-ART were a restricted subset that exclusively dated to the years immediately prior to ART. This suggests that replication-competent reservoir sequences older than this had already been eliminated, or were extremely rare. Alternatively, it is possible that they exist but could not reactivate (e.g., due to integration into inaccessible chromatin (
50)) or that they reactivated but could not replicate effectively (e.g., because they were inhibited by host immune responses). Indeed, it is increasingly being appreciated that HIV rebound is a selective process, where the viruses that replicate to high levels in plasma are not necessarily those that reactivated first, but those that host immune responses, particularly antibodies, fail to control (
51–53). By definition, the observation that rebound viruses integrated near the time of ART initiation means that they will be enriched in immune escape mutations, because sequences from this infection era will have had the longest time to adapt to within-host responses (
30).
Participant 1’s data also suggested that, during extended ART interruption, viral rebound occurs in sequential “waves” of reactivation from individual reservoir cells (or clonal populations). This was supported by the emergence of slightly more ancestral viral sequences 6 months into the treatment interruption. Participant 1’s data are also consistent with a Simian Immunodeficiency Virus (SIV) study that showed that rebound viruses can re-seed the reservoir if rebound viral loads reach pre-ART levels (
54), which occurred in this case.
Of note, recombinant proviruses were identified in nearly all participants, and recombinant HIV RNA sequences also emerged in plasma in participant 1 after ART interruption. Almost all of these recombinant sequences represented mosaics of sequences that plausibly co-circulated at the same time. Nevertheless, we identified two recombinant proviruses, one each from participants 2 and 4, whose parents dated to different enough infection eras that co-circulation of these sequences was unlikely. Rather, the discovery of these two proviruses suggests that reservoir cells can become superinfected with HIV from another infection era. Though mathematical modeling suggests that this type of recombination occurs, and represents a latent HIV genome survival mechanism (
39), it has never to our knowledge been empirically observed. We acknowledge however that our observations are not definitive and that HIV sequences with substantially different root-to-tip divergences could theoretically have co-circulated for long periods yet remained unsampled in blood.
The source of recombinant viruses during rebound also remains an open question (
31). While recombinant plasma HIV RNA sequences were observed during participant 1’s rebound event, we did not identify any proviruses that exactly matched these sequences (though proviral sampling occurred some years after the rebound). While this suggests that recombinants were generated
de novo during rebound, we cannot exclude the possibility that matching proviruses did exist in blood but we failed to detect them, or that they had existed but were eliminated before we were able to sample them or that recombinant proviruses resided in tissue.
Our study has some caveats and limitations. All participants were women. Though there is no evidence that men and women differ in terms of rates of viral evolution (
32) nor on-ART proviral genetic composition and age distribution (
6), there is evidence that
ex vivo reactivation potential and residual immune activation differ by sex (
55–57). Due to very limited sample availability (only 10 million peripheral blood mononuclear cells (PBMCs) per proviral time point), we performed sub-genomic amplification. This is because near-full-genome HIV amplification would likely have generated many sequences with various large deletions in
env-gp120 (and/or
gag) that could not be phylogenetically dated. We cannot therefore discriminate intact from defective proviruses. In fact, using data from another study (
30), we estimate a 22% overall average likelihood (range 2%–35% depending on the participant) that an intact
env-gp120 sequence comes from a genomically intact provirus. Because we only sequenced part of the HIV genome, we also cannot definitively characterize proviruses as clonal, which would require full-genome sequencing and integration site characterization. We also acknowledge that sequences isolated only once may still be part of a clonal set (
58). Because biological material was so limited, we isolated proviruses directly from PBMCs, so we could not quantify reservoir sizes nor identify the cell types harboring them. Despite these limitations, our study provides insights into the within-host evolutionary origins and temporal stability of proviral lineages on ART, along with the origins of HIV RNA emerging in blood. It also boosts the representation of women living with HIV subtype B, who are under-represented in the within-host HIV evolutionary reservoir dynamics literature.
In conclusion, the diversity of proviruses persisting on ART, which are largely genetically defective (
13–15), broadly reflects the extent of within-host HIV evolution prior to ART (
6,
7). Our results also reveal that the clonal expansion that commonly occurs during the initial years of ART is not appreciably accompanied by the loss of distinct proviral lineages during this time. In fact, on-ART proviral genetic composition remained remarkably stable, with the exception of participant 4, in whom some of the proviruses that had integrated near ART initiation had been preferentially eliminated by the 12th year of ART. Our analysis of recombinant sequences also supports the notion that reservoir cells can become superinfected with HIV reactivated from older infection eras, yielding mosaics of older and younger sequences. Finally, our observations suggest that the replication-competent reservoir (studied here as rebound HIV sequences) comprises a genetically restricted, younger subset of all proviruses persisting in blood. If so, HIV cure strategies will need to eliminate a reservoir whose key characteristics may differ from those of the overall proviral pool.
ACKNOWLEDGMENTS
We thank Mark Brockman for the helpful discussions.
The authors gratefully acknowledge the contributions of the study participants and dedication of the staff at the MWCCS sites.
Data in this manuscript were collected by the Women’s Interagency HIV Study (WIHS), now the MACS/WIHS Combined Cohort Study (MWCCS).
The contents of this publication are solely the responsibility of the authors and do not represent the official views of the National Institutes of Health (NIH).
MWCCS (Principal Investigators): Atlanta CRS (Ighovwerha Ofotokun, Anandi Sheth, and Gina Wingood), U01-HL146241; Baltimore CRS (Todd Brown and Joseph Margolick), U01-HL146201; Bronx CRS (Kathryn Anastos, David Hanna, and Anjali Sharma), U01-HL146204; Brooklyn CRS (Deborah Gustafson and Tracey Wilson), U01-HL146202; Data Analysis and Coordination Center (Gypsyamber D’Souza, Stephen Gange and Elizabeth Topper), U01-HL146193; Chicago-Cook County CRS (Mardge Cohen and Audrey French), U01-HL146245; Chicago-Northwestern CRS (Steven Wolinsky, Frank Palella, and Valentina Stosor), U01-HL146240; Northern California CRS (Bradley Aouizerat, Jennifer Price, and Phyllis Tien), U01-HL146242; Los Angeles CRS (Roger Detels and Matthew Mimiaga), U01-HL146333; Metropolitan Washington CRS (Seble Kassaye and Daniel Merenstein), U01-HL146205; Miami CRS (Maria Alcaide, Margaret Fischl, and Deborah Jones), U01-HL146203; Pittsburgh CRS (Jeremy Martinson and Charles Rinaldo), U01-HL146208; UAB-MS CRS (Mirjam-Colette Kempf, Jodie Dionne-Odom, Deborah Konkle-Parker, and James B. Brock), U01-HL146192; UNC CRS (Adaora Adimora and Michelle Floris-Moore), U01-HL146194.
The MWCCS is funded primarily by the National Heart, Lung, and Blood Institute (NHLBI), with additional co-funding from the Eunice Kennedy Shriver National Institute Of Child Health & Human Development (NICHD), National Institute On Aging (NIA), National Institute Of Dental & Craniofacial Research (NIDCR), National Institute Of Allergy And Infectious Diseases (NIAID), National Institute Of Neurological Disorders And Stroke (NINDS), National Institute Of Mental Health (NIMH), National Institute On Drug Abuse (NIDA), National Institute Of Nursing Research (NINR), National Cancer Institute (NCI), National Institute on Alcohol Abuse and Alcoholism (NIAAA), National Institute on Deafness and Other Communication Disorders (NIDCD), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute on Minority Health and Health Disparities (NIMHD), and in coordination and alignment with the research priorities of the National Institutes of Health, Office of AIDS Research (OAR). MWCCS data collection is also supported by UL1-TR000004 (UCSF CTSA), UL1-TR003098 (JHU ICTR), UL1-TR001881 (UCLA CTSI), P30-AI-050409 (Atlanta CFAR), P30-AI-073961 (Miami CFAR), P30-AI-050410 (UNC CFAR), P30-AI-027767 (UAB CFAR), P30-MH-116867 (Miami CHARM), UL1-TR001409 (DC CTSA), KL2-TR001432 (DC CTSA), and TL1-TR001431 (DC CTSA).
In addition, this work was supported by the Canadian Institutes of Health Research (CIHR) through a project grant (PJT-159625 to Z.L.B. and J.B.J.) and a focused team grant (HB1-164063 to Z.L.B.). This work was also supported by the Martin Delaney "REACH" Collaboratory (NIH grant 1-UM1AI164565-01 to Z.L.B.), which is supported by the following NIH co-funding Institutes: NIMH, NIDA, NINDS, NIDDK, NHLBI, and NIAID. This work was also supported by the Einstein-Rockefeller-CUNY Center for AIDS Research (NIH grant # P30AI124414 to H.G.). A.S. and B.R.J. are supported by CIHR Doctoral Research Awards (stipend support). S.M. was supported by an FHS Undergraduate Student Research Award (stipend support). M.C.D. and H.S. are supported by CIHR Canada Graduate Scholarship—Master’s awards (stipend support). N.N.K. was supported by a CIHR Vanier Doctoral Award (stipend support). Z.L.B. was supported by a Scholar Award from Michael Smith Health Research BC (salary support). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.