Helicobacter pylori is a microaerophilic, gram-negative bacterium that colonizes the human stomach (
12). Colonization induces chronic gastritis and plays a role in the development of peptic ulcer disease and gastric adenocarcinoma (
12,
44).
H. pylori strains are highly diverse (
22,
31), as evidenced by allelic variation within
vacA (
10,
11,
17,
42,
43); the presence of nonconserved DNA fragments among different strains, such as the
cag pathogenicity island (
4,
15,
16,
28,
41,
44); and the occupation of a single genomic site with different genes such as
iceA1 and
iceA2 (
44). DNA fingerprinting and multilocus enzyme electrophoresis techniques have shown that there is greater total genetic diversity for
H. pylori than for other bacteria that have been studied (
2,
3,
20,
22,
32,
39).
H. pylori adhesion to the gastric epithelium is mediated, at least in part, through the Lewis B blood group antigen (
13). Adherence to the epithelium is believed to help protect the bacteria from gastric acidity, as well as from displacement due to peristalsis. Two
H. pylori genes,
babA and
babB, were identified based on the N-terminal similarity of their products to the Lewis B binding protein (
25), and it was determined that the
babA gene product is necessary for Lewis B binding activity. The two gene products are members of a paralogous family of outer membrane proteins, in which the members have significant N- and C-terminal similarity (
40).
babA and
babB are nearly identical in their 5′ and 3′ regions, with most of their sequence divergence being in their midregions (
8,
25).
DISCUSSION
We classified the 11 (10 in J99)
babA-related genes in 26695 as
babA paralogues by several criteria, including the natural break found in the BLAST search probability scores in both strains J99 and 26695 and their substantial N- and C-terminal similarity. Both
babA (HP1243) and
hopZ (HP9) have been shown to be involved in
H. pylori adherence to gastric cells (
25,
36), and paralogues
hopD(HP25) and
hopA (HP229) appear to be porins, involved in molecular transport across the
H. pylori membrane (
40). These 11 paralogues do not constitute the full
H. pylori repertoire for adherence or molecular transport, since other members (HP912 and HP913) of the family of outer membrane proteins have been shown previously to be involved in adherence (
34). HP912, HP913, and HP706 also appear to function as porins (
19,
40).
The strong similarities at both the N and C termini of most of these paralogues imply their necessity for conserved functions, whereas the central variable regions likely encode unique functions. The predicted transmembrane domains in the conserved N- and C-terminal regions of each paralogue could serve as membrane anchors, with the variable regions forming extracellular loops involved in specific ligand binding, or other unique functions. The extensive 5′ and 3′ identities among the paralogues also could facilitate both intrastrain or interstrain recombination; interstrain recombination would be a powerful mechanism for increasing the functional repertoire of the recipient strains. That
babA and
babB are in opposite locations in relation to flanking genes in strains J99 and 26695 suggests that a reciprocal exchange could have occurred (
8). The high (74 to 98%) level of identity between the downstream intergenic regions of HP317,
babA, and
babB and the downstream gene identity for HP317 and
babB also could promote recombination.
Gene duplications among the
babA paralogues have been observed for all three strains studied; strain CCUG17875 has two copies of
babA (
25), whereas strains 26695 and J99 have identical paralogues HP227 and HP1342 and JHP212 and JHP1261, respectively. Gene duplications potentially yield additional functional copies of the gene for enhanced expression of the product (
14). The duplicate genes also may serve as the foci of gene conversion events that result in horizontal genetic movement between the copies (
6,
7,
23). In strain CCUG17875,
babA2 (but not
babA1) is necessary for Lewis B binding (
25). Thus,
babA1 does not increase binding efficiency but may be useful for increasing
babA diversity through gene conversion.
Segments of DNA, such as pathogenicity islands, that differ in GC content from the surrounding chromosome are believed to result from relatively recent cross-species acquisitions (
29). That the
babA paralogues have a consistently higher GC content (mean, 43%) than does the
H. pylori genome (39%) suggests that they may have been a relatively recent genomic acquisition, followed by gene duplication events leading to the presence of multiple paralogues. The presence of HP317 in strain 26695 (absent in J99) could be explained by an even more recent gene duplication event or conversely by its deletion in J99. That the amount of intergenic similarity for each of the
babA paralogues (90.4%) in 26695 and J99 is significantly less than that for the entire genomes (94.0%) suggests that, on average, they are diversifying faster than the rest of the
H. pylori genome.
Allelic variation within individual
babA paralogues has been previously reported for HP9 (
hopZ; with two distinct alleles) and for HP1342 and HP227 (
27,
36). For
hopZ (
36), the region of greatest diversity is located in nearly the analogous position to the
babA and
babB diversity regions. Despite the substantial polymorphism that exists throughout the
babA fragments, the 42 strains that we studied cluster phylogenetically almost exactly according to the diversity present in the 84-nucleotide allele group segment (data not shown); that this region dominates the phylogenetic structure suggests an important functional role. Strain 97-147 (AD5), which has features of each of the other
babA allele groups, is most closely related to another South American strain (90-40, AD2). Its unique sequence indicates the possibility of recombination within the
babA allele groups in the diversity region, paralleling (but less common than) recombination within the
babB allele group region.
Most of the diversity in the
babB fragments exists in a 108-nucleotide segment. Each of the different allele groups of
babB contains greater diversity than that which exists within each
babA allele group (Fig.
2,
3, and
5), which likely is due to interallelic recombination. In contrast to
babA, phylogenetic analyses of the entire
babBfragments show that strains do not segregate according to the allele groups but rather based on the geographic origin of each strain (data not shown). That the
babA and
babB allele group regions are largely independent of geographic origin but are flanked by regions that show evidence of geographic variation implies that these allele groups have moved horizontally throughout the
H. pylori population. In support of this hypothesis, the similarity plot analysis indicates that the borders of each of the allele groups may represent recombination breakpoints (Fig.
3).
Overall, for the sequences studied,
babA shows much more variation in length and lower average similarity than
babB. In the third codon position, transversions are more likely to change amino acids in coding sequences, while transitions almost always leave coding sequences intact (Met and Trp are the only exceptions); thus, transitional substitutions tend to predominate for most species (
30). In
H. pylori, transitions account for most interstrain diversity, accounting for a mean of 80% (range, 66 to 94%) of the polymorphisms (
45). However, in both
babA and
babBsequences, transitions account for only 50% of the polymorphisms; thus, transversions are far more common than the typical
H. pylori gene (
45). Because both
babA and
babB are outer membrane proteins, the diversity observed in this region in both genes may result from selection, possibly due to ongoing immune recognition.
Analysis of synonymous and nonsynonymous substitutions among the
babA and
babB diversity region fragments indicates that the
babB fragments share a more recent common ancestor and that, as measured by the
Ka/
Ks ratio, both gene products are under similar functional constraints. That, for both genes, the
Ka/
Ks ratios for comparisons of sequences from specific allele groups are little different from one another (data not shown) and are considerably less than those observed for fragments within single alleles of either
cagA and
vacA indicates that there are similar functional restrictions between these allele groups (
11,
42).
The existence of substantial recombination in both the
babAand
babB fragments is supported by the consistency index, compatibility matrices, and the homoplasy test. Homoplasies can arise through recombination or independent mutations, and the
Hratio gives a measure of the observed synonymous homoplasies relative to the observed sequence variation (
1,
33). The
H ratios for
babA and
babB indicate that substantial recombination has likely occurred within both genes and at a level similar to that observed previously for several housekeeping genes (
1). However, the presence of the allele groups and the geographic variation in both
babA and
babB (Fig.
4 and
6) indicate that recombination has not been sufficient to totally obscure evidence of clonal descent and suggest specific functional differences among the different allele groups. The high frequency of recombination for both
babA and
babB helps explain why the phylograms of the same strains are not congruent.
The presence of well-conserved allele groups in the
babAand
babB diversity regions implies an important functional role.
babA has been shown previously to be responsible for Lewis B binding in
H. pylori(
25), which we now confirm. Because of the substantial similarity of
babB and
babA, babB might be involved in Lewis B binding as well; however, our data clearly indicate that neither the presence of
babB allele groups nor the presence of
babA allele groups is a determining factor in Lewis B binding.
In summary, both substantial conservation and variation exist among the babA paralogues. Two such paralogues,babA and babB, show both geographic and allelic group-associated variation in their predicted regions of maximum diversity. babB fragments appear to share a more recent common ancestor, but both the babA andbabB gene products are under similar functional constraints. Although recombination accounts for much of the variation inbabA and babB, as for vacA, it is not sufficient to obscure clonal structures present in both genes. Despite the involvement of babA in Lewis B binding, neither the babA allele groups nor the babBallele groups are determining factors in Lewis B binding. Whether the presence of different alleles of babA and babBhas other functional implications could aid in our understanding ofH. pylori-host ligand interactions.