Pseudomonas aeruginosa is a gram-negative bacterium commonly isolated from fresh water and soil. The opportunistic pathogen causes a variety of human infections, including infections of the ear canal, lung, eye, and urinary tract.
P. aeruginosa is particularly dangerous for immunocompromised patients such as those with human immunodeficiency virus, cancer, or severe burn wounds. Most cystic fibrosis patients have pulmonary
P. aeruginosa infections for the majority of their lives. Infection also occurs in many other organisms, including insects (
9), nematodes (
16), plants (
48), and amoebas (
52).
Iron is necessary for the growth and survival of almost all organisms, with the exception of
Lactobacillus species and some intracellular parasites (
3). Iron is incorporated into numerous protein moieties such as iron-sulfur clusters and participates in many cellular redox reactions. Although abundant on the surface of the Earth, iron is not readily accessible because it rapidly oxidizes to insoluble Fe
3+ compounds under aerobic conditions. Bacteria obtain iron by a variety of mechanisms, including secretion and subsequent uptake of ferrisiderophores, direct uptake of ferrous iron, uptake of host iron-binding molecules, and proteolytic destruction of host proteins with subsequent iron scavenging (
54). Mechanisms for iron acquisition via siderophores are diverse; at least 500 chemically distinct siderophores are known (
69). Although almost all organisms require iron, excess iron is highly toxic in aerobic environments, particularly because of the production of hydroxyl radicals in the Fenton reaction.
Pyoverdines are siderophores made by members of the genus
Pseudomonas (
10,
37). Pyoverdine is the primary siderophore of
P. aeruginosa, and pyoverdine mutants have attenuated virulence in multiple infection models (
39,
66). Each strain of
P. aeruginosa makes one of three pyoverdine types, each type having a distinct peptide side chain that is synthesized nonribosomally (
40). The pyoverdine receptor is also type specific, transporting only its corresponding pyoverdine (
14).
In a whole-genome-diversity study in which we compared two cystic fibrosis strains and one environmental strain of
P. aeruginosa with the PAO1 reference strain, we found the pyoverdine region to be the most divergent alignable locus in the genome (
64). Given the divergence in pyoverdine genes and the variety of different pyoverdine molecular structures, the pyoverdine locus appeared to be a possible target of diversifying selection. To determine the degree and patterns of diversity in pyoverdine genes, we sought to clone, sequence, and analyze the pyoverdine region from multiple strains.
MATERIALS AND METHODS
Bacterial strains and growth conditions.
The strains used in this study are listed in Table
1. Strains were grown overnight on L plates at 37°C. Overnight growth at 37°C with solid or liquid Casamino Acids (Difco) medium was used to induce pyoverdine production. L plates containing 12.5 μg of chloramphenicol per ml were used to grow
Escherichia coli containing fosmid clones. XL1-Blue MR
E. coli cells were used as the host strain for fosmid clones (Stratagene, La Jolla, Calif.).
Saccharomyces cerevisiae transformants carrying yeast capture clones were selected on uracil-deficient medium (
63) with 2.5 μg of cycloheximide per ml. L plates containing 6 μg of chloramphenicol per ml were used to grow
E. coli containing yeast capture clones.
E. coli DH10B cells were used as the host cells for yeast capture cloning (
24).
Pyoverdine typing.
Pyoverdine strain typing was done via crude pyoverdine purification and isoelectric focusing gel electrophoresis as described (
40). Once the nucleotide sequences of each pyoverdine sequence type became available, whole-cell PCR with pyoverdine type-specific primers was used to type strains (sequences available from the authors on request).
Fosmid library creation and screening.
P. aeruginosa genomic DNA was prepared by phenol-chloroform extraction and then further purified by ethanol precipitation. Genomic DNA was partially digested with restriction enzyme Sau3AI and size-separated by pulsed-field gel electrophoresis (Bio-Rad CHEF_DRII apparatus, 1% agarose gel, 14°C, 16 h, pulse ramp 1 to 6 s). DNA segments sized 35 to 52 kb were cut from the gel and recovered by electroelution in dialysis bags (SPETRA/POR molecular porous membrane tubing). Fosmid vector pFos1 was prepared as described (
30) and genomic DNA fragments were ligated into the prepared vector with T4 DNA ligase (Roche, Basel, Switzerland).
The ligated vector was then packaged with Giga Pack III XL packaging extract (Stratagene), and E. coli cells were infected. Fosmid-containing colonies were picked into 384-well plates and also pooled into 96-well plates at eight colonies per pool. Fosmid-containing colonies were picked to 10-fold coverage of the genome, assuming average insert sizes of 40 kb.
Fosmid pools were screened with PCR at multiple sites upstream and downstream of the pyoverdine region. Primers were designed to amplify approximately 500-bp fragments and amplicons were spaced 1 to 3 kb apart. Once PCR-positive pools were identified, individual fosmid-containing cells were screened with PCR. Positive individual cells were regrown in 3 ml of L broth at 37°C for 18 h, lysed with 0.2 N NaOH with 1% sodium dodecyl sulfate treatment, and DNA was precipitated with isopropanol. Purified fosmids were retested with PCR and by digestion with multiple restriction enzymes.
Yeast capture cloning.
Yeast capture cloning vectors were assembled via yeast recombination (
45). PCR with tailed primers carrying homology to both vector and target DNA was used to build target segments. Primers used to amplify upstream of the pyoverdine region included sequences from genes
pvdQ and
pvdA, while primers used to amplify downstream of the pyoverdine region included sequences from genes PA2405 and PA2406. Targeting plasmid pEHS6 was assembled by transforming into
S. cerevisiae the two target segments, a linearized yeast-
E. coli shuttle vector, and a central fragment carrying the yeast wild-type
CYH2 gene with the Amp/
ori stuffer fragment (
57).
Genomic DNA was prepared as described (
34). After passing genomic DNA 20 times through a 26.5-gauge needle, sheared genomic DNA and the linearized yeast capture cloning vector were transformed into yeast spheroplasts (
57). Plasmid DNA was prepared from pooled yeast colonies and transformed into
E. coli cells (
26). Yeast capture clones were screened by whole-cell PCR with primers specific to vector-insert junctions.
PCR was used to test for the presence of deletions in strains R′ and ATCC O13 by designing primers at the boundaries of each potential breakpoint. Primers located inside putative deletions were designed with sequence from type III strain 206-12.
Multiple complete digest gels and DNA sequencing of fosmid and yeast capture clones.
Multiple complete digest gels were used to verify the 35- to 52-kb size of each isolated fosmid insert or the size of each yeast capture clone (
71). Restriction enzymes EcoRI, BstBI, and NcoI were used. When prior sequence data were available, fragment sizes from newly acquired fosmid or yeast capture clones were compared with those predicted from the sequence.
Shotgun libraries of fosmid and yeast capture clones were made by shearing and ligating prepared DNA into a pBluescript vector. Greater than eightfold sequence coverage was obtained for each clone by standard shotgun sequencing methods. Automated sequence finishing was done with Autofinish software (
22), manual finishing was performed if necessary, and sequence assemblies were checked against multiple complete digest gel patterns. Sequences for strains PAO1, PA14, 1-60, and 2-164 were obtained from GenBank.
DNA sequence analysis.
All pairwise sequence alignments were performed with cross_match software, a banded implementation of the Smith Waterman algorithm (
http://www.phrap.org/ ). Gene predictions were performed with Genemark (
35). Protein domain predictions were performed with PFAM (
6) and NCBI's Conserved Domain Database (
36). The EMBOSS program lindna was used to generate windowed single nucleotide polymorphism (SNP) plots in Fig.
1, and the EMBOSS programs cusp and cai were used to estimate the codon adaptation index of a gene (
58). The codon adaptation index of genes was calculated in reference to a codon table of all genes in the PAO1 genome. Nonribosomal peptide synthetase specificity was predicted with software at the website
http://raynam.chm.jhu.edu/≈nrps/ (
13). PCR primer design was performed with Primer3 software (
60). Hydrophobicity plots were generated with the algorithm of Kyte and Doolittle (
32) with a window size of 15 amino acids. The program PROF 1.0 was used to predict the secondary structure of proteins (
46).
To estimate deviations in tetranucleotide usage in a sequence, we used a Markov maximal-order model (
27,
51,
59). This standard model produces an expected count of oligonucleotide usage that accounts for bias in smaller oligonucleotides contained within the oligonucleotide. Let
W = (
w1,
w2,
…wm) denote the word formed by
m adjacent nucleotides and
N(
W) denote the observed number of occurrences of the word in a sequence of length
n. The expected count
E(
W) of
W is
E(W) = N(w1, w2, …wm−1) N(w2, w3, …wm)/N(w2, w3, …wm−1)
To measure how observed tetranucleotide usage deviated from the expectation, we calculated the frequency of divergence of a word F(W) as the ratio of the observed count N(W) to the expected count E(W). Linear regression analysis was then used to compare F(W) from two sequences, such as the pyoverdine region and the PAO1 genome, and produce an R2 value that estimated the similarity in tetranucleotide usage between the sequences.
Gene names were kept consistent with PAO1 gene naming conventions when predicted protein domains and synteny for genes from different types indicated obvious homologies. The nonribosomal peptide synthetase genes were given gene names consistent with the PAO1 nonribosomal peptide synthetase genes with a number added indicating their pyoverdine type. Type-specific genes were named based on predicted protein domains.
To search for additional type-specific genes outside the divergent pyoverdine region, nucleotide alignments were made between the genomes of type I strains PAO1 and PA14, the unfinished genome of a type II strain (data not shown), and the 0.5X shotgun sequence reads of three type II strains (
64). Type III sequence data were not available for comparison outside of the pyoverdine region.
Tests for positive selection.
We used a maximum-likelihood method (
73) to estimate whether a gene had experienced positive selection, and if so, we identified the specific codons that were under positive selection. Maximum-likelihood models allow one to test a great number of combinations of parameter values and give a score to each combination according to how well the parameters fit the data; the combination with the best score fits the given data best. Our data were a multiple alignment of nucleotide sequences obtained for each gene in the pyoverdine region. Also input to the models was a phylogenetic tree that was constructed for each gene. We used models whose parameters included the nonsynonymous-to-synonymous ratio ω, the transition-to-transversion ratio κ, and equilibrium codon frequencies π in each codon. ω is a measure of how frequently single-base-pair differences change the amino acid coded for in a codon (as opposed to silent, or synonymous changes): ω < 1 suggests purifying selection, ω = 1 suggests neutral evolution, and ω > 1 suggests positive selection. Values for ω and κ were estimated from the data, and codon frequencies were estimated from nucleotide base frequencies at each of the three codon positions. The genes with the highest scoring combination of parameter values that included a ω greater than 1 were evaluated as being under positive selection.
In practice, maximum likelihood does not test all possible parameters. One must provide a starting point for each parameter, and the software will then incrementally change the parameter values to find a local high score. For each test, we used starting values of 0.1, 1.0, and 3.0 for ω to ensure a broad sampling of likelihood space. When the likelihood scores from each of these starting values agreed, we were confident that we had obtained a true maximum and not a locally maximal score. Below are descriptions of two nested models that have detected positive selection in other systems and which we used to test pyoverdine genes for positive selection.
Model M0 assumed one class of ω for all sites (aligned codons). Model M3 assumed three classes of ω (ω0, ω1, and ω2) each with a respective proportion of sites p (p0, p1, and p2). To compare the models, we calculated twice the log difference between the highest likelihood estimates of models M0 and M3. This difference was compared to a chi square distribution with four degrees of freedom to account for the difference in the number of parameters between the models. Evidence for positive selection in a gene required two conditions: model M3 predict a proportion of codons in a gene to have ω > 1, and the likelihood score of model M3 to be significantly greater than the score of model M0.
To identify specific codons under positive selection, we compared two additional models. Model M7 assumed a beta distribution for values of ω in a gene. Model M8 assumed a beta distribution with an added class of ω for a proportion of sites p1. The beta distribution was used because it allowed for a range of ω values in a gene, but only values of ω between 0 and 1. If some codons in a gene were under positive selection, then model M8 would better fit the data because it allowed a fraction of codons to have ω > 1. Two parameters, p and q, described the shape of the beta distribution. Values for p and q as well as ω and p1 were estimated from the data. We compared twice the log likelihood difference between models M7 and M8 to a chi square distribution with two degrees of freedom. Evidence for positive selection in specific codons required two conditions: model M8 be significantly more likely than model M7, and model M8 predict a fraction of codons with ω > 1. When positive selection was predicted, we could predict specific codons under positive selection by calculating posterior probabilities at each codon with Bayes' theorem. In other words, since we knew that a fraction of codons had a ω value predicted by model M8, we could calculate the probability that each codon belonged to this class of ω.
The program codeml from the PAML software package was used to test genes for positive selection (
72). Amino acid multiple alignments were created with ClustalW (
67). Alignments were then back-translated to nucleotide sequences and a neighbor-joining tree was generated with the program Neighbor with the phylogenetic software package PHYLIP (
18).
To identify the locations of positively selected amino acids in FpvA, we aligned the amino acid sequence of FpvA from PAO1 to sequences from three
E. coli proteins of known structure: ferric citrate transporter FecA (
19), ferric enterobactin transporter FepA (
8), and ferrichrome-iron transporter FhuA (
20). Each of these proteins is in the same protein family as FpvA, and contains a TonB-dependent receptor domain (PF00593).
Nucleotide sequence accession number.
Refer to Table
1 for sequence accession numbers.
DISCUSSION
Major genomic differences between strains of
P. aeruginosa fall into two broad categories. Some strains contain an insertion of one or more genes that are not present in other strains. Examples include the mobile genetic islands pKLK106 and pKLC102 (
31) and pathogenicity islands PAG1 (
34), PAG2(C), PAG3(SG) (
33), and PAPI-1 and PAPI-2 (
25). These insertion-deletion polymorphisms do not occur randomly throughout the genome, but often in hotspots such as specific tRNA genes. Plasmids are similar in that they are present or absent in different strains. In contrast, the pyoverdine region is like the O-antigen biosynthetic region (
56), the
pilA locus (
28), and the flagellar glycosylation region (
5): each is present in all strains, but the genes in each locus are highly divergent between strains. This replacement island phenomenon presumably results from diversifying selection, a type of selection that maintains multiple alleles in a population.
It is interesting that alleles of pvdE or fpvA are more similar to genes from other soil bacteria than to other P. aeruginosa alleles. This situation is typical of a region under diversifying selection, where divergent alleles predating a speciation event can be inherited in both new species. However, since there is unusual codon usage and tetranucleotide usage in some pyoverdine types, horizontal transfer seems a more probable explanation for the trans-species polymorphism observed at this locus. Since the pyoverdine region does not show unusual G+C content, perhaps some pyoverdine genes originated in another organism with high G+C content. Other Pseudomonas species are primary candidates for the source of divergent alleles, but high similarity to fpvA was also seen in the high-G+C soil bacteria Agrobacterium tumefaciens and Azotobacter vinelandii.
The pyoverdine outer membrane receptor gene
fpvA may drive diversity at the pyoverdine locus. To evolve a new pyoverdine type, both the nonribosomal peptide synthetase and the receptor must coevolve, maintaining their mutual specificity. Receptors are typically specific to a particular pyoverdine structure; however, multiple specificity has been reported, such as in the putative uptake of type II pyoverdine by the type III FpvA receptor (
17,
38,
41). Our analysis identifies
fpvA as the most divergent gene alignable between types in the region and under positive selection.
fpvA is chromosomally located among the most divergent genes in the region and has the only intratype variation that is not a result of recombination between types.
Type II
fpvA is an entry site for pyocin S3, a molecule made by some
P. aeruginosa strains that is taken up by and lethal to some other strains (
7). An alternative type I receptor,
fpvB, was recently identified; the receptor is present in strains of each pyoverdine type and is regulated differently than
fpvA, suggesting that there is selection to maintain alternative modes of regulation in the receptor (
21). Viewed together, these aspects of
fpvA function suggest a dynamic evolutionary history, where change in the receptor gene leads to further changes in the system of pyoverdine genes. Adjacent to
fpvA, the nonribosomal peptide synthetase genes
pvdI,
pvdJ, and
pvdD and the putative ABC transporter
pvdE are also highly divergent and probably coevolve with
fpvA. This coevolution would select against recombination events between types that separate these genes, particularly selecting against separation of the receptor gene and the nonribosomal peptide synthetase genes. Repeated rounds of receptor change followed by compensatory mutations elsewhere could result in rapid divergence between pyoverdine sequence types.
Diversifying selection seems to be acting on type-specific genes regardless of their chromosomal location. pvdY is separated by 30 kb from the divergent pyoverdine region but has type-specific divergence. This gene, whose function is currently unknown, presumably has a type-specific function in the pyoverdine system even though it is distantly located from other type-specific genes. It remains possible that there are still other type-specific pyoverdine genes elsewhere in the genome.
One of the most puzzling results of this study is the mixture of alleles in strain 1-60. Two genes in the type II strain have type I alleles, which are significantly divergent from and are present in place of type II alleles. To have become divergent, each allele must have had a selective advantage and must have evolved separately from alleles of other types for a relatively long period of time. Yet we now observe this mixture of alleles functioning together. If the alleles of different types can function together, how have the alleles been able to diverge from one another?
Lastly, we considered possible sources of diversifying selection at the pyoverdine locus of
P. aeruginosa. Outer membrane protein genes such as the pyoverdine receptor
fpvA are common targets for entry by phage or pyocins, and siderophore diversity may be a resistance mechanism. Siderophore diversity may also be a defense against ferrisiderophore stealing. Siderophore production is a cooperative behavior, since diffusion makes the ferrisiderophore available to any cell with an appropriate receptor. However, all cooperative activities invite cheating (
23,
70). Many bacterial genome sequences, including the
P. aeruginosa genome, contain numerous putative siderophore receptor genes without the corresponding siderophore synthesis genes (
15,
49,
69). In iron-poor environments, it may be beneficial for a bacterial strain to make a siderophore that is distinct in structure from the major varieties present in the environment. This evolutionary dynamic could lead to continual generation of new siderophores whose selective advantage to a particular strain is constantly compromised when other strains acquire a compatible receptor.