INTRODUCTION
Molecular typing of pathogen populations is essential to gain insight into their genetic diversity and population dynamics in order to elaborate efficient strategies for disease control (
1,
2). In agricultural systems, pests are ideally controlled by integrated approaches, including eradication or treatment of diseased organisms and planting of resistant varieties. However, the durability of resistance can be challenged if pathogen diversity is significant. Importantly, gene flow between pathogen populations can facilitate the breakdown of resistance in crop plants (
3). Hence, efficient and precise molecular-typing tools for identifying strains and differentiating among related bacterial isolates are essential for microevolutionary reconstruction as a population genetics approach for integrated plant protection.
Rice, one of the major crops worldwide, is affected by two bacterial diseases that are caused by strains of
Xanthomonas oryzae, bacterial leaf blight (BLB), caused by
X. oryzae pv. oryzae, and bacterial leaf streak (BLS), caused by
X. oryzae pv. oryzicola. Collectively, these two diseases cause significant yield losses in tropical and temperate rice-growing areas.
X. oryzae pv. oryzae colonizes xylem vessels upon entry into the vascular system.
X. oryzae pv. oryzicola infects the plant via natural openings and colonizes the mesophyll (
4). Genomes of members of both pathovars have been sequenced; however, the determinants of tissue specificity are still largely unknown (
5,
6). While
X. oryzae pv. oryzicola has been shown to be seedborne and seed transmitted (
7,
8), the evidence that
X. oryzae pv. oryzae is seedborne is still controversial (
7,
9).
Because both pathogens infect the same host species and cause symptoms that at later stages of infection may be difficult to distinguish, both BLB and BLS diseases are not easy to unambiguously diagnose in the field. In the laboratory, strains of the two pathovars are identified by inoculation methods on susceptible host plants (
10). Upon leaf inoculation,
X. oryzae pv. oryzicola colonizes the mesophyll, resulting in water-soaked lesions, while
X. oryzae pv. oryzae colonizes and spreads in the vascular system, resulting in long lesions along the leaf blade (
4,
10). Besides the observation of visual symptoms, and for a more reliable diagnosis, a genomics-based multiplex PCR has been developed to differentiate the two pathovars (
11). Recently, these diagnostic loci were converted into a loop-mediated isothermal amplification (LAMP) assay (
12). In addition to pathovar discrimination, other assays were developed that can differentiate
X. oryzae strains from Africa and Asia (
13). Currently, phenotypic, multiplex PCR, and LAMP methods are routinely used by several laboratories to identify, to characterize, and to detect
X. oryzae strains.
BLB was first described in Fukuoka Prefecture, Japan, in 1884 (
14,
15) and has been reported in a number of rice-growing countries from Iran to Japan and Philippines (
16). The disease was also reported in African countries, including Mali, Senegal, Niger, Gabon, Nigeria, Cameroon, Burkina Faso, Mauritania, and the Gambia (
17). Reports for northern Australia (
18) and South America (
19) exist but are rather sporadic and less important (
16).
BLS was first described in Philippines in 1918 and is prevalent in Asia (
16), as well as in Africa in Mali (
20) and Burkina Faso (
21), and was recently reported in Madagascar (
22), Burundi (
23), and Uganda (
24).
So far, studies have focused on
X. oryzae pv. oryzae, describing the genetic and pathotypic diversity of different populations. Most of these studies analyzed regional populations of
X. oryzae pv. oryzae, mainly using pathogenicity assays and/or restriction fragment length polymorphism (RFLP) profiling (
25–30). For instance, Mishra and coworkers (2013) analyzed more than 1,000 isolates of
X. oryzae pv. oryzae based on their reactions to 10 resistance genes. They also differentiated a subset of strains by an RFLP analysis using IS
1112 as a probe. However, the study included only strains from India (
30).
Much less is known about
X. oryzae pv. oryzicola diversity. In Philippines, a study evaluated the diversity of 123
X. oryzae pv. oryzicola strains by RFLP analysis, concluding that the pathovar is endemic due to the large diversity of strains (
31). A repetitive-element palindromic (rep)-PCR analysis of 141 Chinese
X. oryzae pv. oryzicola strains revealed significant genetic diversity among
X. oryzae pv. oryzicola strains in southwest China (
32). One of the most exhaustive phylogenetic studies of
X. oryzae was performed by Gonzalez and coworkers, who analyzed a set of 26 Asian
X. oryzae pv. oryzae strains, 21 African
X. oryzae pv. oryzae strains, and 14
X. oryzae pv. oryzicola strains from different origins (
20). Using a polyphasic approach, including RFLP, rep-PCR, and fluorescent amplified fragment length polymorphism (AFLP), three lineages were defined within the species as Asian
X. oryzae pv. oryzae, African
X. oryzae pv. oryzae, and
X. oryzae pv. oryzicola (from Asia and Africa). More recently, multilocus sequence typing (MLST) targeting nine housekeeping genes of a few strains of
X. oryzae confirmed the designation of the three lineages (
33). Other studies combining multilocus sequence analysis (MLSA) with the analysis of the type III effector repertoires of 40
X. oryzae strains belonging to different pathovars and from different geographical origins clearly confirmed that both
X. oryzae pv. oryzae and
X. oryzae pv. oryzicola pathogens belong to closely related but distinct phylogenetic groups formerly defined as lineages (
20,
34). Finally, MLST and RFLP analyses focusing on a large collection of
X. oryzae pv. oryzicola strains revealed a high level of genetic diversity among African strains (
35).
Molecular methods previously used to evaluate the genetic diversity of
X. oryzae were either cumbersome, poorly reproducible, insufficiently discriminative, or difficult to interpret evolutionarily. Over the past several years, MLST based on a set of housekeeping genes has become popular to investigate microbial populations of bacterial pathogens (
36). However, this technique does not have enough resolution for an in-depth study of
X. oryzae populations or epidemic outbreaks. Now, with easy access to nearly complete genome sequences, single-nucleotide polymorphism (SNP) analyses of genomes receive more attention due to the unprecedented increase in resolution they provide and the consequent possibility of investigating epidemics more accurately (
37,
38). Although the costs of genome sequencing have decreased tremendously in the last few years, such typing methods are unlikely to become widely adopted as a standard due to bioinformatic and infrastructural constraints. Most developing countries that are concerned with epidemiological surveillance of plant diseases are not yet equipped for such analyses.
First reported in eukaryotic species, DNA motifs that are repeated in multiple copies were identified in bacterial genomes (
39). Variation between strains is reflected by a change in the size of the repeat array, called variable-number tandem repeats (VNTRs) or mini- or microsatellites. Among the different models of mutations of microsatellites, the stepwise mutation model (SMM) was widely adopted for microsatellites, even if large jumps in repeat numbers may occasionally occur (
40,
41). The SMM postulates that the size of a VNTR locus evolves through the addition or deletion of one repeat unit per mutation event. Consequently, VNTR loci provide us with connectible data reflecting patterns of evolutionary descent that can be used for epidemiological tracing of bacterial strains. The flanking regions next to the repeats are generally well conserved, sometimes even among different species. Consequently, PCR primers could be designed allowing the analysis of VNTR polymorphisms at different levels, e.g., the species or subspecies level (
42). Since 2001, shortly after the first genome sequence became available, VNTR studies of bacterial plant pathogens became more and more popular, as exemplified by
Xylella fastidiosa (
43). In 2009, the first VNTR typing scheme was developed for
Xanthomonas species (
44). Later, a 25-locus-based VNTR scheme (named MLVA-25) was developed for
X. oryzae pv. oryzicola (
45) and evaluated on a limited collection of
X. oryzae pv. oryzicola strains. Preliminary
in silico analyses indicated that a few loci would be useful to characterize the
X. oryzae pv. oryzae strains, as well, but the discriminatory power would have been rather low for the two
X. oryzae pv. oryzae lineages in comparison to the
X. oryzae pv. oryzicola lineage. Hence, a highly discriminatory multilocus variable-number tandem-repeat (MLVA) scheme that could at the same time identify the different
X. oryzae pathovars and distinguish strains within pathovars would be very useful.
Here, we report on a new multilocus VNTR analysis that allows the production of robust and reproducible genetic data to efficiently characterize X. oryzae strains and to study epidemics of BLB and BLS on rice. For this purpose, a collection of 338 strains of the X. oryzae pathovars oryzae and oryzicola, originating from 20 countries, was analyzed. Both global and small-scale MLVAs of the three X. oryzae genetic lineages are discussed in the context of the geographical origins of the strains, the sampling dates, and the host plants. With this work, we wish to promote the worldwide use of MLVA-16 (a 16-locus-based VNTR scheme) for monitoring of BLB and BLS on various temporal and geographical scales.
DISCUSSION
We developed a new MLVA scheme that identifies the different lineages of X. oryzae, i.e., pathovars and those of continental origin, and shows high discriminatory power on small scales in space and time.
On a global scale, the MLVA-16 scheme confirmed the lineage differentiation that was largely described in previous typing and phylogenetic studies (
20,
33,
34,
55). However, on a smaller scale, the MLVA-16 scheme was shown to be more discriminatory than other previously used molecular-typing tools. The MLVA-16 scheme could discriminate strains from the same region and even strains originating from the same field. Interestingly, epidemiologically related strains kept a signature of their relationship by descent on this small spatiotemporal scale.
Recombination does not strongly bias the phylogenetic signal.
Even though the association indices are quite low for some lineages, linkage disequilibrium was highly significant. Moreover, distance matrices obtained from MLVA-16 and from a different genotyping technique (MLSA) were significantly correlated. These lines of evidence support low levels of recombination, as has also been suggested for other
Xanthomonas species (
56). However, one needs to be cautious, because these observations could result from sampling biases. For instance, geographical isolation might have occurred in our collections, consequently promoting high linkage disequilibrium (
57). To test this hypothesis, the locus association should be tested on a strongly sampled population, i.e., more strains isolated on a small spatiotemporal scale. Nevertheless, our results support the usefulness of the 16 VNTR markers to investigate demographic or epidemiological patterns.
As a special case, a distance matrix obtained from a
tal gene-based RFLP data set was not correlated with an MLVA-16-based distance matrix when applied to a set of Malian and Burkina Faso
X. oryzae pv. oryzicola strains.
tal genes, which consist of tandem repeats with a large repeat unit (
58), contribute to the colonization of host plants in a compatible interaction or trigger specific defenses in an incompatible interaction. Hence,
tal gene-based markers are expected to be under selection. Moreover, the RFLP band sizes of
tal genes do not always reflect their functional relatedness, i.e., identical
tal RFLP bands can belong to functionally unrelated
tal genes while different
tal RFLP bands can correspond to functionally analogous
tal genes. Together, these considerations may explain why a matrix generated by this marker is not correlated with a matrix obtained from neutral markers, such as most VNTRs.
MLVA-16 as a tool to identify lineages.
The MLVA-16 scheme produced some private and monomorphic alleles for each genetic lineage. Consequently, the different lineages of X. oryzae can be distinguished. In addition, other lineage-polymorphic loci also contribute to this differentiation. Hence, MLVA-16 directly identifies the pathogen responsible for a disease, i.e., BLB or BLS. Since BLB and BLS symptoms can be confused in the field at very early and late stages of the diseases, and also in cases of double infection, this information is important for efficient disease management. In the context of prior knowledge about prevalent BLB and BLS pathogens in a certain geographic area, the ability to distinguish between the two pathovars helps in deployment of disease-resistant rice cultivars and/or application of specific antibacterial agents. Moreover, the ability to distinguish between BLS and BLB might be helpful for quarantine purposes, especially with the increase in rice seed trade between countries.
FST indices of diversity suggest that African
X. oryzae pv. oryzae strains are closer to
X. oryzae pv. oryzicola than to Asian
X. oryzae pv. oryzae, as shown previously by Gonzalez and coworkers using RFLP, AFLP, and rep-PCR. However, these results are challenged by an MLSA of the three housekeeping genes
gyrB,
rpoD, and
glnA at a lower resolution, where the two
X. oryzae pv. oryzae lineages group together and are more distant from
X. oryzae pv. oryzicola (
34). Interestingly, another MLSA using a different set of housekeeping genes,
fusA,
gyrB, and
gapA, rather supported our scenario, i.e., that African
X. oryzae pv. oryzae strains are closer to
X. oryzae pv. oryzicola than to Asian
X. oryzae pv. oryzae (
33). Apparently, the choice of markers (with only a few polymorphisms) used for such analysis plays an important role in the discrimination of lineages. To resolve this problem, a genome-wide SNP analysis or an MLSA of the core genome could be performed (
55). In conclusion, MLVA-16 differentiates lineages well on a global scale.
MLVA-16 has limited value for large-scale epidemiology.
First reported in Asia,
X. oryzae pv. oryzae and
X. oryzae pv. oryzicola were described decades later in Africa (
4,
20). It has been assumed that BLB and BLS have been introduced accidentally from Asia to Africa. The fact that both African
X. oryzae pv. oryzae and
X. oryzae pv. oryzicola showed less allelic richness than Asian
X. oryzae pv. oryzae and
X. oryzae pv. oryzicola supports an Asian ancestor for African strains. However, even though this study was conducted on a large strain collection, additional and more extensive sampling will be necessary to confirm an Asian origin of the African populations of
X. oryzae.
On a global scale, the MLVA-16 scheme revealed hardly any shared or related haplotypes (SLVs and DLVs) among strains from different countries. Similar findings were obtained with a microsatellite-based MLVA-14 of
X. citri (
46). Generally, no link can be drawn from a vast number of strains on a really large geographic scale or over a long time based on microsatellites that evolve rapidly. Therefore, the development of another MLVA scheme based on TRs with a slower molecular clock, i.e., minisatellites, where repeat units are longer, would be more appropriate for large-scale epidemiology. Indeed, as shown by N′Guessan and coworkers for
Ralstonia solanacearum, the number of alleles in a large collection of strains decreases with an increase in the repeat unit size (
59). Similarly, Pruvost and coworkers demonstrated the superiority of a minisatellite-based scheme over a microsatellite-based scheme for large-scale epidemiology of
X. citri (
46). Preliminary analyses have shown that no useful VNTR loci with repeat unit sizes above 12 bp are shared among the three
X. oryzae lineages, thus preventing the development of a universal scheme. In the future, lineage-specific minisatellite schemes will be developed for global epidemiological monitoring.
Despite these concerns, MLVA-16 provided some important information on a larger geographical scale. For instance, we identified two Chinese X. oryzae pv. oryzicola strains (Xoc-China and HN-DA-1) that differed greatly from the other Asian X. oryzae pv. oryzicola strains (only 7 common alleles with the closest Asian X. oryzae pv. oryzicola strain). Our results suggest that these strains are rather related to African X. oryzae pv. oryzicola strains (10 common alleles with the closest African X. oryzae pv. oryzicola strain). This finding coincides with the recent increase of the seedborne BLS disease in Africa and with tighter commercial links between Africa and China. Moreover, the two strains from South America, which are related to two Philippine haplotypes, support a scenario with distinct events of introduction in South America, perhaps from Philippines. Finally, two African X. oryzae pv. oryzae strains from different countries, BAI3 from Burkina Faso and NAI8 from Niger, shared the same haplotype. This finding may reflect the exchange of material between western African countries.
In conclusion, even if the MLVA-16 scheme is not ideal for large-scale epidemiology with slow migration, it is still a valuable tool for more modern situations where migration is very fast due to human transport. To demonstrate the power of this tool, we would need a larger set of strains representative of the total diversity. Currently, access to strains or even genomic DNA of X. oryzae is limited due to quarantine restrictions or regulatory issues concerning biodiversity in the different countries. The MLVAbank database, which makes our data accessible for analyses and comparisons, will help to coordinate international efforts to understand the epidemiology of X. oryzae.
MLVA-16 as a new tool for local epidemiological monitoring.
Our results show that the MLVA-16 scheme is very useful for local epidemiological surveillance on a regional or country-wide scale for all three
X. oryzae lineages. First, African
X. oryzae pv. oryzae strains sampled from Malian regions (Niono and Sélingue) in the same period (2010 and 2012) showed relatively close haplotypes. Their allelic profiles reveal an expansion that has occurred clonally from an unidentified founder. However, the Malian collection showed a dynamic population structure, since Malian strains sampled in 2010 and 2012 are distant from Malian strains sampled up to 2009. However, a phylogenetic signature for relatedness remained while epidemiological information was lost, because strains of both complexes differ by at least nine alleles. Second, on a slightly larger scale, a set of
X. oryzae pv. oryzicola strains sampled in southern China in 2011 shared identical or similar haplotypes, as shown by the minimum spanning tree and eBURST analysis (
Fig. 4; see Fig. S5 in the supplemental material). Third, 24 Philippine
X. oryzae pv. oryzae strains sampled over 9 years in Laguna Bay are linked in a clonal complex. It would be interesting to clarify how such a limited number of haplotypes was maintained for many years in this region. Moreover, since several very different haplotypes of Philippine
X. oryzae pv. oryzae were found to coexist within this region, further investigation by extensive sampling would be of high interest.
Our results show that the MLVA-16 scheme holds great potential for small-scale reconstruction of population dynamics. Such an improved understanding of microevolution will be key for short- and medium-term management of control strategies for X. oryzae, particularly by deciphering the pathways of dispersal both by natural transmission and by human-mediated transmission (including seed transmission).
Conclusions.
The MLVA-16 scheme will be useful for high discrimination among
X. oryzae strains and to identify the lineages and pathovars of these pathogens. MLVA-16 will further allow us to analyze new outbreaks and epidemics of both pathovars of
X. oryzae. Additional samplings at the population level will be a further step to evaluate the efficiency of MLVA-16 by describing the patterns of descent of the strains and the population structures. Specifically, recently isolated
X. oryzae pv. oryzicola strains from eastern and central Africa could be characterized (
22–24).
Directions for use.
In cases where the pathovar and geographical origin are known, e.g., upon multiplex PCR (
13), one could omit one of the four primer pools that correspond to loci that are largely monomorphic within a lineage. Hence, one would use a subset of primer pools in a multiplex MLVA-12 scheme that is adapted to the pathovars and origins of the isolates. Asian
X. oryzae pv. oryzae strains need to be analyzed with pools 1, 2, and 4 (MLVA-12a); African
X. oryzae pv. oryzae strains need pools 1, 2, and 3 (MLVA-12b); and
X. oryzae pv. oryzicola strains need pools 1, 3, and 4 (MLVA-12c). Results can be analyzed and compared to our and other's data using the public database at IRD Montpellier (
http://www.biopred.net/MLVA/), which allows sharing of information with other scientists in an interactive manner and provides access to a few phylogenetic-analysis tools (
46).