GENOME ANNOUNCEMENT
Frankia strains are filamentous, sporulating, aerobic actinobacteria that induce nitrogen-fixing root nodules on about 220 plant species from eight families in three orders, Fagales, Rosales, and Cucurbitales. Phylogenetic analyses place symbiotic
Frankia strains into three distinct clusters (
7). Genomes of two members of cluster I have been sequenced,
Frankia alni strain ACN14a (7.50 Mbp; GenBank accession number NC_008278) and the
Casuarina-infective
Frankia sp. strain HFPCcI3 (5.43 Mbp; GenBank accession number NC_007777). Three genomes of cluster III representatives have been sequenced, including
Frankia sp. strains EAN1pec (8.9 Mbp; GenBank accession number NC_009921), EUN1f (9.35 Mbp; GenBank accession number NC_014666), and EuI1c (8.8 Mbp; GenBank accession number ADGX00000000).
Strains of cluster II, which represents the basal clade of the symbiotic
Frankia strains (
2,
7), cannot be cultured despite numerous attempts. Cluster II strains enter into nitrogen-fixing root nodule symbioses with actinorhizal species in the orders Cucurbitales (Datiscaceae and Coriariaceae) and Rosales (Rosaceae, and
Ceanothus in the Rhamnaceae). Here, we announce the first genome sequence of a strain from this cluster using DNA from
Frankia isolated from root nodules (
3) of the American suffruticose endemic
Datisca glomerata (C. Presl) Baill. (Durango root). The
Frankia strain was originally sampled from soil in Pakistan that was used to infect
Datisca cannabina and
Coriaria nepalensis (
5), and homogenates of the nodules were later used to inoculate and repeatedly reinoculate
D. glomerata in greenhouses at Göttingen and Stockholm Universities. Based on the recommendation of Murray and Stackebrandt (
6), we propose naming this strain “
Candidatus Frankia datiscae” Dg1.
The finished genome of Dg1 was generated at the Joint Genome Institute using a combination of Illumina (
1) and 454 (
4) technologies. One Illumina GAii shotgun library, one 454 Titanium standard library, and one paired-end 454 library were constructed and sequenced. For finishing, gaps and misassemblies were resolved by sequencing cloned PCR fragments.
The genome sequence of Dg1 has 5,323,186 bp with a GC content of 70% and 78% coding bases. It has one circular chromosome with 4,579 genes of which 4,202 encode proteins, 56 encode structural RNAs, 2 encode rRNA operons, and 325 represent pseudogenes. This is the smallest
Frankia genome sequenced so far, slightly smaller than that of strain CcI3. It has been suggested that the genome of CcI3 underwent a process of reduction while the strain's capability of saprotrophic growth was reduced (
8). The small genome size of Dg1 would also fit that hypothesis, considering that cluster II strains have not been cultured. Moreover, Dg1 has 325 pseudogenes, compared to 50 in CcI3, 128 in EAN1pec, and 12 in ACN14a, suggesting ongoing genome degradation. This degradation does not appear to involve genes encoding enzymes involved in glycolysis or amino acid, purine, or pyrimidine biosynthesis.
Nucleotide sequence accession number.
The “Candidatus Frankia datiscae” Dg1 genome sequence and annotation data have been deposited in GenBank under accession number NC_015656.