strains are filamentous, sporulating, aerobic actinobacteria that induce nitrogen-fixing root nodules on about 220 plant species from eight families in three orders, Fagales, Rosales, and Cucurbitales. Phylogenetic analyses place symbiotic Frankia
strains into three distinct clusters (7
). Genomes of two members of cluster I have been sequenced, Frankia alni
strain ACN14a (7.50 Mbp; GenBank accession number NC_008278) and the Casuarina
sp. strain HFPCcI3 (5.43 Mbp; GenBank accession number NC_007777). Three genomes of cluster III representatives have been sequenced, including Frankia
sp. strains EAN1pec (8.9 Mbp; GenBank accession number NC_009921), EUN1f (9.35 Mbp; GenBank accession number NC_014666), and EuI1c (8.8 Mbp; GenBank accession number ADGX00000000).
Strains of cluster II, which represents the basal clade of the symbiotic Frankia
), cannot be cultured despite numerous attempts. Cluster II strains enter into nitrogen-fixing root nodule symbioses with actinorhizal species in the orders Cucurbitales (Datiscaceae and Coriariaceae) and Rosales (Rosaceae, and Ceanothus
in the Rhamnaceae). Here, we announce the first genome sequence of a strain from this cluster using DNA from Frankia
isolated from root nodules (3
) of the American suffruticose endemic Datisca glomerata
(C. Presl) Baill. (Durango root). The Frankia
strain was originally sampled from soil in Pakistan that was used to infect Datisca cannabina
and Coriaria nepalensis
), and homogenates of the nodules were later used to inoculate and repeatedly reinoculate D. glomerata
in greenhouses at Göttingen and Stockholm Universities. Based on the recommendation of Murray and Stackebrandt (6
), we propose naming this strain “Candidatus
Frankia datiscae” Dg1.
The finished genome of Dg1 was generated at the Joint Genome Institute using a combination of Illumina (1
) and 454 (4
) technologies. One Illumina GAii shotgun library, one 454 Titanium standard library, and one paired-end 454 library were constructed and sequenced. For finishing, gaps and misassemblies were resolved by sequencing cloned PCR fragments.
The genome sequence of Dg1 has 5,323,186 bp with a GC content of 70% and 78% coding bases. It has one circular chromosome with 4,579 genes of which 4,202 encode proteins, 56 encode structural RNAs, 2 encode rRNA operons, and 325 represent pseudogenes. This is the smallest Frankia
genome sequenced so far, slightly smaller than that of strain CcI3. It has been suggested that the genome of CcI3 underwent a process of reduction while the strain's capability of saprotrophic growth was reduced (8
). The small genome size of Dg1 would also fit that hypothesis, considering that cluster II strains have not been cultured. Moreover, Dg1 has 325 pseudogenes, compared to 50 in CcI3, 128 in EAN1pec, and 12 in ACN14a, suggesting ongoing genome degradation. This degradation does not appear to involve genes encoding enzymes involved in glycolysis or amino acid, purine, or pyrimidine biosynthesis.
Nucleotide sequence accession number.
The “Candidatus Frankia datiscae” Dg1 genome sequence and annotation data have been deposited in GenBank under accession number NC_015656.