GENOME ANNOUNCEMENT
Actinobacteria of the genus “
Candidatus Microthrix” are commonly found in biological wastewater treatment plants, where they are considered notorious in causing solid-liquid separation problems and foaming (see reference
14 for a recent review). On culture media they grow meagerly, yet they seem to have an
in situ competitive advantage mediated by their avid uptake of long-chain fatty acids, which are accumulated as neutral lipids under anaerobic conditions and converted into phospholipids for cell division under aerobic conditions (
1,
13).
Here, we present the draft genome sequence of “
Candidatus Microthrix parvicella” Bio17-1, a strain isolated from a Dutch wastewater treatment plant serving fish industries (
10). First, two sequencing libraries were prepared from genomic DNA with mean insert lengths of 350 bp (paired ends) or 2,750 bp (mated pairs) and sequenced on an Illumina Genome Analyzer II. Raw 100-bp reads were error corrected with Quake (
8). A total of 5.84 × 10
6 paired-end and 1.12 × 10
6 single-end reads with a minimum mean quality value of 30 and a minimum length of 70 bp were used for assemblies. Second, 24,031 single molecule, real-time (SMRT) sequence reads were obtained on a Pacific Biosciences PacBio
RS using C1 chemistry. Error correction yielded 2,625 reads (232 to 1,984 bp).
Using the Illumina sequence reads, two preliminary assemblies were obtained with Velvet (
17) and Edena (
7) and merged with the minimus2 utility (
16). The resulting 27 contigs were scaffolded with SSPACE (
3), and gaps were filled with GapFiller (
4). Additional assemblies were obtained using SOAPdenovo (
11) (kmer values between 65 and 81, steps of 2) and CABOG (
12). Error-corrected PacBio reads (
9) were mapped onto the preliminary assemblies. Draft contigs were broken where discrepancies among assemblies or PacBio reads suggested misassemblies. Conversely, contigs were joined where contig ends overlapped with perfect identity for at least 500 bp. Manual curation of the assemblies was performed using Consed (
5). Automatic annotation and draft metabolic reconstruction were performed by the RAST server (
2). CRISPR loci were identified using CRISPRFinder (
6).
The draft assembly consists of 4,202,850 bp, arranged in 13/16 scaffolds/contigs, with a mean GC content of 66.4%. Automated annotation identified 4,063 coding sequences, in addition to 1 rRNA operon and 46 tRNAs covering all amino acids. A complete pentose phosphate pathway and tricarboxylic acid (TCA) cycle are encoded in the genome. As previously hypothesized for “
Candidatus Microthrix parvicella” strain RN1 (
15), a nitrate reductase is encoded by the genome, but no nitrite reductase appears to be present. The strain is also predicted to be a prototroph for all amino acids, to be able to polymerize/depolymerize polyhydroxybutyrate, to accumulate polyphosphate, and to translate several selenoproteins. No genes are annotated that are related to photosynthesis. The assembly contains one CRISPR locus with 88 spacers.
“Candidatus Microthrix parvicella” Bio17-1's ability to process and accumulate excessive amounts of fatty acids is highlighted by its gene content: the genome encodes 28 homologs of long-chain fatty acid–acyl coenzyme A (acyl-CoA) ligase and 17 of enoyl-CoA hydratase. The genetic inventory of “Candidatus Microthrix parvicella” makes it of particular interest for future wastewater treatment strategies based around the comprehensive reclamation of nutrients and chemical energy-rich biomolecules.
Nucleotide sequence accession numbers.
The genome sequence of “Candidatus Microthrix parvicella” strain Bio17-1 has been deposited at DDBJ/EMBL/GenBank under accession number AMPG00000000; the version described in this paper is the first version, AMPG01000000. A provisional annotation is available upon request. Raw sequence reads were deposited in the Sequence Read Archive under accession number SRA058866.