The genus
Weissella consists of Gram-positive, catalase-negative, non-spore-forming, nonmotile bacteria with irregular or coccoid heterofermentative rod morphologies (
1).
Weissella species are common in diverse nutrient-rich environments, including fermented foods, soil, and the intestines of many animals, including humans (
2).
Weissella confusa strains have infrequently been reported to cause infections in both humans (
3–5) and nonhuman primates (
6); however, members of the genus are not typically associated with disease. Novel
Weissella sp. bacteria have recently been associated with disease outbreaks in rainbow trout (
Oncorhynchus mykiss) in China (
7), Brazil (
8), and the United States (T. J. Welch and C. M. Good, submitted for publication). Each of these outbreaks occurred at commercial rainbow trout farms and caused high levels of morbidity and mortality. The origin of the bacteria associated with these outbreaks is unknown, but 16S rRNA sequences from the Brazilian, Chinese, and U.S. isolates are >99% identical, suggesting a high level of genetic similarity among strains (Welch and Good, submitted). The trout isolates also show >99% 16S sequence similarity to
W. ceti sp. nov., which was recently isolated from beaked whales (
9), and therefore, the whale and fish isolates may constitute a single species. The occurrence of this pathogen on three continents over a relatively short period (5 years) suggests that weissellosis is a rapidly emerging disease of farmed rainbow trout. Comparison of the genome sequences of the U.S., Brazilian, and Chinese strains will be necessary to our understanding of the evolutionary relationship among the strains and may additionally provide insight into the recent emergence of this pathogen. As a basis for these comparisons, and to identify putative virulence genes, we sequenced the genome of
Weissella ceti NC36, a representative strain from the U.S. outbreak.
Genomic DNA was purified by using the MasterPure Gram-positive DNA purification kit (Epicenter) according to the supplied protocol. The genome was assembled by using a combination of sequences from Illumina (MiSeq, paired-end 150-bp reads; 583× coverage) and Pacific Biosciences (PacBio RS 10-kb library of continuous long reads [CLR]; 198× coverage). An initial assembly was conducted using ABySS (
10) with only the Illumina sequences, resulting in 23 high-quality contigs of ≥500 bp. The PacBio CLR sequences were then leveraged to join these contigs together using AHA (Pacific Biosciences). Finally, small assembly errors were corrected through an iterative process of mapping the Illumina reads onto the final contigs and then creating a new consensus using Bowtie2, Samtools, and custom scripts (
11,
12). The final assembly consisted of seven contigs (
N50, 385,673 bp; maximum length, 518,056 bp), and the genome was estimated to be ~1.35 Mb, with a G+C content of 40.8%. Automated annotation by NCBI's Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) revealed 16 rRNA genes, 68 tRNA genes, and 1,264 protein-coding sequences (CDS).
Results of comparative analysis highlighted several putative virulence factors, which do not have homologs encoded in any of the other sequenced Weissella genomes. These include five collagen adhesins (WCNC_00912, WCNC_00917, WCNC_00922, WCNC_05547, and WCNC_06207), a platelet-associated adhesin (WCNC_01820), and a mucus-binding protein (WCNC_01840).