Among the contigs that were assembled from the total sequences of the two libraries, we found that 19 of them contained repeated sequences or circular sequences which had the same sequences in front (the start) and at the rear (the end). The sizes of these repeated or circular sequences varied from 290 to 2,495 bp (Table
4). Of these repeated or circular sequences, four contig sequences (C020, C112, C005, and C132, with sizes of 2,090, 1,984, 1,634, and 1,108 bp, respectively) showed significant hits with ssDNA eu-viruses in the TBLASTX and BLASTX searches of the GenBank database; the other contigs were not related to any known sequences. In order to confirm that the circular DNA contigs really had circular structures, we designed inverse PCR primers in opposite directions against two contigs (C005 and C112) that are thought to contain the putative circular genomes. This resulted in the amplification of PCR amplicons of the expected size from the MDA-amplified viral sample. Sequencing of the amplicons (data not shown) showed that they had the same sequences as the circular genomes. In the case of C005, the chimera sequence that existed in the contig sequence and was excluded during construction of the circular genome was not found in the sequence of the PCR amplicon. With the exception of replication gene-related ORFs, all of the other ORFs in the putative genomic components showed no significant hits to known proteins. The viral replication protein of C020 and C112 had two conserved domains from the putative viral replication protein (PF02407) and the viral replication domain C terminus (PF08419). The viral replication protein of C005 had one conserved domain of the viral replication domain C terminus (PF08419). No conserved domains were identified for C132. Analyses of these putative viral genomic components were performed (Fig.
5). C132 was excluded from the analyses because this contig was obtained from only two reads (Table
4), and potential sequencing errors and/or chimeric sequences can result in highly biased analyses. The genome organizations of the three putative genome components were similar to that of PCV2, a circovirus (Fig.
5A), as well as to those of other circoviruses, nanoviruses, and geminiviruses (
35,
46). All of the components had a putative stem-loop structure in their intergenic regions. The loop region of this structure contains a conserved nonanucleotide motif that is found in plant geminiviruses and plant nanoviruses and that corresponds to the site of viral DNA replication (
34). The stem-loop sequences were aligned with that of PCV2 (see Fig. S1 in the supplemental material). Except for the putative Rep protein, each putative genome component had only one or two additional ORFs (overlapped ORFs were not considered). If the circular component constitutes a viral genome, these ORFs could be the putative capsid protein but none of these ORFs gave significant hits with known proteins. Because various kinds of capsid proteins from circoviruses, nanoviruses, and geminiviruses have a high frequency of arginine/lysine residues in their amino-terminal regions (
35), we investigated the arginine/lysine frequency in the amino-terminal regions of the ORFs of the putative genome components. We found that one of the ORFs of each component had an arginine/lysine-rich amino-terminal region (see Fig. S1 in the supplemental material), although this was only weakly so in the case of C112.