The fundamental unit of cellulose is the microfibril, consisting of a bundle of parallel chains of β-1,4-glucan that are hydrogen bonded to one another, forming a crystalline array (
4,
25). Although cellulose is best known as the major component of plant and most algal cell walls, the
CesA genes encoding the putative catalytic subunit of cellulose synthase (EC 2.4.1.12) were first identified in the cellulose-producing bacterium
Acetobacter xylinus (
39,
47).
CesA genes in other prokaryotes and several seed plants, including cotton,
Arabidopsis thaliana, maize, rice, and poplar (
36) have subsequently been characterized. All share a common domain structure that includes putative transmembrane helices (TMH) and a cytoplasmic loop consisting of four conserved regions (U1 to U4), each containing a D residue or QXXRW sequence predicted to be involved in substrate binding and catalysis (D-D-D-QXXRW motif). An N-terminal zinc-binding domain, a strongly conserved region (CR-P) between U1 and U2, and a more variable region between U2 and U3 are found only in plant CesAs (
11).
Although the factors that determine terminal complex structure, and thus microfibril dimensions, remain unknown, several lines of evidence indicate that
CesA gene products play a direct role in maintaining the association of the particles that compose terminal complexes. The
rsw1 mutation in
Arabidopsis thaliana, which results in a single amino acid substitution in the cytoplasmic domain of a cellulose synthase (
Arabidopsis thaliana CesA1 [
AtCesA1]), disrupts assembly of crystalline cellulose microfibrils and leads to accumulation of noncrystalline β-1,4-glucan. Freeze fracture of
rsw1 mutants showed that the rosettes are dissociated (
2). It has also been shown that the products of two cotton
CesA genes (
Gossypium hirsutum CesA1 [
GhCesA1] and
GhCesA2) can associate in vitro through their zinc-binding domains, indicating a role for this domain in terminal complex assembly (
26). The
Acetobacter CesA proteins, which assemble as a linear terminal complex, lack the zinc-binding domain and two other domains found in all seed plant CesA proteins (
11). These observations indicate that comparing the
CesA genes of organisms with different types of terminal complexes may reveal domains that control terminal complex assembly and thus microfibril structure.
The origin of rosettes is thought to be a crucial event in the evolution of land plants because it is linked to fundamental changes in cytokinesis and intercellular communication that provided the basis for the origin of the complex body plan (
16). Green algae demonstrate the greatest diversity in terminal complex structure, and the history of the evolution of rosettes from linear terminal complexes appears to be preserved within this group (
6,
21,
44). According to a recent classification, the monophyletic group Charophyta includes the land plants and six orders of green algae, including the Zygnematales, the Coleochaetales, and the Charales, which are thought to be the closest relatives of land plants (
23). Within the Charophyta, all species examined have six-particle rosettes except for
Coleochaete scutata (
44), which has a unique eight-particle terminal complex (
32). Other green algae have linear terminal complexes (
44).
Mesotaenium caldariorum is in the order Zygnematales, which diverged from the land plant lineage before the Coleochaetales and the Charales (
23). Thus, characterization of
M. caldariorum CesAs (
McCesAs) will reveal the extent of
CesA divergence since plants colonized the land and provide a basis for analyzing the
CesA genes from algae with different types of terminal complexes, including
C. scutata, with its apparently derived eight-particle terminal complex, and chlorophyte green algae, with presumably more primitive linear terminal complexes.
RESULTS
Degenerate primers based on conserved regions of the deduced amino acid sequences of plant and prokaryote
CesA genes were used to amplify
CesA gene fragments from isolated genomic DNA and a genomic DNA library phage suspension from
M. caldariorum. Primer pair 1F-3R amplified two major fragments and several minor fragments from the phage suspension (Fig.
2, lane 1). To test for specificity of amplification, the products of this reaction were subjected to fully nested and half-nested PCR (Fig.
1, lanes 2 to 7). Fully nested PCR with primer pairs 2F-1R and 2F-2R amplified fragments of about 300 and 350 bp, respectively (Fig.
2, lanes 2 and 3). This is close to the expected product sizes of 293 and 338 bp that were calculated from the
GhCesA1 sequence and verified by amplification of cloned
GhCesA1 with primer pairs 2F-1R and 2F-2R (data not shown). Half-nested PCR with primer pairs 1F-1R and 1F-2R produced numerous bands (Fig.
2, lanes 4 and 5), including those close to the expected product sizes of 326 and 370 bp, respectively. Half-nested PCR with primer pairs 2F-3R and 3F-3R produced strong bands at about 1 and 1.2 kb (Fig.
2, lanes 6 and 7). These exceed the expected values of 733 and 528 bp, presumably due to the presence of one or more introns, since the products differ from each other by about the expected 205 bp. Direct amplification of genomic DNA with primer pairs 1F-1R and 1F-2R did not produce products of the expected sizes (Fig.
1, lanes 8 and 9). However, at least some of the products of amplification with primer pairs 1F-3R, 3F-3R, and 2F-3R were similar in size to those resulting from amplification of the genomic DNA library phage suspension with the same primers.
The major bands from lanes 2, 7, and 12 (Fig.
2) were excised, purified, and cloned into pCR-TOPO 2.1. When the inserts were excised, 11 clones derived from the product in lane 2 (Fig.
2) appeared identical, but the major product in lane 7 produced two distinct classes of inserts and the product in lane 12 produced three distinct classes, including one containing an internal restriction site. A single representative of each of the six distinct clones was sequenced and compared to sequences in GenBank with BLASTX (
1). The predicted products of two clones derived from amplification of genomic DNA with the primer pair 2F-3R were similar to
GhCesA1, spanning the regions upon which the primers were based (Fig.
1). Although the deduced polypeptides share 84% (
M. caldariorum clone 1 [
Mc1] compared with
Zea mays CesA4 [
ZmCesA4] [accession number AF200528 ]) and 73% (
Mc2 compared with
ZmCesA5 [accession number AF200529 ]) amino acid identity with known CesAs encoded within three open reading frames, they lack similarity in the amino acids encoded by the regions spanning nucleotides 409 to 705 and 970 to 1307 (
Mc1) and nucleotides 394 to 624 and 889 to 1162 (
Mc2). Prediction of intron-exon boundaries with NetGene2 (
18) supports the hypothesis that these regions represent introns (Fig.
3). The spliced sequences have open reading frames of 733 and 718 bp, respectively, and their predicted amino acid sequences share 76% identity.
Mc3 and
Mc4, derived from amplification of a genomic library suspension with primer pairs 2F-2R and 3F-3R, respectively, were very similar to
Mc2, differing by 17 bp within their 1,224-bp consensus sequence (data not shown). Together, the four clones represent at least two distinct
McCesA sequences (Fig.
3).
The cloned
Mc1 and
Mc2 fragments were used to synthesize probes for screening an
M. caldariorum genomic library. A total of 300,000 plaques were screened, 103 plaques were selected, and 10 clones were purified. Phage DNA was isolated from each of these clones, and the inserts were excised with
BamHI, revealing five distinct restriction patterns. One clone was subcloned, sequenced in its entirety, and assembled. Comparison to sequences in GenBank with BLASTX revealed eight open reading frames with high similarity to plant
CesAs. Start and stop codons were identified in frame at the N-terminal and C-terminal ends. Prediction of splicing sites by using GenScanW with both
Arabidopsis and maize parameter matrices (
9) indicated the presence of 11 exons and 10 introns, and the spliced gene produced an open reading frame of 3,390 bp. This gene was similar to that for
Mc1, differing by only nine base substitutions and a 9-bp insert within their 1,377-bp consensus, and was named
McCesA1. Two additional genomic clones were partially subcloned and sequenced. One was very similar to
McCesA1, differing by a single deletion and two base substitutions, including a T→C substitution that produced an additional
BamHI site. The other clone was also similar to
Mc1, differing by seven base substitutions within their 1,368-bp consensus sequences. Genomic clones corresponding to
Mc2 to
Mc4 were retrieved neither in the first screen nor when the genomic library was rescreened with only the probe based on
Mc2. Of nine additional clones that were partially sequenced, three were nearly identical to
McCesA1, three were more similar to
Mc1, and three were similar to
McCesA1 but had additional deletions (data not shown). The designation
McCesA2 was assigned to
Mc2, which represents
Mc2 to
Mc4.
By using ClustalX software (
43), the predicted
McCesA1 protein was compared with proteins representing different subfamilies of seed plant CesAs (
20). The hypothetical
McCesA1 protein of 1,130 amino acids contains all domains characterized in plant CesAs, as highlighted in Fig.
4. These include the zinc-binding domain near the N terminus (
26). As predicted by HMMTOP (
45),
McCesA1 contains eight putative TMH. The cytoplasmic domain between the second and third TMH includes the four putative substrate-binding domains, U1 to U4, which are highly conserved in all known CesAs. Between U1 and U2 is the CR-P, a conserved region in plants (
34) that is absent in bacterial CesAs (
11) and is poorly conserved in some cyanobacteria (
31) and the slime mold
Dictyostelium discoideum (
3). The
McCesA1 CR-P is very similar to those of plant CesAs (up to 87% identity with
A. thaliana CesA1 [
AtCesA1] [accession number AF027172 ]) and bears only slight similarity to those of cyanobacterial CesAs (13% identity with that of
Nostoc punctiforme, contig 499).
McCesA1 is also similar to seed plant CesAs in regions that are not universally conserved (Fig.
4). These include the hypervariable region between U2 and U3 (
34), also known as the class-specific region (CSR) (
46). Like those of seed plants, the
McCesA1 CSR contains basic residues at the N terminus and acidic residues at the C terminus, including DDXED and EXE motifs (amino acids 747 to 751 and 756 to 758, respectively). It also contains three K motifs (centered on amino acids 697, 720, and 735) and a cysteine-rich region (amino acids 703 to 715) and shares up to 42% amino acid identity with the CSRs of plant CesAs (
ZmCesA1 [AF200525]). The region between the zinc-binding domain and the first TMH is also highly variable among the known CesAs.
McCesA has the longest N terminus, including a unique 28-residue block and blocks corresponding to all of the sequence blocks found in the N-terminal regions of other plant CesA proteins.
McCesA1 joins 11 other
CesA genomic sequences in which intron-exon boundaries are conserved (
36,
37). All
McCesA1 intron-exon junctions are also found in
AtCesA1 and
AtCesA3 (Fig.
4 and
5). Within the region corresponding to
McCesA1 exon 6, these
Arabidopsis genes have an additional intron, which is present in all other
Arabidopsis CesAs except
AtCesA4,
AtCesA5, and
AtCesA9. A second additional intron within the region corresponding to
McCesA1 exon 7 is present in all
Arabidopsis CesAs except
AtCesA7, and a third additional intron in the region corresponding to
McCesA1 exon 10 is present in all
Arabidopsis CesAs. In
McCesA1 and all
CesAs examined, the C-terminal exon contains TMH-4 through TMH-8 and the penultimate exon contains H-3, H-4, and TMH-3 (Fig.
5).
Figure
6A shows a parsimony phylogram corresponding to the bootstrap consensus tree for deduced amino acid sequences encoded by
McCesA1 and selected seed plant
CesAs, rooted with deduced amino acid sequences encoded by two cyanobacterial
CesAs. Prior to alignment with ClustalX (
43), the sequences were edited to remove the poorly conserved N terminus upstream of the (P/L/S)(Y/F)R consensus sequence, the variable region between the G(Y/F)(D/E/S/G) and (L/I)(K/R)E consensus sequences, and the C terminus downstream of the WV(R/K) consensus sequence (Fig.
4). The analysis shows a high similarity between
McCesA1 and seed plant CesAs (Fig.
6A). Although the possibility of higher-level groupings is strongly supported (bootstrap values, 72 to 100%), the early divergence of
McCesA1 and separation of the seed plant
CesAs into two major clades is supported only weakly (bootstrap values, 52 to 68%). An analysis including
McCesA2 was carried out with the conserved region from the DQF consensus sequence directly following TMH-2 to the DCDH consensus sequence of U2 (Fig.
4). The unrooted phylogram corresponding to the bootstrap consensus tree shows strong support for an
M. caldariorum clade that is separate from that corresponding to the seed plant CesAs (Fig.
6B). Some of the sequences included in Fig.
6A were omitted from Fig.
6B for clarity. When included, their positions were consistent with those shown in Fig.
6A. The topologies of trees created using distance methods (neighbor joining) were identical to those shown except for the position of
AtCesA7 in the rooted tree (data not shown).