Engelmann spruce (Picea engelmannii) is a conifer found primarily on the west coast of North America. Here, we present the complete chloroplast genome sequence of Picea engelmannii genotype Se404-851. This chloroplast sequence will benefit future conifer genomic research and contribute resources to further species conservation efforts.
We sequenced, assembled, and annotated the complete chloroplast genome of Engelmann spruce (Picea engelmannii, genotype Se404-851). The Engelmann spruce dominates much of the large spruce forests of interior British Columbia, where it has been reported to hybridize with Picea glauca and Picea sitchensis (1), and its range extends southward to New Mexico. The tree has three different genomes, a nuclear genome, a mitochondrial genome, and a plastid genome (i.e., chloroplast). In general, chloroplast genomes are derived from the ancestral genomes of the microbial endosymbiont from which these organelles originated (2).
A tissue sample was collected from a 13-year-old Engelmann spruce grown at the Kalamalka Forestry Centre in British Columbia (50°14′38.4ʺN, 119°16′40.8ʺW; elevation, 450 m) and planted from a seed from Don Fernando Mountain, New Mexico (36°17′60ʺN, 105°24′0ʺW; elevation, 2,987 m). Genomic DNA was extracted from 60 g tissue by Bio S&T using an organelle exclusion method yielding 300 μg of high-quality purified nuclear DNA, as previously described (3). The sample was sequenced at Canada’s Michael Smith Genome Sciences Centre.
To sequence the sample, a 900-bp whole-genome library was constructed following a previously described protocol (4, 5) with minor modifications. Briefly, 5 μg of genomic DNA was subjected to shearing by sonication (Covaris LE220) using a duty factor of 5 and peak incident power of 450 for 70 seconds. The sonicated DNA products were fractionated in a 6% PAGE gel to recover fragments greater than 700 bp for library preparation. These PCR-free libraries were sequenced with paired-end 150-base reads on an Illumina HiSeq X platform using V4 chemistry according to the manufacturer’s recommendations. With this protocol, four libraries were generated, sequencing approximately 200 million reads from each of them.
To assemble the chloroplast genome, we subsampled the whole-genome shotgun sequencing reads of one lane of one library (i.e., 41,748,620 read pairs) to subsets of 0.75, 1.5, 3, 6, 12, 25, and 41 million read pairs and then assembled each subset with ABySS v2.1.1 (6) (k-mer size [k], 128; k-mer count [kc], 3). The ABySS assembly of the 3-million read-pair subset resulted in a single 123,601-bp contig that aligned to the reference chloroplast sequence (Picea glauca admix genotype PG29, NCBI accession number NC_028594 ), with zero misassemblies and internal gaps, based on QUAST v5.0.0 (8) analysis.
Using BLAST v2.7.1 (9), we aligned our assembly to the reference chloroplast sequence (PG29), modifying start and stop positions for consistency with previously published conifer chloroplast genomes. To ensure that there were no missing sequences at the ends of our assembly, we introduced a gap at the end, circularized the sequence, and ran Sealer v2.1.1 (10), closing the “end” gap and removing overlapping sequences as previously described (11). Finally, the resulting assembly was polished using Pilon v1.22 (12) using the 3-million subset of read pairs aligned with the Burrows-Wheeler Aligner (BWA) v0.1.7 (13).
The complete P. engelmannii genotype Se404-851 chloroplast genome is 123,542 bp long with a 38.74% GC content. Using GeSeq v1.65 (14) and using other Picea chloroplast genomes as references (7, 11), we annotated 114 genes comprising 74 protein-coding genes, 36 tRNA-coding genes, and 4 rRNA-coding genes. We note that four genes (rps12, petB, petD, and rpl16) in this list were manually annotated. We used OrganellarGenomeDRAW v1.2 (15) to generate the map in Fig. 1.
The introduction of this new chloroplast genome will benefit conifer genomic research and inform future evolutionary studies.
The complete chloroplast genome sequence of Picea engelmannii genotype Se404-851 is available under GenBank accession number MK241981, and the raw reads are available under SRA numbers SRX5070635 and SRR8252852. The annotations used as references were from Picea abies (GenBank accession number NC_021456), Picea asperata (GenBank accession number NC_032367), Picea glauca genotype PG29 (GenBank accession number NC_028594), Picea morrisonicola (GenBank accession number NC_016069), and Picea sitchensis (GenBank accession numbers NC_011152 and KU215903).
This work was supported by funds from Genome Canada, Genome BC, and Genome Quebec as part of the Spruce-Up (www.spruce-up.ca) (243FOR) and AnnoVis (281ANV) projects.
Sutton BCS, Flanagan DJ, Gawley JR, Newton CH, Lester DT, El-Kassaby YA. 1991. Inheritance of chloroplast and mitochondrial DNA in Picea and composition of hybrids from introgression zones. Theor Appl Genet 82:242–248.
Ku C, Nelson-Sathi S, Roettger M, Sousa FL, Lockhart PJ, Bryant D, Hazkani-Covo E, McInerney JO, Landan G, Martin WF. 2015. Endosymbioitic origin and differential loss of eukaryotic genes. Nature 524:427–432.
Jones MR, Schrader KA, Shen Y, Pleasance E, Ch’ng C, Dar N, Yip S, Renouf DJ, Schein JE, Mungall AJ, Zhao Y, Moore R, Ma Y, Sheffield BS, Ng T, Jones SJM, Marra MA, Laskin J, Lim HJ. 2016. Response to angiotensin blockade with irbesartan in a patient with metastatic colorectal cancer. Ann Oncol 27:801–806.
Tsang ES, Shen Y, Chooback N, Ho C, Jones M, Renouf DJ, Lim HJ, Sun S, Yip S, Pleasance E, Ma Y, Zhao Y, Mungall AJ, Moore R, Jones S, Marra M, Laskin JJ. 2017. Clinical outcomes after whole genome sequencing in patients with metastatic non-small cell lung cancer. J Clin Oncol 35:e20563.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. 2014. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9:e112963.
Lohse M, Drechsel O, Kahlau S, Bock R. 2013. OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res 41:W575–W581.
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.