To expand our molecular biological knowledge of these industrially important microalgae, we determined the draft genome sequence of
H. lacustris NIES-144, which was obtained from the Natural Institute for Environmental Studies (NIES, Japan).
H. lacustris NIES-144 was cultured in C medium (
4) under 14/10-h light/dark photocycles at 25°C. Extraction of genomic DNA from
Haematococcus cells was performed using a FastDNA Spin kit for soil (MP Biomedical, USA). Paired-end and mate pair libraries (3 kb and 10 kb, respectively) were prepared using a combination of the Covaris (USA) sonicator and the TruSeq DNA LT sample prep kit or the Nextera mate pair sample preparation kit (Illumina), respectively. The paired-end library was sequenced using the TruSeq rapid sequencing by synthesis (SBS) kit on the Illumina HiSeq 2500, while the mate pair library was sequenced using the TruSeq SBS kit v3 on the Illumina HiSeq 2000 platform.
The mate pair reads (average, 154,807,864 reads) were processed with cutadapt 1.2.1 (
5) to remove adapter sequences. The paired-end reads (215,289,986 reads) and trimmed mate pair reads (average, 105,701,143 reads) were assembled into 9,693 scaffolds with a total length of 172 Mb (genome coverage, 186×; GC content, 58.4%;
N50 scaffold length, 38,941 bp) using ALLPATHS-LG R45226 (
6) with the following parameters: GENOME_SIZE: 125,000,000; FRAG_COVERAGE: 100; JUMP_COVERAGE: 100; and HAPLOIDFY: True. The completeness of the draft genome was 57.7% based on the Benchmarking Universal Single-Copy Orthologs (BUSCO) software v3.1.0 (eukaryota_odb9 database) (
7). Prior to gene structure prediction, the repeat sequences of the
H. lacustris NIES-144 genome were identified and masked by RepeatMasker v4.0.9 (
8) with default parameters. The gene structure of the masked
Haematococcus genome was predicted by using MAKER v2.31.10 (
9) in collaboration with AUGUSTUS 3.3.2 (
10), SNAP v2006-07-28 (
11), and GeneMark-ES 4.3.0 (
12) (model parameters,
Chlamydomonas,
Arabidopsis thaliana, and
Chlamydomonas reinhardtii, respectively). For RNA and protein homology evidence in the MAKER prediction, we also recruited the transcriptome data of
H. lacustris NIES-144 (SRA accession number
SRX3729494) (
13) and the protein sequences of representative eukaryotic species, including
H. lacustris strain SAG192.80 (
3). A total of 13,309 genes were functionally annotated for
H. lacustris NIES-144 by BLASTp analysis against the UniProtKB SWISS-PROT and TrEMBL databases (
14) with E value thresholds of <1.0 × 10
−5 and InterProScan v5.36-75.0 (
15) analysis against the Pfam database (
16). Also, 277 tRNAs were predicted using tRNAscan-SE v2.0 (
17). This genome will provide the prerequisite information for genetic engineering and spur the further development of efficient astaxanthin production by this microalga.