ABSTRACT

To enable an in-depth survey of the metabolic potential of complex soil microbiomes, we performed ultra-deep metagenome sequencing, collecting >1 Tb of sequence data from three grassland soils representing different precipitation regimes.

ANNOUNCEMENT

As part of the Pacific Northwest National Laboratory (PNNL) Science Focus Area program (1, 2), we are investigating the impact of environmental change on microbial community function in grassland soils. Three grassland soils, representing different moisture regimes, were selected for ultra-deep metagenome sequencing, resulting in >1 Tb of sequence data per location. This data set serves as a resource for deep analysis of soil microbiome composition and metabolic potential.
Soils were collected from three grassland field site locations. Arid regime soil (irrigated agriculture), characterized as a coarse silty loam, was collected from the Washington State University Irrigated Agriculture Research and Extension Center (IAREC) (46.25N, 119.73W). Intermediate precipitation regime soil (rain-fed and irrigated agriculture), characterized as a fine clay loam, was collected from the Konza Prairie Biological Station (KPBS) (39.10N, 96.61W) (3, 4). Frequent precipitation regime soil (rain-fed and tile-drained agriculture), characterized as a fine silty clay loam, was collected from the Iowa State University Comparison of Biofuel Systems (COBS) (41.92N, 93.75W) (5).
Surface soil samples (2 cm by 0 to 20 cm) were collected from three randomly selected field site block locations using a push corer (3 subsamples per block, 3 replicates per subsample). Replicate subsamples were sieved together, resulting in 9 independent samples per site. Samples were flash frozen and stored at −80°C until further processing.
DNA was extracted from 3 × 0.25 g soil for each of the 9 field samples per site using the PowerSoil DNA extraction kit (Qiagen), with bead beating, and quantified. The extracted DNA samples from each site were combined to generate a pooled sample from each location (IAREC, COBS, and KPBS) for sequencing. Metagenomic libraries were prepared using the TruSeq PCR-free kit (Illumina) and a starting material of 1 μg DNA from the pooled DNA. Sequencing was performed on an Illumina HiSeq X system at Fulgent Genetics (Los Angeles, CA), generating 150-nucleotide paired-end reads to a final effort of at least 1 Tb of sequence per site (Table 1). BBDuk (BBTools package v38.38) (6) was used to trim adapter sequences from raw reads (adapters_no_transposase database), to perform quality filtering (parameters: int, ow; k, 27; hdist, 1; qtrim, f; minlen, 35), and to remove contaminants (sequencing_artifacts and phix174_ill reference database). Assembly was performed using the metaHipMer assembler (see MIMS metadata files for the specific developmental version used for each site) with kmer lengths of 21, 31, 55, and 71 (7) on the NERSC Cori platform (https://docs.nersc.gov/systems/cori). Scaffolds <2,500 bp long were omitted from further analysis. Quality-screened reads were mapped to scaffolds using the Burrows-Wheeler Aligner (v0.7.12) (8), and depth of coverage was determined across each scaffold using SAMtools (v1.9) (9).
TABLE 1
TABLE 1 Metagenome statistics for grassland soil microbiomes
SiteTotal no. of readsNo. of quality bases (Tb)Total no. of scaffoldsN50 (bp)No. of scaffolds ≥2,500 bpTotal length of scaffolds ≥2,500 bp (bp)No. of predicted proteinsPNNL DataHub accession no.
IAREC7,536,393,6341.12383,651,0961,404241,472989,234,0181,255,684WA-TmG.1.0
KPBS7,343,389,1821.08868,100,7711,111304,7361,283,171,2441,388,888KS-TmG.1.0
COBS7,723,367,4041.15277,470,4271,194289,8451,152,748,0701,255,684IA-TmG.1.0
Prodigal (v2.6.3) (10) was used to predict coding regions. Predicted protein sequences were searched using hmmsearch (v.3.1b2) (11) against the eggNOG (v4.5) (12), Pfam (v32.0) (13), and Nucleo-Cytoplasmic Virus Orthologous Group (NCVOG) (release date, 9 June 2014) (14) databases. Annotation assignments were given based on best bit scores (E-value cutoff, 1.0e−05).
These metagenomes are intended as a resource for the scientific community and should facilitate understanding of the highly diverse and complex metabolic potential that is encoded in soil microbial genomes.

Data availability.

Metagenomic sequence data have been deposited in the PNNL DataHub repository and are available for download under project doi numbers WA-TmG.1.0, KS-TmG.1.0, and IA-TmG.1.0. The versions described in this paper are the first versions. Packages contain raw reads, assemblies, functional annotations, field site plot maps, MIMS.me.soil.5.0 metadata information, and package “read me” files.

ACKNOWLEDGMENTS

This research was supported by the Department of Energy (DOE) Office of Biological and Environmental Research. This research is a contribution of the Scientific Focus Area Phenotypic Response of the Soil Microbiome to Environmental Perturbations project and the EMSL/JGI FICUS award (award 50978). The PNNL is operated for the DOE by Battelle Memorial Institute under contract DE-AC05-76RLO1830. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. DOE Office of Science User Facility operated under contract DE-AC02-05CH11231.
We thank Robert S. Egan, Leonid Oliker, and Katherine A. Yelick for access to NERSC resources, developmental metaHipMer code, and expert advice in running the assembly process.

REFERENCES

1.
Jansson JK, Hofmockel KS. 2018. The soil microbiome—from metagenomics to metaphenomics. Curr Opin Microbiol 43:162–168.
2.
Jansson JK, Hofmockel KS. 2020. Soil microbiomes and climate change. Nat Rev Microbiol 18:35–46.
3.
Fay PA, Carlisle JD, Knapp AK, Blair JM, Collins SL. 2000. Altering rainfall timing and quantity in a mesic grassland ecosystem: design and performance of rainfall manipulation shelters. Ecosystems 3:308–319.
4.
Fay PA, Carlisle JD, Danner BT, Lett MS, McCarron JK, Stewart C, Knapp AK, Blair JM, Collins SL. 2002. Altered rainfall patterns, gas exchange, and growth in grasses and forbs. Int J Plant Sci 163:549–557.
5.
Jarchow ME, Liebman M. 2013. Nitrogen fertilization increases diversity and productivity of prairie communities used for bioenergy. Glob Change Biol Bioenergy 5:281–289.
6.
7.
Georganas E, Egan R, Hofmeyr S, Goltsman E, Arndt B, Tritt A, Buluç A, Oliker L, Yelick K. 2018. Extreme scale de novo metagenome assembly, p 122–134. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX.
8.
Li H, Durbin R. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589–595.
9.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079.
10.
Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119.
11.
Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195.
12.
Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, Jensen LJ, von Mering C, Bork P. 2016. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293.
13.
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. 2014. Pfam: the protein families database. Nucleic Acids Res 42:D222–D230.
14.
Yutin N, Wolf YI, Raoult D, Koonin EV. 2009. Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 6:223.

Information & Contributors

Information

Published In

cover image Microbiology Resource Announcements
Microbiology Resource Announcements
Volume 9Number 326 August 2020
eLocator: 10.1128/mra.00718-20
Editor: Frank J. Stewart, Georgia Institute of Technology

History

Received: 19 June 2020
Accepted: 29 June 2020
Published online: 6 August 2020

Contributors

Authors

Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Ruonan Wu
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Jason E. McDermott
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Sheryl L. Bell
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Ari Jumpponen
Division of Biology, Kansas State University, Manhattan, Kansas, USA
Sarah J. Fansler
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Kimberly J. Tyrrell
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Yuliya Farris
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Kirsten S. Hofmockel
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA
Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, USA
Janet K. Jansson
Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington, USA

Editor

Frank J. Stewart
Editor
Georgia Institute of Technology

Notes

Address correspondence to William C. Nelson, [email protected], or Janet K. Jansson, [email protected].
William C. Nelson and Lindsey N. Anderson contributed equally to this work. Author order was determined by drawing straws.

Metrics & Citations

Metrics

Note:

  • For recently published articles, the TOTAL download count will appear as zero until a new month starts.
  • There is a 3- to 4-day delay in article usage, so article usage will not appear immediately after publication.
  • Citation counts come from the Crossref Cited by service.

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For an editable text file, please select Medlars format which will download as a .txt file. Simply select your manager software from the list below and click Download.

View Options

Figures and Media

Figures

Media

Tables

Share

Share

Share the article link

Share with email

Email a colleague

Share on social media

American Society for Microbiology ("ASM") is committed to maintaining your confidence and trust with respect to the information we collect from you on websites owned and operated by ASM ("ASM Web Sites") and other sources. This Privacy Policy sets forth the information we collect about you, how we use this information and the choices you have about how we use such information.
FIND OUT MORE about the privacy policy