CGEB - Integrated Microbiome Resource (IMR)

Protocols

The following protocol summaries are for the generation of paired-end sequencing reads of 16S or 18S PCR amplicons with multiple barcodes (ie: "indices") on the Illumina MiSeq machine of length approx. 400-500 bp (300+300 bp with ~100-200 bp overlap). It assumes an input of up to 384 (380 samples + 4 PCR controls). These protocols are a synthesis of multiple sources in the current scientific literature, but draws mainly from the following sources:

Comeau AM, Douglas GM, Langille M. 2017. Microbiome Helper: A custom and streamlined workflow for microbiome research. mSystems, 2:e00127-16. Link to Article. [overview of the entire IMR wet-lab and bioinformatics pipeline]
Comeau AM, Li WK, Tremblay JE, Carmack EC, Lovejoy C. 2011. Arctic Ocean microbial community structure before and after the 2007 record sea ice minimum. PLoS ONE, 6:e27492. Link to Article. [initial 16S V6-V8 and 18S V4 primer design/sequences]
William W, et al. 2015. Improved bacterial 16S rRNA gene (V4 and V4-5) and fungal Internal Transcribed Spacer marker gene primers for microbial community surveys. mSystems, 1:e00009-15. Link to Article. [16S V4-V5 primer design/sequences]
Op De Beeck M, Lievens B, Busschaert P, Declerck S, Vangronsveld J, Colpaert JV 2014. Comparison and validation of some ITS primer pairs useful for fungal metabarcoding studies. PLoS ONE, 9:e97629. Link to Article. [ITS2 primer design/sequences]
Earth Microbiome Project (EMP) at www.earthmicrobiome.org/protocols-and-standards/ [blocking protocol for eukaryote contaminants]
Human Microbiome Project (HMP) at hmpdacc.org/resources/tools_protocols.php [general considerations/extraction for stool samples]

Sample Collection

Sample collection issues can be very specific to your sample type (ie: the medium which you are sampling) - many of our IMR projects so far focus on water column "environmental" samples or fecal matter. For the former, varying volumes of water are typically collected onto filters and frozen at -20°C or -80°C in a storage buffer. For the latter, fresh fecal pellets are collected from mice or human stool is sampled and then frozen immediately at -20°C or -80°C (without buffer). All samples are kept frozen until extraction below. For transcriptomics approaches for environmental, lab culture or biopsy samples (not recommended for stool), samples must be immediately flash-frozen in liquid nitrogen after collection.

DNA/RNA Extraction

DNA/RNA is extracted from the samples using the method/kit appropriate to the specific samples - this may also be a choice you make for your personal samples since you have experience with them, or you may wish to follow our choice extraction methods. There is no general consensus in the literature on choice of kits, other than to say all of them affect the final profiles to some degree and that bead-beating is most probably a must for difficult materials (and/or with Gram+ves and protist cysts) - we have currently evaluated and are using the MO BIO PowerFecal DNA Kit with mouse pellets and human fecal samples. We are also testing various DNA kits for use with human urine. We have not yet ventured into RNA kits for metatranscriptomes, but this is something that we will be examining in the near future. Quantification and quality-checks are done (via NanoDrop or PicoGreen/Qubit) to verify success. Optional: A gel can be run to verify integrity (generally unnecessary for PCR-only studies, but required for shotgun metagenomic sequencing).

Library Preparation

→ 16S/18S/ITS Amplicons

Amplicon fragments are PCR-amplified from the DNA in duplicate using separate template dilutions (generally 1:1 & 1:10) using the high-fidelity Phusion polymerase. A single round of PCR is done using "fusion primers" (Illumina adaptors + indices + specific regions) targeting either the 16S V6-V8 (Bacteria/Archaea; ~440-450 bp), 16S V4-V5 (primarily Bacteria; ~410 bp), 18S V4 (Eukarya; ~440 bp) or ITS2 (Fungi; variable length, avg. ~350 bp) regions with multiplexing which allows up to 380 samples to be run. PCR products are verified visually by running a high-throughput Invitrogen 96-well E-gel. Any samples with failed PCRs (or spurious bands) are re-amplified by optimizing PCR conditions to produce correct bands in order to complete the sample plate before continuing. The PCR reactions from the same samples are pooled in one plate, then cleaned-up and normalized using the high-throughput Invitrogen SequalPrep 96-well Plate Kit. The (up to) 380 samples are then pooled to make one library which is then quantified fluorometrically before sequencing.

→ (Meta)genomes ("Shotgun")

Microbial (or mtDNA) genomes and community metagenomes are prepared using the Illumina Nextera XT kit which requires a very small amount of starting material (1 ng) as it is a PCR-based library preparation procedure. Briefly, samples are "tagmented" (enzymatically "sheared" and tagged with adaptors), PCR amplified while adding barcodes, purified using columns or beads, normalized using Illumina beads or manually, then pooled for loading onto the MiSeq or NextSeq.

→ (Meta)transcriptomes ("RNA-Seq")

We are currently in the process of evaluating Illumina (Ribo-Zero + TruSeq Stranded mRNA LT) vs. NuGEN (Ovation Complete Prokaryotic RNA-Seq) kits for the production of sufficiently rRNA-depleted libraries for RNA-Seq. Briefly, after rRNA depletion, remaining mRNAs are converted to cDNA in a way that maintains stranded information, tagged with adaptors+barcodes, PCR amplified, purified using columns or beads, normalized using beads or manually, then pooled for loading onto the MiSeq or NextSeq.

Next-Generation Sequencing

Amplicon samples, small metagenomic sets, and small genomes are run on our Illumina MiSeq using 300+300 bp paired-end V3 chemistry which allows for overlap and stitching togther of paired amplicon reads into one full-length read of higher quality. Output is generally ~22 million raw reads and ~13 Gb of sequence = ~60,000 reads per sample for 380 amplicons. For larger metagenomic/metatranscriptomic projects, we run on our recently-acquired Illumina NextSeq 550 using 150+150 bp paired-end "high output" chemistry generating up to ~400 million raw reads and ~120 Gb of sequence.

Bioinformatics Analyses

Details of our amplicon and metagenomics pipelines are available at https://github.com/mlangill/microbiome_helper/wiki, but the following is a summary of the major deliverables clients will receive (analyses will require a "mapping file" from the clients containing any relevant metadata for the study):

16S/18S/ITS Amplicon Analysis

Combined FASTA file of the quality-controlled sequences (formatted for use in QIIME)
Final OTU tables in text, BIOM and STAMP formats (from open-ref. picking at 97% [Bact./Arch./ITS] or 98% [Euk.])
FASTA file of representative sequences (one per OTU)
Taxonomic assignment files at various levels (ex: phylum, genus, etc.)
Alpha-diversity rarefaction plots
Beta-diversity UniFrac plots
Functional prediction files generated from PICRUSt (Bact. only)

Metagenomics Analysis

FASTA files of the final sequences screened to remove human (or other) contaminants
Taxonomic composition of the samples from MetaPhlAn 2.0 (text and STAMP files)
Functional prediction files generated from HUMAnN (text and STAMP files) for individual KO numbers, KEGG modules and KEGG pathways

Custom Bioinformatics: Additional bioinformatic analyses can be requested at an hourly rate or through research collaboration. Please contact us for more details.