Data from: Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: a case study using freshwater macroinvertebrates

Main Authors: Dowle, Eddy J., Pochon, Xavier, Banks, Jonathan C., Shearer, Karen, Wood, Susanna A.
Format: info dataset
Terbitan: , 2015
Subjects:
Online Access: https://zenodo.org/record/4998943
Daftar Isi:
  • Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare (<1% of the total abundance or biomass), moderately abundant (1–5%) and highly abundant (>5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step.
  • MyBaits SequencesSequences used for the MyBait Probe design (120 bp)MyBaitsSeqNoWrapNoNames.fastaMybaits M4Mybaits data file for sample M4MybaitsM4.zipMybaits M10Mybaits data. Sample M10MybaitsM10.zipMybaits M11Mybaits data. Sample M11MybaitsM11.zipMybaits M12Mybaits data. Sample M12MybaitsM12.zipMybaits M13Mybaits datafile. Sample M13MybaitsM13.zipMybaits M14Mybaits data. Sample M14MybaitsM14.zipMybaits M15Mybaits datafile. Sample M15MybaitsM15.zipMybaits M16Mybaits datafiles. Sample M16MybaitsM16.zipMybaits M17Mybaits datafile. Sample M17MybaitsM17.zipMybaits M18Mybaits datafile. Sample M18MybaitsM18.zipMybaits M19Mybaits datafile. Sample M19MybaitsM19.zipMybaits M20Mybaits datafile. Sample M20MybaitsM20.zipAmplicon Sequencing FilesAll 12 samples. Amplicon Sequencing data zipped together.AmpliconSequencingFiles.zipTable resultsTable showing the results in read counts between samples and methodsSITable2.xlsxSanger Sequencing Reads COISanger sequencing of some missing species. Name in Fasta file. Species names are unknown.COIreads.fastaSample file explanationSamples are referred to by there number and type of lab prep in the sample files. I.e. RS4 has a A4 (amplicon) and M4 mybaits samples associated with it in the sample filesSampleFileExplanationMMCI biomass calculationSpreadsheet with the details regarding the biomass calculations included.MMCIbiomassCalculationSupp.xlsxReference databasesTwo files zipped together. One fasta file contains all the genes associated with the Mayflies, Stoneflies and Caddisflies. These three groups had the additional genes used in the mapping protocol. The second fasta contains the sequences for all other species (only COI sequences).ReferenceDataBases.zipLocation SamplesDetails on the river samples location. All rivers are located in New Zealand.LocationSamples.xlsx