Staggered Mock Metagenomic Communities

Artificial metagenomic reads from mock communities. Data from https://www.imicrobe.us/#/projects/265 The raw sequence data from the staggered mock community (SRP115095 in project accession PRJNA397434) were used to develop simulated metagenomes. We used the known staggered mock community abundance profile to generate a simulated metagenome using GemSim (454 sequencing) with 0.5, 1, 5, and 10 million reads based on the relative abundance of organisms in the staggered mock community. We developed four simulated metagenomes to test the effect of changing the dominant organism abundance and genetic composition including: 10 million reads from the staggered mock community (mock 1), the mock community with alterations in a few abundant species (mock 2), the mock community with many alterations in abundant species (mock 3), and mock 3 with additional sequences from archaea to further alter the genetic composition (mock 4). The same community profiles were used to generate paired-end illumina dataset (100 million reads), using GemSim (illumina v4 error model). Finally, using SimLord, the community profiles were used to generate simulated third generation sequencing datasets (Pacific Bioscience SMRT sequencing - 1 million reads). SimLord default parameters were used to generate those simulated datasets.

Data and Resources

Additional Info

Field	Value
Author	Bonnie Hurwitz
Last Updated	June 23, 2024, 17:17 (UTC)
Created	June 23, 2024, 17:16 (UTC)
Citation	Bonnie Hurwitz 2018. Staggered Mock Metagenomic Communities. CyVerse Data Commons. DOI 10.7946/MQ0G
Date created in discovery environment	2018-12-12 18:24:50
Date last modified in discovery environment	2020-02-20 22:05:33
Description	Artificial metagenomic reads from mock communities. Data from https://www.imicrobe.us/#/projects/265 The raw sequence data from the staggered mock community (SRP115095 in project accession PRJNA397434) were used to develop simulated metagenomes. We used the known staggered mock community abundance profile to generate a simulated metagenome using GemSim (454 sequencing) with 0.5, 1, 5, and 10 million reads based on the relative abundance of organisms in the staggered mock community. We developed four simulated metagenomes to test the effect of changing the dominant organism abundance and genetic composition including: 10 million reads from the staggered mock community (mock 1), the mock community with alterations in a few abundant species (mock 2), the mock community with many alterations in abundant species (mock 3), and mock 3 with additional sequences from archaea to further alter the genetic composition (mock 4). The same community profiles were used to generate paired-end illumina dataset (100 million reads), using GemSim (illumina v4 error model). Finally, using SimLord, the community profiles were used to generate simulated third generation sequencing datasets (Pacific Bioscience SMRT sequencing - 1 million reads). SimLord default parameters were used to generate those simulated datasets.
PublicationYear	2018
Publisher	CyVerse Data Commons
ResourceType	mock metagenomic data
Rights	ODC PDDL
Subject	Bacteria, Metagenomics, artificial data
identifierType	DOI