Staggered Mock Metagenomic Communities

Artificial metagenomic reads from mock communities. Data from https://www.imicrobe.us/#/projects/265 The raw sequence data from the staggered mock community (SRP115095 in project accession PRJNA397434) were used to develop simulated metagenomes. We used the known staggered mock community abundance profile to generate a simulated metagenome using GemSim (454 sequencing) with 0.5, 1, 5, and 10 million reads based on the relative abundance of organisms in the staggered mock community. We developed four simulated metagenomes to test the effect of changing the dominant organism abundance and genetic composition including: 10 million reads from the staggered mock community (mock 1), the mock community with alterations in a few abundant species (mock 2), the mock community with many alterations in abundant species (mock 3), and mock 3 with additional sequences from archaea to further alter the genetic composition (mock 4). The same community profiles were used to generate paired-end illumina dataset (100 million reads), using GemSim (illumina v4 error model). Finally, using SimLord, the community profiles were used to generate simulated third generation sequencing datasets (Pacific Bioscience SMRT sequencing - 1 million reads). SimLord default parameters were used to generate those simulated datasets.

Data and Resources

Additional Info

Field Value
Author Bonnie Hurwitz
Last Updated June 23, 2024, 17:17 (UTC)
Created June 23, 2024, 17:16 (UTC)
Citation Bonnie Hurwitz 2018. Staggered Mock Metagenomic Communities. CyVerse Data Commons. DOI 10.7946/MQ0G
Date created in discovery environment 2018-12-12 18:24:50
Date last modified in discovery environment 2020-02-20 22:05:33
Description Artificial metagenomic reads from mock communities. Data from https://www.imicrobe.us/#/projects/265 The raw sequence data from the staggered mock community (SRP115095 in project accession PRJNA397434) were used to develop simulated metagenomes. We used the known staggered mock community abundance profile to generate a simulated metagenome using GemSim (454 sequencing) with 0.5, 1, 5, and 10 million reads based on the relative abundance of organisms in the staggered mock community. We developed four simulated metagenomes to test the effect of changing the dominant organism abundance and genetic composition including: 10 million reads from the staggered mock community (mock 1), the mock community with alterations in a few abundant species (mock 2), the mock community with many alterations in abundant species (mock 3), and mock 3 with additional sequences from archaea to further alter the genetic composition (mock 4). The same community profiles were used to generate paired-end illumina dataset (100 million reads), using GemSim (illumina v4 error model). Finally, using SimLord, the community profiles were used to generate simulated third generation sequencing datasets (Pacific Bioscience SMRT sequencing - 1 million reads). SimLord default parameters were used to generate those simulated datasets.
PublicationYear 2018
Publisher CyVerse Data Commons
ResourceType mock metagenomic data
Rights ODC PDDL
Subject Bacteria, Metagenomics, artificial data
identifierType DOI