43
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequencing process, a bioinformatics workflow is required and the QIIME2 platform has emerged as one of the most reliable and commonly used. However, only some pre-formatted reference databases dedicated to a few barcode sequences are available to assign taxonomy. If users want to develop a new custom reference database, several bottlenecks still need to be addressed and a detailed procedure explaining how to develop and format such a database is currently missing. In consequence, this work is aimed at presenting a detailed workflow explaining from start to finish how to develop such a curated reference database for any barcode sequence.

          Results

          We developed DB4Q2, a detailed workflow that allowed development of plant reference databases dedicated to ITS2 and rbcL, two commonly used barcode sequences in plant metabarcoding studies. This workflow addresses several of the main bottlenecks connected with the development of a curated reference database. The detailed and commented structure of DB4Q2 offers the possibility of developing reference databases even without extensive bioinformatics skills, and avoids ‘black box’ systems that are sometimes encountered. Some filtering steps have been included to discard presumably fungal and misidentified sequences. The flexible character of DB4Q2 allows several key sequence processing steps to be included or not, and downloading issues can be avoided. Benchmarking the databases developed using DB4Q2 revealed that they performed well compared to previously published reference datasets.

          Conclusion

          This study presents DB4Q2, a detailed procedure to develop custom reference databases in order to carry out taxonomic analyses with QIIME2, but also with other bioinformatics platforms if desired. This work also provides ready-to-use plant ITS2 and rbcL databases for which the prediction accuracy has been assessed and compared to that of other published databases.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s12863-022-01067-5.

          Related collections

          Most cited references49

          • Record: found
          • Abstract: not found
          • Article: not found

          Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            QIIME allows analysis of high-throughput community sequencing data.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Search and clustering orders of magnitude faster than BLAST.

              Biological sequence data is accumulating rapidly, motivating the development of improved high-throughput methods for sequence classification. UBLAST and USEARCH are new algorithms enabling sensitive local and global search of large sequence databases at exceptionally high speeds. They are often orders of magnitude faster than BLAST in practical applications, though sensitivity to distant protein relationships is lower. UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters. UCLUST offers several advantages over the widely used program CD-HIT, including higher speed, lower memory use, improved sensitivity, clustering at lower identities and classification of much larger datasets. Binaries are available at no charge for non-commercial use at http://www.drive5.com/usearch.
                Bookmark

                Author and article information

                Contributors
                b.dubois@cra.wallonie.be
                Journal
                BMC Genom Data
                BMC Genom Data
                BMC Genomic Data
                BioMed Central (London )
                2730-6844
                8 July 2022
                8 July 2022
                2022
                : 23
                : 53
                Affiliations
                [1 ]Life Sciences Department, Bioengineering Unit, Walloon Agricultural Research Center, Chaussée de Charleroi 234, 5030 Gembloux, Belgium
                [2 ]Life Sciences Department, Plant and Forest Health Unit, Walloon Agricultural Research Center, Rue de Liroux 2, 5030 Gembloux, Belgium
                [3 ]Knowledge and Valorization of Agricultural Products Department, Quality and Authentication Unit, Walloon Agricultural Research Center, Chaussée de Namur 24, 5030 Gembloux, Belgium
                [4 ]Knowledge and Valorization of Agricultural Products Department, Protection, Control Products and Residues Unit, Walloon Agricultural Research Center, Rue du Bordia 11, 5030 Gembloux, Belgium
                Article
                1067
                10.1186/s12863-022-01067-5
                9264521
                35804326
                8541487f-1385-40d8-95a3-419c875ee175
                © The Author(s) 2022

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 2 December 2021
                : 1 July 2022
                Categories
                Research
                Custom metadata
                © The Author(s) 2022

                reference database,qiime2,bioinformatics workflow,metabarcoding,high-throughput sequencing,its2,rbcl,plant

                Comments

                Comment on this article

                scite_
                24
                0
                15
                0
                Smart Citations
                24
                0
                15
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content225

                Cited by7

                Most referenced authors1,663