8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      CatBoost for big data: an interdisciplinary review

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

          Related collections

          Most cited references63

          • Record: found
          • Abstract: not found
          • Article: not found

          Gradient-based learning applied to document recognition

            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            A general and simple method for obtainingR2from generalized linear mixed-effects models

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Stochastic gradient boosting

                Bookmark

                Author and article information

                Contributors
                jhancoc4@fau.edu
                khoshgof@fau.edu
                Journal
                J Big Data
                J Big Data
                Journal of Big Data
                Springer International Publishing (Cham )
                2196-1115
                4 November 2020
                4 November 2020
                2020
                : 7
                : 1
                : 94
                Affiliations
                GRID grid.255951.f, ISNI 0000 0004 0635 0263, Florida Atlantic University, ; 777 Glades Road, Boca Raton, FL USA
                Author information
                http://orcid.org/0000-0003-0699-3042
                Article
                369
                10.1186/s40537-020-00369-8
                7610170
                33169094
                c4b864c4-72ec-479b-a883-195411e048b5
                © The Author(s) 2020

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 5 August 2020
                : 19 October 2020
                Categories
                Survey Paper
                Custom metadata
                © The Author(s) 2020

                catboost,big data,categorical variable encoding,ensemble methods,machine learning,decision tree

                Comments

                Comment on this article