Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Abstract. Despite the large number of recent advances and developments in landslide susceptibility mapping (LSM) there is still a lack of studies focusing on specific aspects of LSM model sensitivity. For example, the influence of factors such as the survey scale of the landslide conditioning variables (LCVs), the resolution of the mapping unit (MUR) and the optimal number and ranking of LCVs have never been investigated analytically, especially on large data sets. In this paper we attempt this experimentation concentrating on the impact of model tuning choice on the final result, rather than on the comparison of methodologies. To this end, we adopt a simple implementation of the random forest (RF), a machine learning technique, to produce an ensemble of landslide susceptibility maps for a set of different model settings, input data types and scales. Random forest is a combination of Bayesian trees that relates a set of predictors to the actual landslide occurrence. Being it a nonparametric model, it is possible to incorporate a range of numerical or categorical data layers and there is no need to select unimodal training data as for example in linear discriminant analysis. Many widely acknowledged landslide predisposing factors are taken into account as mainly related to the lithology, the land use, the geomorphology, the structural and anthropogenic constraints. In addition, for each factor we also include in the predictors set a measure of the standard deviation (for numerical variables) or the variety (for categorical ones) over the map unit. As in other systems, the use of RF enables one to estimate the relative importance of the single input parameters and to select the optimal configuration of the classification model. The model is initially applied using the complete set of input variables, then an iterative process is implemented and progressively smaller subsets of the parameter space are considered. The impact of scale and accuracy of input variables, as well as the effect of the random component of the RF model on the susceptibility results, are also examined. The model is tested in the Arno River basin (central Italy). We find that the dimension of parameter space, the mapping unit (scale) and the training process strongly influence the classification accuracy and the prediction process. This, in turn, implies that a careful sensitivity analysis making use of traditional and new tools should always be performed before producing final susceptibility maps at all levels and scales.

Related collections

Most cited references 49

Record: found
Abstract: not found
Article: not found

A physically based, variable contributing area model of basin hydrology / Un modèle à base physique de zone d'appel variable de l'hydrologie du bassin versant

K. J. BEVEN, M. KIRKBY (1979)

0 comments Cited 1311 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

Carolin Strobl, James Malley, Gerhard Tutz (2009)

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and bioinformatics within the past few years. High-dimensional problems are common not only in genetics, but also in some areas of psychological research, where only a few subjects can be measured because of time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications and to provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high-dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated with freely available implementations in the R system for statistical computing. (c) 2009 APA, all rights reserved.