Assessing the performance of GIS- based machine learning models with different accuracy measures for determining susceptibility to gully erosion.


Faculty of Agriculture, Department of Soil Science, Bu Ali Sina University, Ahmadi Roshan Avenue, 6517838695 Hamedan, Iran. Electronic address: [Email]


The main purpose was to compare discrimination and reliability of four machine learning models to create gully erosion susceptibility map (GESM) in a part of Ekbatan Dam Basin, Hamedan, western Iran. Extensive field surveys using GPS, and the visual interpretation of satellite images, used to prepare a digital map of the spatial distribution of gullies. 130 locations were sampled to elucidate the spatial distribution of the soil surface properties. Topographic attributes were provided from digital elevation model (DEM). The land use and normalized difference vegetation index (NDVI) maps were created by satellite imagery. The functional relationships between gully erosion and controlling factors were calculated using the random forest (RF), support vector machine (SVM), Naïve Bayes (NB), and generalized additive model (GAM) models. The performance of models was evaluated by 10-fold cross-validation based on efficiency, Kappa coefficient, receiver operating characteristic curve (ROC), mean absolute error (MAE), and root mean square error (RMSE). The results showed that the RF model had the highest amount of efficiency, Kappa coefficient, and AUC and the lowest amounts of MAE and RMSE compared with SVM, NB, and GAM. The RF model showed the highest predictive performance (mean AUC = 92.4%), followed by SVM (mean AUC = 90.9%), GAM (mean AUC = 89.9%), and NB (mean AUC = 87.2%) models. Overall accuracy of the models ranged from excellent (NB, GAM) to outstanding (RF, SVM) classes. The capacity of all models for creating GESM was quite stable when the calibration and validation samples were changed through10-fold cross-validation technique. According to variable importance analysis performed by RF model, the most important variables are distance from rivers, calcium carbonate equivalent (CCE), and topographic position index (TPI). The obtained maps can help identifying areas at risk of gully erosion and facilitate the implementation of plans for soil conservation and sustainable management.


Discrimination,Gully erosion susceptibility,Latin hypercube sampling technique (cLHS),Machine learning models,Reliability,Topographic attributes,