The Application of GIS – Based Logistic Regression And Frequency Ratio Approaches For Landslide Susceptibility Assessment. A Case Study of Souk Ahras Region, N E Algeria.
Fatna Mahdadi 1*, Abederrahmane Boumezbeur 2.
1 Geology and Environment Laboratory, Department of Geology, University of Constantine1, Constantine, Algeria. [email protected] Department of Geology, Sciences Faculty, University of Tebessa, Tebessa, Algeria.
Landslide susceptibility assessment (LSA) is carried out using various statistical modeling techniques among which figures the logistic regression (LR) and the Frequency Ratio (FR) models. This work allowed to produce a landslide susceptibility maps (LSMs) on a geographic information system (GIS) platform using LR and FR methods in the Northwest of Souk Ahras region, N E of Algeria. Landslide inventory map was established from visual interpretation of satellite images and field survey data. Slope instability phenomena in this region are related to a large variety of factors pertaining to the geological, geomorphological, hydrological and climate characteristics of the terrain. Consequently, a spatial database of seven causal factors were identified and used for predicting landslide prone areas. LSMs produced using LR and FR statistical models subdivided into five classes according to their degree of susceptibility to landslides: very low, low, moderate, high and very high. These raster based LSMs was compared and verified with both training and testing inventory datasets. The AUC (area under the curve) was used for model evaluation. Results showed that the LR model provides a higher prediction accuracy of the LS mapping than the FR model with an AUC based on success rate equal 90.45 % (0.9045) and that based on prediction rate was 91.81 % (0.9181). In addition, the results showed that about 30 % to 37% of the study area was located in high and very high hazard classes. The resulting LSMs play an indispensable role in the region management and can be used in sustainable development planning.
Keywords: statistical modeling, geographic information system (GIS), landslide inventory, Souk Ahras region, landslide – related factors.
Landslides are natural processes; they cause a great deal of damage to man and his environment especially in rapidly growing population areas of the less developed countries.
Recently with the development of computer technologies, GIS can play an important role in landslide prediction; it has a distinct advantage of storage, analyze and display of results in a large amount of data, either directly from the field or from remote sensing techniques, to predict the slope stabilities within the area.
In the literature, a various statistical methods were used in the field of LSA. Such techniques are logistic regression (Jacobs et al., 2018), analytical hierarchy process (Achour et al., 2017), weight of evidence (Teerarungsigul et al., 2016), frequency ratio (Youssef et al., 2015), and many more. These approaches have been successfully applied by several researchers such as Lee and Sambath, 2006; Pradhan et al., 2010; Greco and Sorriso-Valvo, 2013; Sivakami and Sundaram, 2014; Chen et al., 2016; Hadji et al., 2016, Le et al., 2017, using the GIS software for handling the geospatial database.
As a case study, a part of the northwest of Souk Ahras region, N E Algeria, which is one of the most areas exposed to landslide phenomenon in our country, was selected for LSA on a Pixel-based mapping unit.
Souk Ahras is a mountainous region, it known by the widespread occurrence of landslides. Their study requires that geomorphological, geological and hydro – climatic factors likely to affect the slope stability should be considered altogether at the same time with a characteristic weight for each factor.
For this study, 07 common causative parameters were produced for the LS analyses such as: slope angle, elevation, slope aspect, lithological units, distance to river, NDVI, and rainfall events, to prepare LSM using a LR and FR statistical approaches.
The accuracy of the LR and FR models was evaluated using the ROC (receiver operating characteristic) curve and the AUC (area under the curve) parameter. Data processing and modeling have been done using Arc Map 10.4 and XLSTAT – Pro 7.5 software. The results revealed that about 30 to 37 % of the study area was located in high and very high susceptibility classes. The resultant LSMs play an indispensable role in the region management and it can be used in sustainable development planning.
Souk Ahras region is located in the extreme East of Algeria. It occupies an area of 4 360 km². In this work, the study area is located in the Northwest part of Souk Ahras region (figure 1). It was selected for landslide susceptibility assessment and the establishment of a susceptibility maps. It lies between latitude 36°11’6,16”N – 36°5’18,352”N and longitude 7°27’56,89”E – 7°18’54,91”E. It covers an area of 73 km2 (Fig. 1a). It is a mountainous region that is part of the Tellian mountain belt, with slopes ranging from 0° to more than 66°. The altitude decreases from northeast to southwest between the values of 675 m and 1283 m.
The climate is sub-humid Mediterranean type; characterized by a cold and wet winter against a hot and dry summer, with annual precipitation between a low of 428 mm to a high of 460 mm.
Geological study reveals that this region is essentially formed by sedimentary rocks (figure 2a).The upper Cretaceous formations represented by an alternation of limestone and marl – limestone. A predominantly marly Miocene cover, with some sandstone and conglomerate, the majority of the study area. The Plio – Quaternary constituted by alluvial deposits, sandstones, puddings and gravels.
Materials and Methods
The susceptibility assessment of natural environment disasters such as landslides depends on a good knowledge and deep understanding of the interplay of the causative factors to bring about instabilities. It is precisely this good knowledge which allows an accurate prediction of land elements to LS mapping.
In the statistical approaches, all the landslide conditioning factors that could be mapped, are entered into Arc Map 10.4 software and converted from vector to raster thematic maps. Subsequently, an overlay approach is adopted to derive the frequency statistics of each factor map compared to the landslide inventory map.
The study area occupies 1 098 rows and 1 376 columns with a total of 729 429 pixels and a pixel size of 10 m × 10 m. In this paper, the first step of the data gathering is the preparation of landslide inventory map, covering approximately 4 628 pixels from all the study area.
In addition, a 07 thematic data layers corresponding to geomorphological, geological and environmental causal factors, as we previously mentioned, were designed to evaluate the relationship between existing landslides and these factors in order to obtain weight values for each parameter using the statistical methods FR and LR, which will facilitate the preparation and evaluation of the LSMs of the region (Lee and Sambath, 2006).
The database used in this study includes the previous landslide locations recorded in the region and thematic maps of seven major causal factors expressing within thematic layers in GIS platform.
The inventory map is the first and the most important thematic layer in LSA procedure. A total of 90 landslides was mapped from the interpretation of satellite images, previous reports and validated by several field surveys conducted during the years 2015 – 2018. The produced map was also converted to a raster format at 10 m pixel size. This map assisted in the creation of training data set of 3 471 pixels (approximately 75 % of total landslide area), and testing data set of 1 157 pixels (approximately rest 25 % of total landslide area) were used as validating for the models. Landslide distribution in the study area is shown in figure 1a.
Geology is considered to be the most important factor in the occurrence of landslides (Yesilnacar and Topal, 2005; Yalcin et al., 2011). Eleven types of lithological units (figure 2a) have been digitized on the basis of Sedrata geological map at the scale of 1: 50 000, produced by the Algerian geological survey. They lead to variation in strength and slope stability.
Slope angle is a main cause of slope failure predisposition in mountainous regions (Hadji et al., 2016). In theory, it is assumed that the susceptibility is greater if the slope is steep. In this work, slope angle gradient is derived from the DEM over a regular 10 × 10 m grid, where the slopes vary from 0 to 66°. The slope map (figure 2b) is then divided into five categories: < 5°, 5°- 15°, 15°- 25°, 25°- 35° and > 35°.
Rainfall is a determining factor in the erosion process responsible for triggering gravity driven down slope movements. In this work, the precipitation factor was presented within a thematic layer using the average annual precipitation (figure 2c); it was reclassified into three classes: 428 – 437 mm/year, 437 – 447 mm/year and 447 – 460 mm/year.
Elevation is one of the most important parameters responsible for the landslide occurrence in mountainous areas (Conforti et al., 2014). In theory, the LS is proportional to the elevation which is directly related to precipitation in different forms as well rainfall and snow. The elevation map (figure 2d) presents five classes: 675 – 800 m, 800 – 900 m, 900 – 1000 m, 1000 – 1100 m, 1100 – 1283 m.
Slope aspect is considered also as an important predisposing causal factor. Results from previous research have shown that there is a link between the slope aspect and it’s prone towards landslide (Hadji et al., 2016). It can influence the landslides distribution by controlling the tectonic fractures orientation and the soil moisture concentration (Hadji et al., 2016). The slope aspect map has been derived also from the DEM and subdivided into nine classes (figure 2e) such as: flat, north, northeast, east, southeast, south, southwest, west, and northwest.
Hydrographic network map provides an information about a distribution of unstable areas by modifying the soils behavior with ravines erosion, which can trigger the break of the slopes that can sometimes cause soil movements; hence where the need to designate the buffer zones, the distance between the drain and the vulnerable zone was measured using a multiple buffer analysis with 50 m interval (figure 2f), presented in seven classes: ; 50 m, 50 – 100 m, 100 – 150 m, 150 – 200 m, 200 – 250 m, 250 – 500 m and ; 500 m.
The normalized differential vegetation index (NDVI) is a determining factor in slope stability, used to indicate a plant cover in an area (Yusof and Pradhan, 2014). In general, the relatively low vegetation coverage can lead to landslide incidence. In this study, a Landsat satellite image was used to calculate the NDVI values (figure 2g) using the following formula:
NDVI = (RI – R) / (RI + R)……………………………………………………….. (1)
Where: RI indicate the value of the infrared portion of electromagnetic spectrum, and R is the value of the red portion of electromagnetic spectrum. The produced map presents three classes: – 0.185425416 – 0.188091246, 0.188091246 – 0.295223932 and 0.295223932 – 0.552921474.
Landslide Susceptibility Mapping
Frequency Ratio model
The frequency ratio (FR) analysis method (Lee and Min 2001) is one of the bivariate statistical methods frequently used for calculating the probabilistic relationship between landslides and landslide conditioning factors. In this work, the FR was calculated for all the class of the seven factors used in the landslide susceptibility mapping by dividing the ratio of the landslide occurrence to the area ratio. The FR of different parameter classes are given in Table 1.
The landslide susceptibility index (LSI) was calculated (equation 2) by summing all the FR of the conditioning factors (Lee and Talib 2005):
LSI = FR1 + FR2 +…. + FR n ………………………………………………………….. (2)
Where: n constitutes the total number of factors. A landslide susceptibility map is prepared by combining each causative factor with its frequency ratio value. The final LSM was divided into five categories: very low, low, moderate, high, and very high (figure 3a).
Logistic regression model
Logistic regression (LR) method is a multivariate statistical technique used for study and prevention of landslide susceptibility. The LR model representing the maximum likelihood regression can be expressed with the following form:
P= 11+ e-z ………………………………………………………………. (3)
Where: P presents the estimated probability of landslide occurrence, varies from 0 to 1; z represents the linear combination of the causal factors and varies from – ? to + ?. It is defined by the following equation:
Z = B0 + B1X1 + B2X2+… BnXn ……………………………………………………… (4)
Where: B0 is the intercept and n is the number of independent variables. Bi (i = 0, 1, 2 …, n) represent the regression coefficients of the independent variables, and Xi (i = 0, 1, 2 …, n) are the independent variables.
In this paper, the classes of the seven chosen independent variables as mentioned earlier, were normalized in the range of 0, 1, and two dependent variables which were expressed in binary format, representing the presence or absence of landslides (1 or 0); were converted into dbf format and included in XLSTAT – Pro 7.5 statistical software, in order to evaluate the relationship between landslide events and landslide causal factors using the following equation:
P = 1 / (1 + EXP (- (- 0.441247659273368 + (-0.0861714381342469 x Lithological units) + (0.203331123854193 x slope angle) + (-0.0449963579116557 x Rainfall) + (0.273420719917235 x elevation) + (0.0436096196834892 x distance to river) + (- 0.13323468675971 x NDVI) + (- 0.0217731355150329 x slope aspect)))).
To delineate areas where landslides can occur, we have used raster calculator option of the analysis module of ARC GIS 10.4 software, using weights of the individual factor maps and summing them to obtain a total weight map (figures 3b). The values of the susceptibility have been divided into five classes as very low, low, moderate, high and very high.
Validation of Landslide Susceptibility Maps
The mapping results were verified using the ROC curves. It is a diagram in which the cumulative percentage of decreasing LSI is plotted against the cumulative percentage of observed landslide occurrence. In this method, the area under the ROC curve (AUC), which contains values ranging from 0.5 to 1.0, is used to check the prediction performance of the model.
In this study, the ROC curve of the LR and FR models using training and testing data sets (figure 5a and 5b) showed respectively that the AUC value is 0.9045 (90.45 %), 0.8670 (86.70 %) for the success rate curve and 0.9181 (91.81%), 0.8804 (88.04%) for the prediction rate curve. These results represent a good agreement between the spatial distribution of landslide events and the LSMs produced; which reveal that the models used in this work have high accuracy in predicting the potential locations of future landslides in the study area.
Discussion and Conclusions
Landslide susceptibility mapping is a very essential procedure in delineating the areas prone to this phenomenon. Recently, many statistical techniques and approaches based on computer technology, GIS and remote sensing have been used by many researches in order to prepare LSMs.
In this study, a statistical approaches based on LR and FR models were used for evaluating and mapping the landslide susceptibility in the Northwest of Souk Ahras region, with a total area of 73 km2, using GIS technology. It is a mountainous area located in the Northeast of Algeria, its environ is frequently subjected to landslides in different masses, mainly affected by the interplay of several landslide influencing parameters include: lithological units, elevation, slope aspect, slope angle, distance to river, NDVI, and rainfall events.
The landslide data analysis of our study area using LR and FR models are reflected in the production of a LSMs shown in figure 3a and 3b, which was classified into five susceptible zones according to the degree of their susceptibility: very high, high, moderate, very low and low. Resulting maps is compared with both of training and testing data sets of the landslide inventory, for evaluating their performance based on the values of areas under the ROC curve (AUC) method.
The LSMs analysis show that the high susceptibility sites located in the northeast and the south parts of the study area. It was found that more than 45 % of landslides occur on slopes between 15° and 35°, they affect mainly scree slope with marly gangue, alluvium and Miocene marls.
The susceptibility maps generated from the two statistical models used in our study look the same with minor differences. The map obtained from LR model shows that the susceptibility to landslide in the study area is distributed as follow: high and very high susceptible areas represent (47.39 %) , moderate (24.09 %) and the rest presents a low to very low susceptibility which means no landslide is likely to occur. Whereas the FR model gives: 37.59 % as high and very high susceptible areas, 24.57 % as a moderate susceptible, and the rest presents a low to very low landslide susceptibility. Field evidence and statistical validation show that LR model is more reliable than the FR model.
The produced susceptibility maps could constitute a good document that can be used to predict any future potential hazard inherent to any type of urban extension, road network development as well as any other activity involving earth work.
Achour, Y., Boumezbeur, A., Hadji, R., Chouabbi, A., Cavaleiro, V., ; Bendaoud, E. A. (2017). Landslide susceptibility mapping using analytic hierarchy process and information value methods along a highway road section in Constantine, Algeria. Arabian Journal of Geosciences, 10(8), 194.
Bonham-Carter, G. F. (1994). Geographic information systems for geoscientists-modeling with GIS. Computer methods in the geoscientists, Elsevier, Burlington, 13, 398p.Chen, W., Wang, J., Xie, X., Hong, H., Van Trung, N., Bui, D. T., ; Li, X. (2016). Spatial prediction of landslide susceptibility using integrated frequency ratio with entropy and support vector machines by different kernel functions. Environmental Earth Sciences, 75(20), 1344.
Conforti, M., Pascale, S., Robustelli, G., ; Sdao, F. (2014). Evaluation of prediction capability of the artificial neural networks for mapping landslide susceptibility in the Turbolo River catchment (northern Calabria, Italy). Catena, 113, 236-250.
Greco, R., ; Sorriso-Valvo, M. (2013). Influence of management of variables, sampling zones and land units on LR analysis for landslide spatial prevision. Natural Hazards and Earth System Sciences, 13(9), 2209.Hadji, R., Rais, K., Gadri, L., Chouabi, A., ; Hamed, Y. (2016). Slope Failure Characteristics and Slope Movement Susceptibility Assessment Using GIS in a Medium Scale: A Case Study from Ouled Driss and Machroha Municipalities, Northeast Algeria. Arabian Journal for Science and Engineering, 1(42), 281-300. Algeria. Arabian Journal for Science and Engineering, 42(1), 281-300.
He, S., Pan, P., Dai, L., Wang, H., ; Liu, J. (2012). Application of kernel-based Fisher discriminant analysis to map landslide susceptibility in the Qinggan River delta, Three Gorges, China. Geomorphology, 171, 30-41.
Hosmer, D. W., ; Lemeshow, S. (2000). Interpretation of the fitted logistic regression model. Applied Logistic Regression, Second Edition, 47-90.Jacobs, L., Dewitte, O., Poesen, J., Sekajugo, J., Nobile, A., Rossi, M., ; Kervyn, M. (2018). Field-based landslide susceptibility assessment in a data-scarce environment: the populated areas of the Rwenzori Mountains. Natural Hazards and Earth System Sciences, 18(1), 105.Kavzoglu, T., Sahin, E. K., ; Colkesen, I. (2015). An assessment of multivariate and bivariate approaches in landslide susceptibility mapping: a case study of Duzkoy district. Natural Hazards, 76(1), 471-496.
Kleinbaum, D. G., ; Klein, M. (2002). Analysis of matched data using logistic regression. Logistic regression: A self-learning text, 227-265.
Le, L., Lin, Q., ; Wang, Y. (2017). Landslide susceptibility mapping on a global scale using the method of logistic regression. Natural Hazards and Earth System Sciences, 17(8), 1411.Lee, S., ; Min, K. (2001). Statistical analysis of landslide susceptibility at Yongin, Korea. Environmental geology, 40(9), 1095-1113.Lee, S., ; Sambath, T. (2006). Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environmental Geology, 50(6), 847-855.
Lee, S., ; Talib, J. A. (2005). Probabilistic landslide susceptibility and factor effect analysis. Environmental Geology, 47(7), 982-990.
McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. University of California . Berkeley, California.
Nefeslioglu, H. A., Gokceoglu, C., ; Sonmez, H. (2008). An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps. Engineering Geology, 97(3), 171-191.
Ozdemir, A. (2015). Sinkhole Susceptibility Mapping Using a Frequency Ratio Method and GIS Technology Near Karap?nar, Konya-Turkey. Procedia Earth and Planetary Science, 15, 502-506.
Pradhan, B., Oh, H. J., ; Buchroithner, M. (2010). Weights-of-evidence model applied to landslide susceptibility mapping in a tropical hilly area. Geomatics, Natural Hazards and Risk, 1(3), 199-223.
Saaty, T. L. (1980). The analytic hierarchy process: planning. Priority Setting. Resource Allocation, MacGraw-Hill, New York International Book Company, 287.
Sivakami, C. ; Sundaram, A. (2014). Landslide Susceptibility Zone using Frequency Ratio Model, Remote Sensing ; GIS-A Case Study of Western Ghats, India (Part of Kodaikanal Taluk). Journal of Environment and Earth Science, 4(22), 54-61.
Swets, J. A. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857), 1285-1293.
Teerarungsigul, S., Torizin, J., Fuchs, M., Kühn, F., ; Chonglakmani, C. (2016). An integrative approach for regional landslide susceptibility assessment using weight of evidence method: a case study of Yom River Basin, Phrae Province, Northern Thailand. Landslides, 13(5), 1151-1165.
Van Westen, C. J., Rengers, N., ; Soeters, R. (2003). Use of geomorphological information in indirect landslide susceptibility assessment. Natural hazards, 30(3), 399-419.Yalcin, A., Reis, S., Aydinoglu, A. C., ; Yomralioglu, T. (2011). A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena, 85(3), 274-287.
Yesilnacar, E., ; Topal, T. (2005). Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology, 79(3), 251-266.Yusof, N. M., ; Pradhan, B. (2014). Landslide susceptibility mapping along PLUS expressways in Malaysia using probabilistic based model in GIS. In IOP Conference Series: Earth and Environmental Science (Vol. 20, No. 1, p. 012031). IOP Publishing.Youssef, A. M., Al-Kathery, M., ; Pradhan, B. (2015). Landslide susceptibility mapping at Al-Hasher area, Jizan (Saudi Arabia) using GIS-based frequency ratio and index of entropy models. Geosciences Journal, 19(1), 113-134.
List of Figures
Figure 1 a: Geo-graphical location of the study area, presented with a landslide inventory; b, c and d: Some field views of the landslides considered in the area.
Figure 2a Lithological units map; 1 : scree slope with marly gangue, 2 : Alluvium, 3 : Arable land, slope formation, ancient alluvium and undetermined Quaternary, 4 : Limestone, 5 : Clays, conglomerates, sandstones and limestone, 6 : sandstones, 7 : Miocene predominantly marly, 8 : Sandstones and conglomerates, 9 : Limestone with inocérames and marl with Globotruncana, 10: Biomicrite with Globotruncana and black marl-limestone in platelets, 11 : Marl and Biomicrites to Rotalipor.
Figure 2b Slope angle map of the study area
Figure 2c Rainfall map of the study area
Figure 2d Elevation map of the study area
Figure 2e Slope aspect map of the study area
Figure 2f Distance to river map of the study area
Figure 2g The NDVI map.
Figure 3 Landslide susceptibility maps using a: frequency ratio model b: logistic regression model.
Figure 4 Histogram of landslide area and the susceptibility class generated by LR and FR models.
Figure 5 AUC curves representing quality of models: a. success rate; b. prediction rate
List of Tables
Table 01 spatial relationship between landslide conditioning factors and landslide events using Frequency Ratio and Logistic Regression models.
Table 02: Landslide distributions in different susceptibility classes.