Background & Summary

Land Use and Land Cover (LULC) play a key role in environmental planning, management and monitoring. Accurate LULC information is key for evaluating potential risks to ecosystems and biodiversity, ensuring food security, mitigating natural hazards, and facilitating effective urban planning. LULC maps are often used as an indicator or a proxy of natural and economic processes in environmental modeling. For instance, they are used as inputs in models aiming to map population distribution1,2, poverty or income3,4, ecosystem services (carbon storage, water yield, etc.)5,6, ecological accounting7.

Over the last decades, remote sensing and satellite products have revolutionized the detection and mapping of LULC, as they provide a spatially extensive, multi-temporal and time saving source of information about LULC8. Earlier LULC mapping studies have intensively used medium and low-resolution earth observation satellites, such as LANDSAT (MSS and TM), ASTER, MODIS, SPOT, but with important limitations. First, they often lead to confusion between land-cover types because of a limited number of spectral bands to distinguish them. Second, they poorly captured changes in vegetation overtime, because of low return frequencies. And finally, they showed a limited ability to capture fine details and small-scale features on the Earth’s surface because of their rough spatial resolution9,10,11. New satellites, such as Pléiades, Landsat 9, Sentinel-2, with high return frequencies of multitemporal products, large multispectral sensors and very high-resolution imagery address the above-mentioned limitations and offer new opportunities to LULC mapping12,13.

The methods used for classifying LULC from remote sensing products have also considerably evolved in the recent years, with machine learning algorithm driving the latest developments in LULC mapping. Techniques such as Random Forest, Support Vector Machine, and Artificial Neural Networks were found to significantly improve the accuracy and efficiency of traditional approaches, that historically relied on manual interpretation of satellite imagery or simple spectral analysis8,14,15. Machine learning algorithms are very flexible regarding input data, which enables them to process multisource remote products - including LiDAR, radar, and hyperspectral imagery - of varying resolution and spectra. In addition, they allow a full automation of the classification process and enable efficient analysis of large volumes of data.

Recently published high resolution global LULC datasets are making use of new remote sensing products and advanced machine learning classification algorithms. For instance, WorldCover, launched in 2020 by the European Space Agency, is an open-access global land cover map at 10 m resolution, including 11 classes, based on both Sentinel-1 and Sentinel-2 images16. Other similar initiatives include GlobeLand3017, ESRI 2020 global LULC map18 or Google Earth Engine Dynamic World NRT19. While these global datasets have the advantage of providing new information about countries with limited data until now (e.g. South America, Africa), they often contain limited number of LULC classes, and show varying levels of accuracy, strongly depending on ecological biomes19,20. Indeed, the main challenges to LULC mapping consist in the detection of specific ecosystems, such as wetlands or mangroves and the detection of small-scale features, such as agro-forest mosaics, urban areas, dispersed settlements. Integrating multiple sources of remote sensing products, at different time periods to capture changes in vegetation, with precise in-situ data is often mentioned as the way to improve their detection9,21,22.

The aim of this study is to apply a new method, the MORINGA processing chain, to generate a high resolution and detailed (21 LULC classes) LULC map for the greater Mariño watershed (Peru) in 2019, using the most recent remote sensing imagery (Sentinel-2 and Pléiades) and a random forest algorithm. The greater Mariño watershed is an important area for biodiversity conservation and water management in the Andes, and accurate LULC mapping is crucial for informed decision-making about natural resources. Identifying changes in LULC over time, will allow for more effective management and conservation efforts, and will facilitate better management and conservation strategies. With this study we also contribute to the efforts to develop standardized and replicable methodologies for high resolution, and high accuracy LULC mapping.

Material and Methods

Study site

The greater Mariño watershed stretches over 403 km2 along the eastern slopes of the Southern Peruvian Andes, in the region of Apurimac, Peru (Fig. 1). The local climate is dry and hot in the interandean valleys and cold and humid on the highlands. Annual precipitations are also highly variable, with a dry season (June to August) characterized by lower rainfalls in contrast with the wet season (December to march)23. The elevation varies from 1614 to 5180 m, with very diverse landscapes and ecosystems: dry forests, glaciers, wetlands (bofedales) and more than a dozen of high-elevation lakes. Approximately 70000 people live in the watershed, mostly in two urban areas, Abancay and Tamburco. Agriculture at high and mid elevations is subsistence oriented, whereas at low elevations both crop and livestock farming are commercially oriented and more intensive. There are also tourism activities in the Ampay Forest Sanctuary, which protects 36 km2 of land. Like other mountain social-ecological systems, the greater Mariño watershed provides important but vulnerable ecosystem services that contribute substantially to people good quality of life in the area. Some landscape planning instrument oriented toward biodiversity and ecosystem conservation have been implemented in the past, or are under implementation. These include, for example, the creation of a protected area (the Ampay National Sanctuary), a payment for hydrological services, and several nature-based solution programs, such as reforestation schemes or wetland restoration projects24.

Fig. 1
figure 1

Map of the greater Mariño watershed.

Overview of the MORINGA processing chain

The LULC classification was produced thanks to the MORINGA processing chain, a supervised object-based LULC classification methodology/technique using multi-sensor satellite imagery25. It has been applied recently to several tropical agrosystems of the world, including La Réunion island12, Madagascar26,27, Senegal28, Haiti29,30. The MORINGA chain is composed of four steps (1) segmentation of a Very High Spatial Resolution (VHSR) satellite image (such as Spot 6/7 or Pléiades); (2) object level extraction of spectral and textural predictors derived from several High Spatial Resolution (HSR) satellite images (such as Sentinel-2, or Landsat 8) at different dates, along with the VHSR satellite image and other remote sensing products (such as DEM); (3) training and validation of a random forest classifier using a field database (possibly at different levels of a LULC nomenclature); (4) application of the classifier to the whole study area to map LULC (Fig. 2). The pre-processing of satellite images, so that they can be used at steps 1 and 2, is also part of the MORINGA processing chain.

Fig. 2
figure 2

Overview of the MORINGA processing chain.

The MORINGA processing chain is compiled within a Python 3.8 environment and relies mainly on the GDAL/OGR library and the Orfeo ToolBox (OTB) version 7.2 (https://www.orfeo-toolbox.org). It is complemented with custom modules for specific steps (e.g., for computing reasons the calculation of object-based statistics at step 2 makes use an ad-hoc C++ module, “obiatools”, whose source code is available at https://gitlab.irstea.fr/raffaele.gaetano/obiatools). Some pre-processing steps are also performed out of the Python under QGIS (e.g. slope calculation). The source code of the Moringa processing chain is available at https://gitlab.irstea.fr/raffaele.gaetano/moringa. The implementation of these different steps in the greater Mariño watershed, as well as the satellite images used are described more in detail in the following sections.

Field database and land-use land-cover nomenclature

Fieldwork was carried out in May and June 2016, which corresponds to the beginning of winter and the dry season (i.e. the end of the peak of the growing season), and in agricultural areas, to the beginning of harvest. Sampling sites were selected through a mix of systematic sampling (points distributed in all the study area to capture the altitudinal gradient effect) and stratified sampling (to ensure that sufficient observations are collected for each LULC class). Some sampling sites were located outside the greater Mariño watershed - while maintaining a close proximity - in order to sample specific LULC classes which are scarce in the study area (e.g. Pine plantations, Polylepis sp. forests). At each sampling site, we recorded GPS coordinates, took pictures in the direction of the four cardinal points, and registered the vegetation and species observed. Each sampled site was then digitalized into a polygon delimiting a plot with homogeneous LULC inside, which was classified into one of the categories presented in Table 1 (level 3). This nomenclature is aligned with other LULC maps provided at national31,32 or regional33,34 scale. The VHSR image was used for the delineation of the polygon, based on photointerpretation. 1698 polygons composed the final field database, covering a total of 16.75 km2 (Table 2, Table SM 2).

Table 1 Description of the land-use and land-cover classes.
Table 2 Three level LULC nomenclature.

Satellites images and their pre-processing

Topography

TanDEM-X Digital Elevation Model (DEM) was obtained thanks to the European Space Agency (ESA), through its scientific research support program. TanDEM-X is part of ESA Third Party Missions Programme, that comprises 50 satellites dedicated to earth observation (https://earth.esa.int/eogateway/missions/terrasar-x-and-tandem-x). TanDEM-X is almost identical to its twin, TerraSAR-X, with which they fly on close formation to produce high accuracy and resolution elevation models (12 m spatial resolution), thanks to a powerful radar system: Synthetic Aperture Radar (SAR)35. Pixels with no data were filled with mean elevation of the study area, using OTB BandMathX application.

Very high spatial resolution (VHSR)

We used three Pléiades images (of different sizes) acquired on the 7th of octobre 2019 (i.e. at the end of the dry season) simultaneously for both the panchromatic and the multispectral mode, at a spatial resolution of 0.5 and 2 m respectively (Table SM 1). These images are distributed commercially by AIRBUS Defense and Space at primary geometric processing level and a basic radiometric processing (12-bit native). The access to the Pléiades images was funded and facilitated by DINAMIS, a French institutional data hub that provides an access to high and very high resolution optical and radar data (https://dinamis.data-terra.org). DINAMIS is part of the Data Terra national research infrastructure, whose main mission is to develop an integrated platform for Earth system data, services and products (https://www.data-terra.org).

Pre-processing consisted in (1) the calculation of Top Of Atmosphere (TOA) reflectance, by correcting distributed images for sensor calibration and radiation incidence, and (2) the orthorectification of images using TanDEM-X DEM (with OTB OrthoRectification application). The three pre-processed tiles of Pléiades panchromatic and multispectral images were then mosaicked, and finally, the two resulting mosaics were pansharpened using the Bayesian fusion algorithm (OTB Pansharpening application), to obtain a multispectral image at 0.5 m spatial resolution. Pléiades multispectral image at 2 m resolution was then no longer used in the processing chain (only the pan-sharpened image at 0.5 m resolution is used).

High spatial resolution (HSR)

We also used a time series of 333 Sentinel-2 images, acquired between the 1st of January 2018 and the 30th of October 2019 to capture the vegetation dynamics all along the year before the acquisition date of the Pleiades image (Table SM 1). Sentinel-2 images are provided by two satellites (Sentinel-2 A and B), deployed by the European Space Agency (ESA) and the Copernicus program. The time span between the acquisition by either satellite is five days. The images were downloaded free of charge through the PEPS platform (https://peps.cnes.fr) at level 1 C (i.e. orthorectified TOA reflectance). The Sen2Cor (https://step.esa.int/main/snap-supported-plugins/sen2cor/) atmospheric correction processor for Sentinel-2 images allowed to obtain a level 2 A Bottom-Of-Atmosphere (BOA) reflectance product from distributed level 1 C images, as well cloud, cloud shadows and snow masks36.

Two Sentinel-2 tiles (T18LYL and T18LYK) were necessary to cover the whole study area: they were mosaicked to generate a time series of Sentinel-2 mosaics at different dates. Although already orthorectified, Sentinel-2 images were also readjusted to the VHSR Pleiades image using OTB HomologousPointsExtraction application with red band (Pléiades band 1, Sentinel-2 band 3) as a reference (step 2 of MORINGA processing chain). To eliminate clouds, we created synthetic images every 20 days (gapfilling processing, ImageTimeSeriesGapFilling OTB application). The final Sentinel-2 time series is thus composed of 22 synthetic images, from the 25th of July 2018 and 5th of October 2019.

LULC classification with the MORINGA processing chain

Predictors calculation: topographic, textural and spectral indices

Several indices were calculated from the VHSR and HSR images (Table 3), to be later used as predictors in the classification model. Following previous studies, four textural indices developed by Haralick37 were calculated using the panchromatic Pléiades image27,38,39. Textures are important for detecting landscape patterns, such as tree or crop rows, easily detectable in the VHSR image. Textural indices were computed thanks to HaralickTextureExtraction OTB application. Four sizes of sliding window were used for each index, with radius values of 1 (i.e. a sliding window of 3 × 3 pixels), 5 (11 × 11 pixels), 11 (23 × 23 pixels) et 21 (43 × 43 pixels) (Table 3).

Table 3 Textural and spectral indices computed from VHSR and HSR images.

Nine spectral indices were also calculated from Pléiades pansharpened image and from the Sentinel-2 time series of synthetic images (Table 3), using OTB RadiometricIndices application. Sentinel-2 sensor delivers 13 spectral bands, ranging from 10 to 60 m resolution, but only the 10 bands with a resolution of 20 m or less were exploited in this study (i.e. three 60 m resolution bands were discarded), as direct predictors in the classification model, but also to compute 6 spectral indices that are commonly used to characterize and classify LULC (Table 3).

Finally, slope was calculated from TanDEM-X DEM with QGIS and used as a predictor in addition to elevation. To classify LULC, we therefore used a total of 352 Sentinel-2 derived predictors ( = 22 dates * 10 bands + 22 dates * 6 spectral indices), 20 Pléiades derived predictors ( = 4 spectral indices + 4 textural indices * 4 radius) and 2 TanDEM-X derived predictors (elevation and slope).

Object detection by segmentation of the VHSR image

For the segmentation of Pléiades pansharpened mosaic, we used a method proposed by Baatz and Schäpe40 and implemented in OTB LargeScaleGenericRegionMerging remote application, available at https://gitlab.irstea.fr/remi.cresson/LSGRM41. Various tuning tests were performed on different sub-regions of the study area before selecting the following values (tested values are indicated between brackets):

  • scale parameter: 350 [70–450]

  • weight parameter on the shape: 0,3 [0.1–0.8]

  • weight parameter on compactness: 0,7 [0.5–0.7]

This segmentation step partitions the image into homogenous objects and extracts their contours. The geometries delimited in the image were exported as a shapefile, and for each constitutive element (i.e. for each object of the segmentation), we extracted the mean values of each of the 374 textural, spectral and topographic predictors presented in the previous section. The segmentation was then intersected with the polygons of the field database for which LULC was recorded/identified, and for each element of the intersection the mean values of predictors were also extracted, which composed the training dataset of the classification algorithm (35 392 training elements). Extractions were made thanks to the C++ “obiatools” module.

Random forest training

The random forest algorithm was used to classify LULC from the training dataset produced at the previous step. This algorithm is based on an ensemble of classification or regression decision trees, each created using random subsets of predictors and training data, whose predictions are combined by majority voting or averaging42,43. Over the last two decades, the use of random forest for remote sensing applications has received an increasing attention due to its capacity to handle large datasets (of observations and predictors) and missing data, its processing speed, and high accuracy8,14. Applications focused for instance on mapping LULC27,44, vegetation biomass45,46, urban areas47,48 and habitat quality and health49,50,51.

One random forest model was trained at the level 3 of the LULC nomenclature, using OTB TrainVectorClassifier application, and the following tuning options:

  • Maximum depth of the tree: 25

  • Minimum number of samples in each node: 10 (OTB default value)

  • Cluster possible values of a categorical variable into K <  = cat clusters to find a suboptimal split: 10 (OTB default value)

  • Size of the randomly selected subset of features at each tree node: square root of the total number of predictors (OTB default value, in this application: \(\sqrt{374}=19.34\))

  • Maximum number of trees in the forest: 800

  • Sufficient accuracy (OOB error): 0.01 (OTB default value)

All observations in the training dataset whose size was greater than 25m2 were used for training the classifier, which was then applied to each element of the segmentation for which we extracted predictors values, in order to generate a level 3 LULC classification.

Predictors importance (also called variable or feature importance) was calculated in order to highlight which predictors contributed more to the classification, and were the most influential. Predictor importance is commonly used as a tool for interpretating machine learning algorithms and explaining how particular predictions are made52. Predictors importance were calculated using Python module scikit-learn, and a random forest model-specific importance score based on mean accumulation of impurity decrease (https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html).

Elevation showed the highest importance, then followed by two textural indices (Haralick contrast with radius of 21 and 11), and two vegetation and water spectral indices derived from Sentinel-2 HSR images in August 2019 (Fig. 3A). Slope also appeared as an important predictor, which suggests that considering topography is crucial for LULC classification in areas of high relief such as the Andes. Half of the 16 textural indexes were among the most important predictors, which also indicates that Pléaides-derived textures drove the LULC classification and explained a large amount of our training dataset variance. Finally, several Sentinel-2 spectral indices and bands at different dates were among the most important predictors, which underlines the importance of considering time series of multispectral images for characterizing vegetation dynamics during the classification process.

Fig. 3
figure 3

Interpretation and validation of the random forest classifier. (A) Importance scores of the selected predictors for the level 3 classification. (B) Boxplots of the F1-scores obtained during the cross-validation (level 3 of the LULC classification). For comparison, the F1-scores obtained by comparing the MORINGA classification to the full field database are represented in red.

Post-processing procedure

The post-processing of the LULC classification produced by the MORINGA chain consisted in four steps: (1) conversion to raster; (2) smoothing by majority filter; (3) cross-checking with GIS data and (4) manual correction by photo-interpretation. All post-processing operations were conducted at the finest level of the classification (level 3), and then scaled-up thanks to the nested structure of the nomenclature.

First, the vectorial classification obtained with the MORINGA chain was converted to a raster format, at the resolution of Pléiades’ pansharpened and panchromatic images (0.5 m). The resolution of the Pleiades image was preferred over that of the Sentinel images (10 and 20 m), as the Pleiades image is the one used for the construction of the field database (polygon delineation based on Pléiades image photointerpretation), and segmentation, which are two crucial steps for the supervised classification. As the object were identified at a 0.5 m resolution, it is essential to convert the MORINGA classification into a raster at the same resolution to ensure their integrity. Indeed, the 0.5 m resolution allowed to preserve the isolated landscape features identified during segmentation (such as rural buildings, or roads): they would be merged with neighboring LULC classes with a rasterization at lower resolution.

Second, a majority filter resampling was used to remove isolated pixels and smooth out the classification contours, with OTB ClassificationMapRegularization application and a radius of 3 (corresponding to a 7 × 7 pixels sliding windows). This smoothing only removed objects whose size was inferior to 1.75m2 (in comparison the size of a residential house in rural areas is approximately 10m2), and therefore did not alter the identification of the isolated landscape features mentioned above.

Then, we cross-checked the LULC classification with external data sources to detect unexpected behavior of the MORINGA classifier. For each LULC class of the nomenclature at level 3, specific GIS references, all accessible in open-access, were identified (Table SM 3) and intersected with the classification to highlight potential errors. All disagreements between the classification and the reference GIS data were systematically inspected and eventually corrected manually by photo-interpretation of the Pléiades image, using the Thematic Raster Editor (ThRasE) a QGIS Python plugin that allows flexible and fast raster editing (https://plugins.qgis.org/plugins/ThRasE/). For instance, crops and pastures classes were compared to the map of agricultural areas (https://siea.midagri.gob.pe/portal/informativos/superficie-agricola-peruana) developed by the Peruvian ministry in charge of Agriculture53, and water bodies to the Global Surface Water Explorer54. Other data sources were provided by the European Commission, the Peruvian Ministry of Agrarian Development and irrigation, the Ministry of the Environment of Peru and the OpenStreetMap community.

Finally, the classification was carefully screened using the tile-by-tile navigation option of ThRasE (tile size of approximately 4 km2), and the Pléiades image as a reference (with true and false color composites to highlight vegetation areas). All the classification errors detected were manually corrected. The road network LULC category was added at this stage, by combining elements of the classification from different LULC classes (built-up areas mainly, but also other land use classes at lower percentage). OpenStreetMap data was used to confirm the location of photo-interpreted roads55.

Vectorization

The post-treated classification raster was converted to a vector database using the Raster To Polygon conversion tool from ArcGIS Pro, with the polygon simplification option activated to smooth contours. The Repair Geometry tool was then applied to inspect polygons for geometry problems and repair them, with the “Delete Features with Null Geometry” option set off.

Data Records

The final LULC classification (Fig. 4) and its description is available at the Recherche Data Gouv repository under the CC BY 4.0 license, in both raster and vector format (https://doi.org/10.57745/DDP1ZR)56. The raster format is only provided for the level 3 of the LUCL nomenclature at 0.5 m resolution, but the three nomenclature levels are provided in separate layers of the geopackage file (Table SM 5). The field database used to train the random forest is accessible at the same repository and under the same license; this dataset contains LULC observations at the three nomenclature levels, in a geopackage file (Table SM 6). All three datasets are delivered in the local UTM projection (WGS 84 UTM 18 S, EPSG code 32718).

Fig. 4
figure 4

Three-level classification of land-use land-cover in the greater Mariño watershed (Peru) in 2019.

Technical Validation

Random forest cross-validation and performance metrics

In the random forest algorithm, the subset of training data left out from each tree (also called Out-Of-Bag -OOB- observations) can be used for assessing the prediction error rate, yielding the so-called OOB error, a measure of the classifier performance. Random forests can therefore be trained and validated using all available observations. However, as some noted, this approach can lead to a biased estimation of performance, because of overfit and because it does not consider the size of training observations57,58. In this study we therefore decided to implement, in addition to OOB error, a second approach for estimating the random forest classifier performance, based on cross-validation.

Cross-validation is a procedure to estimate classification performance, where the training dataset is split into K separate folds. For each fold k, a random forest model is trained on the K-1 other folds (i.e. excluding k fold data), and then applied to the k fold data, to assess its performance, taking into account the size of training observations. It is worth noting that the K models developed during the cross-validation procedure are slightly different from the overall model fitted using all observations from the training dataset, as they are trained with only a subset of the data: the objective is not to generate final predictions (i.e. final LULC classification), but to evaluate the quality of the classification model58. In this study we implemented a 5-fold cross-validation, and we estimated the quality of the classification in each fold using the following performance metrics (that were calculated on training observations weighted by their surface, Fig. 3B). The same metrics were also calculated before and after the post-processing, considering all observations available in the training database (Table 1).

  • F1 score, a harmonic mean of the precision and recall, ranging from 0 to 1, computed for each LULC class separately59.

  • An overall accuracy score, computed as the average of each LULC accuracy score (corresponding to the total surface of correctly classified objects divided by the total surface of training observations)59.

  • Cohen’s kappa, which reflects level of agreement between the proposed classification, and a random one60.

  • Pontius’ quantity disagreement (Q, which measures the differences in the proportion of area or quantity of each LULC class), allocation disagreement (A, which measures the measures the differences in the spatial arrangement or allocation of each LULC class) and total disagreement (D, calculated as the average of Q and A). Pontius metrics have been proposed to address some of the limitations of Cohen’s kappa, by explicitly considering the spatial allocation of LULC classes, distinguishing between false positives and false negatives, and not assuming that the disagreement is due to chance61.

We used Python module sklearn.metrics (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.htm) to calculate F1 score, accuracy and Cohen’s kappa during cross-validation. Pontius’ metrics (A, Q, and D), were manually calculated for each fold validation observation, by generating a corresponding confusion matrix for, using OTB ComputeConfusionMatrix application. The same application was used to calculate all performance matrix before and after post-treatment, considering all available observations from the training database.

Corrections applied to the classification during post-processing

Corrections were applied to 8.5% of the study area (a map locating the exact changes is provided in Figure SM 1). The most frequent error was crops confounded with dry shrublands and semi-arid steppes of the valley (15% of the area corrected during post-processing) (Table 4). Mixed shrublands, found at higher elevation and grasslands were often misclassified into wetlands (14% of corrections), and eucalyptus plantations confounded with other types of woodlands (8% of the corrected surface). Frequent confusions were also observed between types of shrublands (11% of total corrections).

Table 4 Corrections applied to the MORINGA classification (rows) during post-treatment (columns).

Some LULC classes, that did not cover large portions of the study area, showed higher levels of post-treatment corrections (Table SM 4). For instance, 77% of the areas classified as beach and riverine rocks by the MORINGA were confounded with rocks and natural bare soils. And 57% of the area classified as lakes were indeed rocks and natural bare soils. The confusion between surface water and bare soils can be explained by relief and shadow effects, as observed in other publications62,63,64. The presence of clouds on the Pléiades image affected the quality of the segmentation in small areas of the study site: the contours of the objects affected by clouds were corrected manually during this post-processing stage. The confusion between riverine rocks and bare soils is due to the close resemblance of their multispectral signal and suggest that other topographic parameters could be added to the MORINGA predictors, such as distance to river network, to improve the distinction between these LULC classes.

Final classification validation

The overall accuracy (i.e. the arithmetic mean of F1-scores from each LULC class) and Cohen’s K index showed a very high agreement between the post-processing map and the training database. The final level of disagreement quantity obtained after post-processing (Pontius Q), was of 0.0042, while the allocation disagreement (Pontius A) was of 0.0053 (Table 1). This means that most of the disagreement (approx. 60%) is explained by the precise location of the different LULC classes in the maps (Pontius A), and not each LULC class relative importance (Pontius Q). Pontius total disagreement (D) disagreement) was very low, which confirm the strong agreement between the post-processing map and the training database.

The slight decrease of overall accuracy and Cohen’s K index observed after post-processing can be explained by changes in F1-score in two LULC classes (Table 1): “Beach and riverine rock” and “Fruit crop”. Fruit crops are among the most complicated classes of LULC to detect, along with wetlands, small-scale fields, and urban areas, that machine learning algorithms typically tend to misidentify65,66,67,68. For “Beach and riverine rock”, the change in accuracy can be explained by an error in the training database, where a polygon of 5451m2 was wrongly classified as “Beach and riverine rock” instead of “Rock and natural bare soil”, among the 14 polygons identified as “Beach and riverine rock” areas in the training database (Table SM 2).