Background & Summary

Environmental degradation and climate change represent a significant threat to life. To address these, the European Union (EU) established the Green Deal1, a set of policies aimed at making the EU the first climate-neutral continent by 2050. The protection of biodiversity is one of the topics covered by these policies, and the EU has strived towards conservation initiatives to stop the loss of biodiversity. Specifically, based on the European Directive 92/43/EC “Habitat”2, EU created the Natura 2000 Network, one of the world’s largest systems of protected lands3,4,5. The evaluation of the conservation status of the habitats and species included in the Annexes to the “Habitat” Directive is crucial for preventing biodiversity loss6,7,8.

According to the “Habitat” Directive, the conservation status of each habitat type has to be assessed by four criteria (area, range, structure and functions, and future prospects). Among them, the “structure and functions” parameters refer to the physiognomical/structural component of the habitat and the ecological processes occurring at different temporal and spatial scales which are necessary for the long-term maintenance of habitat itself6. The European and national guidelines6,9 state that these parameters can be monitored combining floristic and vegetation data collected at local scale (e.g. Natura 2000 sites).

In this work we focus on Annex I habitat “6210(*) Semi-natural dry grasslands and scrubland facies on calcareous substrates (Festuco-Brometalia) (*important orchid sites)”, one of the most diverse and species-rich in the world, after tropical rainforests10, whose maintenance depends on traditional practices such as extensive grazing and mowing11. Monitoring the “structure and functions” of this habitat entails the recording of several parameters, such as total vegetation cover, presence and coverage of dominant species, typical species (in particular orchid species), indicator species of disturbance and dynamic phenomena, alien species8,12. Typical species, hereinafter TS, “should be selected to reflect favourable structure and functions of the habitat type”6. This means that, TS should simultaneously be: i) a good indicator of the favourable habitat quality; ii) unique of the habitat or present over a broad portion of the habitat range; iii) sensitive to changes in the habitat conditions6,13. Actually, given the variability of Annex I habitats, and in particular the wide floristic diversity of grasslands, there are no officially recommended lists of TS, neither at European nor at national scale, and it is recommended to point out the target TS at regional or even local scale14. The presence of orchid species is generally considered as an indicator of favorable conservation status, as well as their eventual abundance in the recorded area is particularly relevant as indicator of the priority status of the habitat 621015,16,17,18,19.

The concept of “early warning species” (hereinafter EWS) applies to those plants whose settlement in the habitat patches is an indication of ongoing processes altering the structure and function of the habitat itself; this concept applies also (perhaps mostly) to non-typical taxa, whose settlement can be a clear sign of transformation. In this sense, non-typical EWS can be used as very informative proxies, pointing out the first community functional shifts20. Among the signals addressing grassland habitat degradation, the appearance and establishment of edge and mantle species is considered an important early warning indicator that should be monitored, as it represents ongoing dynamic processes that can lead, in a more or less short time, to a complete habitat alteration9.

The standard collection of vegetation data suitable for the assessment of grassland conservation status is generally performed by a sampling field relevé9,14. According to the national standards9,14, the relevé must be carried out in randomly located 4 × 4 m2 homogeneous plots, whose number is proportional to the total habitat surface. The recommended area 4 × 4 m2 complies with the EU standards for grasslands sampling21. The data acquisition and assessment of the habitat conservation status are typically performed only by human operators, a time and cost inefficient approach. Additionally, funding for biodiversity preservation is often not adequate22,23. Given these observations, the goal of the Natural Intelligence project (https://www.nih2020.eu/) is to enrich the human monitoring capabilities through the use of robotic systems. In particular, we use legged robots to acquire information on the habitat mimicking the surveys performed by humans. The choice of this type of robot is a trade-off between mobility and battery duration.

In this dataset, we report the data collected during a monitoring mission in the occurrences of the habitat 6210 in Valsorda, Gualdo Tadino (PG), Italy (Fig. 1), inside the Natura 2000 SAC IT5210014, employing a quadrupedal robot ANYmal C24 (Fig. 2). The data have been collected by a team of robotic engineers and plant scientists. The information included in this dataset can be divided into two groups. The first one is a collection of videos and images of three indicator plant species acquired from the robot cameras. The second group includes data related to the autonomous monitoring missions. Specifically, we report the point cloud of the area, the robot state information, and the acquired videos and pictures.

Fig. 1
figure 1

Location of the data acquisition campaign.

Fig. 2
figure 2

ANYmal C robot collecting data in the habitat 6210. The picture highlights the sensors equipped on the robot for data collection: one Velodyne VLP-16 puck lite, two FLIR Blackfly BFS-GE-16S2C-BD2 wide angle cameras, and four Intel RealSense D435 RGB-D cameras.

This dataset has a multidisciplinary scope and can be used by researchers in several fields. For instance, point clouds and information about the robot state could be used by robotic engineers to test or validate their own methods as well as benchmark the robot performance. On the other hand, plant videos and images recorded by the robot could be used by botanists to assess the quality of this information as well as the habitat’s condition, or by computer scientists interested in testing their Artificial Intelligence (AI) algorithms for species detection and classification, as well as for exploring new possibilities for integrated human-AI monitoring.

To the best of authors’ knowledge, this is the first publicly available dataset on robotic habitat monitoring.

Methods

The gathering of the full dataset - comprising monitoring missions and typical and early warning indicator species - was carried out by a team composed of both robotic engineers and plant scientists in Valsorda, Gualdo Tadino 06023 (PG), Italy, inside the Natura 2000 SAC IT5210014 (Fig. 1). This location was chosen because it represents a notable stand of the target habitat 6210 in central Apennines. The period of data acquisition was from 10th to 13th of May, 2022. This month was selected because, as stated in the national guidelines for the habitat monitoring12, the optimal sampling period for habitat 6210 covers the time range between May and August, when the majority of the composing species reach their maximum development and perform flowering. Considering the particular role played by orchid species in this habitat type15 and based on knowledge of local flora and vegetation, the data collection has been executed in May as an ideal period to detect both the target TS and EWS species in the field. May is the best period to monitor their occurrence, since they all typically flower between April and June25. In particular, orchid species are practically indistinguishable in the early stages of their life cycle, when only basal leaves occur, but become clearly recognizable at the time of flowering.

The platform used for data acquisition is the quadrupedal robot ANYmal C24 (Fig. 2) produced by ANYbotics AG. This robot can move both in an autonomous or teleoperated manner. Information about the environment are collected through a LiDAR sensor and four RGB-D cameras. The former is a LiDAR Velodyne VLP-16 puck lite (https://velodynelidar.com/products/puck-lite/), which acquires a 3D map of the environment. The cameras, instead, are four Intel RealSense D435 cameras (https://www.intelrealsense.com/depth-camera-d435/) that are able to collect full HD RGB images and/or 30 fps videos. Figure 2 depicts the location of the sensors: the LIDAR is placed on the rear part of the system, while the four cameras are placed one per side. The platform is equipped also with two wide angle FLIR Blackfly BFS-GE-16S2C-BD2 cameras, which were not used for data gathering. The information about the robot state are collected via the ROS interface of the robot and saved as ROS bag files. For more details about this particular file type please see the official ROS bag reference (http://wiki.ros.org/rosbag).

The dataset contains two different sets of data: (i) typical and early warning species data and (ii) monitoring mission data. Both sets of data are collected with the same device and are related to habitat 6210. However, their content and goal are different. The first batch of data contains categorized pictures and videos of indicator species for habitat 6210. Each species occurrence has been chosen independently and has the only goal of being an example of the indicator species. The second batch of data contains information about habitat surveys performed by the robot using procedures similar to the ones followed by botanists. Both methodologies are described in detail in the following sections.

Typical and early warning species data

This part of the dataset includes pictures and videos of indicator species for habitat 6210. We selected as indicator species three plant taxa with a key role as explanatory proxies of the habitat 6210 conservation status. They are specifically two TS and one non-typical EWS. All data were categorized by botanists expert of the local flora.

Following the guidelines in6, and in accordance with the suggestion to select the TS of the habitat 6210 at local level12,14, we selected two typical orchid species, among those mentioned in the national Interpretation Manual16: Anacamptis morio (L.) R.M.Bateman, Pridgeon & M.W.Chase and Dactylorhiza sambucina (L.) Soó (with both the yellow and pink forms). Furthermore the selected TS are among the most frequent species of orchids occurring in the habitat 6210 at regional level26.

For what concerns EWS, in the Apenninic extents of habitat 6210, emblematic examples are Asphodelus macrocarpus Parl., Pteridium aquilinum (L.) Kuhn, Brachypodium rupestre (Host) Roem. & Schult., and Brachypodium genuense (DC.) Roem. & Schult27,28,29. Among these species, we chose Asphodelus macrocarpus Parl., a rhizomatous geophytes species with rapid vegetative growth, which typically expands from heliophilous forest edges colonizing grassland habitats. The occurrence and the impact of this species on habitat 6210 in the central Apennines is well documented, especially when traditional agropastoral activities are underpracticed or abandoned27,30,31. Please note that the nomenclature of plant species is in accordance with the World Flora Online portal32.

To acquire the data, first, plant scientists identified the instances of the indicator species in the study area employing the diagnostic keys in25. Then, an expert human teleoperated the robot, placing it in front of the chosen instance. This may include one or more of the three species. Finally, the video acquisition was manually started and a video of at least 900 frames was recorded together with 10 pictures. This procedure has been performed 16 times for each of the indicator species. In the case of Dactylorhiza sambucina, we executed the procedure 16 times for each form: pink and yellow. Pictures and videos contain at least one instance of the indicator species, but they may contain also instances of other species. Table 1 summarizes the entries of this part of the dataset, while Fig. 3 shows a picture example for each of the indicator species.

Table 1 Name of the selected four typical species of the habitat 6210 and number of videos and pictures for each species.
Fig. 3
figure 3

Examples of species pictures taken by the robot. (ac) are typical species, while (d) is an early warning species.

The peculiarity of this part of the dataset is that, to the best of Author’s knowledge, this is the first dataset of classified images and videos of indicator species for habitat 6210 taken by a quadrupedal robot. The main goal of this batch of data is to publicly share data that can be used to design novel classification algorithms or to test on robot-acquired data already developed methods. Given this objective, no other information is provided about these data.

Monitoring mission data

This other part of the dataset contains information about the monitoring missions. The idea is to use the quadrupedal robot ANYmal C to imitate the survey of a standard plot by botanists during field relevé. This means that the robot follows the same standard procedures and collects information about the area, but it does not analyze nor classify it.

Each area to be surveyed is randomly chosen, and the GPS coordinates are recorded by a human operator employing a Garmin GPS Etrex 10. Each coordinate has been recorded with an accuracy of at least 3 m. In addition to location information, also the time, the date and the weather conditions have been noted. Georeferencing is utterly important to compare data with past or future surveys, while date and weather information are useful for having an approximate estimate of the sunlight. Time and date were automatically saved with the robot status, while weather was visually inspected and manually noted.

The actual survey is divided into two phases: (i) mapping, and (ii) autonomous monitoring mission. We define as mapping the creation of a 3D map of the environment. Indeed, in this phase, the LiDAR sensor mounted on the robot is used to scan the surrounding area. During this scanning, the laser sensor determines the distance between the robot position and the obstacles. To improve the digital representation of the environment, an expert human operator guides the robot around, while the LiDAR gathers information. The result of this procedure is a set of points, namely point cloud, which represents the environment, in terms of both terrain and potential obstacles. An external video was also recorded to qualitatively show the whole phase.

The point cloud obtained in phase (i) is essential to enable the robot self localization, which is fundamental to enable the autonomous locomotion. The latter is crucial for phase (ii), i.e., autonomous monitoring missions. In this phase, the robot mimics the habitat relevé typically executed by plant scientists. This means that the robot acquires information on plots with the same size of those surveyed by botanists following the national and international standards. In accordance with these standards9,21, the area to be monitored is 4 × 4 m2, and ANYmal completely examines it. To do that, the robot autonomously moves on a grid, following a set of waypoints that starts on the bottom right corner of the area and ends on the top right one. Figure 4 schematizes the robot motion. During this phase the robot keeps its orientation constant. When a waypoint is reached, i.e., in the center of each square in Fig. 4, the robot stops and takes a picture from each of the four RGB-D cameras. Each camera records also a video for the whole mission duration. In addition to these data, in this phase we recorded also the robot status information and an external video of the whole robot motion. No analysis is performed on the robot-acquired data.

Fig. 4
figure 4

Motion of the robot during the autonomous monitoring mission. The robot starts on the blue square out of the grid. Green arrows indicate the motion of the robot. At each waypoint, i.e., the center of each square, the robot stops and take pictures with all cameras. The yellow star is the final position of the monitoring mission. During the whole mission duration the cameras recorded a video, and the robot status information have been saved.

Table 2 reports information about the three monitoring missions we performed. Since the three areas were close, a single mapping phase was sufficient to enable all autonomous monitoring missions. Table 3 summarizes the robot status information. These are collected using ROS bag files of several topics of the robot control architecture. Robot status information includes, for instance, robot base position, joint position, joint velocities, joint acceleration, joint torque, joint current, and battery status. Details about the topics and the specifications on the datastream output from the ANYmal C robot are available on the website of ANYmal Research (https://www.anymal-research.org/). Figure 6 is an example of (part of) the robot status during part of Plot 1.

Table 2 Date, time, weather information, and georeferencing information of the autonomous monitoring missions.
Table 3 ROS topics recorded during robot motion.

Data Records

In this section we describe the data contained in this dataset, we list the different file formats, and we explain how to visualize them. All data are uploaded on Zenodo33 at https://doi.org/10.5281/zenodo.7385369, while an example code to extract the bag files is uploaded on Zenodo34 and GitHub35.

The tree structure of the full dataset is depicted in Fig. 5. We provide two sets of data together to a README.txt file to navigate through them. The first dataset, named “Typical and early warning species”, is a collection of videos and images of three indicator species, which are TS or EWS for habitat 6210. A README.txt file explains the structure of this set of data. For each species, we provide a “Pictures” and a “Video” folder containing the pictures and the videos of the indicator species. Each species has been recorded 16 times. Table 1 summarizes the number of entries of this set of data.

Fig. 5
figure 5

Directory structure tree of the dataset.

Fig. 6
figure 6

Examples of graphs regarding data extracted from the ROS bag file of Plot 1 (see Fig. 5) performed as in Fig. 4. See also the provided MATLAB code. For the sake of image clarity, only approximately a quarter portion of the entire mission (40 s out 140 s) is shown in these graphs.

The second one, named “Monitoring missions”, contains data related to the mapping and the three monitoring missions. A README.txt file explains the structure of this set of data. The “Mapping” folder contains a README.txt file, a video of the mapping operation and the acquired point cloud. Each of the three “Plot” folders contains three subfolders and a README.txt file with time and date at the start, together with weather information. The three subfolders contain a qualitative video of the operation, the ROS bag files related to the robot state, and the videos and images acquired by the robot. The latter data are named based on the camera used to acquire the information. Specifically, “depth_front”, “depth_left”, “depth_rear”, and “depth_right” refer to the RGB-D camera placed on the front, left, rear, and right side of the robot, respectively. The bag files are instead named with the date and time at the start of the recording.

Data formats

In the following we distinguish data elements depending on the file format.

.jpg

This is a standard file format for images and can be visioned using many classic image viewer software. The jpg files in the dataset are only the images collected by the robot, both during autonomous monitoring missions and during typical and early warning species data gathering.

.mp4 and .avi

These are standard file format for videos and can be visioned using many classic multimedia software. The avi files refer to videos collected by the robot, both during autonomous monitoring missions and during typical and early warning species data gathering. The mp4 files are the videos taken with a reflex digital camera and their goal is only to qualitatively see the robot motion.

.ply and .pb

Point clouds have been saved in two formats: ply and pb. Polygon File Format (ply) is a standard format for 3D models, and it can be opened with several software s, e.g., MATLAB (https://www.mathworks.com/products/matlab.html), but there are also free alternatives like MeshLab. pb format is the default one to save point clouds with ROS Gazebo ANYmal research (https://www.anymal-research.org/), and also the one used to load pointclouds. Thus, depending on the goal, users can prefer ply or pb file format.

.bag

This is the standard format that ROS employs to store robot data. The dataset contains bag files related to autonomous monitoring missions. This data can be visioned through several ROS packages, but also through other means, for instance, using data processing software like MATLAB (https://www.mathworks.com/products/matlab.html). In this latter case, a typical procedure is the following. The bag files contain data saved from the so-called ROS topics (http://wiki.ros.org/rostopic), which are streams of data in the form of ROS messages. The particular list of topics that we saved is shown and described in Table 3. These topics contain all the relevant information about the state of the robot and its main components.

Example code for data analysis

The robot status is stored as bag files. The extraction of the data from bag files can be performed using many ROS packages, or other softwares, e.g., MATLAB (https://www.mathworks.com/products/matlab.html). In Code 1, we provide a minimal example of MATLAB script to analyse the data contained in bag files. To run this example, it is required to have MATLAB (https://www.mathworks.com/products/matlab.html) and ROS toolbox (https://www.mathworks.com/products/ros.html). This code was written and tested on MATLAB R2022a, but it may also work on later MATLAB releases. The function extracts data as a MATLAB structure from a ROS bag file from a specified topic. Again, the list of topics that are available in the data we provide is shown in Table 3. More detailed descriptions of the topics and the specifications on the datastream output from the ANYmal C robot are available on the website of ANYmal Research (https://www.anymal-research.org/), which however can only be accessed by research institutions after a partnership request is accepted by ANYbotics. This notwithstanding, please note that the provided data can be freely accessed without the need for registering to ANYmal Research. Additionally, the code provided in the following section is sufficient to visualize and analyze the data. Partnership with ANYmal Research is only to receive very specific details about the robot and is not required to access and analyze the provided data.

Code 1

Example MATLAB code to extract data from ROS bags.

Technical Validation

Quality assurance during the fieldwork was provided both by the robotic engineers team, i.e., Franco Angelini (FA), Mathew Jose Pollayil (MJP), and Manolo Garabini (MG), and by the plant scientists team, i.e., Federica Bonini (FB), and Daniela Gigante (DG). All Authors oversaw the data acquisition and reviewed the final dataset meticulously checking for inaccuracies and incongruencies. It is also worth highlighting that the shared data are provided raw, without going through any post-processing phase that may alter their soundness. The validity of the data collection is also assured by a set of choices which are detailedly described in the following.

Location selection

The chosen location is included in the Natura2000 Site designated as SAC IT5210014 “Monti Maggio - Nero (sommità)” in 201436, and, as such, the presence of the target habitat has been previously detected, identified and assessed according to the standard European and national protocols9,15,37. Besides wide patches of the target Annex I habitat 6210, the SAC includes four more Annex I habitats, namely 8210, 9210, 9260, 9340 (for a definition of Annex I habitat codes, see2,15). Official data on the occurrence and distribution of these habitats in the SAC IT5210014 are part of its Management Plan, published by the Regional Offices of Umbria38, and also available in the EU Standard Data Form of the SAC39. The habitat maps are part of the Natura 2000 Network Management Plans, adopted by the Umbria Region with40 and subsequently approved by the Regional Council. On the field, DG and FB ensured that the chosen plots for data collection are typical examples of habitat 6210.

Date selection

As described in the Method section, the ideal period to perform the data collection is May. However, being in a natural environment, the precise choice of the best period for sampling mainly depends on the seasonal climatic variations in terms of rain and temperature of the survey area, which influence the vegetative growth. In order to acquire meaningful data, DG and FB tracked the flowering progresses of the target indicator species, starting from mid April, and the dates 10–13 May were selected based on the orchids phenology (e. g. time of full blooming) in the study area.

On field indicator species classification

The section of the dataset named “Typical and early warning species data” contains pictures and videos of the indicator species. As described in the Method section, the selection of the species which are indicators for habitat 6210 has been performed following articles published on international journals6,12,14,26,27,30,31. On the field, plant scientists (DG and FB), who are experts of the habitat 6210 flora, selected and classified the specific instances of the chosen indicator species following the guidelines in25.

Mapping accuracy

The validity of the mapping is directly ensured by the autonomous missions. Indeed, in the mapping phase the robot recorded a digital reconstruction of the surroundings that was then used during the autonomous mission phase. Since the robot did not fail to localize itself in the environment, and it was also able to autonomously walk and perform the monitoring mission, we can infer the point cloud soundness.

Monitoring mission validity

Monitoring missions have been conducted using methodologies that mimic field relevé performed by botanists to ensure the compliance with national and international standards9,14,21. The monitoring algorithm developed by the engineer team establishes that error messages are printed on the terminal when faults occur in the data acquisition process. This fact ensured that no issues were presented during the data acquisition. Additionally, no analysis has been performed on the robot-acquired data. This ensures that no data corruption is present.

Database inspection

At the end of the data collection, the database has been created. Only valid and complete data have been added to the dataset. Once the dataset was created, both teams carefully revised each entry to check their validity. This means that plant scientists inspected each video and picture of the “Typical and early warning species data” directory to ensure their correct folder placement. Analogously, the robotic engineering team inspected the robot status folders and the point clouds folder to ensure their correct folder placement. In particular, MJP ran test scripts to ensure no corrupted file was present.

Usage Notes

The presented dataset is highly multidisciplinary and can be employed by researchers working in several different fields. For instance, this dataset can be exploited to achieve the long term goal of the Horizon 2020 Natural Intelligence project (https://www.nih2020.eu/). This is to assist plant scientists in performing habitat monitoring procedures using legged robotic systems. To achieve this objective, two are the main points that need to be tackled: (i) having a robotic platform able to autonomously replicate the survey procedures suggested by national and international standards; and (ii) having algorithms able to (partially) assess the habitat conservation status.

As described in the Background & Summary section, the habitat conservation status can be related to various parameters, among which there is the presence/absence of typical or early warning species6,8,9,14. For this reason, to archive point (ii), it is necessary to have algorithms able to classify different plant species. The part of the proposed dataset named “Typical and early warning species” can be used towards this goal. Indeed, it provides labeled pictures and videos which are usually necessary to develop and tune new classifying algorithms. For these reasons, this part of the dataset can be very useful for engineers or computer scientists aiming to design new artificial intelligence based methods for detection and classification of plant species, e.g.41. Alternatively, these robot-acquired pictures and videos can be used to validate already existing algorithms that were developed using human-acquired data.

The part of the dataset named “Monitoring mission” can be used to work towards point (i). For instance, LiDAR-acquired point clouds and the robot status can be exploited to advance the research regarding robotic locomotion, navigation, and obstacle avoidance. Additionally, this part of the dataset can be used by botanists to assess the habitat conservation status and to compare them with past or future data from the same plots, or with data from different plots.

The mentioned usages of the dataset are intended as examples. Obviously, many other usages can be found. For instance, instead of using point cloud or robot status data to develop algorithms for habitat monitoring, they can be used to design new methodologies for search&rescue, inspection&maintenance, etc. Indeed, point clouds containing natural environments representations are usually quite useful to validate algorithms tested only in simulation scenarios, decreasing the sim to real gap.

Alternatively, it is possible to exploit the monitoring mission data to build new models. Indeed, these data are georeferenced, so they can be implemented for analysis on remote sensing similarly to42,43. The idea is to analyze the images/videos taken by the robot during the autonomous mission to classify the plant species and then integrate this information into species distribution models or validate previously defined models. The position of the plant during the autonomous monitoring mission can be also precisely estimated by merging GPS information with the robot position estimate.