Block-level vulnerability assessment reveals disproportionate impacts of natural hazards across the conterminous United States

Yarveysi, Farnaz; Alipour, Atieh; Moftakhari, Hamed; Jafarzadegan, Keighobad; Moradkhani, Hamid

doi:10.1038/s41467-023-39853-z

Download PDF

Article
Open access
Published: 14 July 2023

Block-level vulnerability assessment reveals disproportionate impacts of natural hazards across the conterminous United States

Nature Communications volume 14, Article number: 4222 (2023) Cite this article

5487 Accesses
4 Citations
73 Altmetric
Metrics details

Subjects

Abstract

The global increase in the frequency, intensity, and adverse impacts of natural hazards on societies and economies necessitates comprehensive vulnerability assessments at regional to national scales. Despite considerable research conducted on this subject, current vulnerability and risk assessments are implemented at relatively coarse resolution, and they are subject to significant uncertainty. Here, we develop a block-level Socio-Economic-Infrastructure Vulnerability (SEIV) index that helps characterize the spatial variation of vulnerability across the conterminous United States. The SEIV index provides vulnerability information at the block level, takes building count and the distance to emergency facilities into consideration in addition to common socioeconomic vulnerability measures and uses a machine-learning algorithm to calculate the relative weight of contributors to improve upon existing vulnerability indices in spatial resolution, comprehensiveness, and subjectivity reduction. Based on such fine resolution data of approximately 11 million blocks, we are able to analyze inequality within smaller political boundaries and find significant differences even between neighboring blocks.

Urban–rural disparity of social vulnerability to natural hazards in Australia

Article Open access 11 August 2022

Integrating social vulnerability into high-resolution global flood risk mapping

Article Open access 11 April 2024

Limited progress in global reduction of vulnerability to flood impacts over the past two decades

Article Open access 08 May 2024

Introduction

The global increase in the number of recorded disasters caused by natural hazards necessitates the investigation of their adverse impacts and associated risks at the regional to local scale¹. The term risk refers to the combination of hazard, exposure, and vulnerability². All these variables are subject to various types of uncertainty, including those from input data, parameterization, and model structure^3,4. Despite numerous studies on the spatiotemporal variability of hazards and exposure at local, regional, and global scales, the vulnerability aspect is still not well understood and has yet to be refined^{5,6,7,8,9,10,11}. The term vulnerability refers to the circumstances shaped by physical, social, economic, and environmental factors or processes that heighten the susceptibility of an individual, a community, assets, or systems to the effects of hazards¹². Such conditions/processes include but are not limited to predispositions, weaknesses, and lack of coping capacities favoring adverse effects of exposed elements, physical fragility of infrastructure, and blanket descriptors of social characteristics, i.e., age and gender². Accurate quantification of all these components at a relevant spatial scale is key to successful risk assessment due to natural hazards¹³.

Current vulnerability assessment practices provide information at a resolution significantly larger than those that actual impacts are happening. Such a coarse resolution carries significant uncertainty in the estimates, and data at finer resolution have shown significant improvement in overall risk estimates^14,15. Currently, the Centers for Disease Control and Prevention (CDC) provides the finest data on the spatial variability of the social vulnerability index (SVI) at the census tract scale over the Conterminous United States (CONUS)¹⁶. SVI, however finer than other alternatives, helps assess vulnerability at the scale of cities, towns, or other administrative areas. This is despite the fact that different neighborhoods or block groups within a city, based on their different socioeconomic characteristics, could demonstrate different levels of vulnerability to natural hazards¹⁷.

The compilation of data on qualitative and quantitative risk factors and subjectivity in the process of determining the weight/contribution of each factor to the overall risk has remained challenging. For instance, for the SVI, to calculate the overall index for each census tract over the CONUS, the contribution of all variables has been considered to be equal¹⁸. Schmidtlein et al.¹⁹ used principal components analysis (PCA) to examine the sensitivity of quantitative features underlying the social vulnerability index (SoVI) developed by Cutter et al.²⁰. Additionally, to improve the socioeconomic vulnerability index (SEVI), Khajehei et al.¹⁴ and Tanim et al.¹³ applied probabilistic PCA (PPCA), which copes with missing values in the data.

Here, we develop a Socio-Economic-Infrastructure Vulnerability (SEIV) index that helps characterize the spatial variation in vulnerability across the CONUS at the census block level. The SEIV index benefits from already existing social, economic, and infrastructure indicators available via federally supported databases to improve upon vulnerability indices currently used for natural hazards risk assessment. To conduct a more comprehensive assessment and take the accessibility of critical infrastructure during hazardous situations into account, in addition to the commonly used demographic and housing information, for each block, we consider the building count and its distance to the nearest emergency/essential facilities. Using the variance inflation factor (VIF), we detect multicollinear variables and remove them from the dataset to avoid redundant information and alleviate the complexity of the input data. One significant contribution of the proposed SEIV in the process of risk assessment is reducing subjectivity in determining the contribution of various social, economic, and infrastructural components to overall estimated vulnerability. For this purpose, we use a machine learning (ML) algorithm to objectively assign weights to variables contributing to overall estimated vulnerability based on the significance of the contribution to the reported damage in the past. Such an objective weighting scheme helps ensure that the generated results are globally acceptable so the developed tools and techniques are transferrable to other places.

Results

Block-level vulnerability distribution

Current aggregate-level vulnerability indices favor top-down risk mitigation approaches and fail to engage local communities in improving resilience. Demographics and infrastructural characteristics can be drastically different between two neighboring blocks. This means indices that aggregate these features at larger scales (i.e., county level) may not resolve the spatial distribution of vulnerability characteristics well and thus pose significant uncertainty to risk estimates. Decisions made based on such information are not subsequently well-informed and thus are subject to failure and eventually yield to distrust in government. Different indices are currently in use for operational risk management, for example, integrate underlying featuring characteristics over the county of Harris or City of Houston¹⁶ to provide one single vulnerability estimate over the entire unit of interest (county or census tract). Such indices simply fail to locate communities/neighborhoods with an immediate need to pay attention to implementing risk mitigation measures and implement bottom-up approaches for adaptation. In fact, with such aggregate level assessment, there is no way to determine the relative urgency of action and resource allocation for improved resiliency. This yields in one-fit-all top-down governance that simply ignores existing variability in vulnerability dynamics at the community level, which in turn yields an increased likelihood of failure²¹. An index that can resolve heterogeneity of vulnerability to natural hazards at finer scales, i.e., block level, as does by SEIV, plays a key role in effective risk management and has the potential to provide crucial information for stakeholders at different levels for enhanced risk-informed decision making. Figure 1 presents the summary statistics of the percentage of blocks within each state that fall under various vulnerability categories defined by SEIV, along with the spatial distribution of vulnerability for six of the largest metros in the United States. Moreover, such information, when publicly available, will facilitate communications between local community leaders, non-governmental organizations (NGOs), and local/federal governments to build trust, synergize resources over agreed-upon strategies and mobilize resources to increase resiliency against upcoming hazards.

**Fig. 1: Spatial distribution of Socio-Economic-Infrastructure Vulnerability (SEIV) index statistics over the conterminous United States (CONUS).**

Communities across the CONUS are not equally susceptible to upcoming hazards. There exists an inequality in vulnerability at various scales (i.e., between blocks, counties, and states). Figure 2 shows the Lorenz curve²² properties of SEIV index inequality over the CONUS. The Gini coefficient is a measure of inequality within a social group and is calculated based on the difference between the Lorenz curve (the observed cumulative distribution) and the notion of a perfectly equal distribution. The coefficient ranges from 0 to 1, where 0 represents perfect equality, and 1 indicates perfect inequality. The vulnerability Gini coefficient here is estimated to be 0.11, which demonstrates that communities across the CONUS will be disproportionately affected by natural hazards. While the 20% least vulnerable blocks hold a 15% share of vulnerability, the 20% most vulnerable hold 25% of the cumulative share of vulnerability.

**Fig. 2: Lorenz curve properties for the SEIV (socio-economic-infrastructure vulnerability) index distribution over the CONUS (conterminous United States).**

Among the states all over the CONUS, Idaho (84%), Minnesota (82%), Ohio (76%), South Dakota (74%), New Hampshire (0.67%), Utah (67%), Rhode Island (66%), Louisiana (61%), North Dakota (0.60%), West Virginia (0.58%), and Alabama (57%) are the top 10 states with the highest percentage of blocks with high and very high SEIV indices. This is despite the fact that Ohio and Minnesota are among the top 10 states in GDP (gross domestic product) ranking and have relatively good standing in income Gini ranking^23,24. This demonstrates how aggregate-level indices can misguide the process of vulnerability assessment and further highlights the need for an index that provides reliable vulnerability information at the finest resolution. On the other hand, Delaware (0.75%), Maryland (69%), Florida (68%), Vermont (67%), Massachusetts (0.66%), California (0.65%), Connecticut (0.65%), Washington (0.61%), New Jersey (0.59%), and Colorado (0.57%) are the top 10 states showing low and very low SEIV indices.

Contribution of various components to the overall vulnerability

Vulnerability components do not equally contribute to the adverse impacts of natural hazards. Here, we waived the conventional assumption of the equal contribution of underlying components to overall vulnerability (e.g., via the rank percentile method in SVI). Our analysis, based on reported property damage costs, uses an ML algorithm to assign weights to various variables that contribute to vulnerability. The proposed model performs well in estimating the damage from validation events, which are not used in training. The results based on our model show a difference in the relative contribution of the components involved (Fig. 3). The medical care distance, persons 65 years and older, distance to a shelter, percent minority (e.g., various ethnicities and races), and percent old built units are found to be the most correlated descriptors of vulnerability (i.e., the larger these are, the higher the vulnerability is). On the contrary, the percent high income, percent recently built units, and average cash rent are the best counter-related predictors of vulnerability. The weights obtained from this approach help reduce subjectivity in determining the relative contribution of vulnerability factors and thus help provide a more realistic depiction of vulnerability distribution across the CONUS than what was available before.

Fig. 3: Contribution range of various factors in Socio-Economic-Infrastructure Vulnerability (SEIV) index based on random forest (RF) regression. Blue (red) represents a positive (negative) contribution to SEIV.

It should be noted that the currently detected patterns of contribution for different components (i.e., increasing/decreasing with the vulnerability of an area) are conditioned on the pre-filtering practice implemented in this study that disregards relatively very high and very low-cost events in the process of training. This means that, in this analysis, given the fact that the events with more than $250 M reported damage are excluded from the training population, higher income/housing costs and lower incomes/housing costs have a negative and positive contribution, respectively, to property damage from storm events; which is in contrary to similar research in this topical area (i.e., Bolin and Stanford 1998²⁵; Tierney 2006²⁶; Osberghaus 2021²⁷). Hence, our results are valid over the range of medium common possibilities of hazard events at which high-income households with sufficient resources and flexible planning capacities are less vulnerable to hazard situations and able to cope with and recover from them much easier than low-income families. Such difference between ours and similar studies in the recognized patterns stems from the nonlinear relationship between income/housing cost and property damage over different ranges of event costs.

Various factors might have contributed to the current distribution of vulnerability. Lack of both physical (e.g., transportation) and nonphysical infrastructure (e.g., insurance) makes low-income families dwell in hazard-prone areas, and their limited recovery capacity exacerbates the situation over time, especially when major hazards hit the area²⁸. On the other hand, high-income households with sufficient resources and flexible planning capacities avoid such losses over time, so the inequality of vulnerability deepens over time, especially after major disasters. Infrastructural characteristics play a crucial role in resilience management²⁹. In fact, vulnerability depends on the capacity of communities to cope with and recover from hazards. Without appropriate characterization of these aspects, i.e., accessibility of critical facilities during and after hazards, vulnerability assessment solely based on social and economic indices may mischaracterize the vulnerability. SEIV improves upon the antecedent indices by integrating information about various categories of building counts (e.g., multidwellings, manufactured housing, nursing homes, and nonprofits) and distance to essential facilities (i.e., emergency operation centers, shelters, medical care, fire stations, and police stations). This extra layer of information is a significant step towards a more comprehensive vulnerability assessment, which was not possible when aggregated at larger levels. In fact, the block-level assessment of vulnerability enables such detailed analysis of infrastructural heterogeneity and thus significantly increases the reliability of the assessment results for decision-making. SEIV, which takes infrastructural characteristics into account, helps reduce this gap via communicable assessment at the scale relevant to bottom-up resilience initiatives.

Discussions

The proposed SEIV index is a valid vulnerability index that meets the widely accepted criteria, including practicality, transparency, interpretability, relevance, theoretical consistency, internal consistency, and external consistency³⁰. SEIV is practical and transparent as the data and method are both publicly available, and so the results are reproducible and usable by interested scientists and practitioners. The range of variability (between 0 and 1) for SEIV makes it interpretable. The closer to zero, the lesser vulnerable the community is, and as the number grows and gets closer to one, vulnerability increases. Such a straightforward indexing system provides interpretable information and so makes the process of risk assessment and communication easy among potential end users. Relevance is ensured in SEIV index development through an expert opinion-based input variable selection procedure. The SEIV index benefits from already existing social, economic, and infrastructure indicators available via federally supported databases to improve upon vulnerability indices currently used for natural hazards risk assessment. To conduct a more comprehensive assessment and take the accessibility of critical infrastructure during hazardous situations into account, in addition to the commonly used demographic and housing information, for each block, we considered building count and its distance to the nearest emergency/essential facilities. Therefore, considering the concept of vulnerability assessment, all variables used in the SEIV index play a major role in quantifying the susceptibility of an individual, a community, and/or their assets to the impacts of hazards. SEIV is theoretically consistent, as it is estimated based on the classic theory of risk that characterizes risk as the combined effects of hazard and vulnerability. While there exists no adequate criterion/measure available to explicitly quantify the vulnerability, one can reasonably use proxies (i.e., reported damage) to characterize the risk and so validate the proposed measure of vulnerability. Moreover, here we use Random Forest (RF) algorithm to determine the contribution of the selected variables to the SEIV index in terms of built-in feature importance weights. It is an algorithm widely used in various fields of science (e.g., Biau and Scornet, 2016; Abbaszadeh et al., 2019)^31,32. To check the internal consistency and robustness criteria, we use the K-fold cross-validation method to estimate the skill of the model on unseen data. To evaluate the performance of the algorithm, we used K-fold cross-validation (with K = 10). This approach helps stabilize the variability of accuracy estimates³³. The performance metrics in estimating the damage from validation events, which are not used in training, can help appropriately evaluate the external consistency of the proposed indexing system. However, as mentioned earlier, in the absence of an explicit quantification for vulnerability, a thorough analysis of this type of consistency is not straightforward and must be done through proxy measures.

There are challenges in keeping the SEIV index up to date. First, census blocks are defined based on visible/invisible features upon adding/losing urban features due to urbanization or in response to natural disasters. Such changes and their associated data updates (via Hazus, for example) are released within 3–4 years after each decadal census revision³⁴. The inconsistency of such data updates and changes made by anthropogenic or natural processes pose a challenge in keeping the SEIV database up to date.

Incompleteness and/or incomprehensiveness of data at the block level is another challenge to tackle for the most accurate SEIV indexing system. For a more comprehensive SEIV index, access to important aspects of vulnerability information, such as disability, education, unemployment, or language status, which are not currently available at the block level, would be very helpful. Additionally, for a more accurate training of ML algorithms, in addition to latitude and longitude, access to storm event damage data that provides information regarding the spatial extent to which damage has been estimated would be helpful.

Also, the inherent noise in the Hazus data that stems from the nature of block-level resolution at the census is another consideration of this index. Census block demographic data has notable noises infused to protect privacy; this would propose some uncertainty in the estimated SEIV index using block-level data in this study. Although it has not been done yet, if possible, denoising the publicly available data or accessing the raw data for an in-depth analysis of variability in various factors contributing to socio-economic and infrastructure vulnerability would help overcome this issue. Moreover, it should be noted that block-level Hazus data is based on longitudinal employer-household dynamics with a set of known nonsampling errors, including misreported data, late reporters whose records are missing and imputed, and geographic/industry edits and imputations. Where fine-resolution data with good accuracy is available, a thorough uncertainty analysis to determine the contribution of each of these potential sources of error would help make better estimates of vulnerability.

While our aim here is to assess relative Social, Economic, and Infrastructural vulnerability for all blocks over the entire CONUS and propose a single weighting system, some forms of classification might be considered helpful in the follow-up studies. For example, some components that show a stronger positive correlation with vulnerability (e.g., manufactured housing counts, distance to emergency facilities, and percent of the population over 65) are characteristics more common in rural areas, while some others that happen to show a negative correlation with vulnerability are characteristics typically associated with urban/metropolitan areas (e.g., average home value, average cash rent, percent high income). Thus, the stratification of results by urban-rural classification might provide valuable context for interpretation. However, the subjectivity of such stratification (urban vs. rural) at a large number of blocks over the CONUS would pose significant uncertainty to the calculated index. Considering urban/rural characteristics of a block needs complete and comprehensive data with an agreed demarcation to determine whether any block over the CONUS belongs to an urban or rural area.

Methods

Providing the spatial distribution of block-level vulnerability across the CONUS is the main objective of this study. The descriptive social and economic characteristics data at the census block level, the smallest geographic area for which the Bureau of the Census collects and tabulates decennial census data, are available via U.S. Census Bureau every 10 years^34,35. Blocks are formed by visible features such as streets, roads, railroads, streams, and other visible physical and cultural features and nonvisible boundaries such as property lines, cities, school districts, and county limits shown on Census Bureau maps³⁵. We obtained descriptive components (demographics and building count by census block) from the Hazus data inventory through the Comprehensive Data Management System (CDMS). Hazus, a natural hazard modeling application developed by the Federal Emergency Management Agency (FEMA), provides information on potential loss estimates from floods, hurricanes, earthquakes, and tsunamis. CDMS is a complementary tool used to search for and transfer data into and out of a given Hazus state inventory dataset. The data available in Hazus 5.0, released on April 30, 2021, contain 59 and 33 demographic and building count variables, respectively, for 10,946,768 blocks over the CONUS. It is based on 2010 census data and modified by the National Structure Inventory (NSI) data developed by the United States Army Corp of Engineers Hydrologic Engineering Center, Flood Impact Assessment (USACE HEC-FIA). This modification, in coordination with FEMA, has been based on land cover patterns to include areas where structures are most likely to be found and to remove undeveloped areas such as areas covered by bodies of water, wetlands, or forests to avoid overestimation of losses. Additionally, the General Building Stuck (GBS) data in Hazus that is used here are based on RSMeans (a construction cost estimators toolbox) version 2018³⁴.

One distinguishing aspect of SEIV is the consideration of the distance from each block to the nearest emergency facilities. For this purpose, the geographic information of emergency facilities, including fire stations, medical care facilities, shelters, and emergency operation centers, is obtained via Hazus, which was updated by the 2019 Homeland Infrastructure Foundation-Level Data (HIFLD)³⁴. We have also used the HIFLD database to obtain geographic information on police stations (local law enforcement locations) and national shelter system facilities that have been assigned as shelters by either FEMA or the American Red Cross. The geographic information of blocks (i.e., latitude and longitude) for distance calculations are available via Census’s Block-level Geographic Information³⁶. It should be noted that, if it was available, the inclusion of transportation infrastructure and access modeling could have improved the outcome of this analysis as compared to the currently implemented direct distance measure. While its development is beyond the scope of this work, such a model should also consider that some disastrous situations may interrupt the serviceability of transportation infrastructure, and an ideal model needs to take these factors into account for an appropriate accessibility assessment during hazardous situations.

To train and validate the developed ML algorithm, we use the National Oceanic and Atmospheric Administration (NOAA) Storm Events database, a comprehensive repository that provides information for different types of natural disasters between 2006 and 2020³⁷. We used the database to obtain the location and property damage for 130,481 events with estimated damage between $10,000 and $250 million to ensure that the regression analysis results remained valid over the range of common possibilities. Exclusion of relatively very high (>$250) and very low (<$10,000) cost incidents from the damage data helps prevent the effect of hazard intensity overshadows the contribution of vulnerability. In fact, a relatively high level of damage (i.e., greater than $250) requires large-scale weather events under which a detailed analysis of the contribution of vulnerability components to the resulting risk is not possible²⁵.

We developed a comprehensive block-level SEIV index that would be useful for high-resolution hydroclimatic risk assessment over the CONUS. Figure 4 demonstrates the overall flow of tasks/processes to calculate SEIV. To make information consistent and reliable for analysis, we first need to sort the data obtained from various sources and select the necessary variables for this study. Next, among 91 variables describing various aspects of socioeconomic vulnerability available via Hazus, we select the 38 most relevant components based on the existing literature^14,16,26. Among those selected based on expert opinion, percent of the female population is due to the fact that women’s vulnerability and discrimination tend to become more visible during periods of societal disruption, such as catastrophic disasters, when the social structure is disturbed³⁸. During and after the disaster, pregnancy, childbirth, insecurity, and more exposure to sexual violence, health, and coping mechanisms after a shock are some factors that make a woman population vulnerable in critical situations³⁸. Such a reduction in the data dimension based on expert opinion ensures the avoidance of redundancy and irrelevance in underlying variables^39,40. These variables will then go through a round of mixing to create 22 mixed variables that combine information at subcategories. For example, we combine the three original subcategories of income less than 10 K, income between 10 K and 20 K, and income between 20 K and 30 K into a mixed variable called low income (Table 1).

**Fig. 4: Flowchart of estimating the SEIV (socio-economic-infrastructure vulnerability) index.**

Table 1 List of selected variables under each category

Full size table

To prevent the range variability from dominating the process of selection based on their relative contribution to overall vulnerability, we normalize each of the selected/mixed variables through a min–max feature scaling equation as:

$${x}^{{\prime} }=\frac{x-{{\min }}\left(x\right)}{{{\max }}\left(x\right)-{{\min }}\left(x\right)}$$

(1)

where $x$ is the original value of the mixed variables, and ${x{{\hbox{'}}}}$ is the normalized value. Such normalization helps deal with different measuring units and helps with outlier problems^41,42. These mixed variables then need to be fed into an ML algorithm (here, random forest regression) to determine the relative contribution of each to the estimated SEIV. To prepare the set of selected variables for regression analysis, we first need to evaluate the severity of multicollinearity among the variables involved. Within a given dataset, highly correlated variables may contain similar information about the variance and thus may adversely affect the efficiency of regression analysis^43,44. For this purpose, we have implemented the VIF, which helps detect the multicollinearity of variables within a given dataset and avoids collinearity problems in regression⁴⁵. VIF is calculated as:

$${{{{{{\rm{VIF}}}}}}}_{{{{{{\rm{i}}}}}}}=\,1/(1-{R}_{i}^{2})$$

(2)

where R_i is the coefficient of determination of the regression between each variable and all other predictor variables. Large VIF values (i.e., >10) indicate that there is significant multicollinearity in the dataset that needs to be considered by removing the correlated variables⁴⁵. By implementing VIF, we detected multicollinearity in a set of multiple regression variables. Therefore, we removed the components with VIFs above 10 from the index calculation and left them with 19 final variables.

To determine the contribution of the final selected variables to the SEIV index, we used a random forest (RF) algorithm. RF is an ensemble learning method that uses a multitude of decision trees at training time to improve the performance of the model³¹. RF can be used for both classification and regression problems. The output of RF is the mean prediction of the individual trees for the regression task⁴⁶. The RF algorithm has a built-in feature importance that describes which features are more relevant. The feature selection for internal nodes of each tree is based on variance reduction for the regression task. Thus, for each feature, we can measure how, on average, it decreases the impurity. The highest decrease is the most relevant and important⁴⁷. To calculate the feature importance, we train and validate the RF algorithm using storm property damage data provided via the NOAA Storm Events database. Since the damage estimates are reported at specific latitudes and longitudes, we need to define a buffer zone around the reporting point to match the damages to our vulnerability estimates within that given area. In fact, the buffer zone serves as a spatial sampling domain over which blocks carry vulnerability information associated with the reported damage. We found a buffer zone with a radius of 0.1 degrees around the reporting point suitable for selecting blocks that could have significantly contributed to the reported damage during an event.

For a reported damaging event at a given location, the estimated damage and vulnerability may not be identical, and hazard characteristics play a major role in the extent of damage observed. Thus, we have generated 1000 synthetic hazard scenarios and picked the ensemble median of resulting weights for vulnerability calculation. In the absence of real hazard intensity observation and a unifying system that can make comparable hazard information amongst various types of hazards (i.e., tornados, wildfire, and floods), these synthetic hazard scenarios probabilistically estimate the range of possibilities for the underlying hazard intensities that could have contributed to this given level of damage. To represent the interconnectedness between hazard and damage in the process of hazard scenario generation, we follow this recipe for generating synthetic scenarios using generalized extreme value (GEV) distribution. There are several studies that support the application of GEV to describe the underlying distribution of different hazards (see KJ Beven et al. 2018, for example). To determine the location parameter of GEV, we multiply the damage values by a fixed number (here 0.33) to ensure a strong correlation exists between damage and the synthetic hazard; then normalize it for statistical analysis. Next, for a given location, we generate 1000 random hazards with scale parameters ranging between 0.05 and 0.1. Hazard scenarios generated by GEV with scale parameters beyond this range no longer seem to be related to damage, so considering them in ML would not be meaningful. Those generated synthetic hazard scenarios are going to be considered as an input feature to the RF algorithm. The ensemble median of 1000 simulations would provide the importance weights for SEIV index calculations.

Using a trial and error approach, we consider 80 decision trees for the algorithm, 80% of the data for training, and 20% for testing the model. To evaluate the performance of the algorithm, we used K-fold cross-validation (with K = 10). K-fold cross-validation is a method commonly used to estimate the skill of ML models on unseen data. The data set is first randomly divided into k subsets with equal sample sizes. Then, each fold is taken as a test data set, and the other k-1 fold is taken as a training set. The model is fit to the training data and evaluated on the test data⁴⁸. The performance of the model is the mean of the model skill scores of each k-fold cross-validation run. This approach helps stabilize the variability of accuracy estimates³³. Then, the estimated weight from the RF algorithm (Wi) for each select variable i will be used to calculate the SEIV index for each block in CONUS by:

$${{{{{\rm{SEIV}}}}}}=\mathop{\sum }\limits_{i=1}^{i=19}\left({W}_{i}{X^{\prime} }_{i}\right)$$

(3)

where ${{X{{\hbox{'}}}}}_{i}$ is the normalized magnitude of the selected variable ${X}_{i}$. SEIV ranges from 0 to 1; the higher the number is, the more vulnerable the block is estimated to be against natural hazards.

In summary, the SEIV index is obtained using the following steps: (1) Normalize each of the selected/mixed variables through min–max feature scaling to prevent the range variability of components. $({X^{\prime} }_{i})$, (2) We implement the VIF to evaluate the severity of multicollinearity among the variables and then remove the highly correlated variables, (3) the RF algorithm was performed to determine the contribution of the final selected variables $({W}_{i})$, (4) Interpret the influence of each component on socioeconomic-infrastructure vulnerability and assign signs of positive/negative contributions to them, (5) calculate the SEIV index for each block in the CONUS as the sum of multiplying the estimated weight from the RF algorithm $({W}_{i})$ for every 19 variables by their normalized variables $({X^{\prime} }_{i})$.

The blocks around the CONUS are classified based on their estimated ${{{{{\rm{SEIV}}}}}}$ quantile to very low (${{{{{{\rm{q}}}}}}}_{{{{{{\rm{SEIV}}}}}}}$ < 0.2), low (0.2 < $\,{{{{{{\rm{q}}}}}}}_{{{{{{\rm{SEIV}}}}}}}$ < 0.4), medium (0.4 < $\,{{{{{{\rm{q}}}}}}}_{{{{{{\rm{SEIV}}}}}}}$ < 0.6), high (0.6 < $\,{{{{{{\rm{q}}}}}}}_{{{{{{\rm{SEIV}}}}}}}$ < 0.8), and very high (0.8 < $\,{{{{{{\rm{q}}}}}}}_{{{{{{\rm{SEIV}}}}}}}$).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All the data used in this study are publicly accessible. The latest version of Hazus data and software is publicly accessible by FEMA Flood Map Service Center via the following link: https://www.fema.gov/flood-maps/tools-resources/flood-map-products/hazus/software. The Comprehensive Data Management System (CDMS) software provided by “informer technologies Inc” that is available via the following link: https://comprehensive-data-management-system-cdm.software.informer.com/ is helpful for downloading and managing Hazus data. The census shape file data, provided by U.S Census Bureau, is available via the following link: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html. The NOAA storm event database is publicly accessible via NOAA National Centers for Environmental Information website at this link: https://www.ncdc.noaa.gov/stormevents/. Due to the exceptionally large size of the data set produced in this study, the complete vulnerability data cannot be openly shared. However, the vulnerability of New York State as a selected region is available through the figshare data repository: https://doi.org/10.6084/m9.figshare.22647937.

References

Coronese, M., Lamperti, F., Keller, K., Chiaromonte, F. & Roventini, A. Evidence for sharp increase in the economic damages of extreme natural disasters. Proc. Natl Acad. Sci. USA 116, 21450–21455 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Field, C.B., Barros, V., Stocker, T.F. and Dahe, Q. Managing the risks of extreme events and disasters to advance climate change adaptation. A special report of the intergovernmental panel on climate change. (Cambridge University Press, Cambridge, UK, and New York, USA, 2012).
Clark, M. P. et al. Characterizing uncertainty of the hydrologic impacts of climate change. Curr. Clim. Change Rep. 2, 55–64 (2016).
Article Google Scholar
Ahmadalipour, A., Moradkhani, H., Castelletti, A. & Magliocca, N. Future drought risk in Africa: integrating vulnerability, climate change, and population growth. Sci. Total Environ. 662, 672–686 (2019).
Article ADS CAS PubMed Google Scholar
Alfieri, L. et al. Global projections of river flood risk in a warmer world. Earth’s Fut. 5, 171–182 (2017).
Article ADS Google Scholar
Carrão, H., Naumann, G. & Barbosa, P. Mapping global patterns of drought risk: An empirical framework based on sub-national estimates of hazard, exposure and vulnerability. Glob. Environ. Change 39, 108–124 (2016).
Article Google Scholar
Cutter, S. L. & Finch, C. Temporal and spatial changes in social vulnerability to natural hazards. Proc. Natl Acad. Sci. USA 105, 2301–2306 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Engström, J., Jafarzadegan, K. & Moradkhani, H. Drought vulnerability in the United States: an integrated assessment. Water 12, 2033 (2020).
Article Google Scholar
Hirabayashi, Y. et al. Global flood risk under climate change. Nat. Clim. Change 3, 816–821 (2013).
Article ADS Google Scholar
Kreibich, H. et al. Costing natural hazards. Nat. Clim. Change 4, 303–306 (2014).
Article ADS Google Scholar
Merz, B. et al. Causes, impacts and patterns of disastrous river floods. Nat. Rev. Earth Environ. 2, 592–609 (2021).
Article Google Scholar
UNDRR. UNISDR terminology on disaster risk reduction. https://www.undrr.org/publication/2009-unisdr-terminology-disaster-risk-reduction https://www.unisdr.org.
Tanim, A.H., Goharian, E. & Moradkhani, H. Integrated socio-environmental vulnerability assessment of coastal hazards using data-driven and multi-criteria analysis approaches. Nat. Sci. Rep. https://doi.org/10.1038/s41598-022-15237-z. (2022).
Khajehei, S., Ahmadalipour, A., Shao, W. & Moradkhani, H. A place-based assessment of flash flood hazard and vulnerability in the contiguous United States. Sci. Rep. 10, 448 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kulp, S. A. & Strauss, B. H. New elevation data triple estimates of global vulnerability to sea-level rise and coastal flooding. Nat. Commun. 10, 4844 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Flanagan, B., Hallisey, E., Adams, E. & Lavery, A. Measuring community vulnerability to natural and anthropogenic hazards: the Centers for Disease Control and Prevention’s Social Vulnerability Index. J. Environ. Health 80, 34–36 (2018).
PubMed PubMed Central Google Scholar
Van Zandt, S. et al. Mapping social vulnerability to enhance housing and neighborhood resilience. Hous. Policy Debate 22, 29–55 (2012).
Article Google Scholar
Flanagan, B. E. Gregory, E. W., Hallisey, E. J., Heitgerd, J. L. & Lewis, B. A social vulnerability index for disaster management. Journal of Homeland Security and Emergency Management 8, 3 (2011).
Schmidtlein, M. C., Deutsch, R. C., Piegorsch, W. W. & Cutter, S. L. A sensitivity analysis of the social vulnerability index. Risk Anal. 28, 1099–1114 (2008).
Article PubMed Google Scholar
Cutter, S. L., Boruff, B. J. & Shirley, W. L. Social vulnerability to environmental hazards*. Soc. Sci. Q. 84, 242–261 (2003).
Article Google Scholar
Stoker, G. The New Politics of British Local Governance. (Palgrave Macmillan, 2000).
Gastwirth, J. L. The estimation of the Lorenz curve and Gini index. Rev. Econ. Stat. 54, 306–316 (1972).
Article MathSciNet Google Scholar
U.S. Bureau of Economic Analysis. GDP & Personal Income. https://apps.bea.gov/iTable/index_nipa.cfm.
U.S. Census Bureau. Income Inequality. https://www.census.gov/topics/income-poverty/income-inequality.html.
Bolin, R. & Stanford, L. The Northridge Earthquake: community-based approaches to unmet recovery needs. Disasters 22, 21–38 (1998).
Article CAS PubMed Google Scholar
Tierney, K., Bevc, C. & Kuligowski, E. Metaphors matter: disaster myths, media frames, and their consequences in Hurricane Katrina. Ann. Am. Acad. Polit. Soc. Sci. 604, 57–81 (2006).
Article Google Scholar
Osberghaus, D. & Hinrichs, H. The effectiveness of a large-scale flood risk awareness campaign: evidence from two panel data sets. Risk Anal. 41, 944–957 (2021).
Article PubMed Google Scholar
Osberghaus, D. Poorly adapted but nothing to lose? A study on the flood risk—income relationship with a focus on low-income households. Clim. Risk Manag. 31, 100268 (2021).
Article Google Scholar
Rød, B., Lange, D., Theocharidou, M. & Pursiainen, C. From risk management to resilience management in critical infrastructure. J. Manag. Eng. 36, 04020039 (2020).
Article Google Scholar
Spielman, S. E. et al. Evaluating social vulnerability indicators: criteria and their application to the Social Vulnerability Index. Nat. Hazards 100, 417–436 (2020).
Article Google Scholar
Biau, G. & Scornet, E. A random forest guided tour. TEST 25, 197–227 (2016).
Article MathSciNet MATH Google Scholar
Abbaszadeh, P., Moradkhani, H. & Zhan, X. Downscaling SMAP radiometer soil moisture over the CONUS Using an ensemble learning method. Water Resour. Res. 55, 324–344 (2019).
Article ADS Google Scholar
Vanwinckelen, G. & Blockeel, H. On estimating model accuracy with repeated cross-validation. in BeneLearn 2012: Proceedings of the 21st Belgian-Dutch Conference on Machine Learning 39–44 (2012).
FEMA. Hazus Inventory Technical Manual. file:///C:/Users/fyarveysi/Documents/Research/Vulnerability/Fema_hazus-inventory-technical-manual-4.2.3.pdf (2021).
Rossiter, K. What are census blocks? The United States Census Bureau https://www.census.gov/newsroom/blogs/random-samplings/2011/07/what-are-census-blocks.html (2011).
U.S. Census Bureau. 2021 TIGER/Line Shapefiles Technical Documentation. (2021).
NOAA. Storm Events Database. National Centers for Environmental Information https://www.ncdc.noaa.gov/stormevents/ (2022).
Khan, M. M. Disaster, Women and Vulnerability in Everyday Life. in Disaster and Gender in Coastal Bangladesh: Women’s Changing Roles, Risk and Vulnerability (ed Khan, M. M.) 63–97 https://doi.org/10.1007/978-981-19-3284-7_3 (Springer Nature, 2022).
de Brito, M. M. & Evers, M. Multi-criteria decision-making for flood risk management: a survey of the current state of the art. Nat. Hazards Earth Syst. Sci. 16, 1019–1033 (2016).
Article ADS Google Scholar
Oulahen, G., Mortsch, L., Tang, K. & Harford, D. Unequal vulnerability to flood hazards: “Ground Truthing” a social vulnerability index of five municipalities in metro Vancouver, Canada. Ann. Assoc. Am. Geogr. 105, 473–495 (2015).
Article Google Scholar
Moreira, L. L., de Brito, M. M. & Kobiyama, M. Review article: a systematic review and future prospects of flood vulnerability indices. Nat. Hazards Earth Syst. Sci. 21, 1513–1530 (2021).
Article ADS Google Scholar
Jacobs, R., Goddard, M. & Smith, P. Measuring performance: an examination of composite performance indicators. A report for the Department of Health (2004).
Alin, A. Multicollinearity. WIREs Comput. Stat. 2, 370–374 (2010).
Farrar, D. E. & Glauber, R. R. Multicollinearity in regression analysis: the problem revisited. Rev. Econ. Stat. 49, 92–107 (1967).
Article Google Scholar
Miles, J. Tolerance and variance inflation factor. in Wiley StatsRef: Statistics Reference Online https://doi.org/10.1002/9781118445112.stat06593 (John Wiley & Sons, Ltd., 2014).
Liaw, A. & Wiener, M. Classification and Regression by randomForest. R News 2/3, 5 (2002).
Płoński, P. Random Forest Feature Importance Computed in 3 Ways with Python. MLJAR https://mljar.com/blog/feature-importance-in-random-forest/ (2020).
Anguita, D et al. The ‘K’ in K-fold Cross Validation. https://arpi.unipi.it/handle/11568/962587 (2012).

Download references

Acknowledgements

This study is funded by USACE contract #W912HZ2020055.

Author information

Authors and Affiliations

Center for Complex Hydrosystems Research, University of Alabama, Tuscaloosa, AL, 35487, USA
Farnaz Yarveysi, Atieh Alipour, Hamed Moftakhari, Keighobad Jafarzadegan & Hamid Moradkhani
Department of Civil, Construction and Environmental Engineering, University of Alabama, Tuscaloosa, AL, 35487, USA
Farnaz Yarveysi, Atieh Alipour, Hamed Moftakhari, Keighobad Jafarzadegan & Hamid Moradkhani

Authors

Farnaz Yarveysi
View author publications
You can also search for this author in PubMed Google Scholar
Atieh Alipour
View author publications
You can also search for this author in PubMed Google Scholar
Hamed Moftakhari
View author publications
You can also search for this author in PubMed Google Scholar
Keighobad Jafarzadegan
View author publications
You can also search for this author in PubMed Google Scholar
Hamid Moradkhani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.Y., H. Mof, and H. Mor conceptualized and designed the framework. F.Y. and A.A. carried out the methodology and the validation process. F.Y. drafted the original paper and developed the visuals. H. Mof, A.A., K.J., and H. Mor reviewed and edited the paper. H. Mor and H. Mof secured the funding, and H. Mor supervised the project.

Corresponding author

Correspondence to Hamid Moradkhani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Seth Spielman and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yarveysi, F., Alipour, A., Moftakhari, H. et al. Block-level vulnerability assessment reveals disproportionate impacts of natural hazards across the conterminous United States. Nat Commun 14, 4222 (2023). https://doi.org/10.1038/s41467-023-39853-z

Download citation

Received: 26 May 2022
Accepted: 30 June 2023
Published: 14 July 2023
DOI: https://doi.org/10.1038/s41467-023-39853-z

This article is cited by

Dynamic bivariate hazard forecasting of hurricanes for improved disaster preparedness
- Shrabani S. Tripathy
- Keighobad Jafarzadegan
- Hamid Moradkhani
Communications Earth & Environment (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.