Network recovery based on system crash early warning in a cascading failure model

Zhou, Dong; Elmokashfi, Ahmed

doi:10.1038/s41598-018-25591-6

Download PDF

Article
Open access
Published: 10 May 2018

Network recovery based on system crash early warning in a cascading failure model

Dong Zhou¹ &
Ahmed Elmokashfi¹

Scientific Reports volume 8, Article number: 7443 (2018) Cite this article

2020 Accesses
22 Citations
2 Altmetric
Metrics details

Subjects

Abstract

This paper investigates the possibility of saving a network that is predicted to have a cascading failure that will eventually lead to a total collapse. We model cascading failures using the recently proposed KQ model. Then predict an impending total collapse by monitoring critical slowing down indicators and subsequently attempt to prevent the total collapse of the network by adding new nodes. To this end, we systematically evaluate five node addition rules, the effect of intervention delay and network degree heterogeneity. Surprisingly, unlike for random homogeneous networks, we find that a delayed intervention is preferred for saving scale free networks. We also find that for homogeneous networks, the best strategy is to wire newly added nodes to existing nodes in a uniformly random manner. For heterogeneous networks, however, a random selection of nodes based on their degree mostly outperforms a uniform random selection. These results provide new insights into restoring networks by adding nodes after observing early warnings of an impending complete breakdown.

Mitigation of cascading failures in complex networks

Article Open access 30 September 2020

Research on proactive defense and dynamic repair of complex networks considering cascading effects

Article Open access 08 May 2024

Network isolators inhibit failure spreading in complex networks

Article Open access 25 May 2021

Introduction

Cascading failures and the recovery from them is one of the most popular research directions in network science. Recently, the percolation theory has been widely used for modeling cascading failures in interdependent networks, where failures propagate among networks due to predefined dependency links^{1,2,3,4,5,6,7,8,9}. Overload-triggered cascades in single or coupled networks have also been the subject of much work in the past decade^{10,11,12,13,14,15,16,17,18,19,20,21}. Besides the above mentioned models, other models like k-core cascades, sandpile models have also been employed for understanding failure propagation and systems collapse^{22,23,24,25,26,27}. Based on the above modeling frameworks of cascading failures, different approaches for system repair have also been studied. Most of these works consider including rules for restoring nodes that fail during the cascading failure process^{28,29,30,31,32,33}. For example, A. Majdandzic, et al. in 2014 presented a model, where a node recovers from an internal or external failure after a fixed period of time. This model leads to an interesting phase-flipping phenomena, as well as a strong hysteresis behavior²⁸. This model was later extended by using a randomized recovery method³¹. More recently, M.A. Di Muro, et al. studied a node repairing strategy for interdependent random networks, where a failed node can be repaired with a certain probability if it is a part of the current giant connected components³². A. Majdandzic, et al. further studied the cascade and node recovery model for multi-layer interacting networks and also investigated the optimal repairing strategy for a collapsed coupled system³³.

Many cascading failure models exhibit the interesting phenomena of “critical slowing down”: systems near criticality can experience a much longer cascading process (the so called “plateau stage”), which is sensitive to noise, before a final total collapse^1,34,35,36. For example, D. Zhou, et al. studied the branching process behind the critical cascading failures in interdependent networks, and showed the critical/non-critical scaling rules of the total cascade length³⁴. In addition, G.J. Baxter, et al. studied the critical and non-critical dynamic processes in the k-core pruning model³⁵. Recently, D. Lee, et al. presented a universal model for hybrid percolation transitions and investigated the resulting critical cascading process³⁶. Most of these studies mainly focused on interpreting the time length of the critical slowing down phase. Further, early warning indicators for system transitions based on the critical slowing down have already been evaluated for many real systems^{37,38,39,40,41,42}. This technique has also been used for predicting system collapse in cascading failure models. For example, B. Podobnik, et al. studied indicators to predict total collapses in a cascading failure model on random networks⁴³.

Although the critical slowing down phenomena have been leveraged to provide indicators of impending cascades, there is still an important open question: how to restore the system after an early warning has been recorded? In this work, we attempt to investigate and answer this question. To this end, we have systematically explored several system recovery strategies after observing an early warning of a total system crash. We base our work on the recently proposed model of cascading failures by Y. Yu, et al.⁴⁴. This cascading failure model is an extension of the k-core cascade, where a node will be removed from the network with a probability f if it has fewer than k_s connections, or it has lost more than a fraction 1 − q of its original neighbors. Further, as in⁴³, we employ the moving standard deviation (MVSD) of the remaining system size time series as an early indicator of an impending cascade. We then compare five different node-addition based recovery strategies and study the effect of response time delay on system recovery. We find that, for homogeneous Erdös-Rényi (ER) networks, an earlier node addition leads to a larger survival ratio. However, for scale-free (SF) networks, a delayed recovery can be better in some cases. We also find, for ER networks, that it is always better to connect the newly added nodes to existing nodes in a uniformly random manner. However, for SF networks, a roulette selection based on each node’s original degree (or its reciprocal) can perform better for earlier node additions. These results provide insights on how to save a system that has been predicted to collapse.

Results

Cascading failure model and recovery strategies

In this work, we follow the KQ modeling framework of system crash introduced by Y. Yu, et al.⁴⁴. A node will be removed from the system with a probability f, if it’s current degree is smaller than a threshold k_s or it has lost more than a fraction q of its original neighbors. The fraction of remaining nodes is used as a measure of the system robustness. The KQ model exhibits an interesting behavior for certain parameter values, where systems would experience a slow cascading failure process in a plateau stage (pseudo-steady states) before an abrupt total collapse. In the following, we focus on cases with sudden total collapse after a pseudo-steady state. Our goal is to investigate early warning indicators and compare system recovery strategies. We focus on two cases: ER networks with 〈k〉 = 20, k_s = 11, q = 0.09 and f = 0.1, and SF networks with γ = 1.8, k_s = 5, q = 0.39 and f = 0.2. These parameter values are inspired by the values used by Y. Yu, et al., who in turn based their choice of parameter values on measurements from real-world systems. We show 30 realizations of the cascading failure process for both ER and SF networks in Fig. 1. S(t), t = 1, 2, … denotes the proportion of remaining nodes at time step t. For both cases, the system is near criticality and has a plateau stage (pseudo-steady state) before reaching the final state (a total collapse or surviving near the plateau). Comparing Fig. 1(a) and Fig. 1(b), we find that the ER case has a plateau stage at around \(S(t)\sim 1\), while the SF case has a plateau at around \(S(t)\sim 0.2\). The latter has a much lower plateau stage, since the heterogeneous degree distribution leads to more failures at the beginning compared to ER networks. For ER networks, as long as the mean degree is significantly larger than the threshold k_s, the system will have very few failures at the early time steps. In other words, the system size does not significantly change, which results in the observed plateau stage around 1.

In order to provide early warning indicators of a total collapse during the plateau stage, we need to capture both the beginning and the end of the plateau stage. To do this, we first define the moving standard deviation (MVSD) of S(t), MVSD(t), as the standard deviation (SD) of S(t) in time windows with length 5: S(t − 4), S(t − 3), …, S(t). For t ≤ 4, the first t values of S(t) will be used to calculate the MVSD instead. Note that the time series after the current time step t is not used, since we aim to provide early warning prediction based on historical records. We use a window length of 5 for calculating the MVSD, because some realizations for the SF network, as shown in Fig. 1(b), can reach a total collapse within 20 time steps.

We define the beginning of the plateau stage as T_start = 1 for the ER network case. For the SF network case, T_start is defined as the time step where MVSD(t) becomes smaller than 0.01 for the first time. This threshold is motivated by the observation that the MVSD will become smaller than 0.01 during the plateau stage in most cases. The end of the plateau stage, T_pred, is defined as the first time step where MVSD(T_pred) > mean(MVSD(t = T_start, …, T_pred − 1)) + 3 · SD(MVSD(t = T_start, …, T_pred − 1)). This definition is inspired by the fact that systems tend to have a continuously increasing SD when leaving the pseudo-steady state.

Following the prediction of the start and end of the plateau stage, we try to restore the system by adding N_a new nodes at time step t = T_pred + T_delay, where T_delay ≥ 1 defines the time delay of the node addition process. Each of the additional nodes has k_a connections to k_a remaining nodes–If there are fewer than k_a remaining nodes, all of them will be connected to each additional node. Next we discuss different strategies for wiring the newly added nodes.

“Uniformly random selection”: at time step T_pred + T_delay, each additional node is connected to k_a uniformly randomly sampled remaining nodes.

“Largest degree selection”: at time step T_pred + T_delay, each additional node is connected to k_a remaining nodes that had the largest degree values in the original network.

“Smallest degree selection”: at time step T_pred + T_delay, each additional node is connected to k_a remaining nodes that had the smallest degree values in the original network.

“Roulette selection”: at time step T_pred + T_delay, each additional node is connected to k_a randomly selected remaining nodes, and the probability that a remaining node is selected is proportional to its degree in the original network.

“Anti-roulette selection”: at time step T_pred + T_delay, each additional node is connected to k_a randomly selected remaining nodes, and the probability that a remaining node is selected is proportional to the reciprocal of its original degree.

We use a threshold d for the fraction of remaining nodes, S(t), to determine if one realization of cascading failures in simulation has a total collapse. d is set to 0.5 and 0.1 for the ER and SF networks, respectively. These thresholds correspond to half the system sizes at the pseudo-steady states. For each realization with a total collapse, we repeat the node addition independently M_a times, and calculate a survival ratio over these M_a tests, η, which is the number of trials without total collapses divided by M_a. We also find the time step, t = T_d, where S(t) decreases to below the threshold d, after each trial of node addition with a total collapse. We repeat the above process for M realizations. To illustrate the above mentioned processes of total collapse prediction and mitigation via adding new nodes, we show in Fig. 2 examples for an ER network and a SF network. For both examples, we use N_a = 100 and M_a = 10. We, however, use different k_a, T_delay values depending on the network: k_a = 30, T_delay = 6 for the ER network; and k_a = 8, T_delay = 5 for the SF network. These parameter values were carefully chosen to ensure that we do not end up with the extreme survival ratio η of 0 or 1. For node addition, we follow the uniformly random selection rule. The ER network survived in 8 out of the 10 trials, while 9 of them survived in the SF case (see the middle and lower panels of Fig. 2(a,b)). Therefore, the survival ratios for these two examples are 0.8 and 0.9, respectively.

Comparisons of node addition rules

In the following, we investigate how different node addition rules impact the ability to recover a system with an impending total collapse for both ER and SF networks. We also study the role of the time delay T_delay.

First, we focus on the ER network case with N_a = 100 and different values of k_a. Fig. 3(a–e) show how the mean survival ratio 〈η〉 varies for different values of k_a as we vary T_delay, for the five different approaches of node addition. For example, according to Fig. 3(a,d,e), for the three randomized selection rules, the survival ratio decreases from 1 to around 0 as T_delay increases or as k_a decreases. However, Fig. 3(b,c) show that, for the largest degree and smallest degree selection rules, the system has much lower survival ratios. This is because all the N_a ⋅ k_a additional links are added between N_a new nodes and the k_a remaining nodes with the largest or smallest original degrees. This will lead to a final state, with around N_a + k_a nodes, smaller than the threshold d = 0.5. We also notice that for the roulette/anti-roulette selection, when k_a becomes too large, the survival ratio tends to decrease. This can be related to the fact that for each additional node, one remaining node can be selected multiple times, which reduces the positive effect of node addition.

Figure 4(a–d) also compare the five node selection rules, but this time we check for different values of k_a as we vary T_delay. For example, Fig. 4(a) shows the results for T_delay = 1, which is an immediate system recovery, and as we vary k_a between 0 and 200. The uniformly random selection is evidently the best. The roulette/anti-roulette selection has similar but slightly smaller survival ratio values. According to Fig. 4(b–d), for larger T_delay values, the uniformly random selection is always better than the roulette and anti-roulette selection rules. These results suggest that for restoring an ER network, there is no need to pick nodes to connect to based on degree.

In Figs 5 and 6, we present the same as in Figs 3 and 4, but for the SF case. We consider adding N_a = 100 nodes, with different k_a and T_delay values. In Fig. 5(a), we surprisingly find that for the uniformly random selection, the survival ratio η does not monotonically decrease with T_delay, but has a peak at around T_delay = 11 for different k_a values. This means that to prevent the total collapse of a SF network, sometimes a delayed recovery can be better. As shown in Fig. 5(d,e), the roulette and anti-roulette rules behave similarly. Moreover, we find that, for an immediate node addition, the roulette rule performs better than the other two randomized rules (this will be explained later when we discuss the results in Fig. 6). Finally, as shown in Fig. 5(b,c), the largest degree and smallest degree selection rules perform much better compared to their performance in the ER network case. This is because almost all of the N_a additional nodes and the k_s selected remaining nodes tend to survive when k_a is large enough (compared to k_s). Note that N_a + k_s is larger than the threshold d = 0.1, which leads to an η value of ≈1.

The increasing and decreasing trends of the mean survival ratio in Fig. 5(a) are caused by the fact that increasing T_delay leads to two competing effects. On the one hand, a larger T_delay leads to a smaller remaining network before node addition, which tends to cause a smaller final system state after node addition. On the other hand, for larger T_delay, each remaining node on average is connected to more new nodes, which results in larger degree increments for the remaining nodes. To demonstrate this, we show in Supplementary Figs 1 and 2 the distributions of S(t) and node degrees before adding new nodes, as well as at the final state after node additions, for the ER and SF cases, respectively. Supplementary Fig. 1(a) shows the PDF of S(t) before node addition for different T_delay values. Supplementary Fig. 1(b,c) shows the PDF and the CDF of the degree values of the remaining network before adding nodes. Supplementary Fig. 1(d) shows the PDF of the final state after node addition under the uniformly random selection with N_a = 100, k_a = 100 and M_a = 10. Supplementary Fig. 2(a–d) shows the same as Supplementary Fig. 1(a–d) but for the SF network case.

We find that for the ER case, the second trend (larger degree increments) due to increasing T_delay is weaker. Consequently, for most systems at T_delay = 21 and T_delay = 31, the remaining system size, before node addition, plus another 100 nodes remains below the threshold d = 0.5. Thus, having larger degree increments does not help increasing the survival ratio in these cases. However, for the SF case, the remaining nodes with small degrees before adding nodes are non-negligible, even for T_delay = 1. Therefore, having larger degree increments will be more helpful than in the ER case. For T_delay = 1 and T_delay = 5, the additional degree to each remaining node is still not large enough for saving them. For T_delay = 9, thanks the increased degree increments, most final states are not at 0, but around 0.11. This is greater than the threshold d = 0.1, which leads to a larger survival ratio η. For T_delay = 13 and T_delay = 17, the first trend (reduced remaining system size) dominates as in the ER case, consequently most final system states are below the threshold d = 0.1.

Similar to Fig. 4, Fig. 6(a–d) compares the the five selection rules for the SF case using different time delay values. For T_delay = 1, the roulette selection is better than the anti-roulette or the uniformly random one. However, at T_delay = 5, the anti-roulette is better than the other two randomized rules. When T_delay becomes larger, the uniformly random selection becomes the best. These results present a different phenomenon compared to the ER case. To interpret these findings, we consider the degree distribution of the surviving network before the node addition is performed for the SF network case. At T_delay = 1 (see Supplementary Fig. 2(c)), the remaining nodes that fulfil the requirements of being removed are only a small fraction of all remaining nodes. Therefore, it is more important to add links to the original hub nodes to support the connectivity of the remaining network. At T_delay = 5, the remaining networks before adding nodes include a much larger fraction of nodes with small degrees. Consequently, the anti-roulette rule is better, since it restores more susceptible nodes. Finally, for T_delay = 9 or T_delay = 13, the roulette and anti-roulette selection rule are worse than the uniformly random one. This is because both original hub nodes and original nodes with small degrees tend to fulfil the requirements of node removal, These intricate effects of time delay, T_delay, are not observed for the ER network case, since the ER case has homogeneous degree distributions before the node addition.

The above results can be further viewed in light of the total “costs” of the recovery process. Considering that in real world social networks, the cost of introducing one more individual (node) is mainly determined by his/her importance. It costs much more to introduce famous people into the system. Therefore, we can assume that the cost of adding a node is proportional to its degree: number of connections to surviving nodes. This is equivalent to defining the cost of each additional node as k_a, and the total costs of the system recovery as N_a ⋅ k_a. According to the results presented in Figs 4 and 6, for recovering a homogeneous network, the uniformly random selection rule performs better, since it can reach higher survival ratios at a lower total cost (controlled by the parameter k_a). Further, for an early, an intermediate, or a late recovery of a SF network, the roulette, anti-roulette, or the uniformly random selection rules results in larger survival ratios at a lower cost, respectively.

Tradeoffs between the number of additional nodes and their degree

In this subsection, we investigate the tradeoffs between N_a and k_a for a given fixed total cost value. We can imagine that a larger N_a tends to cause a larger final system state, which is good for system recovery. On the other hand, a larger k_a leads to more robust additional nodes. Therefore, it is important to know which parameter is more critical to the survival ratio η. Note that in this subsection we only show the results for the three randomized node selection rules in order to focus on non-trivial results.

Figure 7(a–c) shows, for the ER case, how the mean survival ratio changes with N_a for a fixed total cost N_a ⋅ k_a = 5000 and a set of T_delay values. The survival ratio, in the uniformly random selection case, is not strongly affected by N_a for different T_delay, except for a very large N_a (see Fig. 7(a)). This is because, under a fixed total cost, as N_a becomes larger k_a becomes smaller and eventually less than k_s = 11. For the roulette and anti-roulette selection rules, the effect of N_a is similar to the uniformly random selection except for small N_a values.

Figure 8(a–c) shows the same as Fig. 7 but for the SF case with a total cost N_a ⋅ k_a = 1200. We find that N_a has a stronger impact on the mean survival ratio 〈η〉 than in the ER case. For the uniformly random selection, a very small N_a is preferred at T_delay = 1. However, the needed number of nodes rises to between 100 and 150 for T_delay = 5 or T_delay = 9 and it continues to rise further for T_delay = 13 and T_delay = 17 (see Fig. 8(a)). This means that for a more delayed system recovery, a larger N_a and a smaller k_a are needed. In other words, more additional nodes are needed for recovering a system with a smaller remaining size before starting the addition. The roulette and anti-roulette selection rules demonstrate a similar behavior (see Fig. 8(b,c)). These results provide suggestions for restoring near-collapse systems under a fixed total cost.

Discussion

In this paper, we investigate the possibility of recovering networks that exhibit early warnings of total collapse by adding additional nodes. To this end, we model system collapse using the recently introduced KQ cascade-model and employ the moving standard deviation of the remaining network size time series as an early indicator of an impending cascade. We use five rules for regulating the wiring of the newly added nodes to existing nodes. These include three random rules: uniformly random, roulette and anti-roulette. The latter two connect a new node to a set of randomly selected existing nodes with a probability proportional and inversely proportional, respectively, to their degree in the original network. The five rules include also two deterministic rules that connect new nodes to existing nodes with largest and smallest degrees in the original network, respectively. We find that an early addition of nodes (i.e. immediately after observing early warning signals) is always better for preventing ER networks from a total collapse. This is because ER networks are characterized by a homogeneous degree distribution. SF networks, however, benefit more from a delayed intervention, that is to start adding nodes after a certain time delay T_delay. Investigating the interplay between the five connection rules and T_delay shows that the uniformly random selection is always the best strategy for saving ER networks. For SF network, the best wiring rules change from roulette to anti-roulette, and finally to the uniformly random rule as T_delay increases. This complex interplay is a product of node degree heterogeneity in SF networks. Finally, we explore the balance between the number of needed nodes N_a and their degree k_a that are needed for restoring a collapsing system at a fixed cost of N_a ⋅ k_a. We find that SF networks need to add more nodes as T_delay increases. However, N_a has minimal impact on ER networks survival.

Our findings provide insights into saving networks that are predicted to approaching a total collapse. For example, the counterintuitive results of SF networks restoration, i.e. the positive impact of time delay, can be applied to social structures (companies) and networks with impending cascade to prevent a total collapse. Note that many real-world social networks are known to have heterogeneous structures.

Going forward, we plan to apply the proposed network recovery framework to other sorts of cascading failure models. These include overload based cascades^10,20, which are known to exhibit a slow down near criticality. Furthermore, while the KQ-cascade and node addition based-recovery are more related to social networks like Facebook, it will be interesting to investigate failure models and recovery scenarios that are relevant to other systems. For example, cascades based on dependencies or overloads, with recovery by reconnecting failed nodes^29,30,32,33, are more applicable to systems with physical connections, such as the power-grid and traffic systems.

References

Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks. Nature 464, 1025–8 (2010).
Article ADS CAS PubMed Google Scholar
Parshani, R., Buldyrev, S. V. & Havlin, S. Interdependent networks: reducing the coupling strength leads to a change from a first to second order percolation transition. Physical Review Letters 105, 048701 (2010).
Article ADS PubMed Google Scholar
Parshani, R., Buldyrev, S. V. & Havlin, S. Critical effect of dependency groups on the function of networks. Proc. Natl. Acad. Sci. USA 108, 1007–10 (2011).
Article ADS CAS PubMed Google Scholar
Gao, J., Buldyrev, S. V., Stanley, H. E. & Havlin, S. Networks formed from interdependent networks. Nature Physics 8, 40 (2012).
Article ADS CAS Google Scholar
Hu, Y., Ksherim, B., Cohen, R. & Havlin, S. Percolation in interdependent and interconnected networks: Abrupt change from second- to first-order transitions. Physical Review E 84, 066116 (2011).
Article ADS Google Scholar
Hu, Y. et al. Percolation of interdependent networks with intersimilarity. Physical Review E 88, 052805 (2013).
Article ADS Google Scholar
Reis, S. D. et al. Avoiding catastrophic failure in correlated networks of networks. Nature Physics 8, 762–767 (2014).
Article ADS Google Scholar
Feng, L., Monterola, C. P. & Hu, Y. The simplified self-consistent probabilities method for percolation and its application to interdependent networks. New Journal of Physics 17, 063025 (2015).
Article ADS Google Scholar
Yuan, X., Hu, Y., Stanley, H. E. & Havlin, S. Eradicating catastrophic collapse in interdependent networks via reinforced nodes. Proceedings of the National Academy of Sciences 114, 3311–5 (2017).
Article ADS CAS Google Scholar
Motter, A. E. & Lai, Y.-C. Cascade-based attacks on complex networks. Physical Review E 66, 065102 (2002).
Article ADS Google Scholar
Crucitti, P., Latora, V. & Marchiori, M. Model for cascading failures in complex networks. Physical Review E 69, 045104 (2004).
Article ADS Google Scholar
Motter, A. E. Cascade control and defense in complex networks. Physical Review Letters 93, 098701 (2004).
Article ADS PubMed Google Scholar
De Martino, D., Dall’Asta, L., Bianconi, G. & Marsili, M. Congestion phenomena on complex networks. Physical Review E 79, 015101 (2009).
Article ADS Google Scholar
Brummitt, C. D., D’Souza, R. M. & Leicht, E. A. Suppressing cascades of load in interdependent networks. Proceedings of the National Academy of Sciences 109, E680–E689 (2012).
Article ADS CAS Google Scholar
Tan, F., Xia, Y., Zhang, W. & Jin, X. Cascading failures of loads in interconnected networks under intentional attack. Europhysics Letters 102, 28009 (2013).
Article ADS CAS Google Scholar
Li, D., Jiang, Y., Kang, R. & Havlin, S. Spatial correlation analysis of cascading failures: congestions and blackouts. Sci. Rep. 4, 5381 (2014).
Google Scholar
Tan, F., Wu, J., Xia, Y. & Tse, C. K. Traffic congestion in interconnected complex networks. Physical Review E 89, 062813 (2014).
Article ADS Google Scholar
Chen, Z., Zhang, J., Du, W.-B., Lordan, O. & Tang, J. Optimal allocation of node capacity in cascade-robustness networks. PLoS ONE 10, e0141360 (2015).
Article PubMed PubMed Central Google Scholar
Xia, Y., Zhang, W. & Zhang, X. The effect of capacity redundancy disparity on the robustness of interconnected networks. Physica A 447, 561–568 (2016).
Article ADS Google Scholar
Zhao, J., Li, D., Sanhedrai, H., Cohen, R. & Havlin, S. Spatio-temporal propagation of cascading overload failures in spatially embedded networks. Nature Communications 7, 10094 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, D. & Elmokashfi, A. Overload-based cascades on multiplex networks and effects of inter-similarity. PLoS ONE 12, e0189624 (2017).
Article PubMed PubMed Central Google Scholar
Garcia, D., Mavrodiev, P. & Schweitzer, F. Social resilience in online communities: The autopsy of friendster. In Proceedings of the first ACM conference on Online social networks, 39–50 (ACM, 2013).
Dorogovtsev, S. N., Goltsev, A. V. & Mendes, J. F. F. K-core organization of complex networks. Physical Review Letters 96, 040601 (2006).
Article ADS CAS PubMed MATH Google Scholar
Bak, P., Tang, C. & Wiesenfeld, K. Self-organized criticality. Physical Review A 38, 364 (1988).
Article ADS MathSciNet CAS MATH Google Scholar
Goh, K.-I., Lee, D.-S., Kahng, B. & Kim, D. Sandpile on scale-free networks. Physical Review Letters 91, 148701 (2003).
Article ADS PubMed Google Scholar
Lee, K.-M., Goh, K.-I. & Kim, I.-M. Sandpiles on multiplex networks. Journal of the Korean Physical Society 60, 641–647 (2012).
Article ADS Google Scholar
Noël, P.-A., Brummitt, C. D. & D’Souza, R. M. Controlling self-organizing dynamics on networks using models that self-organize. Physical Review Letters 111, 078701 (2013).
Article ADS PubMed Google Scholar
Majdandzic, A. et al. Spontaneous recovery in dynamical networks. Nat. Phys. 10, 34 (2014).
Article CAS Google Scholar
Liu, C., Li, D., Zio, E. & Kang, R. A modeling framework for system restoration from cascading failures. PLoS ONE 9, e112363 (2014).
Article ADS PubMed PubMed Central Google Scholar
Liu, C. et al. Modeling of self-healing against cascading overload failures in complex networks. Europhysics Letters 107, 68003 (2014).
Article ADS Google Scholar
Böttcher, L., Lukovic′, M., Nagler, J., Havlin, S. & Herrmann, H. Failure and recovery in dynamical networks. Sci. Rep. 7, 41729 (2017).
Article ADS PubMed PubMed Central Google Scholar
Di Muro, M. A., La Rocca, C. E., Stanley, H. E., Havlin, S. & Braunstein, L. A. Recovery of Interdependent Networks. Sci. Rep. 6, 22834 (2016).
Article ADS PubMed PubMed Central Google Scholar
Majdandzic, A. et al. Multiple tipping points and optimal repairing in interacting networks. Nature Communications 7, 10850 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, D. et al. Simultaneous first-and second-order percolation transitions in interdependent networks. Physical Review E 90, 012803 (2014).
Article ADS Google Scholar
Baxter, G. J., Dorogovtsev, S. N., Lee, K.-E., Mendes, J. F. F. & Goltsev, A. V. Critical dynamics of the k-core pruning process. Phys. Rev. X 5, 031017 (2015).
Google Scholar
Lee, D., Choi, W., Kertész, J. & Kahng, B. Universal mechanism for hybrid percolation transitions. Sci. Rep. 7, 5723 (2017).
Article ADS PubMed PubMed Central Google Scholar
Dakos, V. et al. Slowing down as an early warning signal for abrupt climate change. Proceedings of the National Academy of Sciences 105, 14308–14312 (2008).
Article ADS CAS Google Scholar
Scheffer, M. et al. Early-warning signals for critical transitions. Nature 461, 53–59 (2009).
Article ADS CAS PubMed Google Scholar
Dai, L., Vorselen, D., Korolev, K. S. & Gore, J. Generic indicators for loss of resilience before a tipping point leading to population collapse. Science 336, 1175–1177 (2012).
Article ADS CAS PubMed Google Scholar
Scheffer, M. et al. Anticipating critical transitions. Science 338, 344–348 (2012).
Article ADS CAS PubMed Google Scholar
Dakos, V. & Bascompte, J. Critical slowing down as early warning for the onset of collapse in mutualistic communities. Proceedings of the National Academy of Sciences 111, 17546–17551 (2014).
Article ADS CAS Google Scholar
van de Leemput, I. A. et al. Critical slowing down as early warning for the onset and termination of depression. Proceedings of the National Academy of Sciences 111, 87–92 (2014).
Article ADS Google Scholar
Podobnik, B. et al. Predicting the Lifetime of Dynamic Networks Experiencing Persistent Random Attacks. Sci. Rep. 5, 14286 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Yu, Y. et al. System crash as dynamics of complex networks. Proceedings of the National Academy of Sciences 113, 11726–11731 (2016).
Article MathSciNet CAS MATH Google Scholar

Download references

Acknowledgements

We thank the DOMINOS project (Grant No. 240850) from Norwegian Research Council for financial supports.

Author information

Authors and Affiliations

Simula Metropolitan CDE, Fornebu, 1364, Norway
Dong Zhou & Ahmed Elmokashfi

Authors

Dong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Elmokashfi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.E. conceived the study. Both authors conducted the simulation and experiments, analysed the results, and prepared the manuscript.

Corresponding author

Correspondence to Ahmed Elmokashfi.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, D., Elmokashfi, A. Network recovery based on system crash early warning in a cascading failure model. Sci Rep 8, 7443 (2018). https://doi.org/10.1038/s41598-018-25591-6

Download citation

Received: 24 January 2018
Accepted: 19 April 2018
Published: 10 May 2018
DOI: https://doi.org/10.1038/s41598-018-25591-6

This article is cited by

Robustness and resilience of complex networks
- Oriol Artime
- Marco Grassia
- Filippo Radicchi
Nature Reviews Physics (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.